UPDATE 3/2017: Embedly has removed the Free tier of their Extract API described in this post.
The biggest change comes in removing our free API tiers.
On April 17th, we will block all free API key traffic.
Looking at other websites can give you ideas for your own marketing and content. Competitive analysis gives you the opportunity to see what resonates with your audience. It shows what opportunities you have to create content, and where to focus your efforts. Knowing what works for the competition is a way to confirm your own ideas.
Part of the relaunch of this site was creating a high level content strategy for my blog. I say high level because didn’t go to the level of detail I have for clients in the past.
All I want is a roadmap to get me started and keep me on track.
I picked several popular WordPress bloggers and analyzed their posts from last year. I wanted to know the different topics they write about, the types of posts they publish, and how often.
Even though this is competitive analysis, I don’t consider them as competition. I’m not trying to win clients away from them. I chose people in the same space as I want this blog to be.
My plan isn’t to copy them, but to use it for inspiration. You won’t see my own 30 days of membership plugin reviews, but I might do a series covering the same topic like that.
To do the analysis I needed to:
- Get a list of all posts from last year
- Get the title, url, description, keywords for each post
- Manually assign a Topic and Type to each post
- Get the relative topic and type counts
Collecting the Data
Getting the list of posts from last year isn’t as easy as it sounds.
Using an RSS feed isn’t enough with a limited number of posts in the feed (usually 10 or 20 posts). I didn’t want to have to page through the site either.
XML Sitemaps to the rescue!
Every site I analyzed was using Yoast SEO for sitemaps. Yoast SEO creates a sitemap_index file containing one or more sitemaps. It groups them by posts, pages, or other custom post types.
To get a list of posts I used the url of the post-sitemap.xml file. A blog with a lot of posts might more than one post-sitemap file. If so, the one with the highest number in the filename or last modified date will give you the latest posts.
Using the Embedly API
In the past I’ve used the DiffBot API for text extraction. The main issue with DiffBot is there’s no free version, only a 14-day trial. I wanted something I could use from time to time to analyze other blogs to refine my strategy.
Instead I’m using the Embedly API. Embedly will get me all the data I need and lets you extract 5,000 URLs per month for free.
Embedly is an app that lets developers embed content from third party content providers. It works roughly the same way as the oEmbed built into WordPress. When you add a YouTube link WordPress will embed the player automatically. Embedly does the same for more than 200 different services.
Embedly also has an Extract API that lets you pull content from any page. Page data includes article text, images, titles, keywords, entities and much more.
For my analysis I was using the title, description, keywords and entities.
You can think of Entities as proper nouns, like brands, products, or people. I got mixed results with Entities, but it wasn’t a key factor in my analysis.
I wrote a .Net console app the parse the sitemap to get post urls and filter by date. Once I had a list of urls for last year’s posts I passed them into the Embedly API.
Embedly Extract API has a rate limit of 15 urls per second, but allows you to pass URLs in batches of ten at a time. To keep it simple, I looped through the URLs, passing in 10 at a time, and paused for a second after each request. I wasn’t too concerned about performance for this so a pause did the trick.
I saved the results to a CSV file for the next step: a manual review in Excel.
Assign Post Topics and Types
Once I had the Embedly API results I went through each row and assigned a topic and type to that post.
Topics: This is the subject matter of the post. I broke them down into about 15 different topics. The most popular were:
- Development
- WordPress
- Business
- Content
- Design
- Tools
Other topics were Themes, Membership Sites, Tools, Accessibility, Social Media.
Types: I determined the type by looking at the title, description and to a lesser degree, the keywords.
- How To
- Advice
- Self
- Product/Course
- Interview/People
How to posts are instructions on how to do something specific. This could be how to do an advanced meta query or documenting a custom plugin. Advice posts are more about the why.
Job changes, personal achievements, or slides from a speaking event are “self” posts. Remember, I looked at the personal sites of these WordPress pros so this is the proper venue for that.
I analyzed the sites of people that also offered products like a theme, plugins or courses. I kept posts about products they offered different than posts about themselves.
Seeing the number of profiles or interviews of other people on these sites took me a bit by surprise. One of the great things about WordPress is the community. It was nice to see how often these bloggers talked about someone else in the WordPress world.
Other types were product reviews, special offers, and curation/round-up posts.
Analyzing the Results
Once I had the topics and types I wanted to see which were most popular at a glance.
I could have created fancy charts, but a wordcloud gives me what I want to know. I could get a count of each topic and type by counting duplicates in a list, but I wanted something visual.
First, I needed to convert a column to CSV.
Once I had the CSV, I could pass it to WordCloud Creator to build a wordcloud. Once I had the wordcloud I could see which were the most popular topics and type.
For example, If you look at the Topics Carrie Dils writes about you get:
The Types of posts Tom McFarlin publishes are:
The Embedly API also returns keywords for a post. I analyzed the groups of keys words for each site too. I don’t know the exact algorithm Embedly uses to extract keywords, but I found they lined up with the Topics. If “Themes” is a popular topic, the keywords have things like Genesis, theme names, etc.
So how will I use this data?
Again, I’m not trying to mimic these sites. This was to give me a guide and inspiration. I wanted to know a few things about how they write to help me set my own goals.
- How often do they post?
- Which topics are most popular?
- What types of posts are most common?
Through this analysis I was able to define a content strategy with no more than 3-5 topics
You should have a primary topic, a secondary topic, and some misc topics.
From my analysis I found that the primary topic gets about 50% of posts. Secondary topics make up 30% and misc. topics fill out the rest, split evenly.
I laid my content plan out the same way. For me my primary topic is obviously WordPress. That covers development, plugins, and how to use it to grow a business.
Secondary for me is content strategy. I’ll also write about speaking, remote work, book reviews, and more.
Going Forward
While this analysis gives me a good starting point, it is something I’ll review and optimize as I create more content. Doing an analysis of my own content will tell me if I’ve stayed on the right track. the course.
And if you’re wondering, I’d classify this post’s topic as a How To post on the topic of Content.
Leave a Reply