20 years ago pioneering SEOs like Rand Fishkin, Brett Tabke, Bruce Clay, Jill Whalen, and Matt Cutts wrote some of the first SEO best practice checklists. Google wasn’t even a company yet and SEOs were debating the difference between subdomains and folders for ranking purposes.
It’s been over 20 years since those days. Rand, and Jill don’t do SEO anymore, Matt left Google over a decade ago and hasn’t even tweeted in almost 3 years – but much of the community is still using those same checklists and listing those same now non-SEOs at the top of their “SEOs to follow lists.”
WHY?
Why hasn’t the industry evolved as fast as the search engines have? What’s wrong with our approach?
Let’s take the above mentioned subdomain/directory argument. For a long time now Google has told us “it doesn’t matter.” While it used to matter when Google was hard-coded rules, they have since figured out how to algorithmically decide based on link patterns – and then more recently how to algorithmically decide based on website vectors and navigational search patterns. I love that some SEOs are still testing this, but “it doesn’t matter” is the reason why all our “tests” keep claiming different things – because it truly doesn’t matter. The URL is no longer the signal. They’ve evolved. The same is true with lots of other things (TF-IDF, Keyword Density, PageRank, etc.) More on all that later.
The SEO Industry has evolved, but the SEO hasn’t.
Thanks to too many people to mention, the industry HAS grown up. SEO is taken seriously at almost every fortune 500 company, commands high dollars, and makes an impact for large and small businesses alone. We’ve definitely come a long way from the days of blackhat forums and “tricking the algorithm” to a place where Mark Cuban regularly mentions the industry on Shark Tank. That’s not a small step.
So why are we still talking about TF-IDF, Keyword Density, Domain Authority, Title tag length, and every other outdated SEO metric from the late 90s?
In our race to grow up, we commoditized too much. We borrowed the checklists that worked at the time, trained teams, created product offerings and sold to the C-Suite. Somewhere along the line we stopped learning, we stopped testing, we stopped understanding how a search engine worked and why our tactics worked, and the original checklist authors retired and moved on.
No time like the present.
It’s pretty clear that search no longer works like it did 20 years ago. Why is that? It’s a phenomenon that AJ Kohn calls “Goog enough” but there’s more to it than that. Back in Matt Cutts days, search was mostly lexical and procedural. What does that mean? It means they were mostly looking at the words on the page (lexical) in a formulaic way with a clear set of instructions (procedural.) While today’s search engines haven’t fully abandoned these approaches, they’ve made the full shift to more adaptive (machine learning) semantic (word context. E.g. vectors) models. It’s a complete shift in how search works. A complete re-write of the core algorithms we track so much.
But we haven’t re-written our checklists to match. We’re still thinking about search in a lexical, procedural way. Most of our tools are still using outdated lexical metrics like Keyword Density and TF-IDF or nonsensical metrics like “LSI keywords.” That last one (LSI Keywords) is a real pet peeve because it’s clear that the SEO intent is in the right place, but the method stops short of getting to vectors and transformers like it should.
Search engines haven’t abandoned lexical models completely. We saw it all over the Yandex code leak – but they weren’t using the above. They were using the more modern BM25 metric – and a field weighted version where different parts of the page are weighted differently called BM25F – all over the place in their initial posting lists and rankers before they handed it off to the vector/ml stuff.
Is BM25 very similar to TF-IDF and Keyword Density? YES it is. See the definition below from wikipedia – But that’s not the point. The point is we could have been using this for years, but so far nobody is. There’s literally tens of python libraries that will compute if for you.
Google started using Word2Vec – taking advantage of the transformer and the vector space model as early as 2013 but it’s only recently I’ve seen SEOs talk about and start to use it. Vectors let us understand “meaning” by comparing them with each other using cosine similarity. It’s why we can rank for what many SEOs call “LSI keywords” without having them on the page – because search engines are using the vector model to understand our content is similar enough to those terms meanings.
Google’s BERT was open sourced in 2018, but so far I don’t see any other ( more on that later) SEO tools that actually use BERT out there. Twbert, passageBert and all the other derivatives unlock so much SEO power they should be illegal. Why isn’t anybody using them? Is it because of the coding barrier, or is it because nobody has updated the checklist, or is because we haven’t set aside the time/hours to keep learning? Sure, we can’t do machine learning at the same scale as Google/Bing/Yandex, and we don’t have access to clicks – but we can definitely do the other stuff. So what’s stopping us?
Where is the appetite? When we pitch these topics to conferences many of them say the same thing: “our attendees won’t be interested in that. It’s too technical, it’s not a bulleted checklist format.” That’s not the conference’s fault – their job is to give the audience what they want. My job with this article is hopefully to make those audiences want more.
So Let’s Build The Future
I wanted a tool that looked at SEO the same way search engines do. I looked around a bit, and couldn’t find it (at least not all in one place or not at an affordable price. Some tools ARE doing some cool stuff, but it’s still not enough for what I wanted)
So, I decided to do something about it. After spending the last year reading countless information retrieval textbooks, re-learning python, pouring over the Yandex and Google leaks with my friend Mike King, and reading more ACM research than any human ever should, I decided there’s no reason we can’t use the same metrics in our SEO tools – so I built one. It’s available at SerpRecon.com and I’d love you to check it out. Plans start at just $10/month and everybody gets a 7 day 5 credit free trial. It’s important to me that it be affordable.
What’s different? For starters, we can compare your website to the websites who rank using the above methods: Cosine similarity, BM25, a version of Pagerank, etc. We can then monitor search results over time and see exactly WHY rankings changed. We can optimize copy using both the lexical and semantic models BEFORE it gets posted and crawled. We can extract the relevant competitive keywords using BERT and even predict what text that the search engine will show as a snippet using Passage Bert.
Some other tools out there analyze intent by saying “if keyword contains this word..” that might work fine, but that’s definitely NOT how a search engine does it. It took some time, but using vectors and machine learning I was able to create something much better to guess intent. But this isn’t (just) an ad for SERPrecon – it’s a call to action.
As much as I want you to sign up for my tool, that’s not the point of this article. It’s time for our industry to throw away the old checklists and evolve.. Let’s not just stop putting Matt Cutts at the top of our best SEO lists, let’s stop writing them altogether. Let’s stop arguing about whether subdomains/directories are better and start to understand WHY “it doesn’t matter” really is the right answer.
It’s time to update our mental model of a search engine and our ways of working. It’s time to admit that we really can’t “optimize” something that we don’t understand. Let’s start doing real science and real marketing and real information retrieval. It’s never been easier than it has right now. I’m sure there’s several other SEOs out there innovating using information retrieval techniques. I’ve met a few of you, but I’d love to meet the rest of you.
Ryan Jones runs the SEO practice at Razorfish. As a software engineer turned SEO Ryan is able to understand sites holistically and solve problems from all angles. He’s a regular speaker at several conferences (Pubcon, SMX, State of Search, Digital Summit, etc) and columnist at all the major SEO publications. Recently he just launched his own SEO tool: SERPrecon.com. When not doing SEO he’s usually playing Hockey – and is ranked top 100 in the state of Michigan in Cornhole.