December 6, 2016

Tactical Keyword Research in a RankBrain World

Keyword Research | Advanced SEO

Summary: RankBrain represents a more advanced way of measuring relevance, built on teaching machines to discover the relationships between words. How should RankBrain change our approach to SEO and specifically to keyword research?

This story starts long before RankBrain, but the action really kicked in around May of 2013, when Google announced conversational search for desktop. At the time, voice search on desktop may have seemed like a gimmick, but in hindsight it was a signal that Google was taking natural language search seriously. Just a few months later the Hummingbird update rewrote Google's core engine, and much of that rewrite was dedicated to dealing with natural language searches.

Why should you care about voice? For most sites, voice is still a relatively small percentage of searches, and you've got other priorities. Here's the problem, illustrated by the most simplistic Google algorithm diagram I've ever created...

If there were two algorithms – one for text search and one for voice search – then, yes, maybe you could drag your feet. The reality, though, is that both text and voice search are powered by the same core algorithm. Every single change Google has made to adapt to natural language searches impacts every search, regardless of the source. Voice has already changed the search landscape irreversibly.

Natural language in action

You may be skeptical, and that's understandable. So, let's take a look at what Google is capable of, right now, in 2016. Let's say you wanted to find the height of Seattle's iconic Space Needle. As a seasoned searcher, you might try something short and sweet, like this...

"Space Needle height"

Google understands this question well enough to attach it to the corresponding Knowledge Graph entity and return the following:

The corresponding organic results appropriately match the informational query and are about what we've come to expect. Google serves this search reasonably well.

"What is the height of the Space Needle?"

Let's try to shake off our short-form addiction and try a natural language version of the same search. I won't repeat the screenshot, because it's very similar, as are the organic results. In 2016, Google understands that these two searches are essentially the same.

"How tall is the Seattle Space Needle in meters?"

Let's try another variant, switching the "What" question for a "How" question, adding a location, and giving it a metric twist. Here's what we get back:

Google understands the question and returns the proper units. While the organic results vary a bit on this one, reflecting the form of the question, the matches remain solid. Natural language search has come a long way.

Build great concepts!

This all may be a bit alarming, from a keyword research perspective. Natural language searches represent potentially thousands of variants of even the simplest queries. How can we possibly operate on this scale as search marketers?

The popular notion is that we should stop targeting keywords and start targeting concepts. This approach has a certain logic. The searches above share a general notion of "tallness," which might look something like this:

"Tall" and "height" are fairly synonymous, words like "size" and "big" are highly related, and units like "feet" and "meters" round out this concept. In theory, this makes perfect sense.

In practice, the advice to target concepts is a bit too much like saying "build great content." It's a good goal, in theory, but it's simply not actionable. How do we build great concepts? We all intuitively understand what a concept is, but how does this translate into specific search marketing tactics?

There's an even bigger problem, and I can illustrate it with one box:

Ok, one box, a logo, and two buttons. At the end of the day, you can't type a concept. Search users, whether they're typing or speaking, have to put words into that box. So, how do concepts, which we all agree exist and are useful, translate into keywords, which I hope we can all agree are still unavoidably necessary?

Language in action, part 2

We need to take a side path on this journey for a moment. Part of rethinking keyword research is understanding that we're no longer bound by an exact-match world. This isn't a bad situation to be in, just a complex one. I'd like to tell a story with examples, showing just how far Google has come in understanding the ways that different keywords relate to each other...

Plurals ("scarf" & "proxies")

While we all know the dangers of keyword stuffing, it originated out of a certain necessity. Search engines simply weren't capable of equating even simple terms, like plurals. Those days are long behind us. Google understands, for example, that a search for "scarf" should also return results for "scarves":

In these examples, I'll be using Google's own highlighting (the bold text; I've added the green boxes) to show where Google seems to understand equivalence or related concepts. Of course, Google's core relevance engine and highlighting engine are not exactly the same, but I think it's safe to say that the latter is a useful window into the former.

Google is also fully capable of understanding the reverse. Let's say, for example, that a "friend" of mine wants to buy proxy IPs. He might search for "proxies":

Google can easily understand even irregular plurals in both directions.

Stemming ("ballroom dancer")

Plurals are relatively easy. Let's step it up a little. Another frequent problem in search is dealing with stemming, which relates to root words and the forms they can take, such as "run" vs. "running." Here's a sample search for "ballroom dancer":

Google is perfectly capable of equating "dancer" to other forms of the word, including "dances," "dance," and "dancing." Once again, keyword stuffing is at best outdated thinking.

Abbreviations ("Dr. Who")

Can Google recognize common abbreviations? Let's try a search for our second-favorite doctor (hint, hint, wink), "Dr. Who":

Google easily makes the connection between "Dr." and "Doctor." Interestingly, none of the organic titles or snippets I see on page one contain the word "Dr."

Acronyms ("SNL skits" & "TARDIS")

How about acronyms? Here's a search for "SNL skits":

Google has no problem interpreting "SNL" as equivalent to "Saturday Night Live." Interestingly, they also understand that "skits" is synonymous with "sketches." What if we spell out an acronym that isn't usually spelled out, such as "Time And Relative Dimension In Space"?

Here, Google is happy to tell us "Hey, nerd, just say 'TARDIS' like everyone else." The six-letter acronym is interchangeable with even the much longer search string.

Acronyms+ ("NJ DMV")

This is where things get interesting. Here's a search for "NJ DMV." Look closely:

Not surprisingly, Google understands that "NJ" equals "New Jersey." There's a problem with this search, though – New Jersey doesn't call their motor vehicle office the DMV, they call it the MVC (Motor Vehicle Commission). Google understands not only how to expand an acronym, but that the acronyms DMV and MVC are conceptually equivalent.

Synonyms ("discount airfare")

The flip-side of no longer being confined to exact-match keywords is that you might just be finding yourself faced with a lot more competition for any given keyword. Let's look at a competitive, commercial query, such as "discount airfare":

Here, "discount airfare" gets matched to "airfare deals," "discount tickets," and "cheapest flights," with even more variations on the rest of page one.

Synonyms+ ("upscale department stores")

Wait, it gets worse. Google can go beyond traditional synonyms. Consider this search for "upscale department stores" (run from my home-base in the Chicago suburbs):

Not only does Google recognize that "upscale" is synonymous with "luxury," but they've matched on actual examples of luxury department stores, including Bergdorf Goodman, Saks Fifth Avenue, and more.

Answers ("Doctor Who villains")

We've moved from simply synonyms to a world of answers. Here's another example, a search for "Doctor Who villains":

It's a parlor trick to tell you that "villains" is synonymous with "monsters" and "enemies." What you really want to know is that Doctor Who's rogue's gallery includes Daleks, Cybermen, and Weeping Angels. Google can make this connection.

These aren't just exceptions

It's easy to cherry-pick examples, but are these edge cases or the new normal? I ran an analysis on 10,000 keywords (page one only) and found that only 57% of results had the search phrase in both the title and snippet. I used a pretty forgiving match (allowing for plurals, for example) and the keyword set in question is mostly shorter terms, not long-tail queries. I also allowed the terms to occur in any order. Keep in mind, too, that display snippets aren't always META descriptions – they're chosen by Google to be good matches.

All of this is to say that, even with a fairly forgiving methodology and a loose definition of a "match," just over half of page-one results in my data set matched the search query. The examples above are not outliers – they are our immediate, unavoidable SEO future.

The Algorithm is learning

This deep into the article, you may be wondering what any of this has to do with RankBrain. There's been a lot of speculation around RankBrain, and so I'm going to do my best to work from the facts as we understand them. You're going to need some essential background information...

What, exactly, is deep learning?

First, the one thing we all seem to be able to agree on is that RankBrain uses machine learning, thus the "brain" part. Specifically, RankBrain uses "deep learning." So, what is deep learning? According to Wikipedia:

Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.

Crystal clear, right? To understand deep learning and the state of modern machine learning, you have to understand neural networks. Let's start with a simple neural network, the kind that were popular in the early 1990s:

Neural networks were built on a basic understanding of the human brain as a system of "nodes" (neurons) and connections between those nodes. At scale, the human brain is capable of learning incredibly complex ideas using this system of nodes and connections.

So, how do we put this model to work? Let's start with what's known as "supervised learning." In a neural network like this, we have a known set of inputs and a desired set of outputs. Given a certain X, we want to teach the system to return Y. We use these inputs and outputs to train the system, gradually weighting the connections. The hidden layer adds computational complexity, giving the machine enough connections to encode interesting data.

Training itself uses methods that are cousins of linear regression (at the risk of oversimplification). Over a large set of inputs and output, we want to minimize the error of our model. In some cases, we work backward from the output(s) back to the input(s), in much the same way you might work a difficult paper maze from the finish back to the start.

Why go to all this trouble? If we know the inputs and outputs (sticking just to supervised learning, to keep this simple), why don't we just have a lookup table? If X, then Y – simple. What happens when we get an input that isn't in the table? The system fails. The magic of neural networks is that, if the system is properly trained, it can return outputs for completely new inputs.

To make a very long story only medium-long, these simple neural networks were interesting playthings, but weren't capable of solving many complex problems. So, we put them aside. Then, the inevitable happened – computing power increased exponentially and got cheaper (thanks, Gordon Moore!). Specifically, we invented the GPU. You might think of the GPU as something built for gamers, but it is, in essence, a very powerful math machine.

At some point, simple neural networks scaled up massively, and I mean massively – on the order of 1,000,000X larger. These new machines were able to perform much more interesting tasks, and a new age of neural networks was born. These new machines required more complex methods, and thus, at the risk of oversimplifying a very complex topic, deep learning was born.

How does Google use deep learning?

Fortunately, we know a bit more about RankBrain. In Steven Levy's excellent article about Google's machine-learning ambitions, he quotes the following from Jeff Dean, head of the broader Google Brain group...

By early 2014, Google’s machine learning masters believed [Amit's approach] should change. “We had a series of discussions with the ranking team,” says Dean. “We said we should at least try this and see, is there any gain to be had.” The experiment his team had in mind turned out to be central to search: how well a document in the ranking matches a query (as measured by whether the user clicks on it). “We sort of just said, let’s try to compute this extra score from the neural net and see if that’s a useful score.”

Amit Singhal, the head of Google's Search team until early 2016, pioneered the heuristic approach – what we might call the "ranking factors." Machine learning (ML) advocates at Google eventually were able to convince the team to test ML in a ranking context. By all accounts, that experiment went very well and the score was indeed useful.

It's also worth noting that Amit, who was reported to be skeptical of using ML in organic search, left Google and was replaced by John Giannandrea, who was instrumental in many ML projects at Google. I won't speculate on Amit's motivations, but the shift in leadership to a strong ML advocate clearly implies that Google considered the RankBrain experiment a success.

Of course, it begs the question: How exactly are ML and deep learning in play in organic search? Google teaches a deep learning course on Udacity, and I was intrigued to find this screenshot in a quiz. The quiz asked how Google might use deep learning in rankings, and this was the answer:

When we train an ML model, the "classifier" is essentially the resulting decision machine. In this case, that classifier takes in a search term and web page as inputs and decides how relevant they are to each other.

Two things are worth noting in this deceptively simple screenshot. First, ML is being used as a relevance engine. I think it's safe to say that the quiz is not entirely hypothetical. Second, notice the query and the matching page. The query is "Udacity deep learning", but the matching result title contains the related phrases "machine learning" and "supervised learning." This is starting to look like some of the examples we saw earlier.

Another resource we have is the original Bloomberg article about RankBrain, which is still one of the more comprehensive pieces on the subject. The article quotes senior Google research scientist Greg Corrado and makes the following very specific claim:

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities – called vectors – that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

Again, RankBrain is being called out as essentially a relevance engine, a machine for better understanding the similarities and relationships between words. What are these vectors the article mentions, though? In the general sense, vectors are a mathematical concept – a point in space with both direction and magnitude. Vectors are a way of encoding complex information.

Thankfully, we have another clue, from Google's public ML project, TensorFlow. One of Google's side projects is a library called Word2Vec that, as the name implies, uses ML to convert words into vectors. Traditional methods of encoding words for information retrieval can deal with simple problems like pluralization and stemming, but have little or no sense of relationships. Word2Vec and similar models are capable of learning relationships like the examples below (Source: Tensorflow.org, ©2016 Google):

Here, Word2Vec has learned that the relationship between man and woman is the same as the relationship between king and queen (encoded in the direction of the vector). Similarly, the relationship between the verb tense walking to walked is the same as the relationship between swimming and swam. More importantly, these rules didn't need to be specified. The machine learned them by studying large collections of real words in context.

Google's actual algorithms are almost certainly more complex than the publicly available Word2Vec library, and researchers have combined vector-based approaches with other approaches, such as the more familiar LDA (latent dirichlet allocation), but it seems very likely that an approach like this is in play in RankBrain.

RankBrain is NOT query translation

It's easy to mistakenly jump to the conclusion that RankBrain simply translates unfamiliar queries into more familiar ones, or long queries into short queries. This is not the case. RankBrain seems to operate in real-time and can compare multiple versions of a search phrase at once.

If I mistakenly type a search like "Benedict Crumblebatch," Google will tell me this:

In this case, Google has tried to interpret my intent and has replaced my query with what it thinks is a better version. This is query translation. In this case, all of the results match the translated query and it overrules my original search.

Revisiting an example from above, if I search for "scarf," I can get back matches on both "scarf" and "scarves" (even in the same result):

Google is not translating "scarf" --> "scarves" and then returning matches on the new term. Google is applying a powerful relevance engine that recognizes these matches in real-time.

Are we sure it's RankBrain?

Let me be clear on one thing – relevance is a very complex process, and it's hard to know for sure where traditional information retrieval methods end and RankBrain begins. I can't say with certainty that all of the examples I showed previously represent RankBrain in action.

However, there is one more piece of evidence. Remember the "NJ DMV" example? Google was able to understand that "DMV" (Department of Motor Vehicles) and "MVC" (Motor Vehicle Commission) are equivalent concepts in New Jersey.

Our data science team, led by Matt Peters, put together an ML prototype that uses a method similar to Word2Vec. If you input search terms into this tool, it looks at the corresponding Google results and calculates the similarity between those results and the original query:

This screenshot has been edited, but the data is real. What the tool is saying is that a page with the title "State of New Jersey - Motor Vehicle Commission" is a good match (93%, although the system is a little forgiving) for "NJ DMV." The fact that we can train an ML system to perform this task doesn't prove RankBrain does it, but it does at least show that it is well within Google's ML capabilities.

When did RankBrain roll out?

Please note that RankBrain is often tied to the announcement date in October of 2015, but that article also says that RankBrain was in play "for the past few months." Steven Levy's article on ML in Google gives a date of April 2015 for the rollout, and we believe that timeline is accurate. RankBrain has probably been in play for at least 1 1/2 years at the time of this writing.

How do we adapt to RankBrain?

In a world where Google can understand stemming, synonyms, and even answers, how do we approach keyword research? Let's go back to our Space Needle example. I'm going to use Moz's Keyword Explorer as a backdrop for the rest of this discussion. Let's say I fire up my trusty keyword research tool and enter the phrase "space needle height":

Even out of the gate, we've got 1,000 keywords to deal with, many of which are fairly similar. How do we go about targeting these 1,000 variations?

Option 1 is to write 1,000 pages, each laser-targeted at a single phrase. We know, practically, that either this is going to be a huge amount of work or is going to lead to thin content. Sites filled with templated pages that only vary by a few keywords are a lousy user experience and prime bait for Google's Panda algorithm.

Option 2 is to take as many of these phrases as possible and just stuff them into a single paragraph. I've done this for you, and here's the kind of result you can expect:

SPACE NEEDLE HEIGHT
The Space Needle height (Seattle) is 605 feet. The Space Needle height in stories is just over 60. It’s interesting to note that the Space Needle height comparison to the Empire State Building is about half as high. In contrast, the Seattle Space Needle height comparison to Chicago’s Willis Tower is only about one-third the height.

The bolded phrases are my target phrases. I hope we can all agree that this isn't optimal content crafting if our goal is to convince our audience that we're a credible source of information.

I propose a third option. You may have noticed a pulldown in Keyword Explorer for [Group Keywords]. This does exactly what it sounds like it does. Let's take all of these very similar keywords (and you could do this by hand as well, if you're willing to put in the time) and try to group them. We end up with something like this:

The system has tried to bucket the keywords into broader, more useful groups, allowing us to ignore some of the minor variants. So, let's pick three groups from this list:

"space needle height"
"space needle height in stories"
"space needle how tall"

What if we chose representative, natural language phrases within each of these groups? Think of them as exemplars of the group. We might pick something like this:

"height of the Space Needle"
"Space Needle is ___ stories"
“How tall is the Space Needle?”

Now, let's craft a paragraph around these more natural, diverse phrases:

HOW TALL IS THE SPACE NEEDLE?
The height of the Space Needle in Seattle, Washington is 605 ft. (184 m), including the antenna. Interestingly, while the Space Needle is approximately 60 stories tall, it only occupies 6 floors, with most of the tower being structural. While it was once the tallest building in Seattle, the Space Needle now ranks only 7th.

Not only have we written a paragraph that might actually be valuable to humans, but we've covered our three target phrases and even had room for a fourth ("tallest building in Seattle"). What's more, each of these phrases represent groups of dozens or hundreds of similar keywords. By writing to the groups or broader concepts instead of narrowly targeted phrases, we're able to cover many keyword variants efficiently.

3 Gs: Gather, Group, Generate

I've taken to calling this approach to keyword research the 3 Gs, and it goes likes this:

Gather keywords
Group keywords into clusters
Generate exemplars

Another way to think of this process is that we're grouping keywords into concepts, and then converting each concept back into a representative keyword/phrase: Keyword --> Concept --> Keyword*. The result is a specific search phrase to target, but that phrase represents potentially dozens or hundreds of similar keywords.

Let's work through another example, but one with commercial intent. Pretend you're working in the Seattle apartment space and are looking to write an article about rental costs. Just to pick a starting point, you enter "Seattle rental prices" into your keyword research tool of choice and gather your keyword list:

Naturally, we get back a list of related but sometimes very similar keywords. Even in this list, we can start to see some interesting variations ("average rent", prices by year, mapped prices, etc.), but let's take it to step two and group these keywords:

In a real-world keyword research scenario, we'd want to thoroughly explore all of the groups, but I've picked three for now that caught my eye (underlined in green). They are:

"Seattle average rent by neighborhood"
"Seattle housing prices skyrocket"
"cheapest Seattle apartments"

How do we go about generating an exemplar from each group? Sometimes, intuition is fine. For example, the keywords our system has grouped under #2 turn out to be a bit of an odd mix, but I really like how "skyrocket" resonates and "housing prices" is a good keyword variant, so I'll pick a phrase. For something like #3, we may choose to just see what variation has the highest potential for traffic. In Keyword Explorer, we can simply expand that group, select the keywords, and add all of them to a list, like this:

Once the stats for the list are collected, we can take a look and see that "cheapest apartments in Seattle" has both the highest traffic volume and Keyword Potential, according to our metrics:

For the final group ("Seattle average rent by neighborhood"), I browsed the grouped keywords, and one caught my eye: "average rent downtown seattle." I like this one because it's specific to an actual neighborhood, although we might choose to craft content around some kind of neighborhood-by-neighborhood theme as well. What I like about trying to understand our keywords as groups/clusters is that it's also a great process for generating content ideas.

So, let's put some exemplars against our three groups. We might end up with something like this:

"average rent in downtown Seattle"
"Seattle housing prices are skyrocketing"
"cheapest apartments in Seattle"

These are all rich phrases that we can use to craft content, and they're built on a logical framework of keyword research. Even using just this single list, our system claims these three groups represent at least 64 keyword phrases. Factoring in the long-tail, they potentially represent hundreds more.

Eventually, we may have ML tools that can take large groups of related phrases and help find the perfect exemplar. Even now, Keyword Explorer's grouping engine is built on ML. There will come a time very soon when ML is part of our everyday work as SEOs.

There's a fourth, unofficial G: Gap. As our British friends might say, mind the gap. The exemplars you build in this process are meant to be natural-language phrases that represent dozens of keywords, but our understanding of a concept and Google's won't always match, and some searches you hoped you'd rank for will fall through the cracks. It's important to continue to monitor and track a large set of keywords. If you see that some aren't improving, consider generating new exemplars or targeting them separately. This is an iterative process, and we still have to get our hands dirty with real searches every day.

Bonus: Keyword brainstorming

Here's something fun to try. In Keyword Explorer, you can specifically request keyword phrases that contain none of the words in your original phrase. Why would you want to do this? It can help you find related concepts that you might not have considered.

From the [Display keyword suggestions that] pulldown, select "exclude your query terms to get broader ideas." Here are some of the results I get on a search for "Seattle rental prices" with grouping on (I've edited this list a bit just to show some of the more interesting results in the space allowed):

Some of these are obvious (although still interesting), like searches that use specific neigbhorhood names (e.g. "best Capitol Hill apartments"). Some are less obvious and open up some new avenues. "Kirkland apartments under $1000" reminds us that both neighborhood and price sensitivity matter in similar searches. These are aspects we can't ignore in our broader keyword research on this topic.

The second to the last is really interesting, IMO: "apartments near Amazon headquarters." Being such a big employer (we know all too well, given the competition for talent in Seattle), a content focus on just apartments near Amazon's headquarters could get a lot of traction. Finally, while it's not the most useful topic or keyword to target, "too damn expensive" is certainly a good headline phrase to tuck away.

Why not just write for people?

If Google is really understanding natural language searches and becoming more intelligent, why don't we just write content for people and forget about this whole process? It's a fair question. If your choices are 2005-era keyword stuffing and thin content or writing for people, then please, for the love of all that this is holy, write for your human site users (and, by extension, search users).

There's a problem, though, and it's probably easier to show than tell...

Google has come a long way in their journey from a heuristic-based approach to a machine learning approach, but where we're at in 2016 is still a long way from human language comprehension. To really be effective as SEOs, we still need to understand how this machine thinks, and where it falls short of human behavior. If you want to do truly next-level keyword research, your approach can be more human, but your process should replicate the machine's understanding as much as possible.

I hope you'll give the 3 Gs a try and let me know what you think. I'll freely admit I'm biased and hope you'll also give Keyword Explorer a try, if you haven't yet (and if you have, test out some of the new tricks I've talked about).