I’ve got a question. One of those super-innocent, “Why is the sky blue?” kind of questions that a lot of people stopped asking ten years ago, but I can never seem to. It comes down to this: how does Google work? I don’t know how to code, I don’t know very much about the web, but I use Google every day, all the time, for everything. Everybody does. But it occurred to me that I don’t understand the inner workings of it. So I’m asking you: how does a search engine work?
Search engines have become so ubiquitous to our daily lives that we almost do not notice them. But they are an extremely powerful force that shapes what we experience, when we experience it, and how it gets delivered to us.
The fact is, there is an enormous amount of stuff on the internet. The BBC Science Focus Magazine estimated that the “Big Four” online storage companies (Google, Amazon, Microsoft, and Facebook) store “at least 1,200 petabytes between them.” That does not include smaller companies (Dropbox), or the amount of data stored on government and academic servers. All of that information needs to be sorted in order to bring you relevant results for your search. So how exactly does that work?
Google has automated programs known as “search engine spiders” that crawl through the internet. When a spider reaches a website, it reads through all the text, links, code, and other data on the page. Using this information, the spider provides a profile of that web page to the search engine to be investigated further. It also investigates links from that web page to other pages, in order to understand where the original web page fits in to the larger schema of relevant pages.
When it gets this data, Google then matches the indexed page with its large database of keywords, and ranks the result with an algorithm called PageRank. Although the term is thrown around a lot and may seem a little esoteric, an “algorithm” is just a set of rules to be followed in mathematical or computational calculations.
Google’s PageRank takes a few factors into account when making its decision for ranking a given webpage. For example, it will consider how long the web page has existed for, or the number of web pages that link to that webpage (as a proxy of authority). Additionally, if a page with a high page-rank score links to a page with a low score, the second page’s score will go up.
Indexing web pages in this way has been extremely successful for Google, helping it grow into one of the largest companies on the planet. PageRank is in fact the algorithm that got Google off the ground. The original PageRank algorithm was presented in a paper by Larry Page and Sergey Brin when they were computer science students at Stanford in 1997, and it is still at the core of Google’s functionality.
To Google’s good fortune, the PageRank algorithm scales, meaning that it can index national news events, while simultaneously indexing small businesses looking to break out through local SEO campaigns. Entire industries have grown up around helping people navigate Google’s algorithm, with experts offering to optimize web pages to perform better on Google’s search results.
As more and more of our world moves online, algorithms like PageRank will only become more important to navigating it. Understanding how search algorithms work is key to understanding the way our world works in the modern age.