The 15th of June 2006 was a momentous day for a host of reasons. England actually won a game of international football (2-0 vs Trinidad & Tobago), Bill Gates announced he was stepping down as Microsoft’s top dog and the verb “to google” was introduced in the Oxford English Dictionary.
When in need of advice, help or answers, most of us turn to Google before asking our parents, our most trusted friends, or even applying our own common sense, but how does the magical search engine work? For any given search item, there are trillions of webpages on the internet. How does Google filter these web pages and produce a list of the top few pages that return exactly what you want?
Algorithms make the world go round
Google does this using a number of different algorithms, which are simply computer programs that look for certain clues to organise the results that you see. The most important algorithm used by Google is known as PageRank.
PageRank was invented by Larry Page and Sergey Brin when they were only in their early 20s at Stanford, and has become one of the most famous algorithms in computer science. It was such an integral part of Google’s search algorithm when it was founded that it is sometimes referred to by the slang term “Google juice.”
PageRank in a nutshell
PageRank is a way of ranking the importance of web pages on the internet, by assigning each website a number known as its PageRank. Page and Brin’s theory is that the most important pages on the internet are those with the most hyperlinks leading to them. Each link to a given page thus counts as a vote of support and the page is ranked higher if there are more links to it.
Vote quality, not quantity?
All votes, however, are not equal. A link from a more important page carries a higher weight than a link from a less important page. This means that links from more authoritative sites with a higher PageRank, like google.com, cnn.com, stanford.edu, thefullapple.com carry more weight.
It’s like getting a vote of support from a more influential person: Hillary Clinton getting endorsed by Barack Obama counts more than Hulk Hogan endorsing Trump. Or at least it should.
The size of each blob is proportional to the number of blobs pointing to it – like their page rank. Even though the green and blue blobs both have 5 links pointing towards them, the blue blob is bigger since it has more important links pointing to it.
If a page has a lot of links from it to other pages, then the value of its vote of support will lessen. This makes sense: if a page votes for too many pages, then it’s inflating the value of its support, making it less authoritative.
Inspiration from academia
This concept was revolutionary at the time, since it does not depend on the content of the page. Earlier search engines believed that the best way to rank pages was by filtering out their content for keywords. Not Page and Brin however! They quickly realised that content can be easily modified by spammers with business and financial interests tied to boosting the ranking of their webpages, but modifying links to a page from reputable sites is not so easily spammed. Page and Brin thus ushered in the link theory revolution.
The fundamental idea behind this comes from academia, where the importance of a research paper or a researcher is usually determined by the number of citations to that paper. The more times a particular research paper is “cited” by other papers, the more important the paper is considered to be.
How is the PageRank calculated?
Right. So now we know roughly what PageRank is. You may notice however that we’re stuck with an endless cycle of re-evaluating the importance of the entire web. For example, to find the PageRank of our beloved thefullapple.com, we first have to find the PageRanks of other pages linking to it in order to determine their importance, which in turn depend on the PageRanks of other pages and so on. So how do we calculate all the PageRank in the first place?
This is done by using an iterative process. The PageRank of each page is first initialised to some starting value, and we keep applying the PageRank formula over and over again, until finally the numbers stop changing so much. This process is known as convergence.
The final value of our thefullapple PageRank is the probability that a user randomly browsing the web will arrive to view this excellent article on our site.
Of course, there’s a plethora of websites online offering to calculate your PageRank for you, but we’ll never actually know for certain – Google now refuses to release official PageRanks in order to prevent attempted manipulations of the system. Not that that stops people trying.
What are the flaws with PageRank?
Though the PageRank algorithm was revolutionary, nothing is perfect. Although no one knows their official PageRank for certain, once people got to know the basic idea behind the algorithm they did start to manipulate the system in order to boost their own websites in the ranking.
An example of this are Google bombs, which are collective efforts to create hyperlinks to a site using the exact same phrase. This ensures that when you Google that phrase, you’re bound to get that site.
Our favorite Google bomb was the phrase “miserable failure”, which was created in 2003 and bombarded to rank the biography of George W Bush as the top search result for that phrase. This manipulated the page rank of the page, since the phrase “miserable failure” does not actually appear anywhere within his biography!
Another example is link farming, which involves creating web pages which simply contain a large number of random links with no apparent meaning. If you stumble upon a page that is nothing but random links to other websites, congrats, you’ve found a link farm. These are often automated and created by bots.
Google Bombs and link farms have now been taken into account by Google in their search algorithm, allowing them to filter out such processes so that they do not affect the search rankings of such pages.
Enough words, can we visualise this?
Google’s PageRank algorithm, though extremely sophisticated and a revolutionary concept at the time, is very easy to visualise, by viewing the entire web as a giant graph.
In this picture, each page is a node in the graph and each hyperlink is an edge connecting two nodes, resulting in this beautiful interconnected network – the world wide web.
You can visit an interactive version of this graph at Internet-map.net. As you might expect, the biggest nodes are the most linked and visited sites and include the usual suspects: Google, Facebook, Youtube and Yahoo.
A patented secret
There you have it – the basic premise behind the PageRank algorithm. Now that you understand it, can you create your own search engine? Not exactly….
PageRank is patented, and we were surprised to find that the patent for PageRank is actually not assigned to Google, but to Stanford University – where Page and Brin started it all. That doesn’t mean anyone can access it – Google has exclusive license rights on the patent. Stanford received 1.8 million shares of Google in exchange for use of the patent which were sold in 2005 for $336 million – not too shabby at all.
While PageRank forms the basis of Google’s ranking system, and was the first algorithm ever to be used by Google, the exact search algorithm used by Google is slightly more complicated, taking into account many more factors. Some of the other factors that might affect the ranking system include how long the site has been around, how powerful the domain name is, and how and where certain keywords appear on the site.
Google’s exact search algorithm is a secret, one of Google’s trade secrets that allowed it to shoot to success as the number one search engine in the world.
Understanding PageRank is just the start to unlocking Google’s vast capabilities: who knows what other secrets the Googlebots are keeping up their sleeves? From reverse image searches to our phone secretly listening in on our conversations, we’ve already seen huge development in the way Google helps us find what we’re searching for.
But one thing is clear: by allowing links to be analysed in a meaningful way, PageRank has made search engines what they are today.
The original paper written by Larry Page and Sergey Brin submitted to Stanford University can be found here.