How Google Page Rank is Calculated?

This is where it gets tricky. The PR of each page depends on the PR of the pages pointing to it. But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on… And when you consider that page links can form circles it seems impossible to do this calculation!

But actually it’s not that bad. Remember this bit of the Google paper:

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

What that means to us is that we can just go ahead and calculate a page’s PR without knowing the final value of the PR of the other pages. That seems strange but, basically, each time we run the calculation we’re getting a closer estimate of the final value. So all we need to do is remember the each value we calculate and repeat the calculations lots of times until the numbers stop changing much.

Lets take the simplest example network: two pages, each pointing to the other:

Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1). Guess 1

we don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations: d = 0.85 PR(A) = (1 â€" d) + d(PR(B)/1) PR(B) = (1 â€" d) + d(PR(A)/1)

i.e. PR(A) = 0.15 + 0.85 * 1 = 1 PR(B) = 0.15 + 0.85 * 1 = 1

Hmm, the numbers aren’t changing at all! So it looks like we started out with a lucky guess!!! Guess 2

No, that’s too easy, maybe I got it wrong (and it wouldn’t be the first time). Ok, let’s start the guess at 0 instead and re-calculate: PR(A) = 0.15 + 0.85 * 0 = 0.15 PR(B) = 0.15 + 0.85 * 0.15 = 0.2775 NB. we’ve already calculated a “next best guess” at PR(A) so we use it here

And again: PR(A) = 0.15 + 0.85 * 0.2775 = 0.385875 PR(B) = 0.15 + 0.85 * 0.385875 = 0.47799375

And again PR(A) = 0.15 + 0.85 * 0.47799375 = 0.5562946875 PR(B) = 0.15 + 0.85 * 0.5562946875 = 0.622850484375

and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation over-shoots and goes above 1.0?

Guess 3

Well let’s see. Let’s start the guess at 40 each and do a few cycles:

PR(A) = 40 PR(B) = 40

First calculation PR(A) = 0.15 + 0.85 * 40 = 34.15 PR(B) = 0.15 + 0.85 * 34.15 = 29.1775

And again PR(A) = 0.15 + 0.85 * 29.1775 = 24.950875 PR(B) = 0.15 + 0.85 * 24.950875 = 21.35824375

Yup, those numbers are heading down alright! It sure looks the numbers will get to 1.0 and stop

Here’s the code used to calculate this example starting the guess at 0: Show the code | Run the program

* Principle: it doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “normalized probability distribution” (the average PageRank for all pages) will be 1.0