altered_equation

The Implementation of an equation can sometimes change it in not-so-subtle ways

Ian Rogers
IPR Computing Ltd.
ian@iprcom.com

Last edited: 16th May 2002

In my paper PageRank Explained I claim that “Chris Ridings of http://search.engine-submission.co.uk” has made a mistake in a paper on the same subject.

I didn’t want to labour the point in the main paper as I have nothing in particular against Chris but because several people have asked for a clarification, and in the interest of open academic critique (Chris does invite that in the first paragraph of his paper), I’ll describe the problem here.

Chris’ paper has roughly the following structure:

Title: “PageRank Explained”, sub-title: “Everything you’ve always wanted to know about PageRank”
Introductory discussion of Google and PageRank.
A quote from the Brin and Page Google paper, including the original algorithm.
Introduces, for illustrative purposes, a ranking calculation called MiniRank to quote: “which is very similar to PageRank. This should help us understand it.” [PageRank]. The terms MiniRank and PageRank are freely mixed in several sections thereafter though.
Gives advice to webmasters about how to structure links in real web sites based on observations of MiniRank behaviour.

This would be fine, and some of the more general comments in the paper are useful, but a fundamental flaw in the implementation of MiniRank means that it models an equation very different from Google’s PageRank! Most of the conclusion in the paper, therefore, are not applicable to real websites.

This is the Google PageRank equation:

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

This is explained fully in my paper along with a correct method of implementation.

Chris Ridings’ paper uses an implementation of MiniRank that results in a very different equation though.

In the section “The calculation of PageRank”, Chris uses the English phrases like:

“so at the end of the process we’re going to add 0.425 to Page B’s MiniRank and 0.425 to Page C’s MiniRank”

and, in a diagram just below that, Page B has indeed jumped from 1 to 1.425. Page C has had an even more incredible jump from 1 to 3.125 (because of all the other MiniRank that has been added via other links).

When you work through the examples it’s clear the MiniRank equation is something like

PR(A) =PR(A’) + (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

The numbers Chris comes up with spiral ever upwards. This equation will never converge to settled values (unlike the original Brin and Page PageRank equation)!

It’s interesting to note that Ridings never describes a method for terminating the MiniRank algorithm!

The most important effect of altering the equation like this is that Ridings’ analysis of Feedback Loops is completely wrong. This is most clearly seen in the sections “Looping” and “Extensive Interlinking”. The examples start with a MiniRank of 1, but after “a few iterations” they reach the dizzying heights of 469.5883 without any incoming links! A correct PageRank implementation would converge to 1.0 for all these pages.

Conclusion

The MiniRank model may be interesting, but it has no relevance to Google PageRank. Conclusions from the MiniRank model should not be used to influence website design in the real world.


	[Home] [Services] [Portfolio] [Downloads] [White Papers]