TU130_PageRank: 2008

20080205

pr

Resources > Articles > A Survey of Google's PageRank: The PageRank Algorithm
A Survey of Google's PageRank: The PageRank Algorithm
Article Index
Introduction
The PageRank Algorithm
The Implementation of PageRank
The Effect of Inbound Links
The Effect of Outbound Links
The Effect of the Number of Pages
The Distribution of PageRank
The Yahoo Bonus
Additional Factors Influencing PageRank
The PageRank Algorithm

The original PageRank algorithm was described by Lawrence Page and Sergey Brin in several publications. It is given by:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where:
PR(A) is the PageRank of page A,
PR(Ti) is the PageRank of pages Ti which link to page A,
C(Ti) is the number of outbound links on page Ti and
d is a damping factor which can be set between 0 and 1.

So, first of all, we see that PageRank does not rank web sites as a whole, but is determined for each page individually. Further, the PageRank of page A

is recursively defined by the PageRanks of those pages which link to page A.

The PageRank of pages Ti which link to page A does not influence the PageRank of page A uniformly. Within the PageRank algorithm, the PageRank

of a page T is always weighted by the number of outbound links C(T) on page T. This means that the more outbound links a page T has, the less will

page A benefit from a link to it on page T.

The weighted PageRank of pages Ti is then added up. The outcome of this is that an additional inbound link for page A will always increase page A's

PageRank.

Finally, the sum of the weighted PageRanks of all pages Ti is multiplied with a damping factor d which can be set between 0 and 1. Thereby, the

extend of PageRank benefit for a page by another page linking to it is reduced.
The Random Surfer Model

In their publications, Lawrence Page and Sergey Brin give a very simple intuitive justification for the PageRank algorithm. They consider PageRank as a

model of user behaviour, where a surfer clicks on links at random with no regard towards content.

The random surfer visits a web page with a certain probability which derives from the page's PageRank. The probability that the random surfer clicks on

one link is solely given by the number of links on that page. This is why one page's PageRank is not completely passed on to a page it links to, but is

devided by the number of links on the page.

So, the probability for the random surfer reaching one page is the sum of probabilities for the random surfer following links to this page. Now, this

probability is reduced by the damping factor d. The justification within the Random Surfer Model, therefore, is that the surfer does not click on an infinite

number of links, but gets bored sometimes and jumps to another page at random.

The probability for the random surfer not stopping to click on links is given by the damping factor d, which is, depending on the degree of probability

therefore, set between 0 and 1. The higher d is, the more likely will the random surfer keep clicking links. Since the surfer jumps to another page at

random after he stopped clicking links, the probability therefore is implemented as a constant (1-d) into the algorithm. Regardless of inbound links, the

probability for the random surfer jumping to a page is always (1-d), so a page has always a minimum PageRank.
A Different Notation of the PageRank Algorithm

Lawrence Page and Sergey Brin have published two different versions of their PageRank algorithm in different papers. In the second version of the

algorithm, the PageRank of page A is given as:

PR(A) = (1-d) / N + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where N is the total number of all pages on the web. The second version of the algorithm, indeed, does not differ fundamentally from the first one.

Regarding the Random Surfer Model, the second version's PageRank of a page is the actual probability for a surfer reaching that page after clicking on

many links. The PageRanks then form a probability distribution over web pages, so the sum of all pages' PageRanks will be one.

Contrary, in the first version of the algorithm the probability for the random surfer reaching a page is weighted by the total number of web pages. So, in

this version PageRank is an expected value for the random surfer visiting a page, when he restarts this procedure as often as the web has pages. If the

web had 100 pages and a page had a PageRank value of 2, the random surfer would reach that page in an average twice if he restarts 100 times.

As mentioned above, the two versions of the algorithm do not differ fundamentally from each other. A PageRank which has been calculated by using

the second version of the algorithm has to be multiplied by the total number of web pages to get the according PageRank that would have been

caculated by using the first version. Even Page and Brin mixed up the two algorithm versions in their most popular paper "The Anatomy of a Large-

Scale Hypertextual Web Search Engine", where they claim the first version of the algorithm to form a probability distribution over web pages with the

sum of all pages' PageRanks being one.

In the following, we will use the first version of the algorithm. The reason is that PageRank calculations by means of this algorithm are easier to compute,

because we can disregard the total number of web pages.
The Characteristics of PageRank

The characteristics of PageRank shall be illustrated by a small example.

We regard a small web consisting of three pages A, B and C, whereby page A links to the pages B and C, page B links to page C and page C links to

page A. According to Page and Brin, the damping factor d is usually set to 0.85, but to keep the calculation simple we set it to 0.5. The exact value of

the damping factor d admittedly has effects on PageRank, but it does not influence the fundamental principles of PageRank. So, we get the following

equations for the PageRank calculation:

PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) / 2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))

These equations can easily be solved. We get the following PageRank values for the single pages:

PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = 0.76923077
PR(C) = 15/13 = 1.15384615

It is obvious that the sum of all pages' PageRanks is 3 and thus equals the total number of web pages. As shown above this is not a specific result for

our simple example.

For our simple three-page example it is easy to solve the according equation system to determine PageRank values. In practice, the web consists of

billions of documents and it is not possible to find a solution by inspection.
The Iterative Computation of PageRank

Because of the size of the actual web, the Google search engine uses an approximative, iterative computation of PageRank values. This means that

each page is assigned an initial starting value and the PageRanks of all pages are then calculated in several computation circles based on the

equations determined by the PageRank algorithm. The iterative calculation shall again be illustrated by our three-page example, whereby each page is

assigned a starting PageRank value of 1.Iteration PR(A) PR(B) PR(C)
0 1 1 1
1 1 0.75 1.125
2 1.0625 0.765625 1.1484375
3 1.07421875 0.76855469 1.15283203
4 1.07641602 0.76910400 1.15365601
5 1.07682800 0.76920700 1.15381050
6 1.07690525 0.76922631 1.15383947
7 1.07691973 0.76922993 1.15384490
8 1.07692245 0.76923061 1.15384592
9 1.07692296 0.76923074 1.15384611
10 1.07692305 0.76923076 1.15384615
11 1.07692307 0.76923077 1.15384615
12 1.07692308 0.76923077 1.15384615

We see that we get a good approximation of the real PageRank values after only a few iterations. According to publications of Lawrence Page and

Sergey Brin, about 100 iterations are necessary to get a good approximation of the PageRank values of the whole web.

Also, by means of the iterative calculation, the sum of all pages' PageRanks still converges to the total number of web pages. So the average PageRank

of a web page is 1. The minimum PageRank of a page is given by (1-d). Therefore, there is a maximum PageRank for a page which is given by dN+(1

-d), where N is total number of web pages. This maximum can theoretically occur, if all web pages solely link to one page, and this page also solely

links to itself.

> > The Implementation of PageRank

20080130

LIST OF KEY WORD

hxhx1hx
hxhx2hx
hxhx3hx
hxhx4hx
hxhx5hx
hxhx6hx
hxhx7hx
hxhxrhx FOR PR5 PR6 PR7 node
63333465x For : PR1 PR2 PR3 PR4

==============================
These Key word use for test update links and for reach google bot to access the data and collect into DB. //
8x52456
dod6x6
e4f3x3
6458521325648522
==============================

20080122

SEO project : Google Pagerank.

หากจะพูดถึงการทำ SEO แล้ว จะไม่กล่าวถึงเจ้า [tag]Google Pagerank[/tag] ก็คงแปลกๆ ไปแล้วครับ ดังนั้น เรามาทำความรู้จัก Google pagerank กันดีกว่าครับ เอาแบบเล็กน้อย ตามที่ผมพอรู้แล้วกัน

Google Pagerank ถือว่าเป็นเอกลักษณ์ ที่ Google สร้างขึ้นเลยก็ว่าได้ (คำว่า Pagerank ที่จะต้องพิมพ์ติดกันนะครับ เพราะถือเป็นคำเดียวกัน )

เจ้า Pagerank ที่ว่านี้ คือ ตัวเลขที่ถูกกำหนดค่าขึ้นมาเพื่อให้คะแนน แก่หน้า เว็บหน้านั้นๆ โดยระบบนี้ถูกสร้างขึ้นที่ Stanford University ใน California ด้วยฝีมือของ [tag]Larry Page[/tag] และ [tag]Sergey Brin[/tag] ที่เป็นผู้ให้กำเนิด Google นั่นเอง

Pagerank นั้น ใช้หลักง่ายๆ ตามแนวคิดประชาธิปไตย ทั่วๆไป (จากแนวคิดของทั้งสองคนนั่นล่ะครับ) บนพื้นฐานของระบบอินเทอเน็ตที่มีการลิ้งค์ไปมา บนหน้าเว็บไซต์ อาศัยจำนวนของ [tag]inbound link[/tag] และ [tag]Outbound link[/tag] จากเว็บไซต์ต่างๆ จำนวนมากมายที่ Google ทำ indexs ไว้นั่นเอง โดย เมื่อหน้าเว็บ A ลิ้งค์ไปยังหน้า เว็บไซต์ B Google จะถือว่า เว็บไซต์ A ได้โหวต ให้คะแนนเว็บไซต์ B ไปแล้ว 1 คะแนน

แต่ตัวของระบบจริงๆนั้น คงไม่ง่ายอย่างนี้แน่ เพราะทุกวันนี้ Google Pagerank ยังคงถือเป็นสิ่งที่อยู่ใน หลุมดำ ที่ยังไม่มีใครรู้ นอกจากตัวของ Google เอง

1 คะแนนของ Inbound link ของเว็บไซต์หนึ่ง นั้น มิใช่ได้มาแค่มีลิ้งค์ หรือว่าเว็บอื่นๆ มีลิ้งค์เชื่อมโยงมาแค่หน้านั้น แล้วจบ Google จะเก็บข้อมูลของ inbound link ที่เข้ามานั้น และตัดสินใจว่า 1 คะแนนที่จะได้นั้น สมควร หรือเหมาะสมหรือไม่ เปรียบได้กับการตรวจสอบบัตรเลือกตั้งว่า เป็นบัตรดี หรือบัตรเสียนั่นเอง

เพียงแต่ 1 คะแนนนั่นจะไม่ใช่แค่ได้หรือไม่ได้ แต่ Google ยังนำมาคำนวณว่า มีน้ำหนักที่ควรจะได้มากแค่ไหนเสียด้วย หากว่า ลิ้งค์ ที่เป็น 1 โหวดนั้น มีค่ามากพอ ก็จะถูกนับคะแนนเพิ่มให้แก่ หน้าเว็บนั้น แต่หากลิ้งค์ที่เป็น 1 โหวตมีค่าแย่กว่าปรกติ ย่อมทำให้มีโอกาสที่จะโดนลดค่า PR อีกด้วย (ถือว่าเป็น [tag]Bad neighbourhood[/tag])

จากนั้น คะแนนที่ได้ จะถูกนำมาคำนวณต่อ เพื่อกำหนดออกมาเป็นค่า PR ตั้งแต่ 0 - 10 นั่นเอง ซึ่งหากมองภาพรวมทั้งระบบการคิด Pagerank แล้ว เหมือนกับการเลือกตั้ง กลายๆ คือ

มีลิ้งค์ เหมือนคะแนนโหวต 1 โหวต คะแนนที่ได้ จะนำมาเช็คว่าเป็นบัตรดีหรือเสีย หรือไม่ออกเสียง จากนั้น ก็จะนับคะแนนทั้งหมด แล้วนำมาคิดเป็น % นั่นเอง

ซึ่งค่า PR ที่ได้ แต่ละระดับนั้น สามารถเช็คได้จาก Google toolbar หรือ ตามเว็บไซต์ต่างๆ ที่มีให้ตรวจสอบค่าดังกล่าว

สำหรับค่า PR=0 และ 1 นั้นสำหรับเว็บไซต์ใหม่ หรือ page ใหม่นั้น ถือว่าเป็นการเริ่มต้นที่ดีที่เดียว เพราะหมายถึงการที่ Google ยอมรับแล้วว่าหน้าเว็บไซต์นั้นๆ ผ่านการตรวจสอบ และเว็บไซต์ที่เกิดใหม่ หลุดพ้นจาก กล่องทราย หรือ [tag]Google Sandbox[/tag] เป็นที่เรียบร้อยแล้ว

ค่า PR จะมิได้ขึ้นตามขั้น จาก 1 ไป 2 จาก2 ไป 3 แต่จะถูกปรับขึ้นลงได้ และวันดีคืนดี PR=0 อาจจะเป็น 3 หรือ PR=6 อาจจะเป็น 1 ก็ได้ ขึ้นอยู่กับ คะแนนโหวตจาก Inbound/Outbond link ,ความสำคัญของลิ้งค์ เป็นต้น

แม้ว่า Google จะได้มาเผยสูตร

PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))

โดยที่ PR(A) คือPagerank ของ เวบเพจ A ,PR(Ti) คือPagerank ของ Ti ที่ลิงค์มายังเวบเพจ A,C(Ti) คือ จำนวนของ Outbound ลิงค์ ของ Ti, และ d คือ Damping Factor มีค่า ระหว่าง 0-1

จุดที่น่าสนใจคือ เจ้าค่า d นี่เอง ที่ถือว่าเป็นค่าตัวแปรที่ถูกหมกอยู่กับ Google เพราะหากมองตัวสมการแล้วพิจารณาง่ายๆ ไม่ต้องนั่งคิดเลขให้ปวดหัว
ค่าที่ถูกนำไปคิดใน สมการนี้ทั้งหมด จะถูกนำมาคูณด้วย d เป็นตัวสุดท้าย ก่อนที่จะเอาไปคิดต่อ

ดังนั้น หาก ในวงเล็บที่ว่า PR(t1)/C(t1) + … + PR(tn)/C(tn) คิดได้ เยอะแยะ แต่ Google ให้ค่า d แค่ 0.1 ก็เท่านั้นครับ
ซึ่ง เจ้าค่า d หรือ [tag]damping factor[/tag] นี่ล่ะครับที่ถือเป็น ค่าที่อยู่ในหลุมดำของ Google จริงๆครับ

PAGERANK Explianation

FROM http://searchengineland.com/070508-152900.php

PAGERANK

From : Wikipedia
================================================================

PageRank is a link analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).

PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank[1]) and later Sergey Brin as part of a research project about a new kind of search engine. The project started in 1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.[2]

The name PageRank is a trademark of Google. The PageRank process has been patented (U.S. Patent 6,285,999 ). The patent is not assigned to Google but to Stanford University.

==================================================================

General description

Google describes PageRank:[2]“ PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important". ”

In other words, a PageRank results from a "ballot" among all the other pages on the World Wide Web about how important a page is. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page.

Google assigns a numeric weighting from 0-10 for each webpage on the Internet; this PageRank denotes a site’s importance in the eyes of Google. The scale for PageRank is logarithmic like the Richter Scale and roughly based upon quantity of inbound links as well as importance of the page providing the link.

Numerous academic papers concerning PageRank have been published since Page and Brin's original paper.[3] In practice, the PageRank concept has proven to be vulnerable to manipulation, and extensive research has been devoted to identifying falsely inflated PageRank and ways to ignore links from documents with falsely inflated PageRank.

Alternatives to the PageRank algorithm include the HITS algorithm proposed by Jon Kleinberg, the IBM CLEVER project and the TrustRank algorithm

==============================================================

PageRank algorithm

Our Search: Google Technology

From : http://www.google.com/technology
Google searches more sites more quickly, delivering the most relevant results.

Introduction

Google runs on a unique combination of advanced hardware and software. The speed you experience can be attributed in part to the efficiency of our search algorithm and partly to the thousands of low cost PC's we've networked together to create a superfast search engine.

The heart of our software is PageRank™, a system for ranking web pages developed by our founders Larry Page and Sergey Brin at Stanford University. And while we have dozens of engineers working to improve every aspect of Google on a daily basis, PageRank continues to play a central role in many of our web search tools.

PageRank Explained

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." Using these and other factors, Google provides its views on pages' relative importance.

Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines dozens of aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.
Integrity

Google's complex automated methods make human tampering with our search results extremely difficult. And though we may run relevant ads above and next to our results, Google does not sell placement within the results themselves (i.e., no one can buy a particular or higher placement). A Google search provides an easy and effective way to find high-quality websites that contain information relevant to your search.

What is PageRank

PageRank is a numeric value that represents how important a page is on the web. Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is is taken into account when a page's PageRank is calculated.

PageRank is Google's way of deciding a page's importance. It matters because it is one of the factors that determines a page's ranking in the search results. It isn't the only factor that Google uses to rank pages, but it is an important one.

(From here on in, we'll occasionally refer to PageRank as "PR".)

Notes:
Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

TU130_PageRank

20080205

pr

20080130

LIST OF KEY WORD

20080122

SEO project : Google Pagerank.

PAGERANK Explianation

PAGERANK

Our Search: Google Technology

What is PageRank

B Series

A SERIEs

KEY WORD

Total Pageviews

page rank article

SEO Forum

Blog Archive

About Me