20071228

Google searches more sites more quickly, delivering the most relevant results.

Google searches more sites more quickly, delivering the most relevant results.

Introduction

Google runs on a unique combination of advanced hardware and software. The speed you experience can be attributed in part to the efficiency of our search algorithm and partly to the thousands of low cost PC's we've networked together to create a superfast search engine.

The heart of our software is PageRank™, a system for ranking web pages developed by our founders Larry Page and Sergey Brin at Stanford University. And while we have dozens of engineers working to improve every aspect of Google on a daily basis, PageRank continues to play a central role in many of our web search tools.

PageRank Explained

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." Using these and other factors, Google provides its views on pages' relative importance.

Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines dozens of aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.

Integrity

Google's complex automated methods make human tampering with our search results extremely difficult. And though we may run relevant ads above and next to our results, Google does not sell placement within the results themselves (i.e., no one can buy a particular or higher placement). A Google search provides an easy and effective way to find high-quality websites that contain information relevant to your search.

20071223

Google PageRank คืออะไร ?

icphysics บันทึก: Google PageRank คืออะไร ?
Google PageRank คือ วิธีการวัดความสำคัญของเว็บเพจนับล้านๆเว็บเพจบนอินเตอร์เน็ท โดยมีตัวเลขตั้งแต่ 0 ถึง 10 ยิ่งตัวเลขยิ่งสูง PageRank ก็ยิ่งสูง นั่นหมายความว่า เว็บไซต์นั้นๆมีโอกาสได้รับการจัดอันดับที่ดีกว่าเว็บไซต์ที่มี PageRank ต่ำกว่า

โดยเราสามารถทราบค่า PR ของเว็บไซต์เราได้โดย download และ install google toolbar(http://toolbar.google.com)หลังจากนั้นคุณจะสามารถดูคะแนน PR ของคุณที่จัดโดย google
ท่านสามารถนำตัวแสดง PR ของเว็บท่านมาแปะที่เว็บเพื่อให้คนเข้าชมเว็บท่านได้เห็นค่า PR ของเว็บท่านได้ โดยผู้เข้าชมเว็บไม่ต้อง install google toolbar ตัวอย่างของเว็บนี้

ขอโค้ดได้ที่นี่
Google PageRank Indicator
http://www.meelink.com/Google-PageaRank-Indicator.php

ที่มา: http://www.mindphp.com/modules.php?name=News&file=article&sid=23

ขออธิบายง่ายๆนะครับ เพราะถ้ายาว คนเพิ่งเริ่มต้น จะงงครับ

ขออธิบายง่ายๆนะครับ เพราะถ้ายาว คนเพิ่งเริ่มต้น จะงงครับ

Pagerank เป็นการจัดลำดับไซต์ที่เด่นๆทางการจัดการ (seo) เวลาทำเว็บก็ควรจะมีระบบการจัดการที่ดีครับ เช่น

- แลกลิงค์กับเว็บอื่น ทำให้มียอดการเข้าออกเว็บสูงขึ้น
- ทำ sitemap ให้พวก Search Engine มันมาเก็บข้อมูลในเว็บไปลงฐานข้อมูลของมันครับ
- ตั้งชื่อ link ให้ชัดเจนครับ เช่น ข้อมูลหน้านั้นเกี่ยวกับ google ก็ควรตั้งชื่อว่า google.html ไม่ควรขี้เกียจครับ บางคนขี้เกียจตั้งชื่อ ก็ตั้งชื่อเป็น 1.html หรือ aaa.html ซึ่งไม่มีความหมายที่ชัดเจน ซึ่งวิธีแบบนี้เป็นการมองข้ามการทำ seo ครับ

ยิ่งมี Pagerank สูง ก็ยิ่งหมายถึงว่าเว็บเราดีครับ Google มันยอมรับ และอันดับใน Google จะขยับ เวลามีคนมาค้นหาคำว่าอะไรก็จะเจอเว็บเราเป็นอันดับต้นๆครับ ดูอย่าง Google ในเว็บผมนะครับ มาทั้งวันทั้งคืนเลย ไม่ยอมไปไหน มันกำลังทำการเก็บข้อมูลในเว็บครับ เปิดมาได้ 3 วัน Google ติดหนึบเลยครับ ไม่ยอมออกจากเว็บเลย แบบนี้แจ่มมากๆ ^^

ผมทำฮิปโปน้อยมาทั้งสิ้น 110 วัน โดยประมาณครับ เพิ่งหลุดจากหลุมทรายออกมาได้ จาก 0 ขึ้นมา 2 เลย ทำให้ผมดีใจมากๆ เพราะตอนแรก 0 ถ้าขึ้นมา 1 ก็ดีใจแล้ว นี่เล่นขึ้นมา 2 ขีดเลย ทำเว็บมาไม่กี่เดือนเอง มี 2 ขีด ก็ดีใจจะตายแล้วครับ

ถ้าอยากมองเห็นว่าเว็บไหนมีกี่ขีด ก็ต้องใช้เครื่องมือช่วยเสริมตัว browser ที่ใช้อยู่ครับ

Internet Explorer - http://toolbar.google.com/T4/
Firefox - http://www.google.com/tools/firefox/toolbar/index.html

ดาวน์โหลดมาติดตั้งได้เลย ตอนติดตั้งมันจะมีให้เราเลือก Enable PageRank ก็ให้เลือกมันนะครับ

Google Toolbar เป็น Webmaster Tool อีกตัวที่คนทำเว็บจำเป็นต้องมีติดตั้งไว้ในเครื่องครับ

เรื่อง Seo ผมไม่เก่งครับ ถ้าอยากศึกษาต้องหาข้อมูลจากใน google

Page Rank หรือ PR คืออะไร

PR หรือ Page Rank คือ วิธีการจัดลำดับความสำคัญเว็บเพจทั่วโลก จาก Google โดยให้ชื่อว่า Google Page Rank การวัดค่าการจัดลำดับความสำคัญนี้ Google ได้กำหนดไว้ตั้งแต่ 0 ถึง 10 ยิ่งตัวเลขสูงเท่าไหร่ ค่า Page Rank หรือ PR จะยิ่งสูงเท่านั้น และเป็นเหตุให้ได้รับการจัดลำดับที่ดีกว่าจาก Google.

เราจะสามารถทราบว่าเว็บของเรา หรือ เว็บต่าง ๆ หรือ Blog ต่าง ๆ มีค่า PR เท่าไร ได้โดยการดาวน์โหลดและทำการติดตั้ง Google Toolbar หลังจากติดตั้งเสร็จคุณก็จะได้ IE ที่มีการแนบ Google Toolbar ที่สามารถเช็ค PR หรือ Page Rank ของเว็บต่าง ๆ ได้โดยการเข้าไปที่เว็บนั้น ๆ และให้เราสังเกตุ (ดังภาพ) จะพบว่าเว็บนั้น ๆ มีค่า PR หรือ Page Rank เท่าใด.

Page Rank

แล้วถ้าเราไม่อยากติดตั้ง โปรแกรม Google Toolbar หละจะสามารถเช็คได้หรือไม่ ขอตอบว่า ได้แน่นอนครับไม่ยุ่งยากอะไร เพราะมีคนที่คิดค้นโปรแกรมสำหรับการตรวจค่า Page Rank หรือ PR มาให้บริการแบบออนไลน์มากมายในที่นี้ขอยกตัวอย่างที่ SEO Chat ซึ่งจะมีเครื่องมือต่าง ๆ มากมายให้เราสามารถช่วยในการตรวจค่า Page Rank หรือ PR และยังมีเครื่องมือสำหรับการทำ Search Engine Optimization (SEO) อีกด้วย

การแสดงค่า Page Rank หรือ PR จะแสดงทุกหน้าเพจที่ได้รับการจัดลำดับ หรือ ได้รับการบันทึกลำดับจาก Google

วิธีการจัดลำดับ PR และการคำนวนค่า PR ของ Google ทางกูเกิลเองจะทำการคำนวนจากลิงค์ของเว็บไซต์อื่นที่เชื่อมมายังเว็บไซต์เรา หรือ หน้าเพจนั้น ๆ ของเราลักษณะนี้เรียกว่า Inbound Link โดยจะคำนึงถึงคุณภาพของลิงค์ที่มาเป็นสำคัญ ยิ่งถ้าหากว่าคุณสามารถ ไปเชื่อมต่อลิงค์กับเว็บที่มีเนื้อหาไกล้เคียงกัน หรือ คลายคลึงกันก็จะทำให้ค่า PR สูงขึ้นได้ และข้อสำคัญ ถ้าหากเว็บที่ทำการลิงค์มาหาคุณมีค่า PR สูง ๆ ด้วยหละก็คุณก็จะได้ค่า PR สูงไปด้วย เรียกว่าทางลัดเลยหละ

การเพิ่มค่า PR คุณสามารถดำเนินการได้โดยการเข้าไปขอเพิ่มลิงค์หรือ เพิ่มไดเร็คทอรี่ลิงค์ หรือ Blog Directory มายังเว็บหรือ Blog ของคุณ แต่การปรับระดับเพิ่มขึ้นนั้น ไม่ได้อาศัยเพียงการลิงค์อย่างเดียว เช่น คุณอาจมี Inbound Link มาหาคุณ 50 ในเดือนนี้ทำให้คุณมีค่า PR3 แล้วพอเดือนหน้าคุณไปทำการเพิ่มลิงค์อย่างนี้อีกประมาณ 100 Inbound Link การปรับระดับครั้งต่อไปอาจไม่ทำให้คุณมีค่า PR เพิ่มจาก PR3 เป็น PR4 เสมอไป เพราะการจะอัพเกรต PR ให้สูงขึ้นในแต่ละระดับนั้นไม่ได้อาศัยปัจจัยนี้เพียงอย่างเดียว แต่ต้องอาศัยความรู้บวกกับประสบการณ์และเทคนิควิธีการต่าง ๆ อีกมากมายและต้องอาศัยความพยายามอย่างมากเพื่อให้ได้ระดับที่สูงขึ้น รวมทั้งต้องอาศัยความรู้เรื่องการจัดทำ Search Engine Optimization ที่ดีอีกประการหนึ่งด้วย

PageRank Explained

PageRank is a method invented by Google to measure the relative importance of web pages, which is often called popularity. It is based on the topology of the web, i.e. the links structure between pages.

The main idea is that if a page A has a link to a page B, then the page A "thinks" that page B is important enough to deserve being cited and maybe visited by visitors of page A. This link from A to B increase the PageRank of B.

There are two other essential ideas:

the higher the PageRank of page A, the higher the increase of the PageRank of page B. In other terms, it is greatly efficient to get a link from the homepage of Google than one from a page of your cousin's site (otherwise he's a potential genius!).
the less page A has out-links, the more page B's PageRank increases. In other terms, if page A "thinks" that there is only one page deserving a link, then it is quite natural that the PageRank of this page B increases more than in the case where lots of pages get a link from page A.

Now that you know the main principles of PageRank, let's see the mathematical formulat... Our explanations are based upon a paper written by the two founders of Google (1), even though since this time the algorithm must have changed -- the basis remains the same.

Let A1, A2, ..., An: be n pages linking to a page B. Let PR(Ak) be the PageRank of page Ak, N(Ak) the number of outward links within page Ak, and d a factor between 0 and 1, generally equal to 0.85.

Then the PageRank of page B is computed from the PageRank of all pages Ak in the following way:

PR(B) = (1-d) + d x ( PR(A1) / N(A1) + ... + PR(An) / N(An) )

AS you may think, this formula is in the same time simple and complex. Simple because it only depends on several terms, complex because it is recursive: if you wan to compute the PageRank of one particular page, you must have already computed the PageRank of all pages linking to it. Then which page should you begin to process?

Actually it is very simple, you only need to initialize all the PageRank with the same value (for example: 1). The choice of this value doesn't have any impact on the final result -- if you give all pages the same initial value. The first iteration of the formula gives you another PageRank for each page, closer to the reality than the value we selected at the beginning.
Then you continue iterating this formula, computing the PageRank of all the pages of the system, based upon the previous values computed in the previous iteration. After some iterations, the system converges: the value of the PageRank of each page doesn't change from one iteration to the next one.
In pratice, the convergence is obtained after several tens of iterations (depending upon the total number of pages!).

Next step: First points to note about PageRank.

(1) The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page, www-db.stanford.edu/~backrub/google.html

Anatomy Of A Top Ranking Web page

Optimizing web pages for high rankings in the search engines involved two main processes. Firstly there is the on-page factors which include what keywords you place where on the page itself. The second, and more important process is getting the off-page factors right - incoming links.

This article explores mainly the on-page factors. As the competition for a keyword phrase increases, off-page factors become more important to good rankings and these often mask the effects of on-page factors making it impossible to see what on-page factors are important. For this reason, I am going to look at a high ranking page with low levels of competition in Google.

First, let's consider what we mean by competition.

There are two ways to look at competition in Google. There is the competition a page has when you type the phrase with quotes, and the competition when you type the words without quotes. The number of results returned by Google in each case is YOUR competition.

The main differences between these two types of search are as follows:

Search with Quotes - this returns only those pages that have been "optimized" for the exact phrase.

Search without Quotes - this returns all pages that have been "optimized" for the words making up the phrase.

e.g. (in simple terms)

a) If you search Google for

alsatian dog

Google returns 41,000 competing pages.

b) If you search Google for

"alsatian dog"

Google returns 6,390 competing pages.

In (a) above, there are 41,000 pages that refer to alsatian AND dog, but not necessarily to alsatian dog.

In (b) above, there are 6,390 pages that refer to the exact phrase alsatian dog.

Now, if you want to rank well for the term "alsatian dog" on Google, you only have to compete with 6,390 other pages for this exact term.

However, there are 41,000 - 6,390 = 34,610 other pages that are related to this search, and might still beat you if Google sees them as more relevant than your page.

We have discussed before the importance of link reputation and PR in ranking. It is possible for a high PR page to rank well for a term like alsatian dog, even if it does not have the exact phrase on the page.

This fact clouds the issue somewhat, and so although I recommend searching with quotes to find the real competition, I also recommend that you look at the top few results in Google (as searched without quotes) to determine how important those "partial match" pages are.

A quick search at:

http://www.prsearch.net/

for alsatian dog, shows me that the top pages for this search without quotes have a low PR (0-3) and many of those pages have 0 incoming links.

The same search at PRSearch.net using quotes around the phrase show very similar results. The competing pages for the exact term have low PR and low incoming links.

This phrase should be easy to target and get top rankings if done properly.

A word of warning: Because the PR reported on the Google toolbar is out of date (see earlier), you cannot be 100% sure of the PR of the pages, even using a site like PRSearch. They will use the same formula that the toolbar uses, and so will be equally out of date. Only Google knows the exact PR it is using in its ranking for any one page.

A second check I often do is to check what the PR of the homepage of the site that is ranking well, as this gives me an indication of how important the site as a whole is. For the phrase alsatian dog (with or without quotes), the top page is:

http://www.castleofspirits.com/stories02/alsatian.html

The homepage

http://www.castleofspirits.com

has a PR of 6 - quite an important site.

However, there is no link to the alsatian page on the homepage, so the PR 6 homepage wont directly help towards the high ranking of the alsatian dog web page.

Doing a backward links check on Google does not help since there are no backlinks listed for this top ranking page.

OK, putting on my detective hat, I see a link at the bottom of the Alsatian page called "March 02 Ghost Stories". There is another link to "Ghost Story Page".

Clicking on the link to Ghost Story Page, I am taken to a PR 5 page:

http://www.castleofspirits.com/storypg.html

where I find a link to March 2002 Ghost Stories. Clicking that link takes me to a PR 3 page:

http://www.castleofspirits.com/stories02/mch2002.html

And on this page I find a link to Ghostly Alsatian dog.

So, the top ranking alsatian dog page has one link I know of from a PR 3 page. I might assume that this site also has a sitemap (although I cannot find one) where it contains a second link to the alsatian dog page. That means a total of 2 links, both internal.

I can assume from this that the alsatian page with a PR 2 is probably the correct PR, and the page itself has very few incoming links. I am confident that if I targeted the phrase alsatian dog, I would easily get a top ranking.

The phrase alsatian dog is therefore an EASY phrase to target.

As a final check I went to the searchguild difficulty tool mentioned in section 6 of this newsletter and typed my phrase into that. The Search Guild rates this term as EASY.

With relatively few off-page factors contributing to the high ranking of this page, I can only assume that the on-page factors are what makes this page stand out from the rest and rank at number 1 on Google.

There are a variety of tools available for calculating density, but I use a tool I wrote for myself and is not available for purchase.

Running this URL through my tool tells me a lot of useful information.

Density of the phrase "alsatian dog" on the page is 0.49%

The keyword is found ONCE in the title (11.11%), and TWICE in the main text on the page (a density of just 0.34%).

The keyword is not found in any header or meta tag!

As a second check I always look at what I call the partial density. That is the sum of the densities of all words that make up the phrase.

e.g. the phrase "alsatian dog" is made up of two words - alsatian AND dog. I look at the density of alsatian, and the density of dog, and combine the two densities.

This is useful because it tells me the density on the page of the words that make up the phrase (remember it is possible to rank well without the exact phrase on the page) - a kind of simplified page reputation.

The partial density of this page is 3.09%, made up of 7 occurrences of alsatian, and 12 occurrences of dog. This page is obviously about alsatians and dogs!

Let's look at the prominence of this phrase on the page. First an explanation of what prominence means.

Prominence is a measure of where on the page a word exists.

A prominence of 100 would mean it was the first word on the page.

A prominence of 1 would indicate it was the last word on the page.

A prominence of 50% would indicate it was the middle word on the page.

If the phrase was the first word (100% prominence) and the last word (1% prominence) on the page, the average prominence on the page would be about 50%. That means the keywords are well spread out on the page. As prominence increases, the keyword is found higher up the page, as it decreases, it is found lower down the page.

For analysis of top ranking pages, I look at not only the average prominence of ALL occurrences of the phrase on my page

i.e. how the keywords are spaced out on the page,

but also the prominence of the first occurrence on the page.

i.e. how close to the start of the document is the phrase first found?

The prominence of the first occurrence of the phrase alsatian dog is 99.67%. That means it is almost the first phrase on the page (only the word ghost comes before it).

The average prominence of the whole page for this term is 62.62%. That means that the keywords are distributed more in the upper portion of the page. Haven't I always told you that it was important to get your main keyword in the top one-third of the page?.

This page is a good one to study. It shows a top ranking page for a low competitive keyword phrase. Because of the low competition, incoming links and PR are less important (though if you have both, you could dominate this phrase), while on-page factors will make or break the ranking.

Even though the exact phrase is only found 3 times on the page, the fact that the phrase is in the title of the document and in the body text seems to be enough. This low density is backed up by using the words that make up the phrase several times on the page. Google will be in no doubt what this page is about.

A final help to the ranking of this page is the filename. Notice that part of the keyword phrase is found in the filename - alsatian.html

What is PageRank?

PageRank is a numeric value that represents how important a page is on the web. Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is is taken into account when a page's PageRank is calculated.
PageRank is Google's way of deciding a page's importance. It matters because it is one of the factors that determines a page's ranking in the search results. It isn't the only factor that Google uses to rank pages, but it is an important one.

From here on in, we'll occasionally refer to PageRank as "PR".

Notes:
Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

About the PageRank Calculator

[Description] | [The Tool] | [References] | [Other Related Sources]

Description

The PageRank Calculator is a script I wrote in response to a paper written by Chris Ridings entitled, "PageRank Explained". In this paper, Chris describes how PageRank - a multiplying factor which is assigned to all web pages when Google is deciding which documents are most relevant in response to a search - is passed from page to page depending on the link structure of a web site. For search engine optimisers, the principal benefit is to reveal how a webmaster can influence how Google views the relative importance of pages on his or her site by carefully modifying the site's link structure.

The PageRank Calculator is designed to model the behaviour of the PageRank algorithm used by the Google search engine as described by Sergey Brin and Larry Page, the originators of Google, in their research paper, "The Anatomy of a Large-Scale Hypertextual Web Search Engine." By entering the link structure of a small web site of anything up to 50 pages into the calculator, you can get a good idea of which of the pages on the site Google will be viewing as most important.

NB.The PageRank Calculator is not a crawler: it will not crawl your web site for you and calculate its PageRank. It is simply a model. You will need to enter in the details of your web site's link structure in order for it to produce its calculations.

Click here to try the PageRank Calculator.

References

The PageRank Algorithm - My own explanation of the mathematics of the Google PageRank algorithm, and how the PageRank Calculator can be applied to search engine optimisation.

PageRank Uncovered - Updated version of the research paper PageRank Explained by Chris Ridings which provoked me into creating the PageRank Calculator.

The Anatomy of a Large-Scale Hypertextual Web Search Engine - By Sergey Brin and Larry Page, the creators of the Google search engine, this is the original Stanford University research paper which describes the concept of PageRank.

Other Related Sources

The PageRank Citation Ranking: Bringing Order to the Web - By Page, Brin, Motwani and Winograd, this is a further technical explanation of PageRank by its creators.

Hilltop: A Search Engine based on Expert Documents - By Krishna Bharat and George A. Mihaila. Hilltop is a further refinement of the PageRank algorithm which uses small groups of related "expert" documents to calculate importance rather than the entire web.

The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank - By Matthew Richardson and Pedro Domingos. Another proposed refinement of the PageRank algorithm, this research paper uses a probabilistic model to improve the results of searches.

The Google Search Engine - And if you still don't know what all the fuss is about, click here to save yourself hours of frustration!

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998) (Make Corrections) (641 citations)
Sergey Brin, Lawrence PageComputer Networks and ISDN Systems

Bookmark in CiteULike

Home/Search Context Related

Links: ACM DBLP

View or download:
stanford.edu/pub/papers/google.pdf
dbs.cs.unisb.de/lehr...googlewww98.ps
Cached: PS.gz PS PDF Image Update Help
Problem Downloading?
From: stanford.edu/pub/papers/ (more)
Homepages: S.Brin L.Page

Rate this article:

(best)
nice job
View Comments (1)

(Enter summary)
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://infolab.stanford.edu/~backrub/google.html To engineer a search engine is a challenging task. Search engines index tens to... (Update)

Cited by: More
A Sketch-based Sampling Algorithm on Sparse Data - Ping Li Pingli (Correct)
Proceedings of 2002 International Conference on Parallel.. - Popularity-Based Ppm An (Correct)
Utilizing Human Categorisation Ability for Knowledge - Management James Sinclair (Correct)

Similar documents (at the sentence level):
13.8%: A Survey On Web Information Retrieval Technologies - Huang (2000) (Correct)

Active bibliography (related documents): More All
0.0: The PageRank Citation Ranking: Bringing Order to the Web - Page, Brin, Motwani, Winograd (1998) (Correct)
0.0: A Probabilistic Model for Optimal Searching of the Deep Web - Mukherjee (Correct)
0.0: Effective Web Crawling - Chapter 2 - Castillo (2004) (Correct)

System load high. Please wait...
Timeout. Please try your query later.
Similar documents based on text: More All
0.2: Linguistic Search Engine - Adam (Correct)
0.2: Note on Source of this Text - Great Deal Of (Correct)
0.2: Breadth-First Search Crawling Yields High-Quality Pages - Najork, Wiener (2001) (Correct)

Related documents from co-citation: More All
41: Authoritative sources in a hyperlinked environment - Kleinberg - 1997
29: Automatic resource compilation by analyzing hyperlink structure and associated t.. - Chakrabarti, Dom et al. - 1998
28: Improved algorithms for topic distillation in hyperlinked environments - Bharat, Henzinger - 1998

BibTeX entry: (Update)

Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. In Ashman and Thistlewaite [2], pages 107--117. Brisbane, Australia. http://citeseer.ist.psu.edu/brin98anatomy.html More

@article{ brin98anatomy,
   author = "Sergey Brin and Lawrence Page",
   title = "The anatomy of a large-scale hypertextual {Web} search engine",
   journal = "Computer Networks and ISDN Systems",
   volume = "30",
   number = "1--7",
   pages = "107--117",
   year = "1998",
   url = "citeseer.ist.psu.edu/brin98anatomy.html" }

Citations (may not include all citations):
576 Authoritative Sources in a Hyperlinked Environment - Kleinberg - 1998 ACM DBLP
344 The PageRank Citation Ranking: Bringing Order to the Web - Page, Brin et al.
280 Managing Gigabytes: Compressing and Indexing Documents and I.. - Witten, Moffat et al. - 1994
88 The Effectiveness of GlOSS for the Text-Database Discovery P.. - Gravano, Garcia-Molina et al. - 1994
75 ParaSite: Mining Structural Information on the Web (context) - Spertus - 1997 DBLP
72 Finding What People Want: Experiences with the WebCrawler (context) - Pinkerton - 1994
65 GENVL and WWWW: Tools for Taming the Web - McBryan - 1994
57 Queries and Computation on the Web - Abiteboul, Vianu - 1997 ACM DBLP
38 The Quest for Correct Information on the Web: Hyper Search E.. - Marchiori - 1997 DBLP
21 Efficient Crawling Through URL Ordering - Cho, Garcia-Molina et al. - 1998 ACM DBLP
1 Publisher: Beacon (context) - Bagdikian, Monopoly et al.
1 Publisher: Department of Commerce (context) - the, REtrieval et al. - 1996

The graph only includes citing articles where the year of publication is known.

Documents on the same site (http://www-db.stanford.edu/pub/papers/): More
Replicated Data Management in Mobile Environments.. - Barbará-Millá.. (Correct)
Extracting Semistructured Information from the Web - Hammer, Garcia-Molina, Cho, .. (1997) (Correct)
U-PAI: A Universal Payment Application Interface, v 0.93 - Ketchpel, Garcia-Molina, .. (1996) (Correct)

Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback

CiteSeer.IST - Copyright Penn State and NEC

Discussion of “The Anatomy of a Search Engine”

by grant.ingersoll

While much of this paper is now “quaint” in light of the amount of money that Google is making, what with their “noble” goals to save the world and all, the fact remains that there are many interesting ideas in the paper and it does a good job of bridging the gap between real world systems and academic research. Essentially, this paper comes down to two main parts that I am interested in:

PageRank - The formula and approach for ranking web pages.
System Anatomy - The discussion of how Brin and Page setup their initial system.

I’ll skip the details on performance, etc. since it should be pretty obvious that it works. It should be noted, though, that the importance of this new approach did wonders for how the web was searched. If you remember prior to Google, most search results were pretty low quality except Yahoo! which relied on a directory structure.
PageRank
Here is the discussion of PageRank from http://infolab.stanford.edu/~backrub/google.html

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

PageRank is really just the notion that the more pages that point to a page, the more higher that page should rank. Additionally, this is a cumulative effect, that is if A points at B and B at C, A’s PR is also a factor in calculating C’s PR. The paper doesn’t go into details about the exact algorithm for calculating PR, simply stating that it is a simple iterative algorithm and of course this is now part of Google’s proprietary system, so the world may never know much to the chagrin of many an SEO consultant. In the upcoming papers, we will see references to PR and how it can be used for doing other NLP tasks, so it is good to know the basic idea.

System Anatomy

From an engineering standpoint, the System Anatomy section of the paper is quite interesting especially since it goes into some detail about how to setup a basic Google-like search engine. Of note are the discussion, albeit brief, of the Distributed File System (called BigFiles, section 4.2.1. For an open source version see Hadoop, which has both a DFS and an implementation of Google’s Map/Reduce algorithm.) Moving on, there is some discussion of how the lexicon is stored as well as the documents. These sections strike me as standard indexing techniques.

Following the lexicon subsection, though, is the discussion information on how they store the “Hit Lists”, that is, the list of occurrences of terms in documents, along with payload information about the terms such as font size, capitalization, etc. In comparison, Lucene currently only supports the term occurrence info and not payload information, although there is a submitted patch that allows for indexing payloads. These Hit Lists are then stored in both the forward index and the inverted index.

Section 4.4 concerns indexing the web and dealing with the plethora of errors that occur in malformed HTML as well as how to create the various indexes. Currently, I think there are a number of good solutions that can help solve this problem, given the right amount of hardware (see Nutch for an Open Source version.)

Section 4.5 deals with the issues of searching, namely how to handle single word queries and multi-word queries. Multi-word queries can be a bit tricky since you want to weight hits with terms that occur closer together in a document higher than those that are a further apart.

The final section (section 5) covers results and performance, which I’m not going to go into because I think it is obvious to anyone who has an online pulse in the last 10 years that the Google approach works.

Next week, we will start looking into some graph based papers that have some basis in the PageRank calculation, but use it in different contexts.

Popularity: 18% [?]

Technology Overview

Technology Overview

Google stands alone in its focus on developing the "perfect search engine," defined by co-founder Larry Page as something that, "understands exactly what you mean and gives you back exactly what you want." To that end, Google has persistently pursued innovation and refused to accept the limitations of existing models. As a result, Google developed its own serving infrastructure and breakthrough PageRank™ technology that changed the way searches are conducted.

From the beginning, Google's developers recognized that providing the fastest, most accurate results required a new kind of server setup. Whereas most search engines ran off a handful of large servers that often slowed under peak loads, Google employed linked PCs to quickly find each query's answer. The innovation paid off in faster response times, greater scalability and lower costs. It's an idea that others have since copied, while Google has continued to refine its back-end technology to make it even more efficient.

The software behind Google's search technology conducts a series of simultaneous calculations requiring only a fraction of a second. Traditional search engines rely heavily on how often a word appears on a web page. Google uses numerous factors including its patented PageRank™ algorithm to examine the entire link structure of the web and determine which pages are most important. It then conducts hypertext-matching analysis to determine which pages are relevant to the specific search being conducted. By combining overall importance and query-specific relevance, Google is able to put the most relevant and reliable results first.

PageRank Technology: PageRank reflects Google's view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that Google believes are important pages receive a higher PageRank and are more likely to appear at the top of the search results.

PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement.
Hypertext-Matching Analysis: Google's search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Google also analyzes the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.

Google's innovations don't stop at the desktop. To bring its accurate and speedy search results to users accessing the web through portable devices, Google also pioneered the first wireless search technology for on-the-fly translation of HTML to formats optimized for WAP, i-mode, J-SKY, and EZWeb. Currently, Google provides its wireless technology to numerous market leaders, including AT&T Wireless, Sprint PCS, Nextel, Palm, Handspring, and Vodafone, among others.

Life of a Google Query

The life span of a Google query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information.


3. The search results are returned to the user in a fraction of a second.			1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.

	2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

http://www.pagerank0.com/wp-content/uploads/2006/10/pagerank0.jpg

Pagerank 5

February 13th, 2007 by Shiva

I was away on vacation and didn’t exactly know at what date the pagerank got updated for my site. This is great, the page rank is 5 as predicted by the iwebtool site.

Last month or so some of my other websites got updated and I thought it is going to take another 3 months for the next update to occur. So it is not true now that google updates its pagerank quarterly once, they do or they have started updating the sites frequently. My other site Average Joe’s pagerank is ~~6 now~~ (5 as of july 31st).

I am going to get into the google algorithm and now that I had got the green bar - I am going to play around more.

So from October PageRank, this site had come now to pagerank of five.

Pagerank 5 steps

Next Pagerank update? : Pagerank 5 steps

SEO - Required Qualifications

Posted in Pagerank 5 steps, SEO on October 12th, 2007 No Comments »

Whether it is an SEO Organization, Consultant who works on your online business website - they need to have the following prerequisites:
1. Know the difference between White and Black Hat SEO.
2. Should have the domain industry knowledge that your business involve in.
3. Should be aware of all Spamdexing, how google or other search engines ban […]

Keyphrases used by visitors - August 2007

Posted in Monthly Status, Pagerank 5 steps on August 19th, 2007 1 Comment »

Removed the keyphrases - rather providing that as text - as webspiders think that it is a spam - due to common closed paced words. For september 2007 PageRank 0 keywords you can visit here

Thanks to Spam(mers) and Unidentifed robots!

Posted in Pagerank 5 steps, Spam on August 3rd, 2007 1 Comment »

The number of spams that I have to control has increased and the awstats reports for my website shows more unidentified crawlers and my email is flooded with more spams. And I am not grunting about it, it does prove oneway, if not exponentially - but the site is growing.
More number of adult, medicine, […]

Top 10 reasons why PageRank goes down!

Posted in Pagerank 5 steps on July 31st, 2007 1 Comment »

1. Site goes down frequently. (Googlebot/webcrawler/spider couldn’t crawl your site)
2. Including new outbound links (when inbound is less)
3. Folder restructuring (this will hurt initially) and if the new structure has multiple sub folders to post then the chances of pagerank will be less.
4. Keyword stuffing (this will end-up in site getting banned)
5. Multiple languages getting […]

SEO/SEM Job advertisements - beware!

Posted in Pagerank 5 steps, Spam, Seo Jobs on July 8th, 2007 2 Comments »

Beware of job ads such as SEO Analysts, SEO Manager, Director of Search Engine Marketing Manager etc., in any of popular websites. Many start-up companies just post ads to grab attention of real SEO’s as well big employers. It is an easy advertisement for small SEO companies. I did talked with couple of companies […]

07-07-07

Posted in Pagerank 5 steps, Consulting on July 7th, 2007 No Comments »

Updated the title again with mix of old and new (lengthy one though). Check this earlier post PageRank 0 site title changed on Jun 28th, doing this definitely affected the ranking (listing rank) in search results for few major key word combinations.
I am bringing it back with my old title, appending with SEO and SEM […]

What are the benefits of using SEM?

Posted in Marketing, Pagerank 5 steps, Discussions on July 3rd, 2007 No Comments »

Money, Money and more Money.
Everyone runs and works for money. Once upon a time there was no remote for the television, no microwave, no elevators and no beers. As time goes, people create and then adapt themselves to new things. Your company by this time should be sueing an SEO..err..using an SEO to promote your […]

Operation MSN

Posted in Pagerank 5 steps, MSN on June 29th, 2007 No Comments »

For some reason - msn search algorithm is not listing Pagerank0.com for major keywords (such as pagerank) and they have their own right reasons.
Did some reading online, after a very long time started using msn search.
I am going to keep an eye on google, but for another sometime - the focus is going to be […]

PageRank 0 site title changed

Posted in Pagerank 5 steps on June 28th, 2007 3 Comments »

Today, I changed the title from “PageRank Prediction and Improve web site Page Rank 0-10″ to “Pagerank, SEO and SEM made simple”
Well, I could see the impact right away in few minutes - for the word “pagerank” search my site went from 1st page to 2nd page in google.
Also, there is few modification that I […]

Pagerank 5

Posted in Pagerank 5 steps on February 13th, 2007 3 Comments »

I was away on vacation and didn’t exactly know at what date the pagerank got updated for my site. This is great, the page rank is 5 as predicted by the iwebtool site.
Last month or so some of my other websites got updated and I thought it is going to take another 3 months for […]

PageRank

Da Wikipedia, l'enciclopedia libera.

Vai a: Navigazione, cerca

Questa pagina è ritenuta da controllare: per contribuire, partecipa alla discussione e correggila.
Motivo: l'oggettività del testo lascia al quanto a desiderare si fa un uso scosiderato dell'analogia con il voto .Segnalazione di PersOnLine 21:35, 30 ott 2007 (CEST) 21:35, 30 ott 2007 (CET)

Schematizzazione del sistema PageRank

PageRank è un termine ormai entrato di diritto nel lessico dei fruitori dei servizi offerti dal motore di ricerca Google.

Letteralmente traducibile come rango di una pagina web (ma anche un gioco di parole collegato al nome di uno dei suoi inventori, Page), il pagerank è facilmente riconducibile al concetto di popolarità tipico delle relazioni sociali umane, ed indica, o si ripromette di indicare, le pagine o i siti di maggiore rilevanza in relazione ai termini ricercati. Gli algoritmi che rendono possibile l'indicizzazione da parte di Google del materiale presente in rete utilizzano anche il grado di popolarità di una pagina web per definirne la posizione nei risultati di ricerca.

È bene tenere presente che il pagerank ha una valenza democratica nella quale il diritto al voto è permesso dalla semplice pubblicazione di una pagina web, e il voto viene espresso attraverso i collegamenti presenti nella suddetta pagina. Maggiore sarà il grado di popolarità di un sito, maggiore risulterà essere il valore dei voti (link) che quello stesso sito può esprimere.

L'interpretazione e la definizione della popolarità di un sito non sono però legate soltanto a queste votazioni democratiche, ma tengono presente anche della pertinenza del contenuto di una pagina, nonché delle pagine correlate, con i termini ed i criteri della ricerca effettuata. Questo permette, o perlomeno ha lo scopo, di attuare un controllo incrociato che garantisca la validità dei risultati di ricerca.

Visualizzazione del grado di popolarità di un sito [modifica]

Ottenere informazioni riguardo la notorietà di una pagina web in Google è possibile attraverso l'installazione della Google Toolbar da affiancare ad un browser per la navigazione, oppure utilizzando servizi proposti da terze parti in grado di determinare tale informazione ed implementabili su siti internet.

Formula semplificata [modifica]

L'algoritmo completo per il calcolo del PageRank fa ricorso all'uso della teoria dei processi di Markov ed è classificato nella categoria degli algoritmi di Link Analysis Ranking. Dalla formula inizialmente sviluppata dai fondatori di Google, Sergey Brin e Larry Page, è possibile comprendere come il PageRank viene distribuito tra le pagine:

$PR[A]=(1 - d) + d\left (\frac{PR[T1]}{C[T1]} + ... + \frac{PR[Tn]}{C[Tn]}\right )$

Dove:

PR[A] è il valore di PageRank della pagina A che vogliamo calcolare
T1...Tn sono le pagine che contengono almeno un link verso A
PR[T1] ... PR[Tn] sono i valori di PageRank delle pagine T1 ... Tn
C[T1] ... C[Tn] sono il numero complessivo di link contenuti nella pagina che offre il link
d (damping factor) è un fattore deciso da Google e che nella documentazione originale assume valore 0,85. Può essere aggiustato da Google per decidere la percentuale di PageRank che deve transitare da una pagina all'altra e il valore di PageRank minimo attribuito ad ogni pagina in archivio.

Collegamenti esterni [modifica]

La guida, nel sito di Google, alle funzioni di ricerca con la descrizione del significato di PageRank
(EN) Sergey Brin; Larry Page. inglese The Anatomy of a Large-Scale Hypertextual Web Search Engine. 1998. URL consultato il 09-03-2007.

Figure 1. High Level Google Architecture

Figure 1. High Level Google Architecture

http://forums.build-reciprocal-links.com/

http://forums.build-reciprocal-links.com/
Forum Index

Google Search Appliance

The Google Search Appliance was upgraded from version 3.0 to version 4.0 on September 14, 2006. Sites that refer to custom stylesheets or Sub-Collections were affected by this upgrade. The 3.0 Appliance, find-old.stanford.edu, will be shut down on October 10, 2006. Sites with search boxes that refer to find-old.stanford.edu should migrate to the new 4.0 Appliance, ask.stanford.edu, immediately.

On October 10, if you have not made the necessary changes, the search function will no longer work.

Please refer to the two linked pages below for more details.

Overview

Google is Stanford University's official search engine. Google represents a significant enhancement to the University's web environment, providing:

Better and quicker search results
Advanced search features, including searching for PDF, .doc and .ppt files, and displaying them as HTML
Easy and powerful search administration
Easy integration into web sites

Bringing the Google search appliance into the Stanford infrastructure allows us features that the commercial Google Stanford index cannot duplicate, such as:

continuous crawling
managing our own Collections and Front Ends
customizing Format, KeyMatches (equivalent to the commercial service's "sponsored links"), Synonyms, and Filters of the search results
as FERPA and HIPAA regulations begin to have an effect on the availability of web content (requiring some pages to be access-restricted, for example), the campus search appliance can be authenticated to crawl and index where outside search engines cannot.

IT Services, which supports and manages the University's central web infrastructure and services, operates and maintains the Google search appliance.

University Communications, which operates and maintains the Stanford home page (http://www.stanford.edu), has implemented the Google search feature on the home and second-level pages of the Stanford web site.

Getting more information

An announcements mailing list exists for Google users: search-partners@lists.stanford.edu.
If you use the Google search appliance on your Stanford web site, please subscribe to this list for notifications of service changes, updates, etc.
Information and instructions for web creators includes information about:
- getting your page into the Google index
- keeping your page out of the Google index
- adding a Google search feature to your site, plus:
- Stanford's configuration of the Google search appliance
If you have comments or questions about using Google to power the search feature on your website, enter a HelpSU request.
If you need a page indexed immediately, enter a HelpSU request.
Slides from the Google Tech Briefing are available online. Check the schedule for upcoming Tech Briefings (or request a new Tech Briefing).
A newsgroup has been created for Google users: su.computers.search. Post questions or discussions to other Stanford Google users on this newsgroup.

Last modified Thursday, 28-Sep-2006 12:18:46 PM

Google Search Appliance

The Anatomy of a Large-Scale Hypertextual Web Search Engine

This summary is not available. Please click here to view the post.

20071228

20071223

About the PageRank Calculator

Description

References

Other Related Sources

Pagerank 5 steps

Next Pagerank update? : Pagerank 5 steps

PageRank

Da Wikipedia, l'enciclopedia libera.

Visualizzazione del grado di popolarità di un sito [modifica]

Formula semplificata [modifica]

Collegamenti esterni [modifica]

Overview

Getting more information

Pages on Google Search

Related Pages

Google Search Appliance

Total Pageviews

page rank article

SEO Forum

Blog Archive

About Me