Houston Texas Web Site Design Hosting and Cold Fusion Development
  Saturday, May 17, 2008
  WebWize Services
Web Site Design >
Web Site Hosting >
Internet Audio/Video Streaming >
Search Engine Optimization >
Domain Registration Services >
  Email / Spam Solutions
Email Solutions >
Spam Firewall Features >
Spam Firewall Screenshots >
  Support
Email Log-in >
Stats Log-in >
Spam Quarantine Log-in >
Email Configuration >
Do-it-yourself Meta Tags >
Web Site & Search Engine Tips >
  Company
Client Partners/Portfolio >
What WebWize Does >
Why do Biz with Us? >
Inquiries >
Email WebWize >
Home >
  Contact Information
WebWize, Inc.
1006 W. 42nd St.
Houston, Texas 77018
713.682.7111.O
713.416.7111.C
713.688.4382.F
info@webwize.com



How Google Indexes the Web

Google set up a crawler-type software, named Googlebot. It is a robot indexing Web pages (and now other types). Its principle is simple (but not its implementation!): when it reads a page, it adds to its list of pages to visit all those linked to the page in the current process.

Theoretically, it should thus be able to know the majority of the pages of the Web, i.e. all those which are not orphan (a page is known as orphan if no other links to it). The volume of data to be treated being important, this robot is a program distributed on hundreds of servers.

In addition to the knowledge of the greatest number of pages, Google also wants to index them regularly, because many the pages are updated from time to time. Moreover the frequency of visit of Googlebot on a Web page depends on its PageRank : the larger it is, the more it will often index it. From one passage to another, Googlebot can detect a page become non-existent ("error 404").

This colossal mass of information will be analyzed by Google in full details. Each word or sentence will be associated to a type, based on HTML tags. Thus a word contained in the title will be considered to be more significant than in the body text. These types may be classified according to their importance (title of the page , headings H1 to H6, bold, italic, etc). This preprocessing, associated with other criteria including the PageRank, makes it possible to provide the most relevant results in first.



original article


Cold Fusion Driven, PowerEdge Served


Links and Resorces

© Copyright 1994 - 2007     WebWize, Inc.   All Rights Reserved


Web Hosting through Texas Web Hosting


houston web design firm specializing in web site design and development as well as hosting