Search Engine Facts
Search Engine Facts

Read our back issues

December 2009

November 2009

October 2009

September 2009

August 2009

July 2009

June 2009

May 2009

April 2009

March 2009

February 2009

January 2009

December 2008

November 2008

October 2008

September 2008

August 2008

July 2008

June 2008

May 2008

April 2008

March 2008

February 2008

January 2008

December 2007

November 2007

October 2007

September 2007

August 2007

July 2007

June 2007

May 2007

April 2007

March 2007

February 2007

January 2007

December 2006

December 2006

November 2006

October 2006

September 2006

August 2006

July 2006

June 2006

May 2006

April 2006

March 2006

February 2006

Januray 2006

December 2005

November 2005

October 2005

September 2005

August 2005

July 2005

June 2005

May 2005

August 2005

March 2005

February 2005

January 2005

December 2004

November 2004

October 2004

September 2004

August 2004

July 2004

 

» Archive

 

Sitetube.com
All about planning, building and maintaining web sites.

Home   Contact   Privacy policy    Partner sites

Robots.txt on 6 search engines

Search engine robots check a special file that can be included in the root directory of Web servers called "robots.txt".

This is a plain text file (without HTML code) that allows the Web site administrator to define which parts of the site robots may access and which not.

Rumor has it that some search engine robots do not index Web pages that lack the robots.txt file as they don't know whether it's allowed to access your Web site or not.

In our search engine ranking study, we examined 103,260 top 10 Web pages on Google, AltaVista, iWon/Inktomi, AllTheWeb, Teoma and Wisenut. Here are the results for the robots.txt file:

AllTheWeb: 30.5% have it, 69.5% don't have it
AltaVista: 36.3% have it, 63.7% don't have it
Google: 35.7% have it, 64.3% don't have it
iWon/Inktomi: 32.7% have it, 67.3% don't have it
Teoma: 30.4% have it, 69.6% don't have it
Wisenut: 31.4% have it, 68.6% don't have it

As you can see, the majority of the top 10 Web pages don't have the robots.txt file and they are still indexed.

Sometimes, it's necessary to include a robots.txt file in the root directory of your server. For example, you don't want the search engine robot to index your log files so that anyone can find your logs in the search engines. In addition, you may want to exclude robots from accessing dynamically created pages because of the heavy load for your server.

If you already have a robots.txt file, it's very important that you check its syntax. Here's a very good free tool:

http://www.tardis.ed.ac.uk/~sxw/robots/check/

Martijn Koster, author of the Robots Exclusion Protocol, has compiled information on robots:

http://www.robotstxt.org/wc/robots.html

Listing of robot names (so that you can recognize them in your Web server logs):

http://www.jafsoft.com/searchengines/webbots.html

Source of the search engines percentages above: Search Engine Ranking Studies Q2/2002 .

Copyright Axandra.com - Internet marketing and search engine ranking software


Home   Contact   Privacy policy    Partner sites
May 2002 search engine articles