The importance of robots.txt
Although the robots.txt file is a very important file if
you want to have a good ranking on search engines, many Web
sites don't offer this file.
When a search engine crawler comes to your site, it will
look for a special file on your site. That file is called
robots.txt and it tells the search engine spider, which Web
pages of your site should be indexed and which Web pages should
The robots.txt file is a simple text file (it contains no
HTML code). It must be placed in your root directory, for
How do I create a robots.txt file?
The robots.txt file is a simple text file. Open a simple
text editor to create it. The content of a robots.txt file
consists of so-called "records".
A record contains the information for a special search engine.
Each record consists of two fields: the user agent line and
one or more Disallow lines. Here's an example:
This robots.txt file would allow the "googlebot",
which is the spider software program of Google, to retrieve
every page from your site except for the files from the "cgi-bin"
directory. All files in the "cgi-bin" directory
will be ignored by googlebot.
The Disallow command works like a wild card. If you enter
both "/logs.html" and "/logs/index.html"
as well as all other files in the "logs" directory
would not be indexed by search engines.
If you leave the Disallow line blank, you're telling the
search engine that all files may be indexed. In any case,
you must enter a Disallow line for every User-agent record.
There's much more to know about robots.txt files than we
have space for in this newsletter. For that reason, we've
created a special article about this topic on our Web site.
Among other things, that article includes these topics:
- Where to find user-agent names
- 7 things you should avoid when designing your robots.txt
- Tips and tricks for a good robots.txt file
- Examples for easy and complex robots.txt files