|
New robots.txt commands: make sure that Google
can index your site
It seems that Google is currently experimenting with new
robots.txt commands. If your robots.txt file accidentally
contains one of the new commands, it might be that your robots.txt
file tells Google to go away.
What is a robots.txt file?
The robots.txt file is a simple text file that must be
placed in your root directory (http://www.example.com/robots.txt).
It tells the search engine spider which web pages on your
website should be indexed and which web pages should be
ignored.
You can use a simple text editor to create a robots.txt
file. The content of a robots.txt file consists of so-called "records".
A record contains the information for a special search
engine. Each record consists of two fields: the user agent
line and one or more Disallow lines. Here's an example:
User-agent: googlebot
Disallow: /cgi-bin/
This robots.txt file would allow the "googlebot", which
is the search engine spider of Google, to retrieve every
page from your site except for files from the "cgi-bin" directory.
All files in the "cgi-bin" directory will be ignored
by googlebot.
Which new commands is Google testing?
Webmasters have found out
that Google seems to be experimenting with a Noindex commands
for the robots.txt file. It basically seems to do the same
as the Disallow command so it's not clear why
Google is using this command.
Other commands that might be tested by Google are Noarchive
and Nofollow. However, none of these commands is official
yet.
How does this affect your rankings on Google?
If you accidentally use the wrong commands then you might
tell Google to go away although you want them to index
your pages.
For that reason, it is important that you check the content
of your robotx.txt file.
How to check your robots.txt file
Open your web browser and enter www.yourdomain.com/robots.txt
to view the contents of your robots txt file. Here are
the most important tips for a correct robots.txt file:
- There are only two official commands
for the robots.txt file: User-agent and Disallow. Do not
use more commands than these.
- Don't change the order of the commands. Start with the
user-agent line and then add the disallow commands:
User-agent: *
Disallow: /cgi-bin/
- Don't use more than one directory in a Disallow line. "Disallow:
/support /cgi-bin/ /images/" does not work. Use
an extra Disallow line for every directory:
User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/
- Be sure to use the right case. The file names on your
server are case sensitve. If the name of your directory
is "Support", don't write "support" in
the robots.txt file.
You can find user agent names in your log files by checking
for requests to robots.txt. Usually, all search engine
spiders should be given the same rights. To do that, use
User-agent: * in your robots.txt file.
What happens if you don't have a robots.txt file?
If your website doesn't have a robots.txt file (you can
check this by entering your www.yourdomain.com/robotx.txt in
your web browser) then search engines will automatically
index everything they can find on your site.
Checking your robots.txt file is important if you want search
engines to index your web pages. However, indexing alone
is not enough. You must also make sure that search engines
find what they're looking for when they index your pages.
You can make sure that Google indexes your web pages for
the right keywords by optimizing your
website. If search engine spiders index unoptimized pages,
chances are that you won't get high rankings.
|