|
Google's new web page spider
Search engines use automated software programs that crawl
the web. These programs called "crawlers" or "spiders"
go from link to link and store the text and the keywords from
the pages in a database. "Googlebot" is the name
of Google's spider software.
Many webmasters have noticed that there are now two different
Google spiders that index their web pages. At least one of
them is performing a complete site scan:
The normal Google spider: 66.249.64.47 - "GET /robots.txt
HTTP/1.0" 404 1227 "-" "Googlebot/2.1
(+http://www.google.com/bot.html)"
The additional Google spider: 66.249.66.129 - "GET
/ HTTP/1.1" 200 38358 "-" "Mozilla/5.0
(compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What is the difference between these two Google spiders?
The new Google spider uses a slightly different user agent:
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)".
This means that Googlebot now also accepts the HTTP 1.1
protocol. The new spider might be able to understand more
content formats, including compressed HTML.
Why does Google do this?
Google hasn't revealed the reason for it yet. There are
two main theories:
The first theory is that Google uses the new spider to
spot web sites that use cloaking, JavaScript redirects and
other dubious web site optimization techniques. As the new
spider seems to be more powerful than the old spider, this
sounds plausible.
The second theory is that Google's extensive crawling might
be a panic reaction because the index needs to be rebuilt
from the ground up in a short time period. The reason for
this might be that the old index contains too many spam
pages.
What does this mean to your web site?
If you use questionable techniques such as cloaking or
JavaScript redirects, you might get into trouble. If Google
really uses the new spider to detect spamming web sites,
it's likely that these sites will be banned from the index.
To obtain long-term results on search engines, it's better
to use ethical
search engine optimization methods. General information
about Google's web page spider can be found here.
It's likely that the new spider announces a major Google
update. We'll have to see what this means in detail.
|