|
Official Google statement: How to deal with duplicate
content problems
Duplicate content is a problem that worries
many webmasters. Rumor has it that duplicate content can
hurt your Google rankings and that web pages that copy your
web site content can harm your rankings.
For that reason, Google recently made an official
statement about duplicate content.
What is duplicate content and what
is not duplicate content?
Duplicate content are substantive blocks
of contents within the same domain or across different
domains that are identical or very similar.
Google mentions several things that can lead
to duplicate content:
"Forums that generate both regular
and stripped-down mobile-targeted pages, store items
shown (and -- worse yet -- linked) via multiple distinct
URLs, and so on. In some cases, content is duplicated
across domains in an attempt to manipulate search engine
rankings or garner more traffic via popular or long-tail
queries."
If the same article is available in multiple
languages (for example English and Spanish) then Google
doesn't view that as duplicate content. Occasional snippets
such as quotes also won't be flagged as duplicate content.
What does Google do if it finds duplicate
content?
Google tries to filter duplicate content
from the search results. The reason for that is that Google
wants to present a diverse cross-section of unique content
in the search results.
"During our crawling and when serving
search results, we try hard to index and show pages with
distinct information. This filtering means, for instance,
that if your site has articles in 'regular' and 'printer'
versions and neither set is blocked in robots.txt or
via a noindex meta tag, we'll choose one version to list.
In the rare cases in which we perceive
that duplicate content may be shown with intent to manipulate
our rankings and deceive our users, we'll also make appropriate
adjustments in the indexing and ranking of the sites
involved.
However, we prefer to focus on filtering
rather than ranking adjustments ... so in the vast majority
of cases, the worst thing that'll befall webmasters is
to see the "less desired" version of a page shown in
our index."
That simply means that Google will pick one
of the web pages if it finds more than one page with the
same content.
How can you avoid duplicate content
problems with your web site?
- Tell search engines which pages they should index: If
the printer friendly versions should not be indexed, block
them in your robots.txt file.
- Use 301 redirections: If you restructured your web site,
use permanent 301 redirections to redirect users and search
engine spiders.
- Always use the same links to link to a page on your site:
Don't link to /page, /page/ and /page/index.htm if the
URLs always display the same web page.
- Use top level domains to handle language specific content:
If you have German pages, use a .de domain for these pages.
- Use the preferred domain feature of Google's webmaster
tools: Google allows you to choose if
you prefer the www version or the non-www version of your
URLs.
- Syndicate carefully: Make sure that other web sites link
back to your site if they use your content.
- Avoid boilerplate repetition and publishing stubs: If
possible, don't include the same lengthy copyright text
on the bottom of every page. Better use a short version
with a link to the full version. If you have category pages
without any content, don't publish them.
- Understand your content management system (CMS): If you
use a content management system, make sure that it doesn't
publish the same content in multiple formats.
Duplicate content can lead to problems with search engines.
For that reason, follow the tips above so that search engines
have as few problems as possible with your site. If you find
a web site that copies your original content, you can file
a DMCO
request.
If you want to make sure that your web pages get high rankings
on search engines, you should make it as easy as possible
for search engines to parse your pages. Use IBP's Top 10
Optimizer to create your web pages as
search engine friendly as possible.
|