<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: New Google Paper on Near Duplicate Documents</title>
	<atom:link href="http://blog.cre8asite.net/archives/405/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.cre8asite.net/archives/405</link>
	<description>Building Better Web Sites Together, For A Better World</description>
	<pubDate>Fri, 16 May 2008 18:54:09 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Bill</title>
		<link>http://blog.cre8asite.net/archives/405#comment-76834</link>
		<dc:creator>Bill</dc:creator>
		<pubDate>Mon, 19 Mar 2007 22:02:37 +0000</pubDate>
		<guid isPermaLink="false">http://blog.cre8asite.net/archives/405#comment-76834</guid>
		<description>Hi Kees,

I think those are excellent questions.  

What I was trying to point out with my post is that maybe we give too much credit to the search engines when pages are near duplicates as opposed to exact duplicates, in thinking that they might penalize them for duplicate content.  The paper does explore different ways to identify duplicate content, but notes how difficult detection of near duplicates can be.

You probably know the difference between a penalty and a filter.  A penalty would be when one site wouldn't rank as high as it should because there was another site that duplicated its content.  A filter is when a duplicate doesn't appear in search results because a duplicate is showing instead.

It's possible that if there was two sites, one with a .com, and another with a .co.uk, that if I searched in the US I might see the .com version, and if I searched in the UK, I might see the .co.uk version.  But if their content was the same, I wouldn't see both appearing in the search results.  Neither has been penalized in that instance, but both have been filtered.

With booking.com and bookings.net, when I search for "booking" or "bookings" I'm seeing the .com version, and the other is being filtered.

As for the same content in different languages, I believe that both Vanessa Fox and Adam Lasnik of Google have both said that if the content is in different languages, then Google doesn't perceive the sites as duplicates.

On my search for "bookings" I get the .com version first, and then I see the .nl version right after it.  It isn't being filtered or penalized, just as Vanessa and Adam suggested. 

I'd be happy to discuss duplicate content in a new thread at the forums if you would like.  I have more new material to discuss on the topic that I think is pretty interesting, including how search engines might treat sites that use templates.</description>
		<content:encoded><![CDATA[<p>Hi Kees,</p>
<p>I think those are excellent questions.  </p>
<p>What I was trying to point out with my post is that maybe we give too much credit to the search engines when pages are near duplicates as opposed to exact duplicates, in thinking that they might penalize them for duplicate content.  The paper does explore different ways to identify duplicate content, but notes how difficult detection of near duplicates can be.</p>
<p>You probably know the difference between a penalty and a filter.  A penalty would be when one site wouldn&#8217;t rank as high as it should because there was another site that duplicated its content.  A filter is when a duplicate doesn&#8217;t appear in search results because a duplicate is showing instead.</p>
<p>It&#8217;s possible that if there was two sites, one with a .com, and another with a .co.uk, that if I searched in the US I might see the .com version, and if I searched in the UK, I might see the .co.uk version.  But if their content was the same, I wouldn&#8217;t see both appearing in the search results.  Neither has been penalized in that instance, but both have been filtered.</p>
<p>With booking.com and bookings.net, when I search for &#8220;booking&#8221; or &#8220;bookings&#8221; I&#8217;m seeing the .com version, and the other is being filtered.</p>
<p>As for the same content in different languages, I believe that both Vanessa Fox and Adam Lasnik of Google have both said that if the content is in different languages, then Google doesn&#8217;t perceive the sites as duplicates.</p>
<p>On my search for &#8220;bookings&#8221; I get the .com version first, and then I see the .nl version right after it.  It isn&#8217;t being filtered or penalized, just as Vanessa and Adam suggested. </p>
<p>I&#8217;d be happy to discuss duplicate content in a new thread at the forums if you would like.  I have more new material to discuss on the topic that I think is pretty interesting, including how search engines might treat sites that use templates.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Dolson</title>
		<link>http://blog.cre8asite.net/archives/405#comment-76776</link>
		<dc:creator>Joe Dolson</dc:creator>
		<pubDate>Mon, 19 Mar 2007 15:28:07 +0000</pubDate>
		<guid isPermaLink="false">http://blog.cre8asite.net/archives/405#comment-76776</guid>
		<description>First of all, this post doesn't actually say that there is a duplicate content penalty or filter - it simply talks about the ways in which search engines attempt to identify duplicate content.

Second, those sites you mentioned are different in far more ways than just having different top level domains.  They are in different languages: although they essential substance may be different, they are significantly different content because of the language issue.

I'd suggest going into the forums to search for our past discussions on duplicate content or start a new one!</description>
		<content:encoded><![CDATA[<p>First of all, this post doesn&#8217;t actually say that there is a duplicate content penalty or filter - it simply talks about the ways in which search engines attempt to identify duplicate content.</p>
<p>Second, those sites you mentioned are different in far more ways than just having different top level domains.  They are in different languages: although they essential substance may be different, they are significantly different content because of the language issue.</p>
<p>I&#8217;d suggest going into the forums to search for our past discussions on duplicate content or start a new one!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kees J</title>
		<link>http://blog.cre8asite.net/archives/405#comment-76728</link>
		<dc:creator>Kees J</dc:creator>
		<pubDate>Mon, 19 Mar 2007 12:34:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.cre8asite.net/archives/405#comment-76728</guid>
		<description>If there is such thing as duplicate content penalty or filter or what so ever, than how can a huge site like for example "booking.com" be indexed with more than 20 extension like: bookings.nl, bookings.net, bookings.org, bookings.de etc etc.

All there sites are indexed and are ranking high while having 100% exactly the same content.

So if this filter/penalty would be used than, this is one of the sites that for sure would be penalized.

Please give your comment</description>
		<content:encoded><![CDATA[<p>If there is such thing as duplicate content penalty or filter or what so ever, than how can a huge site like for example &#8220;booking.com&#8221; be indexed with more than 20 extension like: bookings.nl, bookings.net, bookings.org, bookings.de etc etc.</p>
<p>All there sites are indexed and are ranking high while having 100% exactly the same content.</p>
<p>So if this filter/penalty would be used than, this is one of the sites that for sure would be penalized.</p>
<p>Please give your comment</p>
]]></content:encoded>
	</item>
</channel>
</rss>
