SMX Advanced – Duplicate Content Summit

by TheMadHat on June 4, 2007

This was the first organic track of the day, moderated by Danny with speakers from the four major engines. Vanessa Fox from Google, Amit Kumar from Yahoo, Peter Linsley from Ask (you should maybe do a few more posts Peter), and Eytan Seidman from Micro$oft Live Search.

* Microsoft was up first and here are the highlights: Duplicate content fragments the weight of anchor text and page rank and is difficult to algorithmically detect…as if we didn’t realize that from all the MSN issues. He goes on to explain how we should handle some of the issues, a couple which make sense and one that didn’t make any at all. He said if you’re doing a redirects from duplicate content to the original content you should be using a client-side redirect. Say what? Later during the Q&A he retracted and said he was including all types of redirects as long as it was returning a 301 and not some kind of silly javascript flip. I hope nobody left before Q&A or they could be getting prepared to screw up all kinds of things. He also discussed redirecting HTTPS pages to HTTP pages when security was not necessary. This is something I had not even thought about so it will be on my list of things to do when I return. It seems to me they should be able to easily decipher that but apparently not. One thing he emphatically said more than once was that there is no site wide penalty for duplicate content, it’s only on a page by page basis and not really a penalty at all. He says they try to avoid showing pages with “substantially identical” content. Nice. Please define “substantially identical” for me.

* Ask was up next, and it was mostly a reiteration of the Microsoft presentation. Again it was repeated that duplicate content was not a penalty. What is it with this “not a penalty” thing? It’s a damn penalty if your pages don’t rank because your retarded algorithm can’t figure out which one is more important. One new thing he mentioned was that Ask only looks at indexable content for duplication and not other areas like templates and navigational areas. They’re basically using page segmentation and duplicate content analysis together.

* Here comes Yahoo. Same drivel. This time the term was “approximate duplication”. Please define. It’s very difficult to make a site with 50 thousand pages not have some type of “approximate duplication” or “substantially identical” content. One of you engineers give me a range or something. (on another note, all of the “engineers” had trouble working PowerPoint. Except the Microsoft guy).

* Next up was Vanessa Fox. And no, unfortunately she wasn’t nude. Her presentation I kind of glazed over. She had duplicate pictures of Alyson Hannigan from Buffy the Vampire Slayer (and no, again unfortunately she wasn’t nude either). It seemed to be a basic explanation of duplicate content that I thought could have been much more advanced. Google gets an F for info, and an A for mentioning Buffy.

* In Q&A time there weren’t too many questions that were interesting. I had my hand up to ask one but never got the chance. My question was that if they were using page segmentation to analyze for duplicate content would they devalue certain sections of content on a page with “substantially identical” content areas. I also wanted them to define or give us some kind of a range that they considered “substantially identical”, but I knew they wouldn’t answer that one. There where a few interesting ideas thrown around about certain variables that would automatically tell the bots a certain page was a duplicate or some reporting features inside the webmaster consoles to show what they considered duplicate. I seriously doubt they will ever have tools like that however, they would be easily gamed.

That was it for that session. I thought it could have been a little more advanced than it was. Up next: How to spam social media networks. Stay tuned.

Similar Posts:

Comments on this entry are closed.

Previous post:

Next post: