From the monthly archives:

August 2007

If you're new here, you must subscribe to my RSS feed, or I will hunt you down. Thanks for visiting!

This Wikipedia phenomenon is getting way out of hand. I know it’s been said before many times. Graywolf had an interesting post some time ago that stuck in my mind about Google and their love affair with Wikipedia that I decided to expand upon. I set my scrapers interns to work. Here is the process I went through (the data was pulled on August 12th):

From Wordtracker I pulled the top 1000 long term keywords (last 90 days) and the top 1000 short term keywords (last 48 hours) with offensive terms removed. I downloaded the buzz list from the SEOmoz popular searches tool. I then de-duped and cleaned the list and it came to a final count of 1,792 keywords. I needed to compare this to some other sites with a high level of “domain authority” so I also had my scrapers neighbors kids run rankings for About.com, Amazon, Craigslist, eBay, and MySpace. First Conclusion: Wikipedia has secret agents inside Google. Not only are they having a love affair with Google but MSN as well. They’re just married to Yahoo as they get the least amount of action from them. See the final results for yourself:
 
 
Google

Wikipedia Google Rankings
 
 
Yahoo

Wikipedia Yahoo Rankings
 
 
MSN

Wikipedia MSN Rankings
 
 

Holy Shit Captain Kirk! Wikipedia is the most or nearly the most relevant for almost half of all my searches! I bet Google can even search for Spock and find him faster than you! Give me a break Google. And Yahoo. And MSN. Wikipedia is full of crap all over the place. Where is Encyclopedia Britannica? Wouldn’t they also be a little bit relevant for something? Let’s look at some terms Wikipedia ranks for in the top 3:

* [search engine optimization] - Currently #1 on every engine and the most relevant source on SEO that exists. Sorry SearchEngineWatch. I guess you and even Danny Sullivan are much less relevant than Wikipedia on SEO.

* [SEO] - See above.

* [text messages] - okay, I suppose in case I don’t know what a text message is.

* [boys] - That’s creepy.

* [girls] - “A girl is a female child, as opposed to a boy, a male child.” Thanks. That deserves a #2 spot.

* [firefox] - (MSN) obviously the Wikipedia entry is more relevant than the actual Firefox page.

* [cheerleaders] - You really think I want to know what a cheerleader is?

* [lap dance] - See above

* [internet] - what is this thing?

* [art] - I’d like to buy some art. Or look at some art. Nope, I get to find out that “Art is a (product of) human activity, made with the intention of stimulating the human senses”

* [booty] - what?

Obviously this could go on and on. I think it’s time for all the search engines to take a hard look at the value of Wikipedia and what a trusted domain really is.

** If you would like to download all the data (terms, ranks, etc) from this experiment, you may grab the rar file here.

{ 9 comments }

Super Duper SEO Tools

by TheMadHat on August 15, 2007

So I’m digging through my slow ass Google feed reader Sunday and I come across another SEO tool set. I rolled my eyes thinking “how many more useless SEO tools can these hacks push?”

Let’s start with the current landscape. There are quite a few very useful tools out there that save us all a lot of development time and give beginners the ability to start off on solid ground. There are several iterations of these and many more that are useful, but here are some of my favorites that are publicly available:

* Keyword Cleaner - SEO Book makes cleaning your keyword list a snap. This is a great tool for beginners that don’t have their own customized solutions. Once you’re comfortable with it you can modify the source for your convenience.

* Compete - Trending data that gives you a good snapshot of visitors, engagement, and growth.

* URL Trends - Another domain tracking service that supplies trending data on various things like backlinks, social media mentions, and top 10 rankings. This will give you a good overview of historical growth.

* SEOmoz SEO Tools - All of them, and in my opinion the best collection available to the public. Page Strength is probably the most popular and can be a great benchmark for comparing and analyzing domains. Link Finder is a newer one that finds authority domains that rank well for your specified keywords. Keyword Difficulty Tool seems considerably accurate. Popular Searches gives you the best picture of the current buzz and most popular searches daily. I am currently a premium member and fully recommend paying for one. It’s certainly worth the cost so if you’re on the fence go ahead and pull the trigger. It’s especially great for getting beginner and intermediate level SEO’s up to speed very quickly and well worth the investment. (I have no affiliation and was not compensated for this post Matt…I do wish they would change my profile name to TheMadHat but I guess my domain would stop passing PR if they did so. Right Google Gestapo Secret Agent Man?).

Anyway, enough link love for Rand and Company; let’s move on to the out of date and/or completely worthless tools. There are dump trucks full of crap out there that a 10-year-old could build, if there was a point. Some of these include:

* Can anyone say keyword density tools? Don’t waste your time please.

* Meta Tag Generators. Go to truck driving school please.

* Submit your website to 152,982,482 MAJOR search engines. Are these things really still around? Whatever. Please.

Occasionally you’ll find something that’s worth digging into and will actually help your SEO efforts. Most tools are focused around analysis of websites, keywords, competition, etc. While these of course help in your SEO efforts, very few actively increase you’re rankings (I’ve heard SQUIRT is effective, but I’ve not yet tried it). That’s why these upcoming elite SEO tools from Digerati Marketing caught my attention. Here are the details:

* Link Backrub - This tool will increase your backlinks massively by scouring the Internet for sites that link to you, that are not indexed in search engines. Any links it finds to your site, which are not indexed by Google, it will get them indexed - almost instantly, thus letting Google see all of your backlinks. This will of course, improve your rankings.

* Flashdex - This tool will get ANY page indexed in Google within 1 hour - guaranteed.

* Social Storm - This neat little bit of script can get a single page or multiple pages socially bookmarked on the top 20 social bookmarking and tagging sites over 190 times, automatically, from different IPs at random times over the course of weeks. This can give you massive traffic boosts.

* StumbleXchange Automator - StumbleXchange is a great site, but it takes so damn long! This program will automate the entire process for you! No more hours of stumbling other peoples pages, just click and go to sleep!

* Link Buster - This tool will build you over 100 relevant links per month, to any page requested - and it’s not blackhat!

I’m really interested in taking a look at these tools and how they work (hint hint). I’m especially curious about how they dodge leaving a footprint. If I get a chance to test this thing or sign on after they are released I plan on checking out SQUIRT and comparing these tools and measuring their effectiveness.

{ 0 comments }

SEO Interview Questions - Part II

by TheMadHat on August 9, 2007

My previous post on SEO interview questions I had a commenter ask where they could find the answers to these questions. Most of them are purposely open ended to get an idea of the level of experience and knowledge of your candidate. I will take the more specific ones and provide some explanations. I have also added an additional question not on the original post. If you are looking to hire or want to dazzle your prospective employers at an interview this post may be helpful.

8) What areas do you think are currently the most important in organically ranking a site?
Obviously a subjective answer, but domain trust, inbound links/anchor text, and properly formatted title tags are a good start.

10) What kind of strategies do you normally implement for backlinks? What do you think about link buying, link bait, and other specific backlink strategies?
There are too many correct answers for this one, so let’s go with the wrong answer: “Reciprocal link requests”

42) What is the Ultimate Answer to Life, the Universe, and Everything?
42 of course. If your candidate doesn’t know this please shoot him with the Point-of-view gun.

22) What is page segmentation? (ever heard of VIPS?)
VIPS is a research paper from Microsoft that stands for Vision-based Page Segmentation which is just an offshoot of the general topic of page segmentation. It is an analysis of how a user understands web layout structures based on visual perception and is independent from the underlying code and technologies. Each section of the page is segmented into blocks and different degrees of relevance are put on each block. This explains one reason why links in content areas are more heavily weighed than sidebar and navigational links (another reason is through the use of shingling algorithms, which I’ll get into on another question). Since this is a visual topic, I’ll give you a visual example from the research paper:

Vision-based Page Segmentation

23) What’s the difference between PageRank and Toolbar PageRank?
Internally PageRank is constantly updated while toolbar PageRank is updated every 2-3 months. Toolbar PageRank is a single digit integer while the internally calculated PageRank is more like a floating-point number. And the final answer: Who cares?

24) What is Latent Semantic Analysis (LSI - Indexing)?
The process of analyzing the relationships between terms in sets of documents. The engine looks not only at the query, but also looks for common terms in the document set. Documents that are semantically similar will carry more weight than those that are not. This is often a misunderstood concept.

25) What is Phrase Based Indexing and Retrieval and what roles does it play?
Phrase based indexing is used to classify good and bad phrases based on certain criteria inside the entire document. The number and proximity are taken into account. It also is capable of predicting the presence of other phrases on the page and will assign a higher or lower value depending on if those phrases or present or not.

26) In Google Lore - what are ‘Hilltop’, ‘Florida’, and ‘Big Daddy’?
Hilltop: An old and often contested algorithm that calculates PageRank based on expert documents and topical relevancy. The theory behind it was to decrease the possibility of manipulation from buying high PR links from off topic pages. This was implemented during the Florida update, which is our next topic.

Florida: The highly controversial update implemented by Google in November of 2003, much to the chagrin of many seasonal retail properties. There were several theories as to what was included in this update; Over optimization filter, competitive term filter, and the Hilltop algorithm. This update had catastrophic results on many web merchants.

Big Daddy: A test data center used by Google to preview algorithm changes. This information was made public around November of 2005 by Matt Cutts and allowed marketers to preview upcoming SERP’s.

What is a shingling algorithm and how is it used?
A shingling algorithm is a page segmentation method similar to VIPS, but less resource intensive and more likely to be used in search engine algorithms. These shingling algorithms look for blocks of content that do not occur frequently across a web site and look for blocks with certain desired features. When the engine stores this information, the navigational, advertisements, and other non-content areas are omitted. This increases speed, saves on storage space, and theoretically makes the results more relevant because of the increase in unique content.

{ 22 comments }

Friday Tea Time - 8/3/07

by TheMadHat on August 3, 2007

Friday Tea Time. Finally I know. Shut it, I’ve been busy. Without further irrelevant content:

* Aaron Wall estimates the value of the long tail and tells you loyal readers how to do it.

* An excellent post for beginner web marketers from SEO Black Hat about where to start.

* Discovered via SEOmoz, an very comprehensive list of blogging resources. I skimmed around on this site for a while and he/she has quite a few excellent list that don’t normally get a lot of attention so check it out. And just because I’m in a generous mood this morning and I will probably be referencing that list in the future: Go and get some business credit cards!

* Manchester United is recruiting nine year old kids off of YouTube? Well, he does look fairly decent. A good elbow to the side of the head might slow him down.

* From Search Engine Journal an interesting take on manipulating your Adwords quality score. You can most certainly use this process for manipulating troubleshooting organic results as well.

* Is today’s SEO tomorrow’s spam? Of course it is, don’t be naive. Paid links are already considered spam and in most markets you can’t compete without doing it to some extent. I suppose if you want to white hat your way to #1 for rolling marbles across the floor then be my guest.

* Cross-linking non-relevant site is bad. No shit? Way to give the world such timely information on your algorithm. By the way, JavaScript redirects are also bad.

* Unconditional Link Love. Excellent advice DG.

* I’m not a big fan of lists, but they work. I guess this is a list, just without numbers. Oh well. Anyway, Copyblogger has a list about writing lists.

* Darren is doing another build a better blog month. That’s a lot of hard work and so far his advice has been spectacular, so be sure to keep up with ProBlogger this month.

* Intraweb Gold: That’s just weird.

That’s all folks.

{ 0 comments }

Personalized Search And Why It Sucks

by TheMadHat on August 1, 2007

I recently had a guest post on 8 Ways To Optimize For Personalized Search over at Search Engine Journal so I figured I would follow that up with why I think it’s not good for users, marketers, and the search engines alike.

1. Personalized Search limits the discovery of new sites and new information. The higher placement of sites you have visited and browsed before pushes anything new that you may have not seen before down the page. Users will have to start finding new content in other ways than search; through blogs, news sites, etc. The users don’t like this and the search engines certainly don’t want people finding content in other ways. If the result sets are filled with things I’ve already seen I’m going to look elsewhere.

2. Personalized Search does not equal more relevant results. Simply because I’ve visited and browsed through a site once or even multiple times doesn’t mean that I think it’s more relevant. In some cases it might be, it others probably not. Google will never know what is more relevant based on my browsing and search history. Take this fairly old post from Graywolf for an excellent example. They might have some decent indicators but their accuracy in this area isn’t even going to be as close as the relevancy of their current results, which we know are filled with search pages from authority domains and edu’s selling Viagra.

3. It’s the most invasive spyware known to man. Google records everything you search for, everything you look at, and probably everything you think about. Imagine a massive database filled with every search you’ve ever done and every web page you’ve ever seen. It’s not very appealing is it? Sure, I don’t mind if it will actually give me something useful in the long run and if I can turn it off and on at the flick of a switch. That way, when I’m searching for [how to infiltrate the Googleplex] no one will be the wiser.

4. Entering the market just got much more difficult. New (legitimate) sites are going to have problems without experienced and professional help that will be very costly. If Google does not have data on your website, even if you go out and purchase an old domain with some already established trust, you’re still up shit creek. Without any users looking at your site then you won’t have new subscribers to your feeds, you won’t have any click-through data, and you will be pushed down the results in favor of something “more popular”…like craigslist.

5. It won’t put a dent in spam. I keep hearing over and over how this is going to eliminate spam. I said in my article over at SEJ that they will be working with very large data sets and filtering out the majority of the automated spam, and they will. The majority of it. Much like they do now. The other 2% of spammers are the ones that are good enough to hijack botnets and fill the world with edu spam and everything else imaginable. Very quickly they will figure out ways to spam personalized search and the flood gates will once again open.

6. They will never be able to determine intent with any accuracy. Ever The engine reps always bring up the “Jaguar and Jaguar” example about how to deliver searches to a biologist and a car enthusiast. WTF? Maybe the freaking biology teacher wants to drive a Jaguar and maybe the rich guy already driving a Jaguar wants to go on a safari? No search engine will ever know. That is until Google finishes SkyNet and they take over with their mind reading robots. You will be assimilated, or they’ll make you go work for Wikipedia.

{ 3 comments }