Page Segmentation. I know it’s a dead duck, but it seems everyone has different ideas on what this actually is. I’m not an IR expert, but here is my view on the subject. Page segmentation attempts to extract the sections of a web page and value those parts. This is not based on anything visual as some people often proclaim. It will extract blocks from the document in order to find the more important sections of a page. Repetitive information such as navigation and advertisements can be easily pulled out and given a lower weight, as these components are often placed in certain positions and formats on the site in question and throughout the structure of the web, as well as often use the same text throughout. This is done using a “shingling” algorithm to determine duplicate content and to filter out those noisy links.
Based on this interpretation, different blocks may contain links to different topics. Traditional link analysis did not differentiate links out of different semantic blocks. There is a fair amount of math involved which I will skip over, but the just of it is that sections with semantically relevant content that have passed the shingling filters are more important than those that are in navigational (margin) areas. It should also be noted the user will probably be viewing the larger content related blocks with a higher frequency and be more inclined to follow those links. There are additional formulas for determining relationships between blocks on a page, as it is likely some blocks in the navigational area are related. This is why the weight of these can fluctuate.
Another theory is to use an authority score on both the entire page and on each individual block together. This is calculated by first pulling the most important pages, then looking at the most important blocks on those pages. Theoretically this will allow the search engine to filter out the noise of advertisement and navigation type links. You can see that links in advertisements are normally in less important blocks, and could be assigned a lower weight (or none at all) than those in main content areas.
Lesson here? Get your links inside relevant content. Forget sidebar and ROS links when you can. Then go and read about phrase based indexing so you actually use the correct linking model. Then go buy some content links and have a heyday.