So, I thought it was time to put some data to the test using the same Wikipedia Link Modeling that we had used in past to test theories on Link Depth, Link Proximity and other link diagnostics.
Last year, after SEOMoz’s ground breaking work on the relationship between Latent Dirichlet Allocation and Google rankings, we brought on Andrew Cron, a Ph.D. statistics candidate at Duke University to build our own in-house LDA model. While we now use this in nearly every content writing endeavor, it has also been useful to test out theories about content relevance.
The strategy is actually quite simple.
- Get the backlinks of around 50 unique Wikipedia articles and then determine the LDA score of the title of those Wikipedia pages to the content of the backlinking pages.
- Compare a single piece of content to 1000 randomly selected words to determine the random distribution of english language topical relationships
- Observe if Wikipedia backlinking pages generally out-perform random content in terms of relevancy.
The results were actually quite unimpressive. From what we can see, the overwhelming majority of pages that link to Wikipedia articles share no discernible topical relevance above that of random content to the article they cite.
Why People Link
We actually find that this reinforces two reasons why people link out on the web.
- Citation Links: These are links where the webmaster is citing content they have included on their page. You would expect high LDA scores because the writer is merely giving credit to the original source of that content (quoted or paraphrased).
- Descriptive Links: These are links where the webmaster is choosing to link to content rather than write about it. Because the link is offered in lieu of writing out the content, you can expect lower than average LDA scores. The link is there explicitly so the related content does not have to be. It is an alternative to relevant content.
Does this mean you should avoid getting links from related sites? Absolutely not. However, it does mean that you should not give up a link solely because the content is not textually similar to the content on your page. If the link is good for the user, it is good for Google.