The History of Latent Semantic Indexing
It's sometimes fun (well, if you're involved with SEO) to look at how optimization theories sometimes form and seem to be truthful, but even years afterwards are still being discussed. Such is the case with Latent Semantic Indexing or LSI.
On February 3, 2005, Aaron Wall wrote this about LSI: “Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text. By placing additional weight on related words in content LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.”
At that time, Orion of Search Engine Watch wasn't convinced and wrote, ”
No thanks. This time I prefer that others demystify LSA/LSI in connection with search engines ranking/indexing.”
Fast forward almost two years later and we have Dr. Garcia reminding everyone of, “all those LSI-based myths promoted by snake oil marketers, like that there is such thing as LSI-friendly documents, LSI and link popularity and the dumb notion that displaying a tag cloud of terms is evidence that a company has any LSI-like technology. I have a debunked collections of these and similar SEO tales.”
Less than two months later on February 7, 2007 Bruce Clay posted about recent ranking changes that have been dubbed the minus-950 penalty. In his post he writes that some have speculated that this penalty is a result of a recent Google patent that describes what seems to be a, “low-scale version of latent semantic indexing.” This certainly fits since the patent abstract includes this high-level description, “Phrases are identified that predict the presence of other phrases in documents. Documents are then indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.”
So what's the verdict on LSI? I don't know. It's certainly something worth watching. The good news is that all of the recommendations from pro-LSI folks seems to boil down to using different keywords with similar meanings when writing content which is something that would happen anyway if you write with your users in mind. In addition, link development that keeps an eye towards “looking natural” will probably survive any algorithm changes that include LSI factors.