Tuesday, November 20, 2007

LSI (Latent Semantic Indexing) : The new face of search

Latent Semantic Indexing

How do you look up for information on the internet? You type in the words in a search box and wait for the information to come up. But if you go through the results in detail, you will find that after the first few results pages, the rest are all irrelevant to the term that you searched for.

This is the retrieval method that is popular the world over today but will be soon replaced by a highly sophisticated model called LSI (Latent Semantic Indexing).

LSI is a new concept based retrieval method which uses a term and document matrix to describe or bring out the occurrence of terms in various documents. The results have been known to be 30% more effective than any conventional form of search that has been used.

Why it works

The reason why LSI works can be attributed to a term called ‘Shared words’. If you are searching for the term ‘mobile phones’, you might miss out on the result pages which also have the words, ‘Cellular Phone, lightweight, camera phones, etc.

While the user may find these words alike, that’s not how a spider thinks. LSI eliminates this problem by searching according to the concept of a searched term rather than its presence in the result pages.

It eliminates a lot of hassles for both the searcher as well as the content provider who does not have to carefully craft out a database based on keywords.

How it works

Most LSI software uses a completely automated system that is called Singular Value Decomposition. By using this, it creates a semantic or concept space and then improves successful retrieval of data.

In simpler terms, an LSI based model will be able to identify that cellular phones and mobile phones and lightweight phones occur in the same context and hence the results will be much more detailed and relevant.

1 comment:

Durgesh said...

It is a good article for LSI..

By
www.centralpennbusiness.com