Friday, June 27, 2008

Key to the Pandora box


The need for information is infinite and the more you get the more you want. Information is the key to making informed decisions. The more we know the better we get and the better we get the more want to know.

Internet, a powerful source of information has the potential to quench our thirst for information. However, looking for information on this vast database is a daunting task. The Introduction of Search Engines has opened the flood gates and made the already available information more accessible. But can a Search Engine understand a human query like any other human would do. Can the tell the with accuracy, if someone meant 'apple' for a fruit or computer in reference to context. Lets set upon a journey to find out if they can.

Latent Semantic Analysis (LSA),according to Wikipedia, is a techniques employed in natural language processing, for analyzing relationship between a set of documents and the terms they contain, by producing a set of concepts related to the documents and terms. In simple words, it is an umbrella term for a family of techniques used for searching and organizing large digital data collection. The motive is to find symmetry in unstructured data and use these patterns to offer more effective search and categorization. Latent Semantic Analysis (LSA) is sometimes also referred as Latent Semantic Indexing (LSI).

Until recently, keyword density ranked high on every Search Engine optimizer's agenda. Most of them agreed that the keyword density per web page, should be in the region of 2%-7%. With the advent of LSI, keyword density has lost some of its importance. The emphasis now lies on semantically related words and phrases and their co-occurrence. Terms, synonyms, buzz words, acronym etc., anything that can be used to establish the topic, context and theme of any given page will have an impact on how it is perceived by Search Engines. The relevance of theme words and phrases would indisputably play a crucial role, as more Search Engines incorporate LSI or part of the concept in their algorithms.

LSI is often misunderstood in its true purpose because of the mathematical complexity involved in making it work. Vector space model, the concept behind LSI involves intricate calculations and understanding. However, this fact should not discourage anyone because, the idea is to understand its impact on Search Engine rankings and not on how it is implemented.

LSI or parts of it is employed at some level in a ranking algorithm, to alleviate issues with ranking pages solely by matching text patterns. Search results purely based on matching text patterns would produce irrelevant results because relevance to context would be absent. For example a person looking for “apple” and “computer” will also be interested in “Mac OS” as they are interlinked and hence relevant. Its all about trying to anticipate and understand more about the nature and intent of the user query. By doing so Search Engines endeavor to return information in context with the user's searched term.

The most amazing thing about implementing the essence of LSI, is the way in which it has made Search Engines more human. They can now discern a link between related term, for example a link between a cat and a dog. We as human know they are household pets and hence can categorize them accordingly. It is easy for a human brain to comprehend this but not for Search Engines. LSI technique has taken Search Engines a step further in helping them draw this analogy. They are better equipped to provide users with information in a more contextual term.

LSI featured in rare discussions until it's principles were adopted by Search Engines as a potent tool. The concept of LSI has been around for more than a decade. Parts of it, has been used by Search Engines to refine their search and abused by unethical marketers to mislead people who lack knowledge in this particular field. The author hopes that his article would clear some mystery and myths surrounding LSI. Its application has made Search Engine more effective and efficient, and with advances being made to improve this technique, the author is sure that Search Engines would achieve new milestones.


No comments: