Friday, June 27, 2008

Key to the Pandora box


The need for information is infinite and the more you get the more you want. Information is the key to making informed decisions. The more we know the better we get and the better we get the more want to know.

Internet, a powerful source of information has the potential to quench our thirst for information. However, looking for information on this vast database is a daunting task. The Introduction of Search Engines has opened the flood gates and made the already available information more accessible. But can a Search Engine understand a human query like any other human would do. Can the tell the with accuracy, if someone meant 'apple' for a fruit or computer in reference to context. Lets set upon a journey to find out if they can.

Latent Semantic Analysis (LSA),according to Wikipedia, is a techniques employed in natural language processing, for analyzing relationship between a set of documents and the terms they contain, by producing a set of concepts related to the documents and terms. In simple words, it is an umbrella term for a family of techniques used for searching and organizing large digital data collection. The motive is to find symmetry in unstructured data and use these patterns to offer more effective search and categorization. Latent Semantic Analysis (LSA) is sometimes also referred as Latent Semantic Indexing (LSI).

Until recently, keyword density ranked high on every Search Engine optimizer's agenda. Most of them agreed that the keyword density per web page, should be in the region of 2%-7%. With the advent of LSI, keyword density has lost some of its importance. The emphasis now lies on semantically related words and phrases and their co-occurrence. Terms, synonyms, buzz words, acronym etc., anything that can be used to establish the topic, context and theme of any given page will have an impact on how it is perceived by Search Engines. The relevance of theme words and phrases would indisputably play a crucial role, as more Search Engines incorporate LSI or part of the concept in their algorithms.

LSI is often misunderstood in its true purpose because of the mathematical complexity involved in making it work. Vector space model, the concept behind LSI involves intricate calculations and understanding. However, this fact should not discourage anyone because, the idea is to understand its impact on Search Engine rankings and not on how it is implemented.

LSI or parts of it is employed at some level in a ranking algorithm, to alleviate issues with ranking pages solely by matching text patterns. Search results purely based on matching text patterns would produce irrelevant results because relevance to context would be absent. For example a person looking for “apple” and “computer” will also be interested in “Mac OS” as they are interlinked and hence relevant. Its all about trying to anticipate and understand more about the nature and intent of the user query. By doing so Search Engines endeavor to return information in context with the user's searched term.

The most amazing thing about implementing the essence of LSI, is the way in which it has made Search Engines more human. They can now discern a link between related term, for example a link between a cat and a dog. We as human know they are household pets and hence can categorize them accordingly. It is easy for a human brain to comprehend this but not for Search Engines. LSI technique has taken Search Engines a step further in helping them draw this analogy. They are better equipped to provide users with information in a more contextual term.

LSI featured in rare discussions until it's principles were adopted by Search Engines as a potent tool. The concept of LSI has been around for more than a decade. Parts of it, has been used by Search Engines to refine their search and abused by unethical marketers to mislead people who lack knowledge in this particular field. The author hopes that his article would clear some mystery and myths surrounding LSI. Its application has made Search Engine more effective and efficient, and with advances being made to improve this technique, the author is sure that Search Engines would achieve new milestones.


Tuesday, June 24, 2008

Of the people, for the people, by the people


I was recently asked what web 2.0 stood for and I found myself groping for a credible explanation. Snippets of information, I had read all along flashed through my mind, but still, I found myself at complete loss of words. “Does web 2.0 mean anything at all”, a thought coupled with the request of my colleagues to find information relating to web 2.0, triggered this inquest. The term has been tossed generously across the world wide web and I decided to start looking for an answer using the same medium.

Wikipedia was my preferred choice and according to them “ Web 2.0 is a term describing the trend in the use of World Wide Web technology and web design that aims to enhance creativity, information sharing, and, most notably, collaboration among users. These concepts have led to the development and evolution of web-based communities and hosted services, such as social-networking sites, wikis, blogs, and folksonomies”. In my opinion (and as described by wiki), web 2.0 is more of a trend than technology, and cannot be used to demarcate web 2.0 from web 1.0. I was thinking about it, more on lines of an ongoing evolution of the world wide web rather than a distinct point in time when a technological transition happened. As always, all things evolve over a period of time and its natural and bound to happen. Describing it as a particular phenomena is unjust. I wanted to have a second opinion, as I always do (to be sure) and found resonance of my thoughts in the words of Tim Berners-Lee, the founder of the Internet. According to him “Web 2.0 is a piece of jargon. Nobody really knows what it means”. He went on to further say that “if web 2.0 for you is blogs and wikis, then that is people to people. But that was what the web was supposed to be all along”. The part relating to people is of particular interest to me, as it signals a shift in the perception on world wide web as a medium. Web 2.0 could be treated more as the next gen world wide web that draws its effectiveness from the ability of users to collaborate and share information.

I wanted to know more about the genesis of the term, to see if the people who invented the term had a different school of thought. According to Tim O' Reilly and Dale Dougherty, accredited for coining the term during a conference brainstorming session in 2004, “Web 2.0 does not have a hard boundary, but rather, a gravitational core. You can visualize Web 2.0 as a set of principles and practices that tie together a veritable solar system of sites that demonstrate some or all of those principles, at a varying distance from that core”. In principle, the world wide web is still the same but has added new dimensions to itself, in a sense evolved. As technology advanced, so did the web and the process is never ending. So was their a need to give this evolution a name or was it just a marketing gimmick? Maybe, it wasn't a deliberate attempt to name this trend but just an attempt to show that the web mattered again, after the dot-com bubble burst. Four years since, and the industry still lacks consensus on what web 2.0 is and what it constitutes off.

In essence, web 2.0 is more of a platform based on the principle that service automatically gets better, the more people use it. This architecture of participation uses web as an intelligent broker connecting the edges (end user). The philosophy being that the users don't just passively imbibe information from the web but actively contribute and supplement information, that is already available. Sites like Youtube, Wikipedia, Facebook, LinkedIn etc derive its effectiveness from this inter human connectivity. Bart Decrem, founder of Flock, calls this the “participatory Web”. This has empowered users in ways, we could have only imagined a decade ago. The web is more democratic than ever before, with big players ready to relinquish the user end control. Those who fail to see the human side of this technological revolution, are or destined to be lost in oblivion. The jury is still out on whether web 2.0 is a new concept or was it always present. All we know for sure is that it is happening now and we can see it happen. No one can predict when it would stop or when it started, and would take its own course.

I am quite amazed that I started my research by using Wikipedia, which now turns out to be a facet of web 2.0. Also, I have not forgotten that I still need to define what web 2.0 stands for, so that next time someone asks me this question, I can save myself from looking like an idiot, by having a ready made answer. Web 2.0 can be described as “world wide web – of the people, for the people, by the people”. It might not be a scientifically correct definitions, but I think this is web 2.0 in a nutshell.