Breakthrough Work to Improve Search Engine Algorithms Earns Best Paper Award

12/2/2011

CS grad student earns best student paper award.

Written by

University of Illinois computer science graduate student Yuanhua Lv received the Best Student Paper Award at the 20th ACM Conference on Information and Knowledge Management (CIKM 2011) for his breakthrough work to improve the general search engine algorithm used in all search engines.

Illinois computer science Yuanhua Lv
Illinois computer science Yuanhua Lv
Illinois computer science Yuanhua Lv

 

 

The performance of a search engine is mainly determined by its retrieval model which formally specifies how to score and rank documents optimally for a user's query. Optimization of retrieval models is a fundamentally important research problem because an improved retrieval model would lead to improved performance for all search engines. However, until Lv’s work, it has proven difficult to improve the current retrieval models.  The well-known BM25 retrieval function was proposed in 1994, but efforts to improve its performance since then have been fruitless.

In order to improve existing approaches, Lv’s work “Lower-bounding term frequency normalization” first analyzed the deficiencies in current models, and revealed a previously unknown common deficiency in all current retrieval methods, namely that the component of term frequency normalization by document length is not properly lower-bounded. As a result of this deficiency, long documents which do match the query term can often be scored by search engines unfairly as having a lower relevancy than shorter documents that do not contain the query term at all. For example, for a query such as "computer virus", a long document matching both "computer" and "virus" can easily be ranked lower than a short document matching only "computer".

Lv’s discovery led him to create a novel methodology for introducing a sufficiently large lower bound for term frequency normalization, which can be used as a plug-and-play patch to multiple current retrieval models to eliminate the problem.  Lab results indicate that Lv’s methodology incurs almost no additional computational costs while delivering more precise search results, particularly in cases where the queries are verbose.

“This paper makes a breakthrough contribution in improving the general search algorithms used in all search engines,” said Illinois computer science professor ChengXiang Zhai, the paper’s co-author. “The new models proposed can improve over many strong baseline methods that have been the state of the art for more than a decade. These new models can be immediately used to replace the old models in all search engines to improve search accuracy.
 


Share this story

This story was published December 2, 2011.