Project 2: Automated Custom Meta-search engine

Adrian O’Riordan, 2006

 

A meta-search engine (such as WebCrawler or Dogpile) is a search engine that utilizes multiple search engines by sending the user requests to a number of different search engines in the hope of improving recall. For example Dogpile (Dogpile.com) searches all of the following engines: Google, Yahoo, LookSmart, AskJeeves/Teoma, and MSN search. Some new search technologies also cluster results (such as Clusty and Vivisimo) but that is not considered in this project.

 

 

While meta-search engine will allow more of the web to be searched at once than any one stand-alone search engine the precision of the results can be poor. The quality of the results is no better than the quality of the search engine databases they obtain results from.

 

Also some meta-search engines present the search results from each engine separately which breaks the ordered list of ranked results. While many meta-search engines (such as Dogpile) integrate the results into a single list the user has neither the option of specifying which search engines to employ nor does the system try to use the search engines likely to handle the query best. (Search aggregators do provide the ability to choose which search engines to employ but are aimed at [RSS] feeds.)

 

The goal of this project is to design and prototype a meta-search engine that automatically chooses appropriate search engines to use for each user query. A user should be able to specify a text query (word or phrase) with a number of modifiers: field limiting (such as site:), Boolean logic (AND, OR and NOT operators) and nesting of same, requires/excludes, case sensitivity on or off, stemming on or off, stop word removal on or off, etc. Ideally the meta-search engine should only employ the engines that can explicitly deal with the type of query issued. For example Yahoo! seach allows nesting in Boolean expressions whereas Google doesn’t so a query such as “Retrieval AND (Text OR Information)” should be issued to Yahoo! Search but not Google as part of the overall search.