Open search aggregator for medical information
The purpose of this project is to develop an open aggregator for medical/health information on the Web using the OpenSearch format. This project specifically focuses on medical information.
Search aggregation is where results from different sources are combined, e.g. a query is sent to a number of medical databanks in parallel and the results combined. Open Search is an open common format for search results or feeds. (OpenSearch 1.0 was unveiled by Jeff Bezos, founder, president and C.E.O. of Amazon, at the Web 2.0 conference in 2005.)
The search aggregator software could use the nutch library (described below) to build a simple Web search engine that queries a small number of chosen on-line medical databanks. Contrast search aggregators with feed (RSS) aggregators that are used for reading feeds (Thunderbird, Google Reader, etc.).
There are dozens of health-focused search engines and portals including HealthLine, MedHunt, MedicineNet, omnimedicalsearch, and HealthFind. Some of these are proprietary, others use Google Custom search. These cannot be easily aggregated since they are primarily intended for direct user interaction and use different formats.
OpenSearch is a set of simple formats for the sharing of search results. OpenSearch helps search engines and search clients communicate by introducing a common set of formats to perform search requests and syndicate search results. Formats such as RSS, Atom and HTML are acceptable.
A9, YaCy and MozDev are examples of general-purpose search engines that support
OpenSearch. nutch is a useful open source library for building search engines that supports OpenSearch. It builds on Lucene Java (text search engine), adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
Refs:
http://en.wikipedia.org/wiki/OpenSearch
http://www.opensearch.org/
http://a9.com/
http://lucene.apache.org/nutch/
http://www.healthline.com/