COMPUTER SCIENCE PROFESSORS TO DEVELOP FREE SEARCH ENGINE FOR 'DEEP' WEB
If three Old Dominion University computer science professors have their way, general Web users will be able to access a research paper, a technical report, an image of a great painting or a performance of a musical piece in just a few seconds from thousands of libraries all over the world. Currently, most digital libraries use different, non-interoperable technologies, making searches time-consuming and less accessible to the majority of Internet users and researchers.
Kaufman Professor and Computer Science Chair Kurt Maly, Assistant Professor Michael Nelson and Professor Mohammad Zubair, along with Herbert Van de Sompel of the Los Alamos National Laboratory, were recently awarded a $122,000 grant from The Andrew W. Mellon Foundation to create a high-performance federated search engine for digital libraries and a frame-work to integrate the current disparate digital libraries and general Web communities.
"Google does an incredible job at providing discovery services of the 'shallow' Web to the general public," said Maly. "The ODU team envisions a similar quality, sustainable, free discovery service for students and researchers for parts of the 'deep' Web." The parts of the deep web referred to in this vision are digital libraries and collections that expose their metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH.) A high performance federated search service that exploits the resources of a Grid will make available a large amount of information that is distributed among heterogeneous digital libraries. A search user will be able to access a research paper, preprint, a technical report, an image of a renowned painting, or a musical performance in a few seconds from thousands of libraries scattered all over the world.
Despite sharing a common toolset, there is not enough interaction between digital libraries and the general Web, according to Maly. The group plans to create an Apache module, mod_oai, which will enable OAI for the general Web community. This will greatly increase the number of people who will be able to export their metadata and resources via OAI-PMH.
Apache is an open-source Web server that is used by 63 percent approximately 27 million of the Websites in the world. OAI-PMH is a protocol to selectively harvest data from repositories and has had a considerable impact in the field of digital libraries, but it has yet to be embraced by the general Web community. Through mod_oai, the team hopes to achieve broader acceptance by making the power and efficiency of the OAI-PMH available to Web servers and Web crawlers. For example, mod_oai would be able to respond to requests to collect all files added or changed since a specified date.
Encouraging the switch from the current, resource-intensive Web harvesting model to the more efficient OAI-PMH model will also significantly reduce the load on Web servers, decrease the amount of repetitive traffic on the network and increase the freshness of harvested resources.
This article was posted on: May 24, 2004
Old Dominion University
Office of University Relations
Room 100 Koch Hall Norfolk, Virginia 23529-0018
Old Dominion University is an equal opportunity, affirmative action institution.