Out-of-the-box thinking similar to the ever more ingenious Internet strategies used to disseminate spam e-mails may someday be employed to preserve digital data, according to the hypothesis of Old Dominion University computer scientist Michael Nelson.

If the researcher is on to something-and the National Science Foundation (NSF) has made a $541,000 bet that he is-our present-day practices for protecting digital information will be augmented by quite different approaches.

Nelson's idea reminds of the maritime practice to send ships to sea in advance of a storm. The reasoning is that a passive ship tied up in port will be more vulnerable to the storm than a ship under way in the open sea.

Instead of ships, Nelson is trying to protect bundles of information called digital objects. He was informed in late December that he is the recipient of an NSF Career Grant that will fund his work over the next five years on "self-preserving digital objects." Career grants are awarded to promote the work of extraordinarily promising young scientific researchers.

"Can we create digital objects that preserve themselves?" Nelson asks. "I want to explore this." He said that e-mail spam and viral videos are the best current examples of the approach he proposes.

Mischievous e-mail or a humorous video clip can "live in the Web infrastructure with minimal hierarchical control," he said, and that is precisely how he plans to preserve digital objects containing data for technical papers, historical documents, Web pages and the like.

"I'm going to investigate if these properties can be applied to content other than pop culture ephemera," he said.

Most approaches to preserving digital information involve putting "dumb" objects in "smart" repositories. But, Nelson noted, "This reveals an implicit assumption that the repository is going to be long-lived." A repository-which he sometimes calls a "fortress"-could be a digital library maintained at a university or the host memory of Yahoo.

The "deadly embrace of repositories" is a phrase coined by computer scientist John Kunze at the University of California, and Nelson likes to repeat it. "Information goes in, but is often difficult to extract. I especially like that phrase as a succinct, vivid description of repositories," he explained.

Nelson said he is not advocating abandonment of repositories or other conventional digital preservation techniques, but he believes an alternative is needed. "These repositories are expensive and they are complicated software systems that require preservation themselves. I'm interested in digital objects that can live longer than their repositories, in information that can live longer than the people or organizations charged with their preservation."

This will require digital objects that are coded at origin to be smart enough to fend for themselves in the Web infrastructure. This extra care at creation could help sort out which digital objects should be saved and which should not. The surface Web, which most of us can access, already contains billions of documents and the so-called "invisible" or "deep" Web contains hundreds of billions of documents. Most of them exist in some sort of repository or fortress.

Giving digital objects a dollop of artificial intelligence so they can exist outside of fortresses may not be as hard as it sounds, Nelson said. He thinks complex preservation practices can emerge from a relatively simple set of directives.

This thinking reflects his fascination with "emergent behavior," a theory in science, engineering and other fields that predicts the development of complex wholes that are much greater than the sum of their simple parts. "Emergent behavior is well known in systems theory, but it has not been attempted in a preservation context," he said.

According to the proposal that NSF accepted, Nelson's project will make use of "flocking" rules created by a cinema scene programmer in the 1980s. These rules state that if you give minimal direction to a few digital agents within a bunch, such as "stay close but not too close to your neighbors," the behavior of the entire bunch will be a random but mostly predictable flocking. (Flocking is often used in animated films to create realistic-looking crowd behavior, such as the wildebeest stampede in "The Lion King.")

Nelson was a computer engineer at the NASA Langley Research Center from 1991-2002, and during that period he earned a master's degree and doctorate in computer science from ODU. He joined the faculty as an assistant professor in 2002 after a postdoctoral stint at the University of North Carolina, Chapel Hill.

In just a few years, Nelson has become an internationally recognized expert in the areas of digital libraries and digital preservation, said Kurt Maly, chair of the ODU computer science department. "We are extremely proud of Professor Nelson winning the prestigious NSF Career award. He is the first in our department and one of only a few faculty at ODU to ever win this award," Maly added.

In addition to this grant, Nelson has been principal investigator or co-principal investigator on eight grants totaling $1.8 million.

During the NSF Career Grant period Nelson will create a preservation test bed that has real content stored within self-preserving digital objects, in repositories and in the general Web. "I will create a framework for measuring preservation effectiveness and perform a quantitative analysis of the content each year," he wrote in his NSF proposal.

Beneficiaries of his work will include computer science students at ODU, whom he will teach and lead in research. Some projects will involve the preservation of old tests, assignments and projects.

Students should appreciate the timeliness and novelty of the project. "I have several Ph.D. students working on other funded projects with the Library of Congress and NSF involving alternative approaches to preservation, but this is probably our most radical approach yet.

"Self-preservation may or may not work, but the Career award will allow me to bring a preservation focus to our course offerings, and maybe the next radical preservation technique will come from the students in those courses."

Added Maly: "Today's students have iPods, digital cameras, laptops and nearly unlimited capacity to create, store and share content. Computer science provides training in many applied areas such as Web programming, networking and databases. The field must now also teach stewardship of large, personal data collections. We look forward to integrating the results of Professor Nelson's Career award into both our graduate and undergraduate curriculum."

Maly and Nelson are members, together with Mohammad Zubair, professor of computer science, of the Digital Library Research Group @ ODU. The group has developed several Web services that are used internationally and was a founding member of the Open Archives Initiative (OAI). The new "mod­_oai" module is housed at ODU under Nelson's direction. Funding for the group comes from many government agencies involved in data preservation.
ODU is one of only a dozen universities in the United States that offers courses in digital libraries.

The university's computer sciences department received more good news last year when the U.S. Department of Energy (DOE) awarded $7 million for the creation of a center to be housed at ODU that will develop software for scientific problem solving on the next generation of high-performance computers.

Alex Pothen, a professor of computer science and a member of the Center for Computational Science at ODU, is the DOE grant's principal investigator. His collaborators come from ODU, Sandia National Laboratories in New Mexico, Argonne National Laboratory in Illinois, Ohio State University and Colorado State University.

With the funding, the researchers will establish the Combinatorial Scientific Computing and Petascale Simulations (CSCAPES, pronounced "seascapes") Institute. The institute will support the DOE's broad-based Scientific Discovery through Advanced Computing (SciDAC) initiative.

This article was posted on: January 10, 2007

