In a Flood Tide of Digital Data, an Ark Full of Books

Brewster Kahle (SHS '78) aims to preserve every book on this planet.

In a Flood Tide of Digital Data, an Ark Full of Books
Lianne Milton for The New York Times

The Physical Archive of the Internet Archive hopes to eventually collect 10 million items, and it has started taking in films as well.
By DAVID STREITFELD

Published: March 3, 2012 88 Comments

RICHMOND, Calif. — In a wooden warehouse in this industrial suburb, the 20th century is being stored in case of digital disaster.

Every week, 20,000 new volumes arrive at the repository.

Forty-foot shipping containers stacked two by two are stuffed with the most enduring, as well as some of the most forgettable, books of the era. Every week, 20,000 new volumes arrive, many of them donations from libraries and universities thrilled to unload material that has no place in the Internet Age.

Destined for immortality one day last week were “American Indian Policy in the 20th Century,” “All New Crafts for Halloween,” “The Portable Faulkner,” “What to Do When Your Son or Daughter Divorces” and “Temptation’s Kiss,” a romance.

“We want to collect one copy of every book,” said Brewster Kahle, who has spent $3 million to buy and operate this repository situated just north of San Francisco. “You can never tell what is going to paint the portrait of a culture.”

As society embraces all forms of digital entertainment, this latter-day Noah is looking the other way. A Silicon Valley entrepreneur who made his fortune selling a data-mining company to Amazon.com in 1999, Mr. Kahle founded and runs the Internet Archive, a nonprofit organization devoted to preserving Web pages — 150 billion so far — and making texts more widely available.

But even though he started his archiving in the digital realm, he now wants to save physical texts, too.

“We must keep the past even as we’re inventing a new future,” he said. “If the Library of Alexandria had made a copy of every book and sent it to India or China, we’d have the other works of Aristotle, the other plays of Euripides. One copy in one institution is not good enough.”

Mr. Kahle had the idea for the physical archive while working on the Internet Archive, which has digitized two million books. With a deep dedication to traditional printing — one of his sons is named Caslon, after the 18th-century type designer — he abhorred the notion of throwing out a book once it had been scanned. The volume that yielded the digital copy was special.

And perhaps essential. What if, for example, digitization improves and we need to copy the books again?

“Microfilm and microfiche were once a utopian vision of access to all information,” Mr. Kahle noted, “but it turned out we were very glad we kept the books.”

An obvious model for the repository is the Svalbard Global Seed Vault, which is buried in the Norwegian permafrost and holds 740,000 seed samples as a safety net for biodiversity. But the repository is also an outgrowth of notions that Mr. Kahle, 51, has had his entire career.

“There used to be all these different models of what the Internet was going to be, and one of them was the great library that would offer universal access to all knowledge,” he said. “I’m still working on it.”

Mr. Kahle’s partners and suppliers in the effort, the Physical Archive of the Internet Archive, are very glad someone is saving the books — as long as it is not them.

The public library in Burlingame, 35 miles to the south, had a room full of bound periodicals stretching back decades. “Only two people a month used it,” said Patricia Harding, the city librarian. “We needed to repurpose the space.”

Three hundred linear feet of Scientific American, Time, Vogue and other periodicals went off to the repository. The room became a computer lab.

“A lot of libraries are doing pretty drastic weeding,” said Judith Russell, the University of Florida’s dean of libraries who is sending the archive duplicate scholarly volumes. “It’s very much more palatable to us and our faculty that books are being sent out to a useful purpose rather than just recycled.”

As the repository expands — from about 500,000 volumes today toward its goal of 10 million — so does its range. It has just started taking in films.

“Most films are as ephemeral as popcorn,” said Rick Prelinger, the Internet Archive’s movie expert. “But as time passes, the works we tried to junk often prove more interesting than the ones we chose to save.”

At Pennsylvania State University, librarians realized that most of their 16-millimeter films were never being checked out and that there was nowhere to store them properly. So the university sent 5,411 films here, including “Introducing the Mentally Retarded” (1964), “We Have an Addict in the House” (1973) and “Ovulation and Egg Transport in the Rat” (1951).

“Otherwise they probably would have ended up in a landfill,” said William Bishop, Penn State’s director of media and technology support services.

Not everyone appreciates Mr. Kahle’s vision. One of the first comments on the Internet Archive’s site after the project was announced in June came from a writer who said he did not want the archive to retain “any of my work in any form whatsoever.”

Even some librarians are unsure of the need for a repository beyond the Library of Congress.


“I think the probability of a massive loss of digital information, and thus the potential need to redigitize things, is lower than Brewster thinks,” said Michael Lesk, former chairman of the department of library and information science at Rutgers. But he conceded that “it’s not zero.”

If serious “1984”-style trouble does arrive, Mr. Lesk said, it might come as “all Internet information falls under the control of either governments or copyright owners.” But he made clear he thought that was unlikely.

Under a heated tent in the warehouse’s western corner the other day, Tracey Gutierres, a digital records specialist, worked on a new batch. If a volume has a bar code, she scans it to see if the title is already in the repository. If there is no bar code, she checks the International Standard Book Number on the copyright page. If the book is really old, she puts it aside for manual processing.

Before the books make it the 150 feet to the shipping containers for storage, some will have to travel 12,000 miles to China. The Chinese, who are keen to build a digital library, will scan the books for themselves and the archive and then send them back. The digital texts will be available for the visually impaired and other legal purposes.

As word about the repository has spread, families are making their own donations.

Carmelle Anaya had no idea what to do with the 1,200 books her father, Eric Larson, left when he died. Then she heard about the project. “He’d be thrilled to think they would be archived so maybe someone could check them out a hundred years from now,” said Ms. Anaya, who lives in California’s Central Valley.

Her daughter Ashley designed a special bookplate. Any readers across the centuries will know where the copies came from. “The books will live on,” Ms. Anaya said, “even if the people can’t.”


connect