Alex Wright

Search inside

October 23, 2003

Nadav points out Amazon's new "Search Inside" feature, whereby you can now do full-text searching within selected books in their catalog. Nifty.

Nadav also asks the entirely reasonable question of why the Library of Congress isn't already doing this? As I understand it (from library school days), there are two reasons:

  • With a collection numbering over 100 million volumes, it would be staggeringly expensive. Amazon's full-text collection, at c.100,000 volumes, is about 0.1% the size of the Library of Congress' collection - and comprised of books that presumably all originated in softcopy. Scanning and OCR-ing 100 million physical volumes would cost at least tens of billions of dollars, and could take decades.

  • As importantly, the process of scanning and digitizing a book quite often destroys the physical artifact, especially older books. Rare books librarians would be horrified at the prospect; and many hard-core librarians believe that the physical artifact can often tell us as much about a book's cultural context as the contents inside.

Which is not to say that digitizing books is a bad idea; just a more complicated proposition than it might seem.

It's worth noting that the Library of Congress has made a few limited strides towards digitizing its collection in recent years. Worth a look: American Memories

File under: User Experience

« That's our gov | for Kwong-roshi »


Glut: Mastering Information Through the Ages

Mastering Information Through the Ages

New Paperback Edition

“A penetrating and highly entertaining meditation on the information age and its historical roots.”
—Los Angeles Times     

Buy from