Search inside
October 23, 2003
Nadav points out Amazon's new "Search Inside" feature, whereby you can now do full-text searching within selected books in their catalog. Nifty.
Nadav also asks the entirely reasonable question of why the Library of Congress isn't already doing this? As I understand it (from library school days), there are two reasons:
- With a collection numbering over 100 million volumes, it would be staggeringly expensive. Amazon's full-text collection, at c.100,000 volumes, is about 0.1% the size of the Library of Congress' collection - and comprised of books that presumably all originated in softcopy. Scanning and OCR-ing 100 million physical volumes would cost at least tens of billions of dollars, and could take decades.
- As importantly, the process of scanning and digitizing a book quite often destroys the physical artifact, especially older books. Rare books librarians would be horrified at the prospect; and many hard-core librarians believe that the physical artifact can often tell us as much about a book's cultural context as the contents inside.
Which is not to say that digitizing books is a bad idea; just a more complicated proposition than it might seem.
It's worth noting that the Library of Congress has made a few limited strides towards digitizing its collection in recent years. Worth a look: American Memories
File under: User Experience
_____________________« That's our gov | for Kwong-roshi »
GLUT:
Mastering Information Through the Ages
New Paperback Edition
“A penetrating and highly entertaining meditation on the information age and its historical roots.”
—Los Angeles Times