[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[padg] The "Google Five" Describe Progress, Challenges
For those who did not make it to the Google Five panel discussion at  
ALA:
The "Google Five" Describe Progress, Challenges
Brittle books, quality control, and better metadata loom large for  
scan plan.
http://www.libraryjournal.com/info/CA6456319.html?nid=2673#news3
Their numbers have now swelled to 25, but what's up with the five  
pioneering libraries that signed on with the ever-growing Google Book  
Search? At the American Library Association Annual Conference,  
panelists from each library said they were pleased with the progress,  
though they acknowledged continuing challenges ranging from damaged  
books to search quality. Google product manager Adam Smith led off by  
describing the new "About the Book" page under construction for  
titles in Google Book Search, which includes key terms and phrases,  
references to the book from scholarly publications or other books,  
chapter titles, and a list of related books—even for books that  
aren't digitized.
At four Harvard libraries, public domain works have been scanned and  
links are being put in the catalog, said Harvard University Library's  
Dale Flecker. "We're filtering out a lot of works that are not  
physically up to being scanned," he noted, citing not just brittle  
paper but problems with binding. "We also find that condition is a  
filtering factor," said John Balow of New York Public Library (NYPL).  
Sarah Thomas of Oxford University's Bodleian Library said that "there  
are many books rejected because of fragile conditions." By contrast,  
Catherine Tierney of Stanford University said that less than one  
percent of books can't be sent for scanning; however, a surprising  
fraction of volumes are limited because they lack bar codes. Are  
damaged copies, one person asked, good enough to scan elsewhere, or  
is any library ready to sacrifice a volume to be digitized? "The  
things we can't send to Google, we have in the queue," Tierney said.  
The accumulated texts would take 36 years, 24/7, to be digitized, she  
said, suggesting that the issue would be reviewed as more scans  
appear elsewhere.
Flecker, praised the "About this book" feature and predicted that  
"text mining" will be an important part of research. Tierney said  
that seven to ten reference questions or interlibrary loan requests a  
week are generated by use of Google Book Search. Dunkle added that  
Michigan has received more international reference questions through  
GBS. Thomas said that the scan plan has produced "much more detailed  
knowledge about our collection," including the surprise that about  
one percent of the Bodleian Library's books have uncut pages, meaning  
they've never been opened.
Challenges remain, Smith conceded, including generating better  
metadata. Dunkle said that librarians in the Committee on  
Institutional Cooperation (CIC), the 12-library group that recently  
signed a deal with Google, hope to find ways to search across the  
books, though "I personally think Google will get there first."  
Flecker said Harvard librarians also hope Google will solve some  
access problems. "Right now, to be frank, I don't find the retrieval  
in Book Search to be that impressive." Flecker said. "There's a long  
ways to go." NYPL's Balow said that "good, old-fashioned librarian  
work" will be needed to refine searches. "There's still a great deal  
of room for the skills we've been working on for a long time."
As for specific drawbacks Tierney said her library received email  
complaining that scans have thumbs visible. "It's a lot of work,"  
conceded Flecker. "C'mon, that's it?" asked a voice from the crowd.  
"Are going to sing 'Kumbaya'?" Dunkle called the tension  
"unfortunate" over whether the scan plan is the right thing to do.
Emory University's Martin Halbert, speaking from the audience,  
briefly described his university's alternative plan in which  
libraries retain control of the digital volumes, and can focus on  
coherent subject areas. Google's Smith was magnanimous. "From  
Google's perspective," he said, "We view this as complementary."
How to measure success? "We'll define success as getting as much of  
our collection digitized as we can," observed Oxford's Thomas, noting  
that most of the collection doesn't circulate, and that digital  
access can transform scholarship. Stanford's Tierney said that she  
hoped the growth of the program would help convince publishers to  
release more material in copyright "available in non-snippet view."  
She said she hoped the "orphan works" issue, which leaves so much  
published material in copyright limbo, is resolved. "I would not want  
my physician to be using pre-'23 medical texts," she observed.
---------------------
Holly Robertson
Preservation Librarian
University of Virginia Library
Alderman Library
Preservation - Rm 113
Charlottesville, VA 22904-4105
434.924.1055
(f) 434.243.7756
AIM: h011y2121 | GoogleTalk: h011yr0b3rts0n
hollyr@xxxxxxxxxxxx
www.lib.virginia.edu/preservation