[padg] The "Google Five" Describe Progress, Challenges

To: padg@xxxxxxx
Subject: [padg] The "Google Five" Describe Progress, Challenges
From: Holly Robertson <har8n@xxxxxxxxxxxx>
Date: Fri, 29 Jun 2007 08:39:44 -0400
List-archive: <http://lists.ala.org/wws/arc/padg>
List-help: <mailto:sympa@ala.org?subject=help>
List-id: <padg.ala.org>
List-owner: <mailto:padg-request@ala.org>
List-post: <mailto:padg@ala.org>
List-subscribe: <mailto:sympa@ala.org?subject=subscribe%20padg>
List-unsubscribe: <mailto:sympa@ala.org?subject=unsubscribe%20padg>
Message-id: <4D595E2F-83D7-4908-A5C9-92F8BAB3DC11@virginia.edu>
Reply-to: padg@xxxxxxx

For those who did not make it to the Google Five panel discussion at ALA:

The "Google Five" Describe Progress, Challenges Brittle books, quality control, and better metadata loom large for scan plan. http://www.libraryjournal.com/info/CA6456319.html?nid=2673#news3

Their numbers have now swelled to 25, but what's up with the five pioneering libraries that signed on with the ever-growing Google Book Search? At the American Library Association Annual Conference, panelists from each library said they were pleased with the progress, though they acknowledged continuing challenges ranging from damaged books to search quality. Google product manager Adam Smith led off by describing the new "About the Book" page under construction for titles in Google Book Search, which includes key terms and phrases, references to the book from scholarly publications or other books, chapter titles, and a list of related books—even for books that aren't digitized.

At four Harvard libraries, public domain works have been scanned and links are being put in the catalog, said Harvard University Library's Dale Flecker. "We're filtering out a lot of works that are not physically up to being scanned," he noted, citing not just brittle paper but problems with binding. "We also find that condition is a filtering factor," said John Balow of New York Public Library (NYPL). Sarah Thomas of Oxford University's Bodleian Library said that "there are many books rejected because of fragile conditions." By contrast, Catherine Tierney of Stanford University said that less than one percent of books can't be sent for scanning; however, a surprising fraction of volumes are limited because they lack bar codes. Are damaged copies, one person asked, good enough to scan elsewhere, or is any library ready to sacrifice a volume to be digitized? "The things we can't send to Google, we have in the queue," Tierney said. The accumulated texts would take 36 years, 24/7, to be digitized, she said, suggesting that the issue would be reviewed as more scans appear elsewhere.

Flecker, praised the "About this book" feature and predicted that "text mining" will be an important part of research. Tierney said that seven to ten reference questions or interlibrary loan requests a week are generated by use of Google Book Search. Dunkle added that Michigan has received more international reference questions through GBS. Thomas said that the scan plan has produced "much more detailed knowledge about our collection," including the surprise that about one percent of the Bodleian Library's books have uncut pages, meaning they've never been opened.

Challenges remain, Smith conceded, including generating better metadata. Dunkle said that librarians in the Committee on Institutional Cooperation (CIC), the 12-library group that recently signed a deal with Google, hope to find ways to search across the books, though "I personally think Google will get there first." Flecker said Harvard librarians also hope Google will solve some access problems. "Right now, to be frank, I don't find the retrieval in Book Search to be that impressive." Flecker said. "There's a long ways to go." NYPL's Balow said that "good, old-fashioned librarian work" will be needed to refine searches. "There's still a great deal of room for the skills we've been working on for a long time."

As for specific drawbacks Tierney said her library received email complaining that scans have thumbs visible. "It's a lot of work," conceded Flecker. "C'mon, that's it?" asked a voice from the crowd. "Are going to sing 'Kumbaya'?" Dunkle called the tension "unfortunate" over whether the scan plan is the right thing to do.

Emory University's Martin Halbert, speaking from the audience, briefly described his university's alternative plan in which libraries retain control of the digital volumes, and can focus on coherent subject areas. Google's Smith was magnanimous. "From Google's perspective," he said, "We view this as complementary."

How to measure success? "We'll define success as getting as much of our collection digitized as we can," observed Oxford's Thomas, noting that most of the collection doesn't circulate, and that digital access can transform scholarship. Stanford's Tierney said that she hoped the growth of the program would help convince publishers to release more material in copyright "available in non-snippet view." She said she hoped the "orphan works" issue, which leaves so much published material in copyright limbo, is resolved. "I would not want my physician to be using pre-'23 medical texts," she observed.

---------------------
Holly Robertson
Preservation Librarian
University of Virginia Library

Alderman Library
Preservation - Rm 113
Charlottesville, VA 22904-4105
434.924.1055
(f) 434.243.7756
AIM: h011y2121 | GoogleTalk: h011yr0b3rts0n
hollyr@xxxxxxxxxxxx
www.lib.virginia.edu/preservation

Prev by Date: RE: [padg] Saving Sound Series Continues at ALA Washington!
Next by Date: [padg] Steve P. Anderson/AOC/MDCOURTS is out of the office.
Previous by thread: Re: [padg] Volunteers found! (out of the office)
Next by thread: [padg] Fwd: [the-insider] Help Us Locate Graduates for the LJ Salary Survey of 2006 Graduates
Index(es):
- Date
- Thread

[Table of Contents]

[padg] The "Google Five" Describe Progress, Challenges

[Subject index] [Index for current month] [Table of Contents]