Checking Out the Machines Behind Book Digitization
February 21, 2006
By Kimberly Maul
Few topics in publishing have received more attention over the past year than book-digitization. And yet, there are essentially only two companies that sell the robotic equipment and hardware necessary to quickly scan and digitize large volumes of books: Kirtas Technologies and 4DigitalBooks. Not surprisingly, major players in the digitization game—Google, Amazon—often employ their own proprietary systems; a spokesman for Google, for instance, says the company uses “some really cool stuff we’ve developed.”
For most individuals and institutions interested in digitizing their own books, Victor, N.Y.–based Kirtas Technologies and 4DigitalBooks, in Ecublens, Switzerland, will be the solution. Both companies offer their own technologies and unique capabilities, says Linda Becker, vice president of marketing and business development for Kirtas, but the differentiation seems relatively minor: Both platforms are robotic scanners that consist of three parts: a robot to turn the pages; a cradle, or table, to hold the book; and a camera.
As the book’s pages are turned, the camera captures an image of each page. During the process, the book remains intact. The potential destruction these systems avoid is among their greatest assets, of course: Earlier-generation book scanners that required the disassembly of a book for scanning necessarily came along with acquisitions personnel, collectors and bibliophiles with the Solomonic challenge of deciding whether it was worth destroying a book in order to preserve it, digitally, forever.
Take Northwestern University’s library—it is creating reprints of “brittle books whose original condition is very fragile,” says digital technology librarian Virginia Kerr, and no longer usable on a day-to-day basis. Northwestern has been working with a Kirtas machine, and is now in effect able to create brand-new antiques. “We have new copies printed on acid-free paper and bound,” says Kerr. “This will provide users with hard-copy books.”
Both companies’ machines employ suites of sophisticated software that allow for image capture and retouching as well as advanced Optical Character Recognition, or OCR, that can interpret text in scores of languages. Images are transformed into PDF files and used for archiving, web search-and-retrieval or print-on-demand purposes.
As one of the first institutions to work with Kirtas (the company currently has more than 100 customers), Northwestern installed a Kirtas APT 1200 on campus in 2005. While Northwestern is keeping their content offline—for now—the archived scans will be saved for possible future online use, Kerr says, but “online availability will be determined by copyright status, uniqueness of the titles, and choice of formats which can be effectively presented online.”
On the other hand, the University of Michigan prides itself on being involved with the Google Book Search project, where content will be searchable online, says University president Mary Sue Coleman, who recently gave a speech backing Google’s efforts. Through the Google Book Search project, the contents of books will be searchable online, though the entire text of a particular will be available only if that book exists within public domain.
Some of the books are scanned onsite, with a nondestructive—but non-robotic—scanner from German company Zeutschel, or a flatbed scanner from Fujitsu that requires destroying the books’ bindings. But most are sent offsite to Google for scanning, says John Wilkin, associate university librarian for library information technology and technical and access services.
Because these scanners take pictures, rather than line-scan, the quality of the captured image is better; further, the machines are easy to upgrade in the event of technological improvements to cameras, Becker says. Kirtas products use a 16.6-megapixel consumer camera as a part of the machine. The high-end model, the APT BookScan 2400, uses two cameras to capture both the right and left side simultaneously.
The top model of 4DigitalBooks scanners, the DL3000, also uses two cameras to capture the whole spread, but these industrial-market cameras each have 35-megapixal quality, says Ivo Iossiger, founder and president of 4DigitalBooks.
The DL3000, Iossiger says, is the fastest scanner in the world, capturing up to 3,000 pages per hour; it costs approximately $225,000. The lower-end model, the DL1800, can scan 1,800 pages per hour and costs around $190,000. In between is the DL1500, which can scan 1,500 pages per hour with higher image resolution.
Kirtas Technology offers the BookScan 800, a comparatively low-cost manual book scanner; the APT 1200 and the APT BookScan 2400, which can capture 800, 1,200 and 2,400 pages per hour, respectively; the list price of the machines runs from $89,000 to $189,000. The unique aspect of the Kirtas products, Becker says, is that the machine weighs only 150 pounds and is portable.
Rather than purchasing machines for their institutions, many universities or libraries prefer to rent them or send their books for scanning elsewhere for small or medium-sized projects. 4DigitalBooks can install a scanner at the institution, where rent can be charged on a per-month or per-page basis, Iossiger says, no different really than renting a copy machine. The rental cost can be as low as 4 cents per page scanned.
“Up until very recently, all those that were making images of documents were doing it on microfilm,” Iossiger says. 4DigitalBooks was founded in 1998 and Kirtas in 2001. But soon, he says, as Google works to make book content searchable online, more companies will join Kirtas and 4DigitalBooks to take advantage of this large market for digitization.