News, Views and Careers for All of Higher Education
Open Library is a new online tool for finding information about books – even (perhaps especially) for titles that are out-of-print, scarce, or likely to find one reader per decade, if even that. It is, so to speak, a catalog with benefits. If a text is available in digital format, there is a link. you to it. Citations and excerpts from reviews will be available. Likewise, cross-references to other works on related topics. A user of Open Library can see the cover of the book and, in some cases, search the contents.
The project is still very much under development. Force of habit makes us speak of the pre-optimal version of a site as its “beta” version. With Open Library, given its ambitions, chances are that “gamma” is probably more accurate.
But here’s an encouraging sign: The basic framework is being established by my appallingly accomplished young friend Aaron Swartz — who, at the age of 21, has already helped create RSS (that was in his early teens), published a couple of computer-science papers, and developed Infogami, a system enabling his digitally clueless elders to set up their own websites.He studied sociology as an undergraduate at Stanford University, presumably in his spare time. Aaron has written an essay called “How to Be More Productive” that can be recommended on the grounds that the author does know something on the subject.
I recently sent him a number of questions about the project. Some of his answers were, it seems, typed into a mobile phone. A transcript of the e-mail interview follows.
Q: How is Open Library funded? Are you working on it full time? And how many people are involved in the project?
A: It’s currently being funded by the Internet Archive, with the help of some state and federal library grants. We have some volunteers, but also about 5 people working full-time (a couple programmers, a designer, and a product manager).
Q: What will Open Library offer that you can’t already find online? What was missing from the existing array of online book-data resources – WorldCat, Google Books, Amazon, etc. – that makes it worthwhile to create a new one?
A: As the kind of person who reads Intellectual Affairs (an academophile?), I’m often looking for interesting books on an obscure topic. I can look on Amazon, but its coverage of out-of-print books is pretty poor. (In my experience, most of the really interesting books are out of print.) I can search an academic library or WorldCat, but the quality of data is pretty weak — you can get basic bibliographic info, but no reviews and weak search and a painful interface and most require a subscription.
So I wanted to build a site where one could more easily find those hidden great books, by combining all the data we have on them in one place and letting the people who love them go back and annotate and highlight them.
Q: With any Web 2.0 project, the question of safeguards comes up. Are any built in? I mean, to keep people from going through and systematically attributing the complete works of Shakespeare to Francis Bacon, or whatever.
A: Our plan is to leave it open and then lock things down as need be. Right now we’re watching all the edits so that we can revert things if people do that and we hope to let users watch their favorite pages and so on. That kind of thing has worked pretty well for Wikipedia and we’re hoping it will work similarly here. But we’re willing to try other things if it doesn’t.
Q: Some serious questions have come up about the shrinking depth of subject cataloging from the book records issued by the Library of Congress. That might sound like a problem just for librarians, but it isn’t. It’s basic infrastructure for intellectual life, pretty much. To anyone doing research, having books adequately cataloged by subject offers tremendous benefits. Will Open Library be taking up the slack on this?
A: Yes, it’s amazing the amount of politics around Library of Congress Subject Headings. (And I had no idea that they were thinking about abandoning them — that’s incredible; thanks for the pointer.) Lots of people have different opinions over how things should be characterized and cataloged and which things were important. When we first started the project, librarians kept arguing about which system we should use.
We decided early on to not be partisan but to be a clearinghouse for all the cataloging data we could get our hands on. So in areas where the Library of Congress doesn’t do the cataloging, or doesn’t do the cataloging to your taste, we’ll try to make that data available.
We’re hoping we’ll be able to pull series data from the specialized libraries so that you can view them on our web site. We’ll also republish them so that other libraries can import them from us.
Q: Will you be asking permission before incorporating data from, say, an academic library’s online catalog?
A: Yes, we’re talking to the academic libraries to make deals on how to import their catalogs. Our main pitch so far has been that this is an opportunity to contribute to a public commons — contribute your library catalog to the public, and not only make it available to interested library users everywhere, but also contribute to a system where you’ll get back everyone else’s work, just like libraries have done with RLG.
Q: Open Library will also serve as a central directory for books available in digital formats. Some such material is freely available to everyone (e.g., the Project Guttenberg editions). And some of it has more limited access. Will you link to the latter? And do you have a policy or opinion about dealing with Google Books?
A: Yes, we hope to link to everything interesting — free or not, although obviously we prefer free and can do more with it. We’re planning to link to Google Books and we’re hoping we can get copies of their public domain books.
Q: Do you have a long-term plan to make digitizing books part of the Open Library project? Or does it make more sense to leave that kind of initiative to others?
A: The Internet Archive has a big book digitization project, with scanning centers at the University of California, the University of Illinois Urbana-Champaign, the Brooklyn Public Library, Library of Congress, and others. We hope Open Library can raise money to increase their scanning.
Q: I have a question about Open Library to pass along from Matthew Battles, a senior editor of scholarly books at the Museum of Fine Arts in Boston and the author of Library: An Unquiet History (Norton, 2003). It’s about metadata – an important issue that I will admit just barely understanding. So before going on to the question itself, would mind giving a crash course on the topic?
A: This is a bit tricky. Metadata generally is stuff like cataloging data. It’s what lets you find books when you want to do a search more complicated than “which books have these words in them?” (or when you don’t have the full text of every book made available for searching, as seems to be the case for the foreseeable future). Whenever you look for everything by a particular author or in a particular subject, you’re using metadata.
It becomes useful in two cases: When you don’t have all the data and when you want to ask more interesting questions. If you just want to find a particular page, searching by full text is usually enough. But if you want to do something more interesting — like graph an author’s output by year, or see which country has produced the most romance novels, or find out which genre has the most growth in the past six months — you need metadata. Here’s a dorky metaphor for you: data is literary criticism and metadata is Franco Moretti.
Q: OK, now on to the question from Matthew Battles: “I wonder how much a resource like Open Library can make itself open to metadata mashups—giving developers openings through which they might take metadata, bibliographical info, and text and organize it in undreamt-of ways... and how robust and open will the system become not only with respect to image formats, but metadata concepts? In less convoluted terms, will it be possible for Open Library to ‘accrete’ tags and other metadata, to layer cross-references and hyperlinks—for its metadats to ‘learn’ from users?”
A: Yes, opening up our data to others is a key part of the plan. We will have full database dumps and XML and other formats as export. A big hope of mine is that by making all of this data available in a centralized place, we’ll make it vastly easier to build applications around books. Want to build a site that lets people find other people who have the same books who live near them? No longer do you have to build a whole bunch of infrastructure to locate and refer to books — instead, you just need to build the part relevant to your application. (Like the geolocation stuff.)
As for “accreting” tags, we spent a lot of time building an advanced new type of database for this project so that we could load in data of all sorts from numerous sources. So if someone has been keeping track of, say, the fonts used in every book, we can import all that data and store it with the other stuff we have. Similarly for any user-created data.
Q: So what is your sense of the master plan for this project? The future course of development?
A: We’re taking it step by step. Our first goal is to get catalog information for every book — a big project in itself. We’ve been calling all the publishers and national libraries and research libraries to get copies of their catalogs (we’d love readers’ help with this, by the way!) and then we’re working on algorithms
to integrate all that data into one coherent site.
After that, we want to work on improving the book-reading interface for books that we have scans of. We’re hoping to make the scanned text into a wiki as well, so that people can fix typos and correct errors in our processing (OCR) of the scan. We’d also like to think about new ways that people can work with a book’s full text online and what the proper interface for that should be. And, of course, we want to think about ways we can get more books scanned. One idea is a “Scan this book” button on every out-of-copyright book, where for $50 to $100, we’ll page the book from a library, deliver it to the scanners, and then email you a PDF of the book and put the full text online, with a little nameplate thanking you for funding it.
And then, of course, we want to expand beyond just books. We’re eager to do the same thing with journal articles: one open site where we list every journal article, all the journal articles by a particular author, sorts by subject and topic, the abstracts and references, and links to places where you can find a full text copy. I just got back from a science conference and the folks I talked to there loved the idea. And after that there’s music and movies, naturally.
Q: One last thing... People should be using index-card catalogs to find print-and-ink books in brick-and-mortar libraries! This is just one more effort to turn the US into a nation of screen potatoes! Admit it — you just hate books, don’t you? (I say this tongue in cheek, but there are bound to be people muttering it in all earnestness.)
A: You found me out: I love books. Every time I walk into a library, my face just lights up. There’s something so grand and inspiring about collecting all those books just to share them with people. And I visit them constantly; I always have a dozen books checked out at anyone time, with a couple new ones each week. I’m sure that’s nothing for most IHE readers but to my friends in the computer industry, it’s like I’m some kind of bizarre alien. I do this because in a world of Googles, Amazons, and Wikipedias, all encouraging people with computers to stay at home and talk to their screens, I want to have at least one countervailing force encouraging people to go find dusty library books off of disused shelves.
This is the kind of project for which a major revival of monasticism would really come in handy.
Adam Kotsko, Graduate Student at Chicago Theological Seminary, at 11:20 am EDT on August 8, 2007
Good idea about monasticism. In fact, in the brilliant Babylon 5 Series, the monks on the future space station do just that — catalogue all data. Life imitating art?
D Scott, at 1:05 pm EDT on August 8, 2007
While I defer to Scott reflexively in matters intellectual, I think I might have a little more experience with the software development world than he does. “Beta” software refers to something which is basically (or barely) functional, but needs to be tested for bugs, new features, etc and debugged. I’ve never really heard anyone talk about “gamma” software, because the step after Beta is usually public release. The step before beta is “Alpha” and that’s the really raw, “will it compile and run” stage of programming, when only the programmer is involved in working on it.
I think we might need a new nomenclature for these kinds of web 2.0 projects, where the beta version is released and is in constant flux as new contributions, etc., come in. Alternately, we need to distinguish between “beta” software and “beta” datasets: it sounds like the software side of this project is fairly well set, so it’s not beta anymore, but the dataset is still — relatively, given the ambition of the project — empty.
Jonathan Dresner, at 3:15 pm EDT on August 8, 2007
The obvious solution is to designate Web 2.0 projects that are constantly changing as “delta” releases.
Adam Kotsko, Graduate Student at Chicago Theological Seminary, at 11:20 am EDT on August 9, 2007
Are there already machines avaliable which total automatically scan the pages of the books? AOn the Open Library Website I read that a person has to manually turn th pages which costs a lot of time. Wouldn’t it be easier to automate this process?
Michael, at 5:30 am EDT on August 13, 2007
Search Google with “automatic book scanners” and you’ll find that some amazingly efficient machines have been developed to turn pages without damaging the books while photographing opposing pages with hi-res cameras. (Of course, if it’s vellum you need to scan, there’s nothing better than the natural oil on your fingers to keep the sheets supple :)
Bill, at 9:55 am EDT on August 13, 2007
Yes, book scanners that turn the pages themselves are now available. I saw a demo of one at a library conference in Ottawa this spring. The machine basically consisted of a book cradle that held the book open, but not quite flat, a robot arm that turned the pages and two digital cameras that took pictures of the pages. There are also other models from other companies that use different methods.
John Beatty, at 9:55 am EDT on August 13, 2007
I’ve seen demos for this scanner, it turns the pages for you: http://www.atiz.com/bookdrive.php
I don’t think you can use them for fragile books, however, and for this project many books may be older and more fragile, since they are out of print. This company also makes a scanner for these types of books though: http://www.diy.atiz.com/bookdrive_diy.php
val forrestal, Information Services Librarian at Stevens Institute of Technology, at 2:40 pm EDT on August 13, 2007
or search for jobs directly.
Utah State University (USU) invites nominations and applications for a significant academic leadership position as the Dean ... see job
TITLE:Reference Librarian/Electronic Services POSITION NUMBER: 19303 (Faculty) LOCATION: College of Law (Orlando) SALARY: ... see job
Posting Description: Library Technician I — Access Services, Document Delivery/Materials Processing Team The ... see job
The James E. Walker Library of Middle Tennessee State University announces the availability of a thesis/dissertation ... see job
The University of Miami is committed to educating and nurturing students, creating knowledge, and providing service to our ... see job
As part of WIU Libraries’ transformation of library services the Virtual Services Librarian explores and evaluates emerging ... see job
Responsibilities will include working as a reference/circulation library assistand and will include evening and weekend ... see job
The University of La Verne has an opening for a Senior Library Assistant at the College of Law Library in Ontario, CA. The ... see job
University of California, Berkeley Borderlands Project Archivist The Bancroft Library see job
Recognized as one of the nation’s “Best Value” institutions and one of the “Best in the Midwest Colleges” by the ... see job