ldd libcongress.so

Posted 29 Nov 2000 at 09:47 UTC by crackmonkey Share This

When the Library of Congress began to use computers in the 1960s, it devised the LC MARC format. This was a simple record specification that allows for portable catalog entries even today. The MARC 21 record format, coupled with the z39.50 transport protocol, is used by libraries all over the US to swap bibliographic data. But what about the Free Software world? What with the proliferation of free POSIX systems that run on cheap hardware, it would seem to be a no-brainer for small libraries and special collections. What's kept this from happening?

For those of you who groaned when you read the title, I assure you that I did that solely so I wouldn't be tempted to make any more bad library puns in the body of this article.

Seth Schoen has been scanning an awful lot of books with his snazzy CueCat barcode reader. His scripts make search requests via the Library of Congress Web Catalog, storing the result in plain text files. Although novel, this is hardly the state of the cataloging art.

Libraries have been using computers for cataloging since long before it was generally practical to do so. A few widely-used standards have been published, including the MARC 21 record format and the z39.50 searching and record retrieval protocol.

The great news is that there exists a Free (MIT license) z39.50 library and toolset called YAZ. It's planned that the YAZ library will be integrated into Mozilla's RDF handling, and it has already been used to create an Apache module. I have lost the link, but I recently stumbled across a z39.50 to X.500 gateway. Although experimental, it prompted me to think how useful it would be to create a similar gateway using LDAP.

What I envision is a system for small collections that uses a local LDAP server to maintain a database of MARC records. When the librarian scans in a book, it is looked up there first. If the record is found, it appears on the screen in some form. If the record is not found, it is fetched from the Library of Congress via z39.50 and then displayed with an option to import the record into the local database.

There are a few steps that need to be completed for this to happen:

  • A suitable openldap schema for MARC records. It should also allow searches to be specific to a particular library, or for to span a number of collections (i.e. "Give me all books written by Martin Gardner in libraries in Alameda County"). The LDAP forwarding and slurping capabilities make the details of this fairly easy, although there are messy details involved in handling things like inter-library loan.
  • Information needs to be stored on actual physical copies of library resources. One needs to be able to find out that (for example) the library has two copies of The Myrkin Papers: one is on the stacks, and the other is checked out. Perhaps it could even tell you that the checked out copy is in poor condition, and will need to be re-bound when it is returned.
  • Coming up with a user interface for the system. Web pages are fine for undergraduate researchers, but a librarian will need something with a bit more power. Many are currently used to using vt220s with attached barcode readers and printers attached, but there will probably need to be some GTK or Qt apps for catalog maintenance.
  • Librarians have to deal with internationalization regularly, even with small collections. Being able to enter a record for a monograph with a Farsi title but a Greek author is key.
  • Borrower records. Libraries need to keep track of people, as well.
There are more, but this is what comes to the front of my mind. The only thing I will add to this is that most catalogs fail to store the two most useful pieces of information about a library book: the books immediately to the left and right of it, and the color and thickness of the spine. It may not help a computer find a book, but it sure helps a human.

I'm sure that we have a number of people here who have done a lot of database and directory development over the years. Perhaps I'm wrong, and there are already a large number of FreeBSD and Debian boxes hidden away in major libraries around the world, running the show without stealing it. After all, as Cynbe Ru Taren once said, "In software as elsewhere, good engineering is whatever gets the job done without calling attention to itself."

z39.50 to X.500, posted 29 Nov 2000 at 23:23 UTC by samth » (Journeyer)

You were probably thinking of http://archive.dstc.edu.au/RDU/ZXG/ .

a few more details, posted 30 Nov 2000 at 07:42 UTC by dchud » (Journeyer)

Lots to say about this but basically, you're right: it's a simple failing of the library community not to have produced any decent packages for what you want. And no, there aren't a bunch of mystery boxen running openldap. That said, there are lots of pieces in place and it's important to keep the issues straight.

(Btw don't miss the oss4lib projects page. Specific highlights are Koha, a public library-scaled catalog, osdls, which has made a nifty start toward freeing up some useful cataloging tools, and pybliographer, a solid reference management tool in python that could be extended to pull values out of marc records usefully. more are mentioned below)

In rough order of your mention:

  • afaik all the docs on moz+z39.50+rdf etc are at least a year old it might be possible that the project's kind of stale. Haven't checked the code, though... hopefully Sebastian or others can correct my assumption.

  • indexdata released a YAZ-based apache module as ZAP, which is likely the same thing you pointed to and a better, newer link than the usgs one (though the domain isn't resolving right now for some reason). I tested the .rpm distro and it hammered on the Yale catalog right away. Also they linked YAZ into PHP. Great contributions, these.

  • You could transform MARC into an ldap schema, but it would likely be easier to use MARC.pm as a translator and store the stuff in an xml-friendly engine. I've not used it but all reports are that MARC.pm can handle loads of data nicely also.

  • The US Library of Congress is not the be-all and end-all of collections. It has millions of volumes, but there are lots and lots it does not have. You might want to hit a variety of Z39.50 servers; they'll all give you different results.

  • If you're keen on writing a new schema for MARC, it might be easier to do in RDF, and somebody's probably already done it (sorry, no links for ya there :( ). In any case the RDF libraries (redfoot, redland, 4suite, etc.) might require less coding than ldap would, although time savings from being able to ldappily slurp up records might make this moot.

  • Your point about using ldap to distinguish regional collections is right on; that would be great but we'd need the registry of libraries first. Also you might be interested in reading about some of the international work being done on identifying localized name forms. If they push their model of national/regional name authority control into an ldap world it would be remarkably useful to all of us. Actually you might want to click up to leaf through all the pages from the recent Bibliographic Control conference at LC. All of the talks will have video streams up soon. Don't miss Priscilla Caplan's excellent overview of when a new descriptive metadata schema works and what the immediate issues are trending to be for all the nascent next-generation schemas. This was an excellent conference filled with true visionaries in the history, present, and future of MARC, AACR2, Z39.50, etc (I, maybe the youngest and most unproven soul in the room, was lucky enough to attend probably because of jake). It will be interesting to see what will come of the recommended output from the group to LC.

  • Don't mix up the issues of keeping a bunch of metadata records and keeping a library catalog going. A library's record handling starts with acquisitions, i.e. purchasing, and although there are EDI sets for speeding those bits up none of our major vendors use free software to do this afaik. Thus when an item is received, any descriptive layers, whether they come from LC or OCLC or RLIN or are created anew, are tied to a purchasing record. Then circulation modules have to tie back to these so we know what to charge users for lost items and so forth. Yeah, yeah, all this is obviously sorta like any now-standard ecommerce framework but if you really want to help build something that big there's lots of pieces to it (but don't let that stop you! :). If all you or Seth really want, though, is a good index of your own personal collection, or that of a group of friends, it gets easier really fast. :) Fyi also there's a NISO Circulation protocol under development and an ISO ILL protocol that could be trimmed by 75% and re-encoded in xml to simplify parsing.

  • As for "failing to store what's on the right and left of a book" there's more there than you think. Catalogers use a 100-year-old system named for Cutter (who started it, natch) to assign an ordered but arbitrary number to identify shelf position based on using alphabetic order of a few characters from certain keywords (usually title I think). This is essentially an ugly human-powered set-value-table hash for uniquely locating items between others -- imagine needing /etc/rc.d/rc4.d/S99X288357454637firewall and you'll get the point. Most library catalogs where this system is used are searchable by a combination of the subject classification (general location) and Cutter code (shelf position). And a handful of libraries track things like color, too.

In general things like LDAP might be best used to better manage all the personal and corporate name authority pieces but maybe not much more. Why else are all the ecommerce bigwigs building a corporate id registry? It's pretty much the same need. Separating out that data, and making it free would make the catalog records easier to assemble and more accessible to all. We could do the same for book metadata. It's already being done for some music (freedb) and films (imdb) and it's what we're doing for journals (jake) even though we've got a lot of work left to prove our point.

Ok, so when I say uppity librarian I really mean it. :) But this kind of discussion comes up occasionally (1 2 3 4) and it's important imho to clarify perceptions. Hope this hasn't been an over-overbearing a response... it's great to see folks asking these questions. :p

LDAP makes a really crappy database... , posted 30 Nov 2000 at 19:57 UTC by bbense » (Journeyer)

- LDAP is good for many things, being a database isn't one of them. The best uses of ldap treat it as a write-once read-many kind of device. If there is any potential for contention in write access to the data, you are screwed with ldap. There is no concept of locking in the protocol.

- LDAP makes a pretty good searching interface into a database, but in and of itself it's a poor choice as the database.

- Booker C. Bense

It's a big job..., posted 1 Dec 2000 at 06:24 UTC by eskil » (Master)

After having worked in this field for almost five years, I can only say it's a big job. Implementing basic Z39.50 capabilities is pretty simple, but full v3 capabilities with all the extended services etc, then we're talking manyears.

But a lot of this already exists in the public domain. You have the protocol stack (eg. YAZ which you mention, but also the DBVOSI II, loads of publicly available BER systems (java and C) and whatnot).

Other more complete systems developed under the EU projects SOCKER, ONE and ONE2. The software developed under these projects should be partially public available, but even more important, the Z39.50 profiles for interoperability etc.

Of course, there is little chance of finding a huge honking tarball with the code, but with enough hassle etc, you should be able to get some of the source codes. Some of the projects even included Java Z39.50 protocol stacks (and partial ASN.1 compilers) and clients.

Since ONE2 should still be in progress, I know it includes a big C++ api with modular backends for db systems, and support for some of the evil Z39.50 features like the extended services, AccessRequests and whatnot. So if the EU projects still have to disclose the source outside the partnership, search for it.

Otherwise, I can recommend going to a ZIG meeting next time there is one in you vincity, you'll find a lot of the zig people are very openminded to opensource stuff etc.

uh, south park.... gotta go...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page