Library Groupware for Bibliographic Lifecycle Management

Author: Daniel Chudnov
Contact: daniel dot chudnov at yale dot edu
Date: 2004-01-28
Web url:http://curtis.med.yale.edu/dchud/writings/blm.html
Copyright: 2004 by Daniel Chudnov
License:
Creative Commons License
This work is licensed under a Creative Commons License.

Contents

Summary

This informal paper proposes that libraries could merge the functions of weblogging, reference management, and link resolution into a new library groupware infrastructure, helping users to better manage the entire lifecycle of the bibliographic research process. Several scenarios explore how such an application suite might help library users by integrating their bibliographic research more closely with communication -- scholarly and otherwise, from private annotation to public discussion. A discussion of related architectural issues suggests a new model of "link routing" to augment "link resolution," and describes how link routing systems could enable library visitors to become users of our groupware services as much as they already are users of the information resources we procure.

Web Trend Conflation

There are three increasingly popular web functions we can tie together in our libraries; our users will soon demand as much, and they are growing increasingly upset to discover we haven't gotten around to something like it yet. One is the realm of "link resolvers," which short-cut access from one web resource to related resources or library services. Another is the set of tools we call "bibliographic reference managers," which enable users to manage records about information resources they might need to reference again. Last is the vast array of weblogging functions being rapidly developed by individuals all over the web, which let anyone write whatever they want about anything they like. More specifically, we can integrate link resolution and reference management with the functions of weblogs that let anyone connect what they have to say with what anyone else says.

How are these tools and the functions they support related? Consider, for instance, whether following a cited reference link to link resolver is the same kind of action as following a link on someone's weblog. Similarly, are citing a work in a written paper and citing a work on a weblog the same action, or are they different somehow? In any case, these are all tasks library users perform over and over again as they manage the bibliographic lifecycle of their ongoing work.

In a way, we can consider the three functional areas of linking, reference management, and weblogging to represent, in a fluid world where users move regularly between informal discussion and scholarly/research domains, service points on a single continuum of information gathering, study, and creation. Following a reference on a weblog or a research article are each similar steps in exploring threads of related ideas. Capturing a reference in your own weblog or reference library indicates that the citation somehow relates to your own thought process. Citing a reference publicly more closely associates your thinking with others'.

None of these ideas are new. But a suite of interconnected library services which marry these functions would be.

Bibliographic reference managers

A few weeks ago I started out to do some new research in preparation for a paper I'd like to publish. Nothing fancy, just the normal cycle: search in a few databases, look up whatever results I can online (using our handy local link resolver, which uses SFX), divert off from a few references there, and export the choice references into EndNote along the way. When my to-read list of online papers shortens, I print a sorted-by-journal-title list of paper-only articles to go hunt down in our library. I find roughly half of these, take some notes on paper, photocopy a few others, put them in their own folder, and so on.

I've found early papers about the area of the work I'm doing I hadn't seen before, and additional authors to look for in future searches, and have a clearer understanding of what ground I need to cover to get editors and readers to recognize my work as a worthwhile contribution. Tidy up the work, address the right issues in the paper, and there you go, right? Well, actually, no... I'm left with a mess of printouts, scribbled notes, pdf files with random names, and an EndNote library that doesn't quite tie it all together. While there might be many traditional approaches to imposing order over this process, it seems it would all be easier if there were just an extra link on the link resolver screen that let me "log this reference". After years of automating isolated steps in the doing-your-homework part of the scholarly communications cycle, I want my library to help automate my entire record-keeping processes.

That's not to say that progress hasn't been made. EndNote and its peers (including the stalwart (La)TeX+BibTeX duo, and clients like Pybliographer that connect the two with higher-level tools like LyX and OpenOffice) are critical in simplifying manuscript preparation. That they search databases directly has long been a great help, too. New products like RefWorks, WriteNote, Quosa , and MetaLib appear to integrate two or more steps in this process, shortening the workflow in one way or another, and appear to offer real benefits.

Even so, I heard Lee Iverson of UBC say something really interesting at his talk at Access 2003 on "Digital Library Research Agendas." He spoke of how libraries need to be "publishers of groupware tools" that manage the workflow of this entire process. And that we would do well to stop thinking solely in terms of shortening one step or combining two steps, and to reconsider the entire lifecycle of the bibliographic research process. And that library-goers should become as much users of our groupware services as they already are of the resources we procure. If I haven't mangled his intent entirely, the Scenarios section below will illustrate one interpretation of Iverson's vision, wherein tools like reference management software move from being ancillary training opportunities to becoming core library services.

Weblogs

If you're not already familiar with weblogs, browse through the Libdex list of library weblogs to get a flavor for what they're about. Scrolling short entries describing events from a blogger's day, musings on any topic, real journalism, opinions all over the map, and plenty of links to stuff bloggers find interesting elsewhere on the net are the meat of what goes into weblogs. Interesting things have been happening in the weblog world, though, beyond the ordinary tedium (and, often enough, extraordinary salience) of a million monkeys typing away.

The first trend to note is the growth of "trackback". As described in "A Beginner's Guide to TrackBack", authored by the progenitors of trackback, Mena and Ben Trott (authors, also, of the popular Moveable Type weblog manager):

'In a nutshell, TrackBack was designed to provide a method of notification between websites: it is a method of person A saying to person B, "This is something you may be interested in." To do that, person A sends a TrackBack ping to person B.'

Sounds mundane enough. But there's more going on than meets the eye:

'...the TrackBack ping has created an explicit reference between my site and yours. These references can be utilized to build a diagram of the distributed conversation. Say, for example, that another weblogger posted her thoughts on what I wrote, and sent me a TrackBack ping. The conversation could then be traced from your original post, to my post, then to her post. This threaded conversation can be automatically mapped out using the TrackBack metadata.'

That probably sounds familiar to anyone who has ever followed a cited reference at the end of a scholarly article: TrackBack creates references. Innovation on this front is happening within scholarly journals also, such as BMJ online, and its peer journals published by Highwire Press, which provide visual maps of citation patterns as in this BMJ citation map. At an even simpler level, TrackBack looks much like the same kinds of citation practices appearing in scholarly and other publishing contexts for generations.

What can we glean from this parallel? Like scholars, webloggers feel a natural urge to connect what they have to say directly to the words of others. This is not surprising, and not only because some webloggers are likely also scholars. Despite broad awareness of the old "open your mouth and remove all doubt" adage, folks -- fools, included -- will find ways to thread their conversations together. That the blogosphere has defined techniques for accomplishing this further reminds us that people want to have their say, and that they are willing to do it publicly. And that they will create ways to bolster connections forward (by leaving TrackBack ping URLs) for others as readily as backward (by citing preceeding sources).

The corollary to wanting to have your say is the need to know what others are saying. And again we have ample evidence that the non-scholarly world is finding new ways to track fashions in current awareness. Many in the library world have caught on to the usefulness of RSS, and several have written thoughtfully about it, notably Steven Cohen, among others. This means of syndicating headline content from one website into another is even the central focus of a site devoted to library-related site content syndication, LISFeeds. In 2003 RSS newsfeeds landed on more and more library public websites, too, and in ILS software like Koha, which allows RSS-based syndication of new book lists and the like.

Users can also harvest RSS feeds to create their own "daily newspaper" of the weblog world subset they care about with new applications called "aggregators". Aggregators (examples of which include Bloglines, Feedreader, and Rawdog) let anyone multiplex several weblogs into their own clipping service-like weblog, and even mark which items they've read, and add links into their own weblogs.

RSS feeds get even more interesting when post-processing indexing and analysis is performed on aggregated streams. The best example is probably Blogdex, which indexes outward link patterns from thousands of weblogs to determine which web pages -- anywhere on the web -- make up the daily zeitgeist. If you want to know what people the world over are reading These Days, Blogdex is the fastest way.

One step beyond Blogdex is del.icio.us. Delicious lets anyone sign up and quickly add links they find interesting, and arbitrary tags describing those links, onto a rolling communal linklog (a weblog of just links to other places). On one hand, this is faster than Blogdex, which by definition requires a delay of pattern watching before it declares a site or page blogdexworthy. Delicious lets you watch over people's shoulders Right Now: users have bookmarklets which automatically add a link at del.icio.us to any page they are currently reading, even without any extra work like filling out a web form. So the links you see at the top of del.icio.us right now were probably added to it at about the time you were reading, here, about aggregators. On the other hand, Blogdex is all about post-processing, though it starts at a disadvantage to Delicious in that it scrapes urls from other sites. Delicious perhaps allows easier up-to-the-minute dynamic analysis, as evidenced by the recently-added visualization bit of shading popular urls, along with running totals counting how many times each particular link has been added by different users.

And there are variations on the linklogging theme. Witness Furl, a shared linklog with a slightly more user-friendly front-end promising also to help archive links. And then visit biologging, which directly connects weblogging to the Pubmed database by allowing users of HubMed, a custom Pubmed interface, to blog any reference that interests them to a shared space.

What do these developments add up to? An opportunity, for one. This world outside of libraries has created new protocols and a public service model that might be useful inside of libraries, so it is to our advantage to consider ways to reuse their work directly. That many smart hackers are approaching similar problems with similar solutions is often a good sign that they've hit upon a good match.

We might also read into what we see happening outside of our libraries that increasingly people don't necessarily mind sharing information about what they are reading, and what they have to say about it, even in the least formal context. To the contrary, they will go out of their way to define and use methods for sharing this information seamlessly, and instantly. And, in so doing, they will ensure that the practice of citing references is not merely a scholary exercise. As of 2004, it's just another information commodity market.

In the near future, we'll likely see smaller domain-specific blogdexes and deliciouses. And, with models and software developing for how to do these things well, there's an opportunity to build layers on top of these. Add to the mix that a few major institutions are bringing up public weblog services for their community members (major as in Harvard Law School and MIT). We might do well do consider a world in which these largely informal, non-scholarly communications blur seamlessly with scholarly research activities.

Scenarios

Let's imagine that these kinds of services are integrated into a library groupware environment. Following are examples of how some common types of users might take advantage of what such an environment could offer. These examples are, admittedly, shamefully biased toward academia, since that's where I work. That said, we are, after all, considering merging scholarly and popular ideas back into the most pragmatic realm within academia (its libraries, that is!). In any case, these examples are intended to flesh out where the interactions between systems might occur, by describing very specific functions and why certain architectural considerations, sketched out further in the next section, need to be on the table.

The undergraduate

Our example undergraduate student is a busy, busy soul. Aside from the classload of a full-time student, he organizes for a student political group, and reviews film for the campus paper. Here's a whirlwind tour of his information needs: For his classes, he writes papers for which he makes heavy use of online aggregate fulltext sources like Academic Universe, and he has to prepare an honors thesis of at least fifty pages which will be somewhat heavy with references. For another class in media studies, he has a group project where he and his project mates survey economic news reporting in the European Union by tracking websites of major news agencies in several European nations for two months. His campus political group has a website with a weblog for both local information and a tie-in to the national campaign website of their favorite candidate. Like I said, a busy soul.

Let's consider some things he might want to do with our not-yet-built infrastructure:

  • Add references and fulltext from online sources like Academic Universe to his web-based reference library.
  • Add EU news links, from one of several sites he's assigned to track, to a weblog shared by members of his group project, commenting on some of them, but not all, and grouping most entries by a simple categorization system they've devised (e.g. "finance," "constitution," "law," etc.). This group log is also available to other students in their course.
  • Add news stories to his student political group's weblog, and comment on links his group peers have sent.
  • Add links to some of the same references and links he adds to these other sites on his own weblog.

The first task isn't too innovative as of 2004; we already have both free/open source (Citation Manager) and proprietary (several) software products to accomplish that. The second and third sound more like what many weblog backends and Delicious already do, but with an extra measure of integration with his courses, and some group-defined access controls, say, for the political group and course project weblog. If you blur the lines between his reference library and the weblogging activities, and imagine he is using library groupware to do all these things, it gets more interesting.

The assistant professor

Our assistant professor is perhaps more focused than our undergraduate friend, but is nonetheless very busy also. Her obligations include teaching four courses a year, working as an editor for a new journal and a reviewer for another, and moving forward on her own research, in polymer science. She also manages a handful of research and staff assistants in a lab she shares with other faculty members.

She spends a lot of her online time doing the following:

  • Managing the online syllabus for her classes, adding very current articles from her specialization, a fairly new branch of science. Sometimes she adds references to both her reference library and her course reading list at the same time.
  • Checking references on papers she's reviewing, sending these through a page on the library website which turns out links to the referenced papers themselves, which she can then scan for relevance and potential value to her own work.
  • Tracking links to popular news stories, grant information, and patents related to her field, all posted to a group weblog shared among members of her lab and peer labs.

It isn't hard to imagine that activities like the last, which used to happen on mailing lists, will move more to the web as email becomes generally less efficient for various reasons -- notably spam, among others. The weblog might be eagerly maintained by ambitious graduate students at the various labs. The first action happens often already, albeit only when professors and their assistants log in to dedicated courseware services, and use those tools' interfaces to augment a reading list. It would save our professor much time to just be able to use a bookmarklet in her web browser to automatically add such links, and for the course server to read in links she sends and allow her to quickly place the linked article beside a certain week's other readings. As for the second example, current link resolvers can already support some of that functionality, but how many libraries have extended resolvers to allow this kind of batch action?

The librarian

Our librarian friend is at no loss for work, either. He does reference consulting in the graduate library of his state-university employer, he manages several web-based subject guides, and he is on a campus-wide e-resource collection development committee. A few tools we could assemble would help several areas of his work:

  • When working with a particular researcher to refine her complex searches, the researcher gives the librarian web access to her otherwise private reference libraries. This way, as the consultation develops over email, and the librarian tries various changes to the search strategy, he can compare potential search results to the articles the researcher has already collected. The researcher doesn't have to share what she's working on with anyone else.
  • One-click bookmarklets for adding new sites to the various subject guides he maintains, and which also tell the library proxy server to collect hit statistics for those new sites.
  • A shared weblog for queueing and discussing new resources his committee is considering.
  • A secure way to troll through through the aggregated additions to user reference libraries that lets him see what resources and articles are popular during a given time period, but without sacrificing the privacy of users choosing not to share their information.

These activities, all typical of librarians' online work, occur in a range of public-to-private and individual-to-aggregate contexts. The first, helping a single user, needs a private access control toggle. The second is a private hook for adding to a public page. The third is a group space, and the last implies a private, scrubbed aggregation context, which would also have familiar useful public functions, such as a "people who are reading this are also reading this," with a terrific amount of granularity of algorithmic refinement (imagine "people reading this subject guide have also read these articles," or, even better, a librarian's holy grail twist on these algorithms, as perfected by Amazon.com: "people reading this article also read this subject guide").

Architectural Considerations

How could we make these scenarios possible? Or, rather, easily possible? After all, many of these individual activities already occur, but not easily. They are supported by a wide range of incompatible software suites, only some from within the scope of library systems. If it makes sense to integrate these kinds of activities into groupware tools managed by libraries for the benefit of our users, we would need to think deeply about how to architect systems to add these facilities to our current services in a manageable way.

Implementation choices

How would we go about doing all of this? There seem to be two clear implementation paths, both of which involve enhancing existing systems. The first, and most obvious, is to bolt these services onto existing resolvers. Their backend databases are already well-tuned. It seems likely that expanding the rule engines they have for link resolution to add new hooks in the different request phases and more flexible routing/bouncing/chaining shouldn't be too complicated. Layering in functions into a user/group management toolkit shouldn't be an obstacle either, especially if we leverage recent work such as the Open Knowledge Initiative specifications for such systems.

A second implementation path involves integration with MyLibrary and UPortal-type systems. Interesting questions would include how close a binding might be necessary between personalized link routers and portals. Should all the personalization happen in a portal, and the routers be just arbitrary rule engines for service resolution? Should the portals just be tuned to be well-behaved sources and targets, but the personal routing functions live in the routers? It's easy to imagine that different institutions, with their widely varied I.T. administration models, would want different pieces to live under different management branches.

This need for design flexibility dovetails nicely with another reason to value solutions which enforce a clean separation of services. Ideally, it should be easy to integrate library groupware with external non-library toolkits (by "non-library" I mean "blogosphere and otherwise general internet community"). After all, as highlighted earlier in this article, many of these technical innovations are occuring nowhere near libraries. Honoring those distinctions by placing the "integrates well with others" design principle high on the agenda, perhaps we can find new ways to feed some of our own system innovations back to the general internet community, from which we have gained so much over the years.

Metadata requirements

The best news about all of this is that it seems the many smart folks behind the new OpenURL specifications thought through many potential service models, surely to include the kinds discussed here. In that context, it is again helpful to remember the for-libraries, not-for-libraries systems dichotomy, and how the services discussed here might span those boundaries. It probably wouldn't be difficult to find useful points of integration for nascent from-libraries specifications like OpenURL and nascent not-from-libraries specifications like the Atom API. If there isn't a clear way to do that, then it shouldn't be too hard to demonstrate something at a lower code-level, by offering up plugins for one or two major weblog toolkits to allow bibliographic entries and OpenURL linking.

The question of which bibliographic datatypes to support is, as ever, a thorny one. Ideally many types, from binary MARC to MODS XML, from arbitrary OWL instances in RDF triples to simple Dublin Core, would be supportable. To do that, we'd need to have a system implementing something like OCLC's metadata switch plugged in near the router, available for schema transforms, or ad hoc record enrichment, at any phase in the routing process. Layer on top of the metadata management layer an export/import/update system which can speak WebDAV, and the possibilities for integrating any user's data with other systems open up more widely.

A brief note on security

If you spend any time reading personal weblogs, you can see the shifting balance between information individuals choose to make public and what they keep private. Perhaps the trend to expose one's life more vividly and in excruciating, machine-processable detail is simply a corollary reaction to our growing awareness that private corporate and government structures are already aggregating and exchanging more data about us as individuals and groups than ever before. This backdrop, however, is no excuse to take privacy and system security lightly.

Without diving into the mess of shifting U.S. laws surrounding a library's responsibilities to its users and to governmental authorities, suffice it to say that great care would be necessary when wiring up systems such as one we've discussed. Any library system that allowed users to make choices about which data points of their own can be shared -- anonymously or not -- would expose libraries to managing more data about their users than we currently handle. And by definition, retaining more user information means we might be asked to divulge more, if so tasked by relevant authorities.

So we cannot merely hand-wave this issue away, but at least we can take solace in knowing that the smart folks working on efforts such as Shibboleth have made great headway in enabling environments like this. And that the testing already being done on those kinds of tools in the context of user authentication for library services will be very beneficial. Keeping in mind also that the general software community is already focused on delivering tools for academia, such as with the recent Mellon Foundation funding of the Chandler project, and the path to building solutions seems even shorter.

Conclusion: Where to now?

A group of librarians investigated some of these ideas in a prototype linklog tool we called Linkstack in the late fall of 2003. That prototype integrated simple link logging with a popular weblog engine, and we found it to be a great way a track current news in digital libraries and software development. It has been successful, if perhaps in the small, in pointing toward many possibilities in our continuing online discussions, some of which I've hopefully reported fairly here.

As of early 2004 we'd like to open up a more ambitious experiment, perhaps to replicate or interact with functions Delicious provides, and layering in basic group management tools and bibliographic reference storage and linking services. It will be exciting to take a closer look at innovations coming from systems such as COPPUL's reSearcher, which seems to be in the lead in envisioning and delivering the kind of integration discussed here. Perhaps we can create a testbed involving an instance of this project or participation from one or more other resolver/portal system vendors, and a broader group of users that expands our small linkstacking gang into a diverse cross-section of professional colleagues.

For me, an exciting thing about these possibilities is that they would involve putting user-centered service integration at the top of the library systems agenda. It has been interesting to overhear peers at multiple institutions discuss the difficulties they've had while defining cohesive visions of how to integrate link resolvers, university portals, federated search, and courseware servers, among other contemporary systems. It seems that upon realizing we've licensed incompatible remote resources, and partially solving that problem with resolvers, and portals, and so on, we've responded by creating our own suite of incompatible local services, replete with local adminstration nightmares. If we can be successful in delivering a user front-end to these disparate services and resources that succeeds in integrating how users want to move through and manage information in 2004, we will have gone a long way toward the library groupware vision.

Acknowledgements

Thanks to Art Rhyno for walking with me through all of these ideas, and to Art, Jeremy Frumkin, Ed Summers, Clay Redding, and Kelsey Libner for their many roles as linkstackers and correspondents, and Joshua Shachter as well, for joining in our discussion. Lee Iverson touched off the "library groupware" meme for me in his Access 2003 talk, and offered gracious feedback for this paper, as did Bruce D'Arcus. Thanks also to the staff of the Cushing/Whitney Medical Library at Yale, for bearing with me, as ever, while I explained yet another crazy scheme. And Matt "I still enjoy the great taste of Flutie Flakes" Wilcox, who has shamelessly belittled my absurd early drafts for six years.

Document History

2004-01-27:

Major and minor edits throughout, new summary, final public draft.

2004-01-13:

Minor edits throughout; second public draft.

2004-01-09:

First public draft.