Processing Citations

In an effort to format my documents, and ultimately test my experiments in modifying the citation style language in BiblioX, I’ve decided to try to write a stylesheet drawing on new features in XSLT 2.0. A lot of help from the gurus shows both that citation and bibliographic formatting is a complex problem, and that XSLT 2.0 has features that make it easier.

Some quick notes on the general classes of citation styles, and the basic rules. Ultimately any program needs to be designed to easily switch between each of them.

  1. citekey: The most simple of style. Citation markers are natural language key like [doe99a] and bibliography list is ordered by the appearance order of the citation. Software people often use this style, so it makes sense it’s easy to process!
  2. author-year: Dominant in the social sciences, this one is much more difficult to process. Citation marker is an author-year combination. Where there is more than one author-year combination in a document, years gets appended with an alphabetic suffix (e.g. 1999a, 1999b, etc.). To add an additional layer of complexity, if there are more than one-author year combination within a citation, the author should be dropped from all but the first (e.g. Doe, 1999a, 1999b). In the bibliography, by contrast, all entries within each author (Doe, and Doe and Jones are grouped separately) ought to be sorted by date (on which the year suffix may be based), and all entries after the first generally have the author(s) replaced with three em-dashes and a period. This is the sort of complex grouping problem that XSLT 2.0 is well-suited to. Finally, author-year citations, like note-based, often have captions for page numbers and so forth, as well as a variety of different forms. For example, if in the text preceding the citation, the author is listed, then it is dropped from the citation proper. The traditional approach has been for authors to explicitly code how the citations ought to be rendered (full, year-only, etc.). It’s an open question at this pointed whether this can and should be automated. I tend to think it should, but this adds another layer of complexity best to worry about later.
  3. note: Footnote and endnote styles are common in the humanities. Often there is no bibliography list here, and so the citation contains the full bibliographic information. Except, in many (most?) cases, note-based citations distinguish between first and subsequent rendering. The first occurrence in the text gets the full reference; all else get an abbreviated one. In addition, these styles also often require references to previous entries; ibid., op. cit., etc. I really despise this stuff myself!
  4. numbered: As I understand, common in the hard-sciences. Here citations are just a numbered list and the citations are like (1). The only wrinkle here, presumably, is collapsing multiple-reference-citations, like (1,3, 4-5). This ought to be the same processing problem as with the author-year citations.

As I said above, one functional requirement for any new citation coding and processing tools should be that one can fully switch between these style without modifying the document source. A change from author-year to footnote ought to involve choosing a different style file from a commandline-processor or GUI menu. I am unaware of any tools that can do this, but that doesn’t mean it can’t be done. Indeed, BiblioX has pretty well shown it’s possible.

The features of XSLT 2.0 that make this sort of processing easier? Temporary trees, and multi-level grouping. In essence, you create a virtual bibliography enhanced with the processed data you need to insert in the final document (for example, a year appended with its suffix).

Comments are closed.

Creative Commons License Creative Commons License