This document provides a reasonably non-technical introduction to the SGML-based markup scheme developed by the Model Editions Partnership (MEP) for the production of historical documentary editions in electronic form. The most important element types (or `tags') in the MEP markup system are introduced with examples.
In the main body of this document, we will provide a systematic introduction to all the most important features of the markup scheme developed by the Model Editions Partnership (MEP) for creating electronic historical documentary editions. Before we start, however, let's begin with a quick list of our most important assumptions, and then a brief tour of the subject, in which all the key ideas will be introduced informally; when they reappear later in slightly more formal guise, they will not be wholly unfamiliar.
We assume that the reader knows a little bit about markup, the Standard Generalized Markup Language (SGML), and documentary editing -- at least enough to follow the examples. We do not assume deep knowledge of these topics or any others. If you're not sure you know enough, try reading this section; if you need to read up, some tutorials on these topics are listed in the next section. A fuller discussion of our basic assumptions is given in the next section (Preliminaries).
Consider a document like the one shown (in edited form) below: a letter from Abraham Lincoln to Richard N. Yates and William Butler:
April 10. 1862
Hon. R. Yates, & Wm. Butler
I fully appreciate Gen. Pope's splendid achievements with their invaluable results; but you must know that Major Generalships in the Regular Army, are not as plenty as blackberries.
In a historical documentary edition, the text of such a document will be presented, together with editorial additions like headings, source notes, and footnotes or endnotes. Any electronic representation of the document must provide for the transcription of the text and the provision of the additional editorial material.
This is a letter from Abraham Lincoln as published in The Collected Works of Abraham Lincoln. ed. Roy P. Basler ([Springfield]: The Abraham Lincoln Association, 1953).
The MEP form of this document, in its entirety, looks more or less like this. We have indented some lines to make the structure of the example more obvious; in practice, such indentation is not usually necessary or even desirable.
<!DOCTYPE doc PUBLIC "-//MEP//DTD Model Editions Partnership data capture level 3 ver. 2.0//EN" > <doc> <mepHeader> <sender>Abraham Lincoln</sender> <docDate>April 10. 1862</docDate> <addressee>Richard Yates</addressee> <addressee>William Butler</addressee> <preparedBy>David R. Chesnutt and C. M. Sperberg-McQueen</preparedBy> <prepDate>31 August 1998</prepDate> <prepDate>20 October 1998</prepDate> <idno>... (document tracking number) </idno> <sourceDesc><bibl>Abraham Lincoln, <title>Speeches and Writings 1859-1865</title> (New York: Library of America, 1989), p. 315.</bibl></sourceDesc> </mepHeader> <head>To <addressee><person> Richard Yates </person></addressee> and <addressee><person> William Butler </person></addressee> </head> <dateline> <place>Washington</place>, <date value="1862-04-10">April 10. 1862</date> </dateline> <docBody> <address> <addrLine>Hon. R. Yates, & Wm. Butler</addrLine> <addrLine>Springfield, Ills.</addrLine> </address> <p>I fully appreciate Gen. Pope's splendid achievements with their invaluable results; but you must know that Major Generalships in the Regular Army, are not as plenty as blackberries.</p> <signed>A. Lincoln</signed> </docBody> </doc>
Let's walk through the example line by line.
The document begins with an indication that it has been encoded using the SGML tag set described here:
<!DOCTYPE doc PUBLIC "-//MEP//DTD Model Editions Partnership data capture level 3 ver. 2.0//EN" >
The first line alerts the SGML parser to the kind of information
provided here: this section identifies the SGML document
DOCTYPE) used in the transcription of the
letter. The quoted string is a public identifier, a
system-independent way to refer to the markup scheme (formally,
document type definition, or DTD) developed
by the Model Editions Partnership. The quoted string's
internal structure, and the
details of the rest of this document type declaration,
need not detain us here: we content ourselves with saying that SGML
software will understand them even if readers of this document
The document itself is transcribed in a more or less usual way,
interspersed with tags which mark the beginnings and ends
of the major structural parts of the document. The tags are
distinguished from ordinary text by being enclosed in angle brackets.
The document itself is (for SGML purposes) represented as a <doc>
element: everything between the
start-tag and the
Within the <doc> element, there is a `MEP header' (represented in SGML by a <mepHeader> element). The MEP header contains some internal labeling and control information for the document, including the dates the document was prepared in electronic form and who did the preparation; it also can say explicitly who wrote the document, to whom, and when. The header is often not displayed to the user, but it can be used in indexing the document. In some cases, the heading of a document does not mention the sender (or addressee) and no signature is reproduced, but the header can still make explicit who wrote a document, and when:
<sender>Abraham Lincoln</sender> <docDate>April 10. 1862</docDate>and who received it:
<addressee>Richard Yates</addressee> <addressee>William Butler</addressee>
In a paper book, this information may be implicitly provided by context: in some editions, all the documents were written by the same person, and no signatures are given for any documents, since the subject of the edition did not vary the form of his signature. In an electronic edition, this context is much more elusive, and it's useful to make explicit some things which can be left implicit in a typical print edition. In this case, the signature and address are reproduced, so the <sender> and <addressee> elements in the header are slightly redundant. Not completely redundant, however: they gives the names in a fuller form than in the document.
Our sample header continues with some administrative information about the preparation of the electronic document, namely who touched the file, and when:
<preparedBy>David R. Chesnutt and C. M. Sperberg-McQueen</preparedBy> <prepDate>31 August 1998</prepDate> <prepDate>20 October 1998</prepDate>
Such information is important in tracking and managing the work flow of an editorial project.
The heading of the document, the dateline, and the document body itself, containing an address, one paragraph of text, and the signature, are all also marked with appropriate SGML tags. (To wit: <head>, <dateline>, <docBody>, <address>, and <p>.) By marking the boundaries of such units explicitly, we make it easier for software to process the text automatically: the display or typesetting software can put headings in large bold type, and place the dateline flush against the right margin, while indexing software can use the dateline to associate the date in the dateline with the text of the document.
In SGML, tags work a bit like brackets or parentheses: each start-tag has a matching end-tag, and vice versa, and pairs of start- and end-tags can nest within other pairs ad infinitum. Pairs of tags are not allowed to overlap each other: they can follow each other, thus:
<head>To Richard Yates and William Butler</head> <dateline>Washington, April 10. 1862</dateline>Or they can nest within each other, thus:
<head> To Richard Yates and William Butler <dateline>Washington, April 10. 1862</dateline> </head>But they are not allowed to overlap:
<!--* ILLEGAL *--> <head>To Richard Yates and William Butler <dateline>Washington, </head> April 10. 1862 </dateline>Overlap is forbidden even when it appears to be wholly nominal or harmless:
<!--* ILLEGAL *--> <addressee><person> Richard Yates </addressee></person>
The date of the letter illustrates another important part of SGML: the use of attributes. The element type <date> has an attribute value, which can be used to provide the date in a standard format (we use the standard ISO date format which gives the year first):
<date value="1862-04-10">April 10. 1862</date>
Special provision can be made for partial dates, uncertain dates, and so on, as is described more fully below.
Attributes are commonly used in SGML encoding to provide supplemental information about a portion of the text, or alternate forms of its content (as here). The MEP encoding defines many attributes for the elements of the MEP encoding scheme. Most of them are not discussed here; to simplify the exposition, we will mention only the most important or commonly used attributes. Full reference information is available elsewhere, in sources listed in the bibliography.
This document assumes that the reader knows enough about the Standard Generalized Markup Language (SGML) and descriptive markup to understand why one might choose to use SGML for preparing and publishing a documentary edition and to follow examples of the use of SGML markup to tag documents. If you need a tutorial or refresher course on SGML, we recommend that you consult the Gentle Introduction to SGML prepared by the Text Encoding Initiative (TEI) and accessible from the TEI's Web site (at http://www.uic.edu/orgs/tei/). Other SGML tutorials are available in electronic or paper form; consult the TEI Web site or the well-researched SGML/XML Home Page maintained by Robin Cover (at http://www.oasis-open.org/cover/sgml-xml.html) to find one you like.
We also assume the reader has background knowledge of documentary editing, in particular as it is practiced in the publication of American historical records. Editorial practices vary from country to country and discipline to discipline; the particular compromises and judgments reached in the Model Editions Partnership may or may not be applicable in other communities of readers and editors. For a good overview of the style of editing we are talking about here, see Mary-Jo Kline, A Guide to Documentary Editing, or Michael E. Stevens and Steven B. Burg, Editing Historical Documents: A Handbook of Practice.
We assume no particular background knowledge of SGML software, but we also will not cover this topic at all. The SGML marketplace changes so fast, and the choice of software depends on so many local and personal factors, that anything we write would be likely to be out of date by the time this document is finished, and appropriate only for a minority of our potential readers. All we can recommend is that readers consult current sources of information, at the time they are considering acquiring SGML software.
Among the best sources of information about relevant software are Robin Cover's SGML/XML Home Page at http://www.oasis-open.org/cover/sgml-xml.html (already mentioned above), and Steve Pepper's Whirlwind Guide to SGML & XML Tools and Vendors at http://www.infotek.no/sgmltool/guide.htm. Both of these sources of information are regularly updated, and their authors have well deserved the thanks of everyone interested in electronic text.
There are many ways to organize the work flow in an editorial project, and we assume that readers will have their own views on the best way to organize their own project. We have views of our own, but we have tried to avoid prescribing a single approach. There are a number of practical questions relevant to practical work which any real project must address: for example, how to go about tagging documents so that they follow the recommendations given here, at what point in the editorial process the documents should be marked up, and how much it will cost.
Some of the tagging we recommend can be automated in very simple ways, e.g. by simple macros in a word processor, or in some cases even by simple global changes. Other tagging we recommend can be automated successfully only by a skilled programmer. Some things fall between the two extremes, and can be performed by an astute editor or a journeyman programmer. Some kinds of tagging cannot be fully automated, even by expert programmers, but an automatic process can propose tagging for a human editor to accept or reject, in much the same way that a selective global change in a word processor allows the user to decide whether or not to make the change, on a case by case basis. And, finally, some tagging is done most simply by hand.
Automated and semi-automated tagging can substantially reduce the cost of tagging an edition, but failed attempts to automate what cannot be automated can consume alarming amounts of time, patience, and money. The art and challenge of managing the creation of an electronic edition using limited resources lies, in no small part, in automating what can be automated, doing manually what must be done manually, and deciding (perhaps with a sigh) to leave untagged what cannot be tagged automatically and is not essential to the edition. It will not always be easy to decide where to class a particular kind of tagging: some kinds of information require manual tagging in some collections of documents, but can be tagged automatically or semi-automatically in others. Right judgment will depend on the body of materials being edited, on the time and resources at hand, and on the skills of the available programming assistance.
Not every project has access to skilled programming assistance: some projects have full- or part-time programmers at least some of the time, while in other projects the only technical assistance available consists of whatever instruction manuals came with the word processor. For this reason, we distinguish between blanket recommendations and selective recommendations. A blanket recommendation means, in effect: this should always be tagged in documentary editions prepared for electronic publication. The benefits are high enough, and the costs of tagging low enough, that we think it's worthwhile doing this tagging in all cases. If you can automate it, more power to you; if not, we think it's worth whatever labor is required to do it manually. A selective recommendation means, in effect: an electronic edition will be better if this is tagged, but the costs of tagging it manually may be high. If you can automate this tagging, you should do so; if you cannot automate it, you must either find someone who can, or sigh and resign yourself to publishing without it.
The system of markup described here is intended to make documentary editions in electronic form as useful as it's practical to make them. But in this document, we are describing the end product, not the work process. The best way of achieving good results will vary from project to project, depending on the available resources of editorial staff, computing equipment, technical knowledge and support, time, and money; there is no single universal solution. Anyone who claims to offer such a simple universal solution applicable to all editorial projects is either dangerously optimistic, or consciously deceptive; either way, they should be given a wide berth.
Note that in addition to the blanket recommendations and selective recommendations, the MEP encoding scheme also defines some element types which are neither recommended nor deprecated; they are simply there, for use in the cases where they are appropriate. These elements are mostly not described in this document, though some are mentioned from time to time. If you need to use them, you must turn to the other documentation mentioned below.
One approach among the many possible and legitimate approaches to creating electronic historical documentary editions is to add the markup gradually, in layers. In theory, many different ways to identify layers are possible; in practice, a number of projects have come to rely on a set of layers like that described below in the appendix One Approach to Gradual Markup.
The markup scheme described here is based on, and is a conforming use of, the Guidelines for Electronic Text Encoding and Interchange developed by the Text Encoding Initiative (TEI), an international project to develop methods of encoding electronic texts for research and teaching. Most of the SGML element types described here are taken directly from the TEI encoding scheme, and are defined here in the same way as in the TEI. Others are additions to the TEI scheme, and may be defined by reference to standard TEI element types. Where appropriate, we have provided cross references to the TEI Guidelines so that readers seeking further discussion of a particular element type know where to turn in that work.
This document assumes no familiarity with the TEI encoding scheme beyond what the reader has just acquired by reading the preceding paragraph. Readers who are familiar with the TEI and wish to compare the basic TEI encoding scheme with MEP's elaboration of it, and readers who wish to become more familiar with the TEI, will find cross-references to relevant portions of the TEI Guidelines in many sections of this document.
The examples in this document have been formatted for legibility; in particular, line breaks have been introduced in order to keep the lines artificially short. This is an artifact of the presentation of the examples, not an intrinsic feature of SGML or of the MEP encoding scheme.
In some examples, the markup (and even the orthography and punctuation of the examples) has been simplified to avoid distracting the reader from the point being illustrated. In particular, many examples omit markup which is introduced only later in this document, even in contexts where we recommend that the omitted markup always be used. To avoid misleading the reader, examples which omit recommended markup begin with an SGML comment like the following:
<!--* The markup here is simplified. *-->
Some readers may be curious about the implications for electronic historical editions of the Extensible Markup Language (XML), a new formalism for markup which has (at the time this document is being prepared) recently been widely discussed in the trade press. For purposes of this document, SGML and XML are interchangeable: every valid XML document is at the same time a valid SGML document, and every SGML document constructed according to the rules described here can be processed by XML software. The differences between SGML and XML are thus not relevant to the creators of electronic documentary editions.
The differences are relevant, in contrast, to the developers of software. Indirectly, of course, anything relevant to software development may affect software users: if XML software is easy to produce, there will be more of it. The wide adoption of XML will not change the recommendations given here for encoding editions, but it will change the environment within which electronic historical editions are delivered, and the primary result will be to make the recommended encoding described here more useful and more important than ever before.
Two peculiarities of XML must be noted for readers of this document
who are already familiar with SGML. First, in XML the end of every
element must be explicitly marked, whereas in most SGML systems,
end-tags may be omitted when logically redundant. Second, in XML
empty elements (i.e. XML elements which have no content)
are marked by tags of the form
<e/>, or by a
start-tag immediately followed by an end-tag
<e></e>); in SGML, empty elements are normally
tagged to resemble a start-tag which has no end-tag:
<e>. The XML form has the virtue of making clear
the precise starting and ending locations of each element, without
requiring the reader to know which elements in a DTD are declared
EMPTY and which have content.
All examples in this document are tagged in XML form.
The encoding scheme described here is intended to be useful both in the initial creation of electronic versions of historical documents (also called `data capture') and in the gradual enrichment and later publication of such electronic editions.
Formally, the element types and attributes described here are grouped into several distinct SGML document type definitions; three of them are intended for data capture and one as an `archival form': less convenient for data capture, but more convenient for processing. The three data-capture DTDs represent three different levels of tagging, described in more detail below in the appendix One Approach to Gradual Markup.
This document provides a systematic introduction to the MEP markup scheme, covering all of the most important element types defined there. Two other documents also describe the MEP scheme: the Reference Manual provides an alphabetical list of all the element types (including those not discussed here) with full lists of attributes and examples. The document Model Editions Partnership: TEI/MEP Encoding Scheme provides a technically oriented overview of the MEP encoding scheme, and describes in detail how the MEP scheme is related to the TEI scheme.
A less technical introduction to the problems of electronic editions, which explains some of our assumptions about the ways electronic scholarly editions can be constructed, can be found in the MEP Prospectus for Electronic Historical Editions.
Full references to these documents, and to the historical editions from which examples are drawn, are given in the bibliography.
For our purposes, a historical edition is a collection of documents, with annotation, indices, and other editorial matter. Tags for the collection as a whole, and tags for editorial matter, are described later; let us begin by considering individual documents in the edition.
Every document in a collection should be tagged as a document, using the <doc> element:
Each <doc> element consists of a header (here tagged <mepHeader>), containing information about the transcription itself (see section 10 The Header), followed by the transcription of the document; editorial material such as notes may precede, follow, or be intermingled with the transcription in various ways (see section 8 Annotation). The key point for now is this: one <doc> element for one historical document.
For example, consider this letter, written to one Lieutenant William Eppes by General Nathanael Greene on 21 January 1781 (like most of the examples used in this document, the letter from Greene to Eppes is reproduced in full in a companion document). The <doc> element, as may be seen, encloses the entire document from the heading to the last endnote. (The other tags shown may be ignored for the present; they will be discussed presently.)
<!--* Papers of General Nathanael Greene, 7:164 *--> <!--* The markup here is simplified. *--> <doc> <mepHeader> ... </mepHeader> <head>To Lieutenant William Eppes</head> <dateline>[Camp on the Pee Dee, S.C.] Jan 21st 1781</dateline> <docBody> <salute>Sir</salute> <p>I have your favor of this day. ...</p> <p>Many Officers in this Army are doing duty upon their old commission notwithstanding their right of promotion is as incontestible as yours and nothing wanting but their commissions from the board of war. ... I am Sir</p> <closing>your humble Ser</closing> <signed>N Greene</signed> </docBody> <sourceNote>ADS (MiU-C)</sourceNote> <endnote>Eppes's letter, dated 20 January, ...</endnote> <endnote>Eppes resigned in a letter of 22 January....</endnote> </doc>
In most cases, it's obvious what should count as one document and what should count as two. Some less obvious cases are discussed below; in particular, see section 11.1 Enclosures.
In most documentary editions, the majority of documents included are letters. We begin, therefore, by discussing letters; other kinds of documents are discussed below (section 6 Documents Other than Letters).
The major parts of a letter (heading, dateline, salutation, body, closing, signature, etc.) should be tagged using the element types described in this section.
The heading, dateline, and salutation of a letter should be marked using the following element types:
The heading of the letter should be given in the form customary for the editorial project; typically, for editions of the papers of individuals, this means omitting the name of the subject of the edition (but see discussion below). The names of the addressee or sender should be identified as such. For example, the heading of the Greene letter discussed above might be given thus:
<!--* Papers of General Nathanael Greene, 7:164 *--> <!--* The markup here is simplified. *--> <head>To <addressee>Lieutenant William Eppes</addressee></head>or (in an edition of the papers of Henry Laurens):
<!--* Papers of Henry Laurens, 10:361 *--> <!--* The markup here is simplified. *--> <head>From <sender>Alexander Innes</sender></head>
When the project's policy is to identify both sender and recipient, or when the edition is not focused on an individual, the headings will name both parties to the letter, as in this example from the Sanger Papers:
<!--* Margaret Sanger Papers, document 106565 *--> <!--* The markup here is simplified. *--> <head><sender>Emma [Goldman]</sender>: Letter to <addressee>Margaret Sanger</addressee></head>or in this one from the first Federal Congress:
<!--* The markup here is simplified. *--> <head><sender>John Vining</sender> to <addressee>Charles Thompson</addressee></head>
Note that omitting the name of the sender or recipient in the heading, while useful in avoiding repetition, does substantially complicate the use of electronic editions. A search for the word commission in letters written by Nathanael Greene, for example, is substantially easier for software to perform if all the letters by Greene are marked explicitly as such, rather than implicitly. Either of two methods may be used to avoid the problems of implicit identification of sender or recipient. First, a project may choose to alter its method of constructing headings, by always naming both sender and recipient. In this style, the Greene letter would have this heading:
<!--* Papers of General Nathanael Greene, 7:164 *--> <!--* The markup here is simplified. *--> <head>From <sender>General Nathanael Greene</sender> to <addressee>Lieutenant William Eppes</addressee></head>and letters in the Laurens Papers would be marked thus:
<!--* Papers of Henry Laurens 10:317-319 *--> <!--* The markup here is simplified. *--> <head>From <sender>Henry Laurens</sender> to <addressee>James Laurens</addressee></head>
On the whole, we believe this method (always name both sender and recipient) is the best solution for electronic editions. Readers of these editions may come to them from other electronic documents (e.g. on the World Wide Web), and may or may not know that the hyperlink they have followed has brought them to an edition of a particular individual's papers. The reader of a print edition must almost always look at the spine of the volume closely enough to know whether it is the Laurens Papers or the Greene Papers, before turning to the page or document being sought. The reader of an electronic edition may have no such warning of what exactly they are about to see. Links can (and should) be automatically provided from each document to the title page, introduction, and editorial apparatus for the edition as a whole. But it should not be necessary to follow such links merely to find out who wrote the letter or other document on the reader's screen.
We recognize that not all editors will be persuaded by our reasoning. For those who judge differently, the second alternative is to place the relevant information in the header of the document, rather than in the heading:
<!--* Papers of Henry Laurens 10:317-319 *--> <!--* The markup here is simplified. *--> <doc> <mepHeader> <sender>Henry Laurens</sender> ... </mepHeader> <head>To <addressee>James Laurens</addressee></head> ... </doc>
For discussion, see section 10 The Header.
Datelines should be tagged as datelines; internally, the placename and the date itself should be distinguished and marked using the <place> and <date> elements, which are further discussed elsewhere (see section 5.3 Names and section 5.4 Dates). For example
<!--* Margaret Sanger Papers, document 106565 *--> <dateline> <date>April 9, 1914</date>, <place>Chicago <supplied>IL</supplied></place> </dateline>
Some projects normalize dates and placenames in the dateline; others do not. When the normalization is not silent, the normal methods of indicating an editorial intervention (such as tagging supplied text as such, using the <supplied> element) should be used. Some projects place the dateline information in the heading; it should be tagged in the same way inside a heading as outside a heading:
<!--* A Necessary Evil?, p. 73. *--> <!--* The markup here is simplified. *--> <head> <sender>Samuel Hopkins</sender> to <addressee>Moses Brown</addressee>, <dateline> <place>Newport</place>, <date>October 22, 1787</date> </dateline> </head>
When the content of the dateline is silently normalized using normalization policies not applied to the text of the document, then the dateline should be considered editorial matter and should appear outside the document body proper (i.e. before the beginning of the <docBody> element); when it is not normalized, it should be treated as part of the transcription proper, and go inside the <docBody> element.
When the contents of the dateline are not normalized, but it has been moved to the top of the document from some other location (e.g. the end of the document), then the <dateline> element should go outside, not inside, the <docBody> element.
When a letter has multiple datelines, they are typically transcribed at their points of occurrence (in addition, perhaps, to a normalized dateline at the top, covering the entire range of dates). If the dateline is set off in the original, then <dateline> element should occur between paragraphs, rather than within paragraphs; if the original author runs the dateline into a paragraph, the <dateline> element may occur within a paragraph.
For example, in this letter from Henry Laurens to James Laurens, the datelines are set off in both the original and the transcription; in this example, indentation is used to emphasize the structure of the marked up text.
<!--* Papers of Henry Laurens, 10:317-319. *--> <!--* The markup here is simplified. *--> <doc> <mepHeader> ... </mepHeader> <head><sender>Henry Laurens</sender> to <addressee>James Laurens</addressee> </head> <docBody> <dateline> <place>Charles Town</place>, <date>August 20, 1775</date> </dateline> ... <p>My late Letters to you have been ... </p> <p>For Public News I refer to your Nephew ... </p> <p>Your Accounts of Sale remain unperfected.... </p> <p>I have seen Mrs. Petrie twice ... </p> <p>I lately Paid ... the Balance of your Account ... </p> <p>Mr. Hawkins tells me he has a large Sum ... </p> <dateline> <date>21st</date>/ Early._ </dateline> <p>Our Summer has been very tolerable ... </p> <p>My love & good wishes attend you all & I remain with great affection & esteem</p> ... </docBody> </doc>
The <head> element is discussed in TEI P3, section 7.2; the <dateline> element is discussed in section 7.2.2. There are no direct analogues in the TEI DTD to <sender> and <addressee>, though the <author> element in the TEI header and the <docAuthor> in the title page can be used to convey much the same information as <sender>.
The address of the letter (if transcribed), the salutation, and the body of the letter itself should be marked up using the following element types:
The internal structure of an address does not need to be captured in the SGML encoding; in order to preserve the lineation of the original, however, the element <addrLine> should be used to mark the lines of the address. If the addressee of the letter has been identified in the heading, there is no need to repeat the identification in the transcription of the address itself; the personal and place names, however, should normally be marked as such:
<!--* Margaret Sanger Papers, document 106565 *--> <!--* The markup here is simplified. *--> <doc><mepHeader> ... </mepHeader> <head><sender>Emma Goldman</sender> to <addressee>Margaret Sanger</addressee></head> <docBody> <address> <addrLine><person>Mrs. Margaret Sanger</person></addrLine> <addrLine>34 Post Ave.</addrLine> <addrLine><place>New York City</place></addrLine> </address> ... </docBody> </doc>
Note that no provision is made, in the MEP encoding scheme, for transcribing the address and other information printed on a letterhead.
The salutation should be transcribed using the normal transcription rules of the edition, and tagged as a <salute>:
<!--* Papers of General Nathanael Greene, 7:164 *--> <salute>Sir</salute>
<!--* Papers of Henry Laurens, 10:317-319 *--> <salute>My Dear Brother</salute>
<!--* Margaret Sanger Papers, document 106565 *--> <!--* The markup here is simplified. *--> <salute>My dear Margaret:-</salute>
The body of the document itself should be marked as a <docBody> element and transcribed using the elements described elsewhere in this document (especially, but not exclusively, section 5 The Document Body). The document body is distinct from the editorial apparatus preceding and following the document; it will typically include the salutation and signature of a letter. For example:
<!--* Papers of General Nathanael Greene, 7:164 *--> <docBody> <salute>Sir</salute> <p>I have your favor of this day. ...</p> <p>Many Officers in this Army ... <closing>your humble Ser</closing> <signed>N Greene</signed> </docBody>
The TEI encoding scheme discusses addresses in section 6.4.2 and salutations in 7.2.2. The <docBody> element is roughly analogous to TEI's <body> element, discussed in the opening paragraphs of chapter 7 of TEI P3.
The closing salutation, signature, and any postscripts should be tagged using the following elements:
Some editorial projects run the closing salutation or flourish up into the preceding paragraph; others format it as a separate paragraph. In the latter case, the closing salutation should definitely be tagged as a <closing> element. For example:
<!--* Papers of Henry Laurens, 10:317-319 *--> <p>My Love & good wishes attend you all & I remain with great affection & esteem</p> <closing>My Dear Brother Yours.</closing> <signed>Henry Laurens</signed>When the closing salutation is run into the preceding paragraph (either by the author or by the editor), it may optionally be tagged as a <closing>, but this is not required. Thus the following two forms are both legitimate:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> <p>... Man has never begun to appreciate the wrongs of woman. Your cousin</p> <signed>E.C.S.</signed>
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> <p>... Man has never begun to appreciate the wrongs of woman. <closing>Your cousin</closing></p> <signed>E.C.S.</signed>
Postscripts should be marked as such using the <ps> element, which typically contains a series of paragraphs; if there is only one paragraph, the <ps> element should contain only one <p> element:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> <p>... Your cousin</p> <signed>E.C.S.</signed> <ps><p>Much love to all.</p></ps>
Here is another more extensive example:
<!--* The markup here is simplified. *--> <doc> <mepHeader> <sender>Henry Laurens</sender> ... </mepHeader> <head>To <addressee>Martha Laurens</addressee></head> <dateline> <place>Charles Town</place>, <date>August 17, 1776</date> </dateline> <salute>My Dear Daughter</salute> <p>It is now upwards of twelve Months ...</p> ... <p>You will take care of my Polly too ...</p> <signed>your affectionate Father</signed> <ps> <dateline><date>19th</date></dateline> <p>Casting my Eye over ...</p> </ps>
For a discussion of the <signed> element in the TEI encoding scheme, see section 7.2.2 of TEI P3; the TEI <salute> element (discussed in the same place) corresponds both to the MEP <salute> and to the MEP <closing> element. There is no <ps> element in the TEI, though in some cases <back> can be used.
When they occur and when it is the policy of the editorial project to transcribe them, markings on a letter, such as endorsements or docketings, should be transcribed, typically at the top or the bottom of a letter, using the following element types:
Some editions transcribe docketings not as part of the document itself, but as part of the source note. In such cases, the edition may optionally tag the docketing as such within the source note (see below, section 8.2 Source Notes). When the docketing is transcribed as part of the document itself, it should always be tagged as a docketing. The letter from Henry Laurens to James Laurens has a docketing which would be transcribed this way:
<!--* Papers of Henry Laurens 10:317-319 *--> <doc> ... <p>My Love & good wishes attend you all & I remain with great affection & esteem</p> <closing>My Dear Brother Yours.</closing> <signed>Henry Laurens.</signed> <docketing>Henry Laurens_/Cha<super>s</super> Town 20 August 1775</docketing>
The term <docketing> should be reserved for systematic annotations made as part of a filing system, such as those made in courts to identify the cases to which briefs and other filings pertain, or by commercial correspondents to identify the sender, date of the letter, and date of receipt. Less systematic annotations, as well as annotations related not to the filing system but to the content of the letter and actions taken (or to be taken) with regard to the subject matter, should be tagged not as docketings but as endorsements.
For example, at the upper left of Emma Goldman's letter to Margaret Sanger asking (among other things) that two hundred copies of The Woman Rebel be shipped to Denver, an unidentified hand has written "200 copies sent to Denver". In an edition which transcribes such endorsements, this should be transcribed thus:
<!--* Margaret Sanger Papers, document 106565 *--> <docBody> <endorsement>200 copies sent to Denver</endorsement> ... </docBody>
In some cases, the same document may have both an endorsement (here, one accompanying President Washington's signature) and an attestation (here, one signed by the Clerk of the House):
<!--* Documentary History of the First Federal Congress 6:2028 *--> <!--* The markup here is simplified. *--> <endorsement>Approved <date value="1789-08-07">August the Seventh 1789</date> <signed> Go. WASHINGTON President of the United States </signed> </endorsement> <attestation>I certify that this Act did originate in the House of Representatives. <signed> JOHN BECKLEY — Clerk </signed> </attestation>
The elements <attestation>, <docketing>, <endorsement>, and <auth> are not in TEI P3.
The MEP encoding scheme distinguishes between paragraph-level elements (roughly: elements which can occur directly within a letter or other document; these normally correspond to discrete identifiable blocks of text in a conventional typographic presentation of a document) and phrase-level (or character-level) elements, which typically occur within some paragraph-level element or other. (A few element types can occur either as paragraph-level elements themselves, or within other paragraph-level elements.)
The rest of this section describes first the most important paragraph-level elements and then the most important phrase-level elements used in transcribing historical documents.
Paragraphs and lists should always be marked, when they occur, using the <p> and <list> element types:
Mark paragraphs as such using the <p> element. For example, consider the paragraphs in this extract from the diary of Susan B. Anthony. (For simplicity, the markup of the place and personal names which would normally be recommended in this document has been omitted here.)
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <!--* The markup here is simplified. *--> <p>Chataque County Woman's Rights Convention held Tuesday Dec. 26th 1854 at Mayville in the Court House.</p> <p>Marrietta Richmond of Columbia County in company with self— stopped at the house of Cyrus Underwood— The weather warm & rainy— Sleighing gone, & wagoning dangerous on account of the heavy snowdrifts— </p>
Paragraphs occur most commonly in the body of the document, but the <p> element is sometimes also used within other elements. If a list item or a note consists of several paragraphs, each paragraph within the list item or note should be tagged as a paragraph. If it consists of a single paragraph, no <p> element is needed, though the <p> may be used if desired.
The <list> element is used to mark a list of items, whether the individual items are numbered, lettered, or marked with bullets or other dingbats. For example:
<!--* Papers of General Nathanael Greene, 7:261 *--> <!--* The markup here is simplified. *--> <p>Present <list> <item>The Hon. Major Gen. Greene</item> <item>The Hon. Brigadier Gen. [Isaac] Huger</item> <item>The Hon. Brigadier Gen. [Daniel] Morgan</item> <item>Colonel Otho Williams</item> </list>
Sometimes the list has a heading of its own:
<!--* Documentary History of the First Federal Congress 6:2021 *--> And the Yeas and Nays being required by one <sCap>FIFTH</sCap> of the Senators present, the determination was as follows: — <list><head>Yea</head> <item>Mr. Butler</item> <item>Mr. Few</item> <item>Mr. Gunn</item> <item>Mr. Grayson</item> <item>Mr. Johnson</item> <item>Mr. Izard</item> <item>Mr. Langdon</item> <item>Mr. Lee</item> <item>Mr. Wingate</item> </list> <list><head>Nay</head> <item>Mr. Carroll</item> <item>Mr. Dalton</item> <item>Mr. Ellsworth</item> <item>Mr. Elmer</item> <item>Mr. Henry</item> <item>Mr. King</item> <item>Mr. Morris</item> <item>Mr. Read</item> <item>Mr. Schuyler</item> <item>Mr. Strong</item> </list>Occasionally, the items of a list have no bullets or numbers, just line breaks separating them:
<!--* Documentary History of the First Federal Congress 6:2021 *--> <list type='simple'> <item>Yeas 9</item> <item>Nays 10</item> </list> <p>So the question was lost, and the words proposed to be struck out, were retained.</p>
Paragraphs are discussed in TEI P3, section 6.1, and lists in section 6.7.
When it is not clear how to mark up some paragraph-level object in a document, it is convenient to be able to record the fact. For this purpose, the <what> element should be used.
For example, an encoder confronted for the first time with the endorsements on a bill passed by Congress might not be sure how to mark them up. It would be possible simply to tag them as paragraphs, thus:
<!--* Doc. Hist. First Federal Congress 6:2028 *--> <!--* The markup here is simplified. *--> <p>Approved <date value="1789-08-07">August the Seventh 1789</date></p> <signed> Go. WASHINGTON President of the United States </signed> <p>I certify that this Act did originate in the House of Representatives.</p> <signed> JOHN BECKLEY — Clerk </signed>If the encoder suspects (rightly, in this case) that <p> is not quite the right solution, the <what> element should be used to call attention to the problem:
<!--* Documentary History of the First Federal Congress 6:2028 *--> <!--* The markup here is simplified. *--> <what>Approved <date value="1789-08-07">August the Seventh 1789</date></what> <signed> Go. WASHINGTON President of the United States </signed> <what>I certify that this Act did originate in the House of Representatives.</what> <signed> JOHN BECKLEY — Clerk </signed>
There is no <what> element in TEI P3; the idea for it came from the Brown University Women Writers Project.
It is recommended that proper nouns in the body of a document always be marked as such; when it is possible to automate the task, it is useful to provide normalized forms of names, and to use the key attribute to indicate when two names refer to the same individual, or the same name refers to different individuals. The following elements should be used for this purpose:
Some projects will wish to mark indirect references to people, places, and organizations (i.e. references other than by name). Such markup is wholly optional; when it is performed, the project may choose to mark all, or only some, such references. The TEI header or MEP header should be used to record whether indirect references are marked at all, and if so whether they are marked uniformly (everything recognized as such a reference is marked) or selectively (only some such references are marked). When such references are marked at all, the following element types should be used to mark them:
All the elements just listed share two attributes used to give useful supplementary information:
It is recommended that both regularized forms and key values be given whenever the process can be made automatic or semi-automatic.
The two attributes have similar but distinct functions, as can illustrated by this example from Susan B. Anthony's diary. At the very least, the place names should be marked as such:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <p><place>Chataque County</place> Woman's Rights Convention held Tuesday Dec. 26th 1854 at <place>Mayville</place> in the Court House.</p>The variant spelling Chataque for Chautauqua can be regularized using the reg attribute; Mayville needs no reg attribute since the spelling SBA uses here is that now current:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <p><place reg="Chautauqua County">Chataque County</place> Woman's Rights Convention held Tuesday Dec. 26th 1854 at <place>Mayville</place> in the Court House.</p>The names Chautauqua County and Mayville, however, might refer to any of a number of localities in the United States. The key attribute should be used to give a unique identifier to a specific locality. The value of the key attribute might be a very full form of the name, such as might be found in a gazetteer or biographical dictionary:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <p><place reg="Chautauqua County" key="Chautauqua, Co. N.Y."> Chataque County</place> Woman's Rights Convention held Tuesday Dec. 26th 1854 at <place key="Mayville, town N.Y."> Mayville</place> in the Court House.</p>or it might be any arbitrary string of characters. If a project maintains a database of places or persons, the key value should normally be the database key of the appropriate record:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <p><place reg="Chautauqua County" key="Cha001"> Chataque County</place> Woman's Rights Convention held Tuesday Dec. 26th 1854 at <place key="May032"> Mayville</place> in the Court House.</p>If there is no particular reason to prefer the latter form, it is perhaps better to choose the former form of key value; since such keys are themselves normalized forms of the name, their use makes it unnecessary to provide a reg value:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:288-289 *--> <p><place key="Chautauqua County, N.Y."> Chataque County</place> Woman's Rights Convention held Tuesday Dec. 26th 1854 at <place key="Mayville, N.Y."> Mayville</place> in the Court House.</p>
In some cases, the reg and key attributes may have very dissimilar values. The pseudonymous signature Ivnivs on a pamphlet might be tagged thus:
<signed><person reg="Junius" key="ps993">Ivnivs</person></signed>
If the sender or addressee of a letter were always indicated by name, it would be redundant to supply the <person> element within a <sender> or <addressee> element. But not all indications of the sender and recipient are in fact personal names; to distinguish the two cases, we recommend that personal names be marked as personal names, even when enclosed within <sender> and <addressee> elements. For example, the Papers of Thomas Jefferson identify a sequence of documents in volume 19 thus:
<head><sender>The Secretary of State</sender> to <addressee>the President</addressee></head> <!--* 9 Dec. 1790 *--> ... <head><sender><person>Tobias Lear</person></sender> to <addressee>the Secretary of State</addressee></head> <!--* 26 Jan. 1791 *--> ... <head><sender><person>Thomas Jefferson</person></sender> to <addressee>the Rev. <person>William Smith</person></addressee> <!--* 19 Feb. 1791 *--> ...
Within <signed> elements, marking up names is optional, since the signer will normally have been identified already as the <sender>. (In cases where the signer and the sender are different, it is highly recommended to tag the name of the signer as a <person>, if appropriate.) For example, the following is acceptable:
<signed><person>Henry Laurens.</person></signed>but so is this:
<signed>Henry Laurens.</signed>When the signature does not consist of a personal name, it should of course not be marked as a <person>:
<signed>Your Dear Father</signed>but it may (optionally) be marked as a reference to a person:
<signed><personRef>Your Dear Father</personRef></signed>
When names are marked up manually instead of automatically or semi-automatically, and when the documents in question are short (say, a thousand words or fewer), some projects will prefer to mark a name only on its first occurrence within a document, in order to reduce the cost of marking up the names. In general, we believe it is preferable to mark names on each occurrence, since marking only some occurrences of names leads to inconsistent and inexplicable behavior in interactive interfaces: users will complain (for example) that when they click on some occurrences of a name, they can jump to a biographical dictionary; but when they click on other occurrences of the same name, nothing happens. In the long run, the cost of markup can be reduced more effectively by using appropriate automated tools than by marking names selectively. In the short run, projects which cannot find or use such tools may find themselves forced to practice selective markup of names; we recommend, in such cases, that at least the first occurrence of each name in each document be marked.
Within datelines, it is recommended that the date be marked as a date, using the <date> element. Elsewhere, dates and times may optionally be marked as such, using the <date> and <time> elements:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:291 *--> <p>County Woman's Rights Conventions will be held as follows, to discuss all the reasons that impel Woman to demand her Right of Suffrage. At <list> <item><place>Bath, Steuben Co.</place> <date value="1855-01-05">Friday, Jan. 5</date>.</item> <item><place>Elmira, Chemung Co. </place> <date value="1855-01-08">Monday, Jan. 8</date>.</item> <item><place>Penn Yan, Yates Co.</place> <date value="1855-01-10">Wednesday, Jan. 10</date>.</item> <item><place>Canandaigua, Ontario Co.</place> <date value="1855-01-12">Friday, Jan. 12</date>.</item> <item><place>Rochester, Monroe Co.</place> <date value="1855-01-15">Monday, Jan. 15</date>.</item> <item><place>Albion, Orleans Co.</place> <date value="1855-01-17">Wednesday, Jan. 17</date>.</item> <item><place>Lockport, Niagara Co. </place> <date value="1855-01-18">Thursday, Jan. 18</date>.</item> <item><place>Buffalo, Erie Co.</place> <date value="1855-01-19">Friday, Jan. 19</date>.</item> <item><place>Warsaw, Wyoming Co. </place> <date value="1855-01-22">Monday, Jan. 22</date>.</item> <item><place>Geneseo, Livingston Co.</place> <date value="1855-01-24">Wednesday, Jan. 24</date>.</item> <item><place>Batavia, Genesee Co.</place> <date value="1855-01-26">Friday, Jan 26</date>.</item> </list> <p>The first sessions will commence at <time>1 o'clock p.m.</time>; the second at <time>7 o'clock, evening</time>.</p>
Both of these elements bear a value attribute, for giving a normalized form of the date or time, including the year supplied by the editor. Within datelines or document headers, it is recommended that dates always be given a normalized value for the value attribute; this makes possible better searching, indexing, and processing of documents based on their dates. In other contexts, such normalization is optional.
The TEI <date> and <time> elements are discussed in TEI P3, section 6.4.4.
When the transcription policy of a project is to record insertions and cancellations in the text of a document, they should be marked using the <add> and <del> elements. When a project's editorial policy requires that partial transcriptions and omissions of original material be marked, then such omissions should be marked with the <gap> element.
For example, this extract from the Daily Advertiser reporting on the actions of Congress on 20 May 1789 does not contain the beginning of the Daily Advertiser's text; the gap is marked with an asterisk in the printed edition. In the electronic edition, the <gap> element is empty. The asterisk or some other signal for the gap should be produced by the style sheet, not transcribed as if it were part of the original document.
<!--* Doc. Hist. of the First Federal Congress, 10:718 *--> <docBody> <p><gap/></p> <p>The house then resolved itself into a committee of the whole on the order of the day.</p> <p><person reg="Boudinot, Elias"> Mr. B<sCap>oudinot</sCap></person> brought forward a plan for the arrangement of the executive departments. He introduced it ... </p> </docBody>
In this diary entry, Elizabeth Cady Stanton has added the word get above the line, after originally leaving it out:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony *--> <p>Saturday went to <place>Olean</place>, could not <add>get</add> a Church, School House or Academy to speak in, The Landlord <person key="Comstock, John K."> Mr. Comstock</person> gave the use of his Dining Hall for Sunday evening— had the room filled— ... </p>
In this passage, Henry Laurens deletes a phrase already written in order to write a correction.
<!--* Papers of Henry Laurens 10:317-319 *--> <!--* The markup here is simplified. *--> <salute>My Dear Brother</salute> <p>My late Letters to you have been <add>dated</add> the 24th. June per <ship>Rabbit</ship>, Capt Fraser. 2d. <del>19th.</del> July per <del><ship>Sandwich</ship> Packet</del> <ship>Scorpion</ship> Man of War 19th. per <ship>Sandwich</ship> Packet.</p>
The <add> and <del> elements each may carry a hand attribute to indicate who made the addition or deletion if not the original author or scribe. If the addition or cancellation is in the same hand as the rest of the text, no hand attribute need be given; if it is in a different hand, the hand attribute should be used to indicate the identity of the hand.
Many documentary editions perform some simple normalizations of their material in the interests of legibility; practice varies among projects, and over time within projects, concerning the kinds of editorial interventions countenanced in the text, and whether such interventions are performed silently or are marked by brackets or other apparatus.
We make no recommendation about transcription policy, which is the responsibility of the individual project. When editorial interventions are made and it is desired to record them, the following element types should be used.
Note that these elements may also be used when editorial interventions are not made, but it is nevertheless desired to record the corrected or normalized form of a passage. Providing both the original and the corrected or normalized form makes it possible, in the electronic edition, to provide both a diplomatic text and a `clear text' of the document; this may be particularly useful for readers like high school students, undergraduates, or other lay readers not accustomed to the usual apparatus of documentary editions.
For example, editors might mark the erratic spelling of colonial texts thus (e.g. to avoid repeated requests to check the original to make sure the unusual spelling is no transcription error):
<!--* Papers of General Nathanael Greene 7:165-166 *--> I <sic>perswade</sic> my self when you consider the matter properly you will neither wish to resign or exercise command different from your commision; ...A corrected spelling might also be provided, if it were desired to offer a reading text with corrections:
<!--* Papers of General Nathanael Greene 7:165-166 *--> I <sic corr="persuade">perswade</sic> my self when you consider the matter properly you will neither wish to resign or exercise command different from your <sic corr="commission">commision</sic>; ...
Spelling regularization, as opposed to correction, involves the substitution of standard spellings for variant ones (e.g. the imposition of American or British spelling); this is less common in documentary edition than correction of errors, but when it is performed, it may be marked using the <reg> (regularized) and <orig> elements, which are used in much the same way as the <corr> and <sic> elements just described. Probably the most common use of normalization is for names which would otherwise be unrecognizable, as in this dateline. The document (Nathanael Greene to Gen. Daniel Morgan, 19 January 1781) gives the place of writing as "Camp near Cain Creek"; the editor supplies the normal spelling in square brackets: "Camp near Cain [Cane] Creek". One way to tag this dateline is this:
<dateLine> <place>Camp near <orig reg="Cane">Cain</orig> Creek</place>, <date>January 19, 1781</date> </dateLine>
Material supplied, rather than normalized, by the editor (like Charles in the example below) are often also printed in square brackets. Such material should be tagged as <supplied>.
<!--* Papers of General Nathanael Greene, 7:162-63 *--> <p>There is one Company of Artillery belonging to the Continental Regiment command'd by <person>Col <supplied>Charles</supplied> Harrison</person> little more than half full. There is also some State artillery but their time of service is out in a day or two. </p>
The <damage>, <unclear>, <gap>, and <supplied> elements allow a precise distinction among various distinct states of affairs. A portion of a page of a letter may be water-damaged, for example; that section of the document may be tagged with the <damage> element. Part of the damage may be minor enough to leave the text clearly legible; where the damage is severe, the text may be difficult to read. Those passages should be tagged <unclear>. A few spots of particularly bad damage may leave the text completely illegible; the paper itself may have holes from damage of one kind or another. Those spots will entail either gaps in the transcription (which should be marked as <gap>), or else an attempt by the editor to supply the missing text, from context or from other evidence (which should be marked as <supplied>).
In TEI P3, the <sic> and <corr> element types are discussed in section 6.5.1; <reg> and <orig> are treated in section 6.5.2. The <gap> and <unclear> elements are described in sections 6.5.3 and 18.1; <supplied> is discussed in 18.3.
Quotations from other sources, and other uses of quotation marks, may optionally be distinguished from each other by the use of the following elements:
For example, an editorial note to Elizabeth Cady Stanton's note of June 1852 to Gerrit Smith quotes Smith's remarks on the Hungarian leader Louis Kossuth:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> Smith characterized <person reg="Kossuth, Louis">Louis Kossuth</person> as <q>a patriot ... who, instead of being absorbed with his individual interests, carries in his patriotic and sympathising bosom the interests of a whole nation.</q> But the patriot, he wrote, is not <q>the summit of human excellence</q>; that position is held by <q>the philanthropist,</q> whose <q>country is the world</q> and <q>countrymen mankind.</q> (<bibl><title>Kossuth</title></bibl>. <bibl>Gerrit Smith to Frederick Douglass, 25 May 1852, broadside, Smith Papers, NSyU</bibl>.)
The <q> and <quote> elements are distinct in order to allow projects to distinguish, if they wish, between actual quotations with identifiable sources on the one hand (<quote>) and feigned quotations or quotations which cannot be verified on the other hand (<q>). Direct discourse in narratives should always be tagged <q>.
If quotations are not marked as such using the elements described here, it is recommended that quotation marks be transcribed using the appropriate SGML entities:
The actual character codes used to distinguish among these forms vary from computer system to computer system; we therefore recommend the use of SGML entities, which will survive transmission from system to system more reliably than raw character codes.
The note quoted above would look like this if the entities were used instead of the <q> element:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> Smith characterized <person reg="Kossuth, Louis">Louis Kossuth</person> as “a patriot ... who, instead of being absorbed with his individual interests, carries in his patriotic and sympathising bosom the interests of a whole nation.” But the patriot, he wrote, is not “the summit of human excellence”; that position is held by “the philanthropist,” whose “country is the world” and “countrymen mankind.” (<bibl><title>Kossuth</title></bibl>. <bibl>Gerrit Smith to Frederick Douglass, 25 May 1852, broadside, Smith Papers, NSyU</bibl>.)
The <q>, <quote>, <cit>, and <soCalled>, elements are described in TEI P3, section 6.3.3. The <gloss> element is discussed in 6.3.4.
It is recommended that font shifts be recorded in all cases. Where economically and intellectually feasible it is further recommended that different cases of font shifts be distinguished, using the following elements:
In a letter to Moses Brown on 22 October 1787, Samuel Hopkins uses italics twice; to convey the information that the Hebrew name Achan is italicized because it is a foreign word, while the word them is italicized because it is emphatic, one might tag the relevant sentences thus:
<!--* A Necessary Evil?, p. 73 *--> <p> ... Some of the southern delegates no doubt, insisted upon it that the introduction of slaves should be secured, and obstinately refused to consent to any constitution, which did not secure it. The others therefore consented, rather than have no constitution, or one in which the delegates should not be unanimous. I fear that is an <foreign lang="heb"> Achan</foreign>, which will bring a curse, so that we cannot prosper. ...</p> <p>It has been objected by some of the ministers against prefering a memorial to the <org>General Assembly</org> respecting the Slave trade; That the present ruling part in the Assembly, have appeared to be so destitute of all principles of justice, or regard to it; and have acted such an iniquitous part, that there is an impropriety in applying to <emph>them</emph> for justice; especially for the ministers of the Gospel to do it, whom they hold in the highest contempt, ... </p>
Alternatively, if it were felt that eighteenth-century usage of italics is too erratic to sustain the interpretations just offered (or if it were economically infeasible to distinguish all the various uses of italics in an edition), one might tag the text in a purely typographic fashion:
<!--* A Necessary Evil?, p. 73 *--> <p> ... I fear that is an <ital>Achan</ital>, which will bring a curse, so that we cannot prosper. ...</p> <p> ... that there is an impropriety in applying to <ital>them</ital> for justice; ... </p>
Distinguishing font shifts by cause, as recommended here, has the advantage of making possible more sensitive searching by readers of an edition. In general, it is recommended to identify technical terms, foreign words or phrases, etc., only when they are italicized, underscored, or otherwise given special visual treatment.
Projects may optionally choose to mark other technical terms, foreign words, etc., even when these are not typographically distinct from the surrounding text. In such cases, the rend attribute should always be used to indicate that the material so marked is not typographically distinct.
The <emph> and <hi> elements should be distinguished carefully: <emph> marks rhetorical emphasis or stress, not merely typographic emphasis. The generic tags for all kinds of typographic emphasis are <hi> and the others in the second list above.
TEI P3 discusses these elements in section 6.3.1 and 6.3.2.
In certain circumstances, it may be desirable to mark the occurrences of some other textual features; the use of the elements described in this section is strictly optional.
For example, in a letter from Emma Goldman to Margaret Sanger:
<!--* Margaret Sanger Papers, document 106565 *--> Let me know what your agent's prices are. You can send me another <num value="100">hundred</num> by express to this city, and if you have sufficient copies on hand you may ship <num>200</num> to Denver ...or in a letter from Alexander Innes to Henry Laurens:
<!--* Papers of Henry Laurens 10:361 *--> I am Sir Your Most Obed<abbr expan="ient">t</abbr> & most H<abbr expan="umble">ble</abbr> Serv<abbr expan="ant">t</abbr> Alex<abbr expan="ander">:</abbr> Innes
In some cases, it may be desirable to mark a word or phrase even though the MEP encoding scheme provides no element type for marking the particular feature of interest in the word or phrase. The <seg> element may be used to mark such phrases:
<!--* Papers of Henry Laurens 10:361 *--> ... those few arms I possess (which are only such as Gentlemen generally have to protect them from <seg id="s388">insult</seg>) ...
The TEI <num> element is described in TEI P3, section 6.4.3. The <abbr> element is described in section 6.4.5. and again in 18.1.2. The main discussion of the <seg> element occurs in section 14.3.
Our discussion so far has concentrated on letters, which are one of the document types most frequently encountered in documentary editions. Other types of document do appear, however, and can be encoded using MEP markup. The following discussions mention some other document types and special problems they present.
Without exception, documents of these types should be tagged as <doc> elements in the same way that letters should be. The paragraphs of the text should be tagged <p>, as shown above, etc. The main differences occur in the material commonly encountered at the beginning and end of the documents; less frequently there are differences in the kinds of paragraph- and phrase-level elements encountered.
Newspaper articles should be transcribed as running prose; of particular interest may be the following element types:
Reports, essays, opinions, pamphlets, etc., should be tagged in the same way as newspaper articles. In addition to the element types described elsewhere, material intended by its authors for publication may contain specialized material, especially at the beginnings of sections:
When reports or essays have internal structures of sections and subsections, these internal structures should be marked; see below.
Account books pose a special problem because of their intricate structure and because of the wide variation in the methods of keeping accounts over the course of history. No general advice can be given, beyond the obvious one that account books may often best be presented in tabular form; see section 11.1 Tables and Figures.
Textual divisions in larger texts (chapters, sections, etc.) should always be identified if they are clearly marked in the original document; the following elements should be used:
section) to be identified
The structure of a document may be marked using the single generic <div> element, which can nest within itself (so that subsections within sections are marked by <div> elements occurring within other <div> elements). Alternatively, the structure may be marked using the set of numbered divisions from <div0> or <div1> to <div7>: <div2> elements may occur only within <div1> elements, <div3> elements may occur only within <div2> elements, etc. The choice between these two styles is wholly at the option of the project: some software works better with one form, some with the other form; some manual operations are easier with the one form, some with the other. In general, if there is not a compelling reason for choosing the generic <div> element, most editors new to SGML find the numbered levels slightly easier to work with.
In the case of particularly complex or elaborate published documents, it may be desirable to use the standard TEI elements <text>, <front>, <body>, <back>, etc. to mark the internal structure of the document. See the TEI Guidelines for full details of how to use those element types.
The TEI tags for text divisions are described in TEI P3, section 7.1; headers and trailers are described in section 7.2.1.
Some editions print abstracts or regests of documents, either in addition to full transcriptions or (more usually among American historical documentary editions) instead of full transcriptions. Such abstracts should be distinguished from full transcriptions by tagging them them with the <surrogate> element:
The internal structure of a <surrogate> element is the same as that of a <doc> element: it has a document header (<mepHeader> or <teiHeader>, editorial front matter, a document body (<docBody>), and editorial back matter.
<!--* Papers of General Nathanael Greene, 7:162 *--> <surrogate id="NG07162A"> <mepHeader> ... <prepDate> 98-05-22 ah </prepDate> <idno>ng07162a</idno> <copyright>Copyright 1994. University of North Carolina Press. All rights reserved. </copyright> <docTitle><titlePart> Nathanael Greene to Robert Gillies, 20 January 1781 </titlePart></docTitle> </mepHeader> <pb n="162"/> <head>To <person>Robert Gillies</person>.</head> <dateLine> From <supplied><place>Camp on the Pee Dee, S.C.</place>, <date>20 January 1781</date></supplied>. </dateLine> <docBody> <p>Has <person>Gillies's</person> letter of <date value="1781-01-18">18 January</date>. Thinks his asking price for the salt is <q>exceeding high,</q> but does not know <q>the state of commerce</q> and will not decide the matter. When he learns how much money <person>Gen. <supplied>Horatio</supplied> Gates</person> drew on <place>Maryland</place> and whether the army can <q>provide</q> tobacco, he will have someone settle with Gillies <q>upon terms of equallity.</q> </p> </docBody> <sourceNote> Autograph draft signed (MiU-C) 1 p. </sourceNote> </surrogate>
See the accompanying set of example documents for examples of fully tagged document surrogates.
Image editions (the electronic equivalent of microfilm editions) frequently prepare highly structured surrogates for their documents, giving sender, recipient, and dateline information for letters, and author, title, date information for other documents, together with some information about the subjects treated in the document and individuals or things mentioned in it by name. Such surrogates are sometimes called targets because in some projects they are placed in the photographic field together with the document they describe, and may be used as a focusing target to help ensure that the camera is focused properly.
The MEP markup for targets is based on the structure of the targets used by the Margaret Sanger Papers project, with some modifications to handle targets prepared by the Papers of Elizabeth Cady Stanton and Susan B. Anthony. It is plausible to think it may be useful for targets prepared for other editorial projects, but no claims are made that the element types described here suffice for all, or even most, materials of this kind. The MEP element types used in targets include:
Within notes and references, the elements <person>, <place>, <org>, and <supplied> may be used in the same way as in the transcription of a document.
<!--* Margaret Sanger Papers, document 106565 *--> <targets id="MS106565"> <mepHeader> <prepDate>97-06-23 rg, cm</prepDate> ... <prepDate> 98-05-05 RG </prepDate> <docTitle><titlePart> Emma Goldman to Margaret Sanger, April 9, 1914 </titlePart></docTitle> <docAuthor>Emma Goldman</docAuthor> </mepHeader> <target> <series>SERIES I (Subseries 1 - Correspondence)</series> <idno type='MSP'>106565</idno> <title><person>Emma [Goldman]</person>; Letter to Margaret Sanger </title> <date>April 9, 1914</date> <place> Chicago, [IL]</place> <extent> 1 frame(s).</extent> <permissions> <repository>Library of Congress, Manuscript Division.</repository> Collection: Margaret Sanger Papers. </permissions> <notes> <sourcetype>Typed Letter Signed.</sourcetype> <note>Author of margin notes and corrections not identified. </note> </notes> <person>Julia May Courtney.</person> <figure entity="i1065651"></figure> </target> </targets>
The MEP encoding scheme distinguishes several different types of annotation, based on function.
Headnotes, when provided, should be marked as such:
Every documentary edition identifies the repository holding a document and provides some descriptive information in a source note. The source note may be given before or after the document body; it should be tagged using the <sourceNote> element:
<!--* Selected Papers of Elizabeth Cady Stanton and * Susan B. Anthony 1:291 *--> <sourceNote> <title>New York Daily Tribune</title>, 5 January 1855. </sourceNote>
In some editions, the source note also contains a physical description of the document (size, paper type, ink); in some the address and docketing, if any, are transcribed as part of the source note rather than as part of the document itself. For example:
<!--* Papers of Henry Laurens, 10:317-319 *--> <sourceNote> ALS, HL Papers, ScHi; addressed on cover <endorsement> <person>M<super>r.</super> James Laurens </person> / at the <place>Carolina Coffee House</place> / <place>London</place>. / per <ship rend="roman">Eagle</ship> Packet_ </endorsement>; dated "<place>Charles Town So Carolina</place> <date value="1775-08-20">20. August / 1775</date>."; docketed <docketing> <person>Henry Laurens</person>_ / Cha<super>s.</super> Town 20 August 1775 </docketing>. LB, HL Papers, ScHi; addressed <endorsement> James Laurens / London / per Eagle Packet</endorsement>; dated "20<super>th.</super> August 1775.". </sourceNote>
Footnotes and endnotes, if any, should be tagged using the <note> or <endnote> element:
There are two main methods of attaching notes to documents. In the first, the note (encoded as a <note> element) is embedded in the text of the document at the point of reference; this is the method usual in most document production systems. In this case, the hyperlink link between the note and the text annotated is accomplished implicitly, by the location of the <note> element. For example:
<!--* A Necessary Evil?, p. 73 *--> <p>... Some of the southern delegates no doubt, insisted upon it that the introduction of slaves should be secured, and obstinately refused to consent to any constitution, which did not secure it. The others therefore consented, rather than have no constitution, or one in which the delegates should not be unanimous. I fear that is an <person rend="ital">Achan</person>, which will bring a curse, so that we cannot prosper.<note id="RC0273N2" n="2"> Achan's actions brought the wrath of God upon the people of Israel (<bibl>Joshua 7</bibl>).</note> At the same time it appears to me that if this constitution be not adopted by the States, as it now stands, we shall have none, and nothing but anarchy and confusion can be expected.—I must leave it with the Supreme Ruler of the universe, who will do right, and knows what to do with these States, to answer his own infinitely wise purposes; ... </p>
In the second, the note (encoded either as a <note> or as an <endnote>) is located at the end of the document, after the <docBody> element and <sourceNote>. The footnote number is transcribed as a <ref> element:
<!--* A Necessary Evil?, p. 73 *--> <p>... I fear that is an <person rend="ital">Achan</person>, which will bring a curse, so that we cannot prosper.<ref type='fnref'>2</ref> At the same time it appears to me that if this constitution be not adopted by the States, as it now stands, we shall have none, and nothing but anarchy and confusion can be expected.—... </p> ... <endnote n="2">Achan's actions brought the wrath of God upon the people of Israel (<bibl>Joshua 7</bibl>).</endnote>
In this case, the link between the note and the text annotated must be accomplished by means of hyperlinking a footnote reference (encoded as a <ref> element) and the note itself. For an example, see section 11.3 Hyperlinking.
Which form is preferable depends on the individual circumstances of the project and the personal preferences of those responsible. No general recommendation is made here as to the choice.
The TEI <note> element, which corresponds to all the specialized forms of notes described in this document, is described in section 6.8 of TEI P3.
In most editions, bibliographic references are most common within editorial annotations, and so we cover this topic at this point. When bibliographic references occur in the text of a document, they can and should be encoded using the same element types:
For full scale bibliographies and lists of sources, use the <listBibl> described below in section 9.4.2 Bibliographies.
In general, it is recommended that titles be marked as such, and that the level attribute be used to distinguish the kind of item whose title is being given. This distinction is essential to allow display or typesetting software to italicize book titles, give series titles in Roman, and put titles of articles, poems, and unpublished works in quotation marks, in accordance with conventional publishing practice.
The other elements, in particular <author>, may be useful in allowing the reader of an edition to perform fine-grained searches, but their use is optional. The following example shows two different ways of tagging the same bibliographic reference; both are acceptable.
<!--* Papers of Henry Laurens 10:317-319 *--> <bibl>E. Milby Burton, <title>Charleston Furniture, 1700-1825</title> (Columbia, S.C., 1955), p. 114. </bibl> <bibl><author>E. Milby Burton</author>, <title>Charleston Furniture, 1700-1825</title> <imprint>(<pubPlace>Columbia, S.C.</pubPlace>, <date>1955</date>), <biblScope>p. 114</biblScope>. </bibl>
TEI markup for bibliographic references is discussed in section 6.10 of TEI P3.
The <head>, <headnote> if any, <sourceNote> (if it precedes the document), and <dateline> element may be grouped together as editorial front matter, before the transcribed text of the document itself; a trailing source note and endnotes may similarly be grouped together as editorial back matter.
These elements are used for editorial matter relating to individual documents; for front and back matter relating to an edition or a volume as a whole, see section 9 An Edition or Volume as a Whole below.
In normal circumstances, there is no need to use these elements; they are present in the encoding scheme only to handle special cases which arise rarely, and to simplify the translation from the `data capture' tag set described here to the `archival' encoding scheme also defined by MEP. In the archival scheme, grouping elements like these are consistently used to simplify processing.
Front and back matter in general are described in TEI P3 in section 7.4 and 7.6; the TEI does not define a distinctive element type for editorial front and back matter, as opposed to other front and back matter.
Cross references to other documents are a common feature of editorial annotation; for purposes of markup, such cross references constitute one particular form of hyperlink. For the encoding of hyperlinks of all kinds, see below, section 11.3 Hyperlinks.
The discussion so far has concentrated on the encoding of individual documents together with their accompanying editorial apparatus. A full edition, however, consists of more than a series of documents: at the least, there is usually front and back matter for each volume of the edition, and in some editions related documents may be clustered into series or groups with a headnote for the entire series. This section of this document describes the markup for the overall structure of an electronic edition, and introduces some element types used specifically for front and back matter.
There are two basic ways to organize an electronic edition, which we call the big book approach and the little book approach. To understand these terms and the approaches they designate, we need to understand some basic facts about SGML and the current technology of the Internet.
The formal definition of SGML defines the SGML document (sometimes called the SGML document instance to distinguish it from the document type) as the fundamental unit of data. In many ways, this is purely a convenient formal assumption which provides a common denominator for software, without restricting what software can actually do. SGML systems can, in theory and in practice, deal with units larger than single SGML documents, i.e. with collections of documents, as well as with units smaller than single documents, i.e. with document fragments. Moreover, the designer of an SGML-based encoding scheme can choose more or less freely to use SGML documents of greater or lesser size.
In the case of historical editions, we can choose to treat each historical document as a separate SGML document, so that the edition as a whole is, formally, a collection of SGML documents. Or we can choose to treat the entire edition as a single large SGML document, within which each historical document is represented by an SGML fragment consisting of a single <doc> element. The former approach we call the little book approach (the edition is a collection of small SGML documents); the latter we call the big book approach (the edition is a single large SGML document).
While theorists of markup may be able to discern subtle theoretical differences between these two approaches, for the most part the choice between them can be, and should be, made on practical grounds. While software can in principle deal with SGML documents, or with collections of documents, or with document fragments, nevertheless in practice any piece of software is likely to be able to do some things better with collections, or other things better with individual documents. With current editors it is usually easier to edit a full SGML document than a document fragment; with current Internet software it is usually easier to deliver an entire SGML document to the user than either an SGML fragment or a collection of SGML documents.
Our recommendation is simple: during the preparation of a volume or an entire edition, it will usually be more convenient for an editorial project to use the little-book method and to encode each historical document as a separate SGML document. At publication time, however, it will usually be more convenient to use the big-book approach, and gather all the historical documents in the edition or volume into a single SGML document.
In the little-book approach, any front or back matter for the edition as a whole (e.g. an introductory essay, a title page for the entire edition, or an index to the edition) must be encoded as a free-standing SGML document. Since there is no standard way to define interrelationships within a group of independent documents, publication on the little-book model must use whatever methods are provided by the software at hand to make clear the relationships among the edition-wide front and back matter and the documents in the edition itself.
In the big-book model, the front and back matter for the edition stands in exactly the same relation to the edition itself as the editorial front and back matter for an individual document stands to the document: it is part of the same SGML document and precedes or follows the material it relates to. (Note that being in the same SGML document does not mean all the material in an edition needs to be in the same file; the standard SGML and XML mechanisms of external entities and entity references can be used to put together a single SGML document out of many distinct files.)
The remainder of this section describes the markup for the big-book model of editions.
At the top level, an electronic edition consists of a single <tei.2> element, which contains a TEI header and a <text> element. The <text> element in turn consists of front matter (grouped into a <front> element), the documents of the edition (grouped into a <docGroup> element), and back matter (in a <back> element).
For example, the overall organization for the Selected Papers of Elizabeth Cady Stanton and Susan B. Anthony might take the following form:
<tei.2> <teiHeader> ... <!--* header for entire edition *--> </teiHeader> <text> <front> <titlePage> ... <!--* title page of edition *--> </titlePage> ... <!--* other front matter *--> </front> <docGroup> <doc> ... </doc> <doc> ... </doc> ... </docGroup> <back> ... <!--* appendices, index, etc. for edition as a whole *--> </back> </tei.2>
TEI P3 discusses the general problem of compound documents, of which documentary editions are one example, in section 7.3; the MEP <docGroup> element corresponds to the TEI <group> element described there. The MEP <doc> element corresponds in most ways to the TEI <text> element, but unlike <text> it can contain a header element.
The front matter for a volume or edition is no different, in its fundamental structure, from the front matter for any work. The most prominent special feature is the table of contents; other parts of the front matter, such as a preface or introduction or acknowledgment, should be encoded as text divisions using the <div> or the <div1> element described above in section 6.2 Text divisions. The elements described here may be used at the top level to encode the front matter for an entire edition; they may equally be used to encode the front matter of an individual document which was originally published in book or pamphlet form, with its own front and back matter.
Editors sometimes treat documents not individually but as a group of related documents, in order to provide common background information and annotation. Such groups may be kept fairly small, each containing no more than ten or twelve documents; they may also be made fairly large, and used to provide a large-scale organization for an edition.  In some cases, there may even be groups within groups.
See also the discussion of enclosures in section 11.1 Enclosures.
Back matter, like front matter, may consist largely of textual material, which should be encoded as a <div> or a <div1>. Some back-matter material does require specialized tagging; the back of the book index is treated below in section 9.4.1 Indices, and the bibliography is treated in section 9.4.2 Bibliographies.
The generation of electronic indices analogous to traditional back-of-the-book indices remains an open problem; in the meantime, however, it is possible to recommend certain specialized element types for encoding such indices for use in electronic form:
The actual references to documents or passages within the index can be encoded in any of four ways, depending on whether the index is new or an electronic reproduction of an already existing print index, and depending on whether the form of reference can be generated automatically or must be specified manually.
The <docRef> and <docPtr> elements are intended for use in indices created for electronic publication: they should point to a particular document or a passage in a document without reference to the pagination of a particular printed version of that document. For example, using <docPtr> (which assumes the display software will generate an appropriate short-reference for the document in question), a simple index entry might look like this:
<bobIndex> <ixe> <entry><place>Albion, Orleans County, NY</place></entry> <docPtr target='SAD52A'/> </ixe> </bobIndex>Using <docRef>, it is possible to specify what form the short reference should take:
<bobIndex> <ixe> <entry><place>Albion, Orleans County, NY</place></entry> <docRef target='SAD52A'>Ann. by SBA 5 Jan 1855</docRef> </ixe> </bobIndex>
The <pgPtr> and <pgRef> elements are intended to simplify the re-use of existing indices which use page numbers, not references to specific documents; in normal practice they will be hyperlinked to the <pb> elements which mark, in the electronic edition, the page boundaries of the printed edition. The <pgPtr> and <pgRef> elements should not be used for the creation of new indices; they should be used solely for the retrospective conversion of existing page-oriented indices. For example:
<bobIndex> <ixe> <entry>China</entry> <pgref>63-64n2</pgref> <pgref>265n2</pgref> </ixe> </bobIndex>
The <docRef> and <pgRef> pointers, like the <ref> element described below (section 11.3 Hyperlinks) contain the explicit textual form in which the index link should be presented to the reader. The <docPtr> and <pgPtr> elements are empty elements, like the <ptr> element described below. They are intended for use with systems which will generate the appropriate replacement text as part of processing the SGML document. The ptr style is easier to use and maintain, because it adjusts automatically to changes in the edition or to changes in the style desired for such document cross-references. At the present time, high-end systems are typically able to use the ptr-style hyperlinks, while low-end systems require the use of ref-style hyperlinks. The ptr style should be used for the editorial preparation of the material, and translated to ref style at publication time, if necessary.
A fragment of an index encoded using these elements might look like this:
<bobIndex> ... <ixe> <entry>China</entry> <pgref>63-64n2</pgref> <pgref>265n2</pgref> </ixe> <ixe> <entry><ship>China</ship></entry> <pgref>226n3</pgref></ixe> <ixe> <entry><title>The Choice Humorous Works of Mark Twain</title> (1873, 1874)</entry> <pgref>168n7</pgref></ixe> <ixe> <entry>Cholmondeley, Mary</entry> <pgref>434</pgref></ixe> <ixe type="correspondent"> <entry>Cholmondeley, Reginald</entry> <pgref rend="bold" type="main-id">432-34</pgref> <pgref>522n2</pgref> <pgref>657<hi>illus</hi></pgref> <subs> <ixe><entry>letter to</entry> <pgref>434</pgref></ixe> <ixe><entry>letters by</entry> <pgref>432-34</pgref></ixe> </subs> </ixe> ... </bobIndex>
Bibliographies are simply lists of bibliographic references. Each reference should be encoded as a <bibl> element, as described above in section 8.4 Bibliographic Citations. The bibliography as a whole should be encoded as a <listBibl>:
In normal practice, the <listBibl> will be enclosed within a <div> or <div1> in the back matter of the edition. In cases where the bibliography needs to be subdivided, the <div> or <div2> element should be used to subdivide the enclosing <div> or <div1> element, and so on. Each of the smallest textual divisions should contain a <listBibl>.
Reference to related elements and to TEI P3 ...
Good editorial practice requires that each transcription of a historical document be accompanied by information allowing a reader to learn what the document is, who transcribed it, when, what repository holds the document, etc. Without such metadata (data about data), no electronic document can really be fully evaluated.
We recommend strongly that even for project-internal work, every historical document be accompanied by either a formal TEI header or by the less formal MEP header described here.
The MEP header can contain the following elements:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> <mepHeader> <prepDate>cm 20 Nov. 1996;QA 2 Dec. 96 LG</prepDate> <prepDate>rg 17 March 97; QA 24 Mar.97/cm</prepDate> <prepDate>rg 9 April 97, Level 3; QA 4/14/97</prepDate> <prepDate>rg 97-05-16, QA 97-05-21</prepDate> <prepDate>97-06-18 cm</prepDate> <prepDate>97-06-26 rg</prepDate> <prepDate>97-07-13 rg</prepDate> <prepDate>97-08-22 rg</prepDate> <prepDate>97-10-08 ah</prepDate> <prepDate>97-10-27 rg</prepDate> <prepDate>97-10-31 rg</prepDate> <prepDate>97-11-05 ah</prepDate> <prepDate>97-11-06 ah</prepDate> <prepDate>97-11-20 ml</prepDate> <prepDate>98-01-09 rg</prepDate> <prepDate>98-04-07 rg</prepDate> <prepDate>98-04-28 ml</prepDate> <prepDate>98-05-04 rg</prepDate> <prepDate>98-07-01 ml</prepDate> <prepDate>98-07-08 rg</prepDate> <prepDate>98-09-04 ml</prepDate> <idno>sad06</idno> <docTitle><titlePart> ECS to Gerrit Smith, with Enclosure, 6? June 1852 </titlePart></docTitle> <sender>Elizabeth Cady Stanton</sender> <addressee>Gerrit Smith</addressee> <docDate>6 June 1852</docDate> </mepHeader>
<!--* Doc. Hist. First Fed. Congress 10:718 *--> <mepHeader> <prepDate>cm 97-01-23; QA 97-01-24 / lg</prepDate> <prepDate>cm 97-02-24</prepDate> <prepDate>rg 97-04-09; QA 97-04-11</prepDate> <prepDate>rg 97-05-13; QA 97-05-23</prepDate> <prepDate>97-08-28 rg</prepDate> <prepDate>97-09-26 ml</prepDate> <prepDate>98-02-05 ah</prepDate> <prepDate>98-02-10 rg</prepDate> <prepDate>98-04-14 rg</prepDate> <prepDate>98-05-27 mm</prepDate> <prepDate>98-07-06 ml</prepDate> <prepDate>98-07-15 rg</prepDate> <idno>FC10718</idno> <docTitle><titlePart> <title>The Daily Advertiser</title>, 20 May 1789</titlePart></docTitle> <docDate>20 May 1789</docDate> </mepHeader>
The TEI header may also be used to provide meta-information; it is fully described in the TEI Guidelines and will not be discussed further here.
The TEI header is described in chapter 5 of TEI P3.
So far, we have concentrated on fairly straightforward situations, in order to keep the exposition simple. Documentary editions, however, provide ample illustration for Joseph Bédier's axiom that "All cases are special cases." To be useful for real editions, any encoding scheme must handle a number of problems we have not yet addressed. Those editors whose materials do not exhibit the problems described here may skip the sections not applicable to them.
When a document (typically a letter) contains enclosures of other documents, some editions make it a matter of policy to print those other documents at their normal place in the chronological ordering, providing a cross reference from the letter in which they were enclosed. Other editions print them together with the letter in which they were enclosed. Others decide on a case by case basis. When enclosures are printed as separate documents in the edition, no special markup is needed; the cross reference needed will use the same cross-reference markup as any cross reference. When enclosures are printed together with their `host' document, the following element types should be used to mark them up:
The overall structure of the markup of a document with an enclosure is illustrated by this letter from Elizabeth Cady Stanton to her cousin Gerrit Smith:
<!--* Papers of Elizabeth Cady Stanton and Susan B. Anthony, * ECS to Gerrit Smith 6? June 1852 *--> <doc id="SAD06"> <mepHeader> ... </mepHeader> <head><person reg="Stanton, Elizabeth Cady">ECS</person> to <person reg="Smith, Gerrit">Gerrit Smith</person>, with Enclosure</head> <dateLine>[<place rend="ital">Seneca Falls</place>] Sunday eve, [<date rend="ital">6? June 1852</date>] </dateLine> <docBody> <salute>Dear <person>cousin Gerrit</person>,</salute> <p>I read your letter on <person reg="Kossuth, Louis"> Kossuth</person> & like it very much. ... Can you give me two more copies of that letter. I am glad you are to be at the state temperance meeting. I think you will find them prepared to pass such a resolution as you offered a year ago. ... <p>I send you four resolutions. I wish you would embody the ideas in your expressive language & present them to the convention for our society. ... Your cousin</p> <closing><signed><person> E. C. S. </person></signed></closing> <ps><p>Much love to all.</p></ps> </docBody> <enclosure> <head>Enclosure</head> <docBody> <p>Resolved,— That inasmuch as man claims to represent woman, in all our national councils, we have a right to demand of him a wise legislation on the liquor traffic,— ... </p> <p>Resolved,— That drunkeness is a just ground of divorce, yea more, ... </p> <p>Resolved,— That it is the duty of our temperance host to dissolve all connexion with churches that wink at the hedious crimes ... </p> <p>Lastly resolved,— That it is your duty ... </p> </docBody> </enclosure> <sourceNote>ALS and AMs, ECS Papers, NjR. Variant of letter and enclosure, dated <date>25 May 1852</date>, in <title>Stanton</title>, 2:43. </sourceNote> <endnote id="SAD06N1" n="1" target="SAD06N1-ANCHOR" type="note"> 1. The letter anticipates a meeting on <date>17 June 1852</date> and ... </endnote> <endNote id="SAD06N2" n="2" target="SAD06N2-ANCHOR" type="note"> 2. Smith characterized <person reg="Kossuth, Louis"> Louis Kossuth</person> as ... </endNote> <endNote id="SAD06N3" n="3" target="SAD06N3-ANCHOR" type="note"> 3. The <org>Women's New York State Temperance Society</org> named <person reg="Smith, Gerrit">Smith</person> a delegate to the <org>State Temperance Society's</org> annual meeting at <place>Syracuse</place> on <date>17 June 1852</date>, but he did not attend.</endNote> </doc>
Tables, when they appear, should be encoded using the following element types:
Figures should be represented in any convenient graphic format (for photographs, JPEG format is highly recommended). The SGML encoding of the figure should use the following element types:
The page image associated with a microfilm target is often a particularly simple <image> element: just the element itself, without contents, and with a reference to the SGML entity name assigned to the image itself:
A more elaborate encoding of images is usual for illustrations which may have captions (encoded using <head> and <p>) and should normally also have alternate text descriptions (encoded using <figDesc>) for use when the visual display of the image is not technically feasible, or when the reader is visually handicapped. The image from Basler's edition of Lincoln reproduced near the beginning of this document, for example, is encoded as follows:
<figure entity="Basler"> <head>A sample historical document</head> <p>This is a letter from Abraham Lincoln as published in <title>The Collected Works of Abraham Lincoln</title>. ed. Roy P. Basler ([Springfield]: The Abraham Lincoln Association, 1953). </p> <figDesc>The figure shows a typeset page containing the document described below.</figDesc> </figure>
TEI P3 discussed tables and figures in chapter 22. Examples are given there.
When fragments of verse or drama appear in the documents (e.g. in quotations), they should be encoded using the following elements:
The use of these elements is as described in the TEI Guidelines in section 6.11 of TEI P3.
Hyperlinks are connections among passages in a text, in particular connections other than that of adjacency. The most common form of hyperlinks in historical documentary editions are cross references to other locations in the edition, references to other editions, and footnote references. The following elements should be used to encode hyperlinks:
The <ptr> and <ref> elements each bear a target attribute, which gives the ID of the element to which the link or cross reference is being made. The <xptr> and <xref> elements bear the attribute doc, which identifies the document being pointed at (if different from the document where the <xptr> or <xref> itself occurs); the specific location they point to is identified by the from and to attributes, which use the TEI extended-pointer notation to delimit the passage being linked to. (A full description of the TEI extended-pointer notation may be found in TEI P3, chapter 14.)
All four hyperlinking elements may carry a type attribute to indicate the type of link being made.
It is recommended that cross references within an edition use only
ID keyword of the TEI extended pointer notation:
references using this form of extended pointing are much more robust and
much easier to handle in software.
Any project preparing an electronic edition will do well to establish and keep to a simple system of naming for documents, their component paragraphs, and their notes. Such a naming discipline greatly simplifies creation and maintenance of hyperlinks.
When notes are not embedded at their point of attachment (see above, section 8.3 Footnotes and End Notes), it is necessary to hyperlink the footnote reference to the footnote, and often desirable to hyperlink the note to the reference, so that readers who wish to browse the endnotes can do so, and jump on demand to the point being annotated by a given note. The recommended method for creating these links is as follows:
<!--* A Necessary Evil?, p. 73 *--> <p>... Some of the southern delegates no doubt, insisted upon it that the introduction of slaves should be secured, and obstinately refused to consent to any constitution, which did not secure it. The others therefore consented, rather than have no constitution, or one in which the delegates should not be unanimous. I fear that is an <person rend="ital">Achan</person>, which will bring a curse, so that we cannot prosper.<ref id="RC0273N2-ANCHOR" n="2" target="RC0273N2" type="note">2</ref> At the same time it appears to me that if this constitution be not adopted by the States, as it now stands, we shall have none, and nothing but anarchy and confusion can be expected.—I must leave it with the Supreme Ruler of the universe, who will do right, and knows what to do with these States, to answer his own infinitely wise purposes; ... </p> ... <endNote id="RC0273N2" n="2" target="RC0273N2-ANCHOR" type="note">2. Achan's actions brought the wrath of God upon the people of Israel (<bibl>Joshua 7</bibl>).</endNote>
Note that some presentation systems will produce undesirable display results given the tagging just described; at publication time it may therefore be necessary to adjust the tagging. In the MEP samples, we have found it necessary to transcribe the note number at the beginning of the note itself and move the ID attribute from the <note> to the <ref> element. The end result is that the two <ref> elements, one in the text and one in the note, each point at the other, thus:
<!--* A Necessary Evil?, p. 73 *--> ... I fear that is an <person rend="ital">Achan</person>, which will bring a curse, so that we cannot prosper.<ref id="RC0273N2-ANCHOR" n="2" target="RC0273N2" type="note">2</ref> At the same time it appears to me that ... ... <endNote><ref id="RC0273N2" n="2" target="RC0273N2-ANCHOR" type="note">2.</ref> Achan's actions brought the wrath of God upon the people of Israel (<bibl>Joshua 7</bibl>).</endNote>
This produces more attractive displays in the particular software we are using, but because it also gives false information about what is actually being linked to what, this two-<ref> tagging cannot be generally recommended. When preparing the edition, projects should use the tagging described above, and change it mechanically to this two-<ref> system only at publication time (and only in the publication version: the archival version should retain the more informative and correct tagging).
Cross references and the use of the <ptr> and <ref> elements is discussed in section 6.10 of TEI P3. The <xptr> and <xref> elements are discussed and compared with <ptr> and <ref> in chapter 14.
When an electronic edition presents material which has also been published in a letterpress edition, it is desirable to provide page number references to the printed version of the edition. In some cases, it may also be desirable to provide line numbers as well. The following elements are available to provide this information:
In order to ensure that the page number for each part of a document is given, it's recommended that a <pb> element be provided at the beginning of each document. This will ensure that even in the little-book model, the page number for the beginning of the document will be available. (This is a slight deviation from the prescribed use of the <pb> element in the TEI, where it marks, strictly speaking, only the page break or boundary.)
<!--* Papers of General Nathanael Greene 7:162 *--> <surrogate id="NG07162A"> <mepHeader> ... </mepHeader> <pb n="162"/> <head>To <person>Robert Gillies</person>.</head> <dateLine> ... </dateLine> <docBody> ... </docBody> <sourceNote> ... </sourceNote> </surrogate>
If a document has been published more than once, it is possible to provide page-number references to multiple editions, which should be distinguished using the ed attribute.
In TEI P3, the <pb> element is discussed in section 6.9.3.
It is useful for readers of an electronic edition to be able to search for documents not only by keywords, but by characteristics like sender, addressee, and date. This is fairly simple in the case of letters or documents with single authors and simple dates, but it is more complicated in the case of diaries, which may cover multiple dates in a single extract, or accounts of debates in a legislature, which may cover multiple dates and record or paraphrase the words of many distinct individuals. The spkr and date attributes may be used on selected elements to indicate the identity of the speaker in a debate, and the date of the speech or diary entry:
For example, the change of speakers in the debate on the War Department bill in the First Federal Congress, and the dates on which they spoke, is recorded in the spkr and date attributes on the paragraphs of the document:
<!--* Doc. Hist. First Fed. Congress 10:718 *--> <p>The house then resolved itself into a committee of the whole on the order of the day.</p> <p spkr="ELIAS-BOUDINOT" date="1789-05-19"> <person reg="Boudinot, Elias"> Mr. <sCap>Boudinot</sCap></person> brought forward a plan for the arrangement of the executive departments. He introduced it by some general observations on the state of the several great officers under the confederation— ... </p> <p spkr="EGBERT-BENSON" date="1789-05-19"> <person reg="Benson, Egbert">Mr. <sCap>Benson</sCap></person> seconded the general propositions, but did not agree in the propriety of entering into the particulars of the arrangement, till the house had determined the general question, ... </p> <p spkr="JAMES-MADISON" date="1789-05-19"> This motion was after some debate withdrawn in favor of one made by <person reg="Madison, James, Jr.">Mr. <sCap>Madison</sCap></person>, to this effect, Resolved, that it is the opinion of this committee ... </p> <p>It was moved as an amendment to this resolution to annex another clause, providing a department for domestic affairs, and several reasons were suggested to prove the present and the encreasing necessity of such an establishment. But this motion was afterwards for the present withdrawn.</p> <p>It was moved to make a division of the question, and that separate questions should be taken ... </p> <p>On the clause rendering the heads of departments removable by the President, a considerable debate arose. </p> <pb n="719"> <p>The objections were that giving the power of removal to the President, would render vain and useless the constitutional provision for impeachment, and that it would convey a dangerous authority to the first magistrate. It was also observed, that if the President had this power, it ought at least to be tempered and qualified by the advice and consent of the <org>Senate</org>; for it was proper that the same power which created, should remove officers.</p> ... <p>A question was then taken, whether the President should have the sole power of removal, and it was carried in the affirmative by a large majority.</p> <p>The question was then put, whether there should be a treasury department, and was carried in the affirmative.</p>
These attributes may be used on the elements <div>, <div0> through <div7>, <p>, and <seg> (for cases where the speaker or date changes within a paragraph). They have no direct analogues in the TEI.
In general, it is very strongly recommended that the presentation of material in an electronic edition should be determined by (a) its markup and (b) the stylesheet in effect. Distinguishing rigorously between the encoding of the material, which should describe it neutrally, and the presentation, which should be carefully thought through by the editor, preferably with the help of a good designer, may not come naturally to all editors. Many of our colleagues have developed a very strong attachment to the particular methods used by their own projects to indicate supplied text, corrections, and the like.
Such an attachment is understandable. But experience shows that cleanly separating the markup of the document from decisions about its presentation gives the editor, and the reader, far greater flexibility and power in the long run.
In some cases, however, it is very useful to record some information about the original typographic rendition of a word or phrase, either in order to reproduce it faithfully or for other reasons. The rend attribute may be used in such cases. It is a global attribute, by which we mean it is defined for every element type in the MEP markup language.
The rend attribute may be used with any element in the MEP encoding scheme.
It is particularly useful when most, but not quite all, occurrences of a given element type should be rendered in one way, but a few exceptions should be rendered differently. In such a case, the rend attribute on the exceptions should have a value that allows them to be distinguished from the normal cases.
For example, suppose (a) that in a particular body of materials
words are, for the most part, underlined or printed in italics, but
some foreign words are not. Suppose further (c) that in order to make
easier to search effectively for foreign words the project had tagged
all foreign words as such, whether they are styled distinctively in the
originals or not, but (d) that it is nevertheless desired to follow the
originals and to underline or italicize only what was underlined or
italicized in the original. The most common case, that of an italicized
foreign word, might be tagged simply as a <foreign> element; the
less common case of a foreign word not italicized should then be tagged
<foreign rend='roman'>. When the material is presented
to the reader, the style sheet can present each foreign word or phrase
Another example may be found in the letter from Samuel Hopkins to Moses Brown already cited above: the name Achan is italicized, even though most names in the edition in question are not italicized. The unusual rendition of the personal name may be registered by using the rend attribute on the <person> element:
<!--* A Necessary Evil?, p. 73 *--> <p> ... Some of the southern delegates no doubt, insisted upon it that the introduction of slaves should be secured, and obstinately refused to consent to any constitution, which did not secure it. The others therefore consented, rather than have no constitution, or one in which the delegates should not be unanimous. I fear that is an <person lang="heb" rend="ital">Achan</person>, which will bring a curse, so that we cannot prosper. ...</p>
The rend attribute is discussed in sections 3.5 and 6.3 of TEI P3. Section 5.3.4 should be consulted for a discussion of how to document the meaning of particular keywords used with the rend attribute.
Many documentary editions practice a relatively conservative transcription of their sources. Some of the most frequently needed elements have already been discussed above (5.5 Insertions and Deletions). Others which are less commonly required, but which may be useful when called for by the project's transcription policy are these:
One advantage of producing books electronically is that most document production systems are capable of generating tables of contents, indices (within limits!), and cross references to sections or pages of the document. When these are produced automatically, the document production system is able to provide accurate page numbers (or document numbers, etc.) with far less effort from the editor than is necessary when the editor must manually key in all the page numbers in an index or in the cross references.
The actual generation of such material depends heavily on the details of the system in use; no general discussion is possible here. All such systems share, however, the requirement that the marked up document indicate, somehow, where the generated text should go.
Markup used to request generation of cross-reference text has already been dealt with in the discussion of the <ptr> element (see section 11.3 Hyperlinks).
The MEP encoding scheme also provides an element used to request generation of tables of contents, indices, etc.
tocfor tables of contents,
figlistfor lists of figures,
maplistfor lists of maps, and
tablistfor lists of tables.
The <divGen> element is discussed in TEI P3 in section 6.8.2.
Most of the documents from which examples are drawn are reproduced in a companion document, with
The specific documents in question are listed below; they are all from the editions indicated in the bibliography.
The authors thank the editors of the editions from which these documents are drawn for their cooperation and assistance, and the rights holders and the archives and repositories in which the documents are held, for their permission to reproduce the text of the documents.
This document is not intended to prescribe how projects should go about supplying the markup described here; as noted above (2.4 Project Organization and Work Flow), projects vary too much in their material, their internal organization, their computing apparatus, and the level of technical expertise available to them.
As an illustration of one approach to organizing the creation of markup for electronic editions, however, we include here a description of a system of markup `levels' which can be used to plan and perform the work. Similar sets of levels have been developed by a number of projects, which suggests that the explicit identification of different levels of markup may be widely useful.
In level-one markup, all typographic blocks and character-level font styles are captured, so that the documents can be rendered on the screen or page in an appropriate way. Level-one markup must also capture any characteristic of the original manuscript or artifact which must be transcribed from the original and cannot be marked up later. In particular, level-one markup must record anything which would otherwise be invisible in the transcription. If a transcription does not record the paragraph boundaries in the source, for example, or fails to register a place where the transcriber was unable to decipher a few words, then it is impossible to reconstruct that information from the transcription. Names of people and places, on the other hand, are as recognizable in a transcription as they would be in the original manuscript, and therefore do not need to be captured at level one. Level-one markup also includes markup that is readily included at transcription time without loss of transcription speed. Marking a postscript as such, for example, is not slower than marking it as a paragraph; when a book is cited in a note, it is just as fast to mark it as a title as it would be to mark it as an italic phrase. Accordingly, a few elements such as <ps> and <title> have been included in our level-one DTD. Note that judgement may vary among transcribers and among projects as to what elements are simple and convenient enough to tag in passing during transcription, and thus belong in level one, and what elements belong in level three. The assignment of element types to levels is no exact science.
The element types included in the MEP level 1 DTD include those for
Because level-one markup is mostly motivated by typographic phenomena (paragraphs and other blocks of type, and font shifts within blocks), it is not very useful for searching a collection of documents (the sender and addressee of a document, for example, are not identified as such), but it is sufficient for producing adequate paper or screen display of documents. Since both editors and users of editions spend a lot of time reading documents, level-one markup is essential.
In level-two markup, hyperlinks are added for notes and cross references. The additional element types are: <nav> (for navigational information -- this is in some ways a hack made necessary by shortcomings in current display tools), <ptr>, <ref>, <xptr>, and <xref> (for cross-references and hyperlinks of various kinds), and <anchor> (which exists solely to provide target identifiers for hyperlinks). When documents are marked up to this level, it is possible to follow cross-references to other documents, or to notes. This level of markup is thus essential for online display and use of the edition, whether by editorial staff or by readers, but this level of markup is not particularly useful for production of printed editions.
In level-three markup, additional markup is added for typographically indistinct, but intellectually important, phenomena like names, dates, etc. From the level 3 DTD the data can be translated automatically into the archival form. The new element types fall into a number of groups.
Level-three markup includes most of the kinds of markup we think likely to be generally useful in constructing search interfaces for general users or for providing basic information about certain specialized kinds of textual phenomena such as damage to manuscripts. It does not by any means exhaust the list of things that might be useful to mark up for specialized users, or for material of specialized interest. But markup for linguistic analysis, or for the history of books, printing, and bookbinding, or for any of the other disciplines which may have an interest in historical documents, goes beyond our scope in this document.
Where to go for more information about:
As, for example, in the popular reprinting of this
document in the Library of America volume Abraham Lincoln:
Speeches and Writings 1859-1865 (New York: Library of America,
1989), p. 315. The Note on the Texts says
(1:856) "Abraham Lincoln's name at the end of letters (which he
almost invariably signed "A. Lincoln" has been omitted."
[return to text]
This pattern is a general one
in the MEP markup language. In general, if the contents of any element in the
markup scheme are expected always to be a single paragraph, then
no <p> element is required, while if several paragraphs
may occur, one or more <p> elements must be used.
[return to text]
Chautauqua is the name of counties
in New York and Kansas, a township in New York within the county of the
same name, and a lake in New York from which the county takes its name,
as well as being the name of the educational and improvement institution
based in that county.
[return to text]
Examples of small-scale groups may be found in many
editions; a fairly typical one is the group at the beginning of volume
19 of the Papers of Thomas Jefferson, ed. Julian P. Boyd,
Ruth W. Lester Assistant Editor (Princeton: Princeton University
Press, 1974). An example of large-scale document groups may be found in
the practice of many microfilm editions of dividing the edition into
distinct series on the basis of provenance or document type. A print
edition which uses a similar principle is the Documentary History
of the First Federal Congress, which has published separate
series of volumes containing the journals of Congress, legislative
histories of the bills considered by the first Congress, letters of
[return to text]
is drawn from Mark Twain's Letters,
Vol. 5, 1872-1873,
ed. Lin Salamo and Harriet Elinor Smith
(Berkeley: University of California Press, 1997), p. 887.
[return to text]
particular, the common practice of displaying the target of a link in
reverse video leads to some display systems to display the entire note
in reverse video.
[return to text]