Why eXist Should Be in Every Digital Humanist's Toolkit

Chances are that if you're in the digital humanities, you either use TEI or some other flavor of XML to store all of your data, or your project uses XML in some key areas. If you use XML, then eXist should be in your toolkit. Why? Well, as you already know, XML is a fantastic way to encode and annotate scholarly data and metadata, but without a database to store it, a web server to publish it, or a search engine to analyze it, your project may fall short of its potential. eXist does all of the above: It's a fast web server, a powerful database, and a full-featured search engine. (To contrast it with other tools used in digital humanities work, eXist isn't a content management system like Drupal or Omeka, or a digital object repository like Fedora; it's more of a database and an application server that can be adapted to your project's needs.) It's free, built on open standards, and continually improved by the open source community. It runs on Macs, PCs, and Linux and is easy to install; you can install it anywhere from your netbook or laptop to a desktop computer or a dedicated server.

What does eXist really do with your XML? At its core is the following process: You give it your XML files, and eXist happily stores and indexes it; the files immediately become available for search and retrieval. Then you use "queries" to search within the documents, organize them into collections, and analyze, transform, and publish your data. You can limit eXist to being an XML storage facility that your existing web server draws content from, or you can store your entire web application in eXist (CSS, Javascript, images, and all), and make eXist your project's website. 

While nothing this powerful could be trivial to learn and use, eXist is entirely feasible to dabble in (or even master) for someone with a humanities background. You or your colleagues will need to learn a language called XQuery, a language designed expressly for the purpose of working with XML. But fear not—XQuery is a high level language that abstracts most of the programming away, and lets you focus on extracting the information you need from your XML. (See below for how to try live examples.) There are excellent resources for learning eXist and XQuery, including a vibrant community of users, many of whom work on humanities applications. In fact, eXist is so flexible and well-suited to the work of the digital humanist that XQuery could be the first and last computer language you'll ever need to learn. For all these reasons, digital humanists should see eXist as an absolutely essential tool.

One of the most direct ways toget a sense of what functionality and power eXist offers digital humanities projects is to visit eXist's homepage and browse to eXist's XQuery Sandbox. The Sandbox contains sample texts (Hamlet, Macbeth, and Romeo & Juliet) and canned queries that you can try, alter, and play with. Find the "Paste Example" drop-down menu, and select the first item: "Simple full text query on the Shakespeare plays." You'll see that the query window will populate with the following:

//SPEECH[ft:query(., 'love')]

This query instructs eXist to show all speeches (SPEECH elements) that contain the word "love"—but for now let's set aside the semantics of the query, and get to the results. Click on the "Send" button. Watch the results of the query stream back to you in the bottom results window. Notice how the word "love" is highlighted in the results to help you see the matching text. (Here's what the syntax means: //SPEECH asks for all speech elements, and the square-bracketed expression filters or restricts the results to just those that have a match in eXist's fulltext index for the word "love". It's okay not to understand every query now; it's time to play and experiment.)

Let's experiment! Try changing the word from "love" to another word (say "cold"), and hit "Send" again. Change the word to "bird*", and notice how the search now returns hits with "bird," "birds," and "bird's"—the asterisk is a wildcard for the ft:query() function. Now try each of the next few options in the drop down menu. By the time you see the 4th option, "Show the context of a match," the real power of XQuery becomes evident: We're still searching speeches, but now the results of your search show each speech's scene, act, and play. This is possible because eXist understands the hierarchical structure of XML, and can use that structure to enhance your search results. You can try as many of the queries as you like. Don't worry, you can't do anything wrong here, and even if you did, the eXist homepage resets itself every several hours.

If this demonstration piques your interest and strikes you as having potential for your project, here are 5 steps you can follow to download and install eXist onto your own computer and get working with your own data.

  1. Download eXist: Go to the eXist homepage, click on the big "Download," and look for the section entitled "Stable Release." If you are running Windows, download the version ending in ".exe." Otherwise, if you're running Mac or Linux, download the version ending in ".jar." The file that downloads is the eXist installer.  (Note to Windows or Linux users: Before you can install eXist, you need to download and install the Java JDK.)
  2. Install eXist: Once the file is downloaded, double-click on it to start the eXist installer. Follow the prompts to select an installation directory on your hard drive, and choose a password (or leave the password blank for now). The default choices that the installer provides you with are all acceptable. (Once you've finished installing eXist, if you navigate to the folder where you installed eXist, you'll see about 50 files and folders. Keep them all for now, and you can mostly ignore them.)
  3. Start eXist: eXist is different than many applications on your computer, and starting eXist is your first indication of this. When you start eXist, you'll notice that it's actually more like a service that runs quietly in the background rather than an application with its own windows and graphical interface; in fact, you usually interact with eXist through other programs, like your web browser. So let's get it started. Starting eXist on Windows is pretty straight-forward; you'll find an icon on your desktop called "Start eXist"; double-clicking on this icon will launch a command line window and display a cryptic log of eXist's startup routine; and just keep this window open. On Linux or Mac, though, you'll need to open a command line: On Mac, go to Applications > Utilities, and start Terminal. Then use the "cd" command to navigate into the folder where you installed eXist, and type "bin/startup.sh". You'll see the log of eXist's cryptic startup routine, and again, just keep this window open. The contents of this log aren't important for now, but you should see it advance pretty quickly, until it halts with a message like, "Server has started on ports 8080." If you see that, you're golden.
  4. Take eXist for a spin: Now that eXist is running, you can begin interacting with it through your web browser. Open your web browser to http://localhost:8080/exist/, and you'll see a page very much like eXist's homepage. (Note: This link only works when your eXist is running. The "localhost" bit means your own computer, and the 8080 bit is a "port" that eXist runs on by default; if this bothers you, don't worry, since it's not hard to change eXist's configuration so you don't need to type 8080. For now we'll stick with 8080.) In fact, it is identical to eXist's homepage, since eXist's homepage is run, naturally enough, on eXist. Now that eXist is running on your own computer, you don't have to be on the internet to explore eXist. (You'll never be bored on a train or plane again.) I'd suggest clicking around a bit to get acquainted with eXist: from the homepage, you'll find a like to the "Main Documentation," the "Feature Sheet," and the all-important "Admin" page. The Admin page will ask you for your username ("admin") and the password you chose during the installation process, and from here you can perform many useful tasks. For example, you can install the example Shakespeare files and the sample Sandbox by clicking on "Examples Setup" and then "Import Files." If you want to search eXist's documentation, you can install it by clicking on "Install Documentation" and then "Generate." Once you've installed the examples and the documentation, it's instructive to click on the "Browse Collections" panel to see the data you've just added to the database: the Shakespeare data is in the "shakespeare" collection, and the the Sandbox example queries are in the "example.xml" file. The root collection is called "db," so the full path to this file is "/db/example.xml."
  5. Add your own data: eXist really starts to shine when you add your own data to the database and begin writing queries on your data. There are several ways to upload files to the database, but we'll start with one simple way. From the Admin page (see step 4), click on "Browse Collections." Let's create a new collection for your data. In the "New collection:" field near the bottom of the page, enter "mydata", and click "Create Collection." Notice that the new "mydata" collection appears in the listing. Click on the "mydata" collection. It's empty, so let's add an XML file. Click on "Choose File," browse to one of your XML files (if you need one, download more Shakespeare), and click on "Upload." Notice that the "myfile.xml" is now in the list of files. You can even upload non-XML files, and while they're not searchable like XML, eXist happily stores them. Now that your data is in eXist, you can return to the Sandbox and begin querying it. It's unlikely that your data matches the structure of the Shakespeare data, so you'll need to experiment with your own queries (note that the ft:query() function in the first Sandbox queries above may not work on your data until you've added full text indexes to your data; instead, try contains(). To browse through all of the functions like this built into eXist, these are on eXist's homepage under Function Library or on your local copy of eXist.) If you're ready to turn your Sandbox query into a webpage with its own URL, save the text of your query to a file ending in ".xq" (e.g. "myquery.xq") and upload it to your collection; then enter, for example, http://localhost:8080/exist/rest/db/mydata/myfile.xq. If you hit a roadblock, don't despair. This is a good time to explore online resources for learning XQuery, like the XQuery Wikibook. Priscilla Walmsley's XQuery (O'Reilly 2007) is a great reference book too. Remember too that you've got all of the eXist documentation in your browser, browsable and searchable. Now is a good time to join the eXist-open mailing list (search or subscribe) for answers to your questions about eXist, and the XQuery-talk mailing list (search or subscribe) for answers to your questions about basic XQuery.

I hope this helps give you a taste of what eXist could offer your digital humanities project, and whets your appetite for more. Questions? Comments?

(This post was inspired by coffee break and hallway conversations I had at the Chicago Colloquium on Digital Humanities and Computer Sciences 2010 meeting. See the tweets: #dhcs10.)

Filed under  //   TEI   XML   XQuery   dh   eXist