Showing posts with label NDIIPP. Show all posts
Showing posts with label NDIIPP. Show all posts

Friday, July 25, 2014

2014-07-25: Digital Preservation 2014 Trip Report

Mat Kelly and Dr. Michael L. Nelson travel to Washington, DC and both report on their current research as well as be made aware of others' work in the field.                           
On July 22 and 23, 2014, Dr. Michael Nelson (@phonedude_mln) and I (@machawk1) attended Digital Preservation 2014 in Washington, DC. This was my fourth consecutive NDIIPP (@ndiipp) / NDSA (@ndsa2) meeting (see trip reports from Digital Preservation 2011, 2012, 2013). With the largest attendance yet (300+) and compressed into two days, the schedule was jam-packed with interesting talks. Per usual, videos for most of the presentations are included inline below.

Day One

Micah Altman (@drmaltman) led the presentations with information about the NDSA and asked, regarding Amazon claiming reliability of 99.99999999999% for uptime, "What do the eleven nines mean?". "There are a number of risk that we know about [as archivists] that Amazon doesn't", he said, continuing, "No single institution can account for all the risks." Micah spoke about the updated National Agenda for Digital Stewardship, which is to have the theme of "developing evidence for collective action".

Matt Kirschenbaum (@mkirschenbaum) followed Micah with his presentation Software, It’s a Thing with reference to the recently excavated infamous Atari ET game. "Though the data on the tapes had been available for years, the uncovering was more about aura and allure". Matt spoke about George R. R. Martin's (of Game of Thrones fame) recent interview where he stated that he still uses a word processing program called WordStar on an un-networked DOS machine and that it is his "secret weapon from distraction and viruses." In the 1980s, Wordstar dominated the market until Word Perfect took rein, followed by Microsoft Word. "A power user that has memorized all of the Wordstar commands could utilize the software with the ease of picking up a pencil and starting to write."
Matt went on to talk (also see his Medium post) about software as different concepts include as an assets, as an object, as a kind of notation or score (qua music), as shrinkwrap, etc. For a full explanation of each, see his presentation:

Following Matt, Shannon Mattern (@shannonmattern) shared her presentation on Preservation Aesthetics. "One of preservationists's primary concerns is whether an item has intrinsic value.", she said. Shannon then spoke about the various sorts of auto-destructive software and art including those that are light sensitive (the latter) and those that delete themselves on execution (the former). In addition to recording her talk (see below), she graciously included the full text of her talk online.

The conference briefly had a break and a quick summary of the poster session to come later in day 1. After the break, there was a panel titled "Stewarding Space Data". Hampapuram Ramapriyan of NASA began the panel stating that "NASA data is available as soon as the satellites are launched.", he continued, "This data is preserved...We shouldn't lost bits that we worked hard to collect and process for results. He also stated that he (and NASA) is part of the Earth Science Information Partners (EISP) Data Stewardship Committee.

Deirdre Byrne of NOAA then presented speaking of their dynamics and further need on documenting data, preserving it with provenance, providing IT support to maintain the data's integrity, and being able to work with the data in the future (a digital preservation plus). Deirdre then referenced Pathfinder, a technology that allows the visualization of sea surface temperatures among other features like indicating coral bleaching from fish stocks, the water quality on coasts, etc. Citing its now use as a de facto standard means for this purpose, she described the physical dynamics as having 8 different satellites for its functionality along with 31 mirror satellites on standby for redundancy.

Emily Frieda Shaw (@emilyfshaw) of University of Iowa Libraries followed in the panel after Deirdre, and spoke about the Iowan role in preserving the original development data for the Explorer I launch. Upon converting and analyzing the data, her fellow researchers realize that at certain altitudes, the radiation detection dropped to zero, which indicated that there were large belts of particles surrounding the Earth (later, they were recognized as the Van Allen belts). After discovering more data in a basement about the launch, the group received a grant to recover and restore the badly damaged and moldy documentation.

Karl Nilsen (@KarlNilsen)and Robin Dasler (@rdasler) of University of Maryland Libraries were next with Robin first talking about her concern with issues in the library realm related to data. She reference that one project's data still resided at the University of Hawaii's Institute for Astronomy due to it being the home school to one of the original researchers on a project. The project consisted of data measuring the distances between galaxies that came about by combining and compiling data from various data sources originating from both observational data and standard corpora. To display the data (500 gigabytes total), the group developed a UI to utilize web technologies like MySQL to make the data more accessible. "Researchers were concerned about their data disappearing before they retired.", she stated about the original motive for increasing the data's accessibility.
Karl changed topics somewhat with stating that two different perspectives can be taken about data from a preservation standpoint (format or system centric). "The intellectual value of a database comes from ad hoc combination from multiple tables in the form of joins and selections.", he said. "Thinking about how to provide access", he continued, "is itself preservation." He followed this with approaches including utilizing virtual machines (VMs), migrating from one application to a more modern web application, and collapsing the digital preservation horizon to ten years at a time.
Following a brief Q&A for the panel was a series of "Lightning talks". James (Jim) A. Bradley of Ball State University started with his talk, "Beyond the Russian Doll Effect: Reflexivity and the Digital Repository Paradigm" where he spoke about promoting and sharing digital assets for reuse. Jim then talked about Digital Media Repository (DMR), which allowed information to be shared and made available at the page level. His group had the unique opportunity to tell what material are in the library, who was using them and when. From these patterns, grad students made 3-D models, which were them subsequently added and attributed to the original objects.
Dave Gibson (@davegilbson) of Library of Congress followed Jim by presenting Video Game Source Disc Preservation. He stated that his group has been the "keepers of The Game" since 1996 and have various source code excerpts from a variety of unreleased games including Duke Nukem Critical Mass, which was released for Nintendo DS but not Playstation Portable (PSP), despite being developed for both platforms. In their exploration of the game, they uncovered 28 different file formats on the discs, many of which were proprietary, and wondered how they could get access to the files' contents. After using Mediacoder to convert many of the audio files and hex editors to read the ASCII, his group found source code fragments hidden within the files. From this work, they now have the ability to preserve unreleased games
Rebecca (Becky) Katz of Council of the District of Columbia was next with UELMA-Compliant Preservation: Questions and Answers?. She first described the UELMA, an act that declares that if a U.S. state passed the act and its digital legal publications are official, the state has to make sure that the publications are preserved in some way. Because of this requirement, many states are reluctant to rely solely on digital documents and instead keeping paper copies in addition to the digital copies. Many of the barriers in the preservation process for the states lie in how they ought to preserve the digital documents. "For long term access," she said, " we want to be preserving our digital content." Becky also spoke about appropriate formats for preservation and that common features of formats like authentication for PDF are not open source. "All the metadata in the world doesn't mean anything if the data is not usable", she said. "We want to have a user interface that integrates with the [preservation] repository." She concluded with recommending state that develop the EULMA to have agreements with universities or long standing institutions to allow users to download updates of the documents to ensure that many copies are available.
Kate Murray of Library of Congress followed Becky with "We Want You Just the Way You Are: The What, Why and When of fixity in Digital Preservation". She referenced the "Migrant Mother" photo and how, through moving the digital photo from one storage component to another, there have been subtle changes to the photo. "I never realized that there are three children in the photo!", she said, referencing the comparison between the altered and original photo. To detect these changes, she uses fixity (related article on The Signal) on a whole collection of data, which ensures bit level integrity.
Following Kate, Krystyna Ohnesorge of Swiss Federal Archives (SFA) presented "Save Your Databases Using SIARD!". "About 90 percent of of specialized applications are based on relational databases.", she said. SIARD is used to preserve database content for the long term so that the data can be understood in 20 or 50 years. The system is already used in 54 countries with 341 licenses currently existing for different international organizations. "If an institution needs to archive relational databases, don't hesitate to use the SIARD suite and SIARD format!"
The conference then broke for lunch where the 2014 Innovation Awards were presented.
Following lunch, Cole Crawford (@coleinthecloud) of the Open Compute Foundation presented "Community Driven Innovation" where he spoke about Open Computer being an international based open source project. "Microsoft is moving all Azure data to Open Compute", he said. "Our technology requirements are different. To have an open innovation system that's reusable is important." He then emphasized that his talk was to be specifically open the storage aspects of Open Compute. He started with "FACT: The 19 inch server rack originated in the railroad industry then propagated to the music industry, then it was adopted by IT." He continued, "One of the most interesting things Facebook has done recently is move from tape storage to ten thousand Blueray discs for cold storage". He stated that in 2010, the Internet consisted of 0.8 Zetabytes. In 2012, this number was 3.0 Zetabytes, and by 2015, he claimed, it will be 40 Zetabytes in size. "As stewards of digital data, you guys can be working with our project to fit your requirements. We can be a great engine for procurement. As you need more access to content, we can get that.

After Cole was another panel titled, "Community Approaches to Digital Stewardship". Fran Berman (@resdatall) of Rensselaer Polytechnic Institute started off with reference to the Research Data Alliance. "All communities are working to develop the infrastructure that's appropriate to them.", she said, "If I want to worry about asthma (referencing an earlier comment about whether asthma is more likely to be obtained in Mexico City versus Los Angeles), I don't want to wait years until the infrastructure is in place. If you have to worry about data, that data needs to live somewhere."

Bradley Daigle (@BradleyDaigle) of University of Virginia followed Fran and spoke about the Academic Preservation Trust, a group consisting of 17 members that takes a community based approach at are attempting to not just have more solutions but better solutions. The group would like to create a business model based on preservation services. "If you have X amount of data, I can tell you it will take Y amount of cost to preserve that data.", he said, describing an ideal model. "The AP Trust can serve as a scratch space with preservation being applied to the data."

Following Bradley on the panel, Aaron Rubinstein from University of Massachusetts Amherst described his organization's scheme as being similar to Scooby Doo, iterating through each character displayed on-screen and stating the name of a member organization. "One of the things that makes our institution privileges is that we have administrators that understand the need for preservation.

The last presenter in the panel, Jamie Schumacher of Northern Illinois University started with "Smaller institutions have challenges when starting digital preservation. Instead of obtaining an implementation grant when applying to the NEH, we got a 'Figure it Out' grant. ... Our mission was to investigate a handful of digital preservation solutions that were affordable for organizations with restricted resources like small staff sizes and those with lone rangers. We discovered that practitioners are overwhelmed. To them, digital objects are a foreign thing." Some of the roadblocks her team eliminated were the questions of which tools and services to use for preservation tasks, to which Google frequently gave poor of too many results.

Following a short break, the conference split into five different concurrent workshops and breakout sessions. I attended the session titled Next Generation: The First Digital Repository for Museum Collections where Ben Fino-Radin (@benfinoradin) of Museum of Modern Art, Dan Gillean of Artefactual Systems and Kara Van Malssen (@kvanmalssen) of AVPreserve gave a demo of their work.

As I was presenting a poster at Digital Preservation 2014, I was unable to stay for the second presentation in the session Revisiting Digital Forensics Workflows in Collecting Institutions by Marty Gengenbach of Gates Archive, as a was required to setup my poster. Starting at 5 o'clock, the breakout sessions ended and a reception was held with the poster session in the main area of the hotel. My poster, "Efficient Thumbnail Summarization for Web Archives" is an implementation of Ahmed AlSum's initial work published at ECIR 2014 (see his ECIR Trip Report).

Day Two

The second day started off with breakfast and an immediate set of workshops and breakout sessions. Among these, I attended the Digital Preservation Questions and Answers from the NDSA Innovation Working Group where Trevor Owens (@tjowens), among other group members introduced the history and migration of an online digital preservation question and answer system. The current site, currently residing at http://qanda.digipres.org is in the process of migration from previous attempts including a failed try at a Digital Preservation Stack Exchange. This work, completed in-part by Andy Jackson (@anjacks0n) at the UK Web Archive, began its migration with his Zombse project, which extracted all of the initial questions and data from the failed Stack Exchange into a format that would eventually be readable by another Q&A system.
Following a short break after the first long set of sessions, the conference then re-split into the second set of breakout sessions for the day, where I attended the session titled Preserving News and Journalism. Aurelia Moser (@auremoser) administrated this panel-style presentation and initially showed a URI where the presentation's resource could be found (I typed bit.ly/1klZ4f2 but that seems to be incorrect).
The panel, consisting of Anne Wootton (@annewooton, Leslie Johnston (@lljohnston), and Edward McCain Reynolds (@e_mccain), initially asked, "What is born digital and what are news apps?". The group had put forth a survey toward 476 news organization, consisting of 406 hybrid organizations (those that put content in print and online), and 70 "online only" publications.
From the results, the surveyors asked what the issue was with responses, as they kept the answers open ended for the journalists to obtain an accurate account of their practices. "Newspapers that are in chains are more likely to have written policies for preservation.
The smaller organizations are where we're seeing data loss." At one point, Anne Wooton's group organized a "Design-a-Thon" where they gathered journalists, archivists, and developers. Regarding the surveyors' practice, the group stated that Content Management System (CMS) vendors for news outlets are the holders of t he key of the kingdom for newspapers in regard to preservation.

After the third breakout session of the conference, lunch was served (Mexican!) with Trevor Owens of Library of Congress, Amanda Brennan (@continuants) of Tumblr, and Trevor Blank (@trevorjblank of The State University of New York at Potsdam giving a preview of CURATECamp, to occur the day after the completion of the conference. While lunch proceeded, a series of lightning talks was presented.
The first lightning talk was by Kate Holterhoff (@KateHolterhoff) of Carnegie-Mellon University and titled Visual Haggard and Digitizing Illustration. In her talk, she introduced Visual Haggard, a catalog of many images from public domain books and documents that attempts to have better quality representations of the images in these documents compared to other online systems like Google Books. "Digital Archivists should contextual and mediate access to digital illustrations", she said.

Michele Kimpton of DuraSpace followed Kate with DuraSpace and Chronopolis Partner to Build a Long-term Access and Preservation Platform. In her presentation she introduced a few tools like Chronopolis (used for dark archiving), DuraCloud and a few other tools and her group's approach toward getting various open source tools to work together to provide a more comprehensive solution for preserving digital content.

Following Michelle, Ted Westervelt of Library of Congress presented Library of Congress Recommended Formats where he reference the Best Edition Statement, a largely obsolete but useful document that needed supplementation to account for modern best practice and newer mediums. His group has developed the "Recommended Format Specification", which provide this without superseding the original document and is a work-in-progress. The goal of the document is to set parameters for the target objects for the document so that most content that is current un-handled by the in-place specification will have directives to ensure that digital preservation of the objects is guided and easy.

After Ted, Jane Zhang of Catholic University of America presented Electronic Records and Digital Archivists: A Landscape Review where she did a cursory study of employment positions for digital archivists, both formally trained and trained on-the-job. She attempt to answer the question "Are libraries hiring digital archivists?" and tried to see a pattern from one hundred job descriptions.

After the lightning talks, another panel was held, titled "Research Data and Curation". Inna Kouper (@inkouper) and Dharma Akmon (@DharmaAkmon) of Indiana University and University of Michigan, Ann Arbor, respectively, discussed Sustainable Environment Actionable Data (SEAD, @SEADdatanet), and a Research Object Framework for the data that will be very familiar for practitioners working with video producers. "Research projects are bundles", they said, "The ROF captures multiple aspects of working with data include unique ID, agents, states, relationships, and content and how they cyclicly relate. Research objects change states."
They continued, "Curation activities are happening from the beginning to the end of an object's lifecycle. An object goes through three main states", they listed, "Live objects are in a state of flux, handled by members of project teams, and their transition is initiated by the intent to publish. Curation objects consist of content packaged using the BagIt protocol with metadata, and relationships via OAI/ORE maps, which are mutable but allow selective changes to metadata. Finally, publication objects are immutable and citable via a DOI and have revisions and derivations tracked."
Ixchel Faniel from OCLC Research then presented, stating that there are three perspectives for archeological practice. "The data has a lifecycle from the data collection, data sharing, and data reuse perspective and cycle." Her investigation consisted of detecting how actions in one part of the lifecycle facilitated work in other parts of the lifecycle. She collected data over one and one-half year (2012-2014) from 9 data producers, 2 repository staff, and 7 data re-users and concluded that actions in one part of the cycle have influence on things that occur in other stages. "Repository actions are overwhelmingly positive but cannot always reverse or explain documentation procedures."

Following Ixchel and the panel, George Oates (@goodformand), Director of Good, Form & Spectacle presented Contending with the Network. "Design can increase access to the material", she said, stating her experience with Flickr and Internet Archive. Relating to her work with Flickr, she referenced The Commons, a program that is attempting to catalog the world's public photo archive. In her work with IA, she was most proud of the interface she designed for the Understanding 9/11 interface. She then worked for a group named MapStack and created a project called Surging Seas, an interactive tool for visualizing sea level rise. She recently started a new business "Good, Form, & Spectacle" and proceeded on a formal mission of archiving all documents related to the mission through metadata. "It's always useful to see someone use what you've designed. They'll do stuff you anticipate and that you think is not so clear."

Following a short break, the last session of the day started with the Web Archiving Panel. Stephen Abrams of California Digital Library (CDL) started the presentation asking "Why web archiving? Before, the web was a giant document retrieval system. This is no longer the case. Now, the web browser is a virtual machine where the language of choice is JavaScript and not HTML." He stated that the web is a primary research data object and that we need to provide programmatic and business ways to support web archiving.

After Stephen, Martin Klein (@mart1nkle1n) of Los Alamos National Laboratory (LANL) and formally of our research group gave an update on the state of the work done with Memento. "There's extensive memento infrastructure in-place now!", he said. New web services that are to be released soon to be offered by LANL include a Memento redirect service (for example, going to http://example.com/memento/20040723150000/http://matkelly.com will automatically be resolved in the archives to the closest available archived copy); a memento list/search service to allow memento lookup using a user interface with specifying dates, times, and a URI; and finally, a Memento TimeMap service.

After Martin, Jimmy Lin (@lintool) of University of Maryland and formally of Twitter presented on how to leverage his big data expertise for use in digital preservation. "The problem", he said, "is that web archives are an important part of heritage but are vastly underused. Users can't do that much with web archives currently." His goal is to build tools to support exploration and discovery in web archives. A tool his group built, Warcbase uses Hadoop and HBase for topic modeling.

After Jimmy, ODU WS-DL's very own Michael Nelson (@phonedude_mln) presented starting off with "The problem is that right now, we're still in the phase of 'Hooray! It's in the web archive!" whenever something show up. What we should be asking is, "How well did we archive it?" Referencing the recent publicity of Internet Archiving capturing evidence toward the plane being shot down in Ukraine, Michael says, "We were just happy that we had it archived. When you click on one of the video, however, and it just sits here and hangs. We have the page archived but maybe not all the stuff archived that we like." He then went on to describe the ways that his group is assessing web archive is to determine the importance of what's missing, detect temporal violations, and benchmarking how well the tools handle the content they're made to capture.

After Dr. Nelson presented, the audience had an extensive amount of questions.

After the final panel, Dragan Espenschied (@despens) of Rhizome presented Big Data, Little Narration (see his interview, transcript, & links page). In his unconventional presentation, he reiterated that some artifacts don't make sense in the archives. "Each data point needs additional data about it somewhere else to give it meaning.", he said, giving a demonstration of an authentic replay of Geocities sites in Netscape 2.0 via a browser-accessible emulator. "Every instance of digital culture is too large for an institution because it's digital and we cannot completely represent it."
As the conference adjourned, I was glad I was able to experience it, see the progress other attendees have made in the last three (or more) years, and present the status of my research.
— Mat Kelly (@machawk1)

Friday, July 26, 2013

2013-07-26: Digital Preservation 2013 Trip Report

The time of year has again arrived for conferences related to our research area of web sciences and digital libraries. While much our group will be representing the university at the Joint Conference on Digital Libraries (JCDL) conference in Indianapolis (trip report), I was given the opportunity to attend Digital Preservation 2013 in Alexandria, Virginia.

Being much closer to home in Hampton Roads, this is the third year running that I have attended this conference (2012 Trip Report, 2011 Trip Report), having presented digital preservation tools at each: Archive Facebook in 2011 and WARCreate in 2012. Following up from the recent public release of WARCreate (see the announcement), I gave a presentation on another package I had created, Web Archiving Integration Layer (WAIL), originally unveiled at Personal Digital Archiving 2013 in February (Trip Report), WARCreate, and how all of the pieces fit together titled: WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy.

Long before it was my turn to present, however, the lineup included a fantastic cast of other presenters. To start off the conference, Bill LeFurgy (@blefurgy) gave the welcoming remarks.

Day One

Bill started by noting that this was the 9th year of the annual NDIIPP meeting and that a lot had changed in that time. He reminisced of when the conference first started in 2004 about how much progress has been made in preservation efforts. "One of principles goals was to build a community around the process of digital stewardship.", he said. Bill then introduced the first speaker of the conference, Hilary Mason.

Hilary Mason (@hmason) of bit.ly is the chief scientist at bit.ly. Hilary started her presentation titled "Humans and Data" by noting that she was there to learn, being an engineer. She offered her expertise on "How engineers and startup people think about preservation when they think about it at all.", she described, "...Which is not that often. That's that punchline."

Commenting on social behavior, she referenced a reddit thread that posed the question "If someone from the 1950s suddenly appeared today, what would be the most difficult thing to explain to them about life today?" to which the top answer was, "I possess a device, in my pocket, that is capable of accessing the entirety of information known to man. I use it to look at pictures of cats and get in arguments with strangers."

"That's the Internet!", said exclaimed, "While technology and our technical capability has changed very rapidly, human nature has not changed at all."

She continued on to speak of the origins of bitly and how it was a complete accident. bit.ly was part of a feature of another product spun out of a company called Betaworks. "They [Betaworks] had this brilliant idea that when you're reading a news article on the Internet, other people are reading that article at the same time and yet our experience of that is very lonely. It is not in any way social." she continued, "So they thought, what if we added a social layer to news consumption? They built a system where you could see the mouse cursors of everybody else on the news article with you. So you can guess what happened then." She described the behavior of the users, who would do the exact opposite by swearing at each other and would chase each other around the screen.

"It was horrible!", she said, "It had the opposite social effect that the product was intended to have. But two different things that were useful came out of it. One of them was bit.ly, which was just a little way to share content in that tool."

Along with most of the presentations at Digital Preservation 2013, I captured this one on digital video and made it viewable here.

Part One
Part Two

Following Hilary, Sarah Werner (@wykenhimself) of the Folger Shakespeare Library presented "Disembodying the Past to Preserve It". She spoke of collections of indulgences and how, because the physical items were not considered valuable, the ones that did survive were reused as waste paper and thus found in bindings of other saved items. "Being treated as disposable is how they survived." she said.

She continued to describe a works within The Great Parchment Book, a collection of 165 leaves describing a survey compiled in 1639 of all of the estates managed by the city of London, that were badly damaged by a fire in 1786.

"Through careful preservation about 50% of the text was recovered but the brittle wrinkled parchment remained an intractable obstacle for further work.", she said, "After extensive physical preservation work, the UCL [University College London] team was able to virtually un-wrinkle the pages. "

She continued, "About 90% of the text of The Great Parchment Book is now readable and available for examination online as images of the leaves, enhanced images, or transcription of the text. In both of these cases, digitization makes available objects for study that would otherwise be restricted either because they're too fragile to handle or they're too dispersed to work with."

After a short break, Micah Altman (drmaltman), the Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, formally announced the 2014 National Digital Stewardship Agenda and gave a brief on the document.

Describing why such a document is drawn up, he said "Effective digital stewardship is vital for maintaining authenticity of public records...and information on how to do it, what to do, and what's going is distributed across practice, research, sectors, disciplines, communities of practice. There's a diversity of perspectives in organizations that are involved. So that sort of sounds like us. More on the reasoning for the document can be seen in the full video."

Following Micah, Leslie Johnston (@lljohnston) of Library of Congress introduced the next panel titled "Creative Approaches to Content Preservation", which "is only a panel in name.", she stated, commenting on the greater similarity of the format to a series of presentations with subsequent questions to the group rather than the traditional format.

Anne Wootton (@annewootton), one-half of Pop-Up Archive (the other being Bailey Smith (@baileyspace) started the panel first describing her organization then beginning by "starting with the tail of an archive" by referencing the Kitchen Sisters and their organization's worth with them when approached with an "archival crisis". The Sisters had been working in public radio for decades, recording thousands hours of sound, and had these recordings stored on a variety of mediums stored in a variety of places.

For their Master's Thesis at Berkeley, they surveyed the digital archiving and public media ecosystems to see if they could identify a solution that would meet the Kitchen Sisters needs while keeping in-mind the restrictions of resources, workflow and lack of technical proficiency. "We saw the need for an inexpensive tool", she said, "that could be used oral history, archives, and media creators alike to store and/or create access to their materials safely and make it discoverable in a way that would be standardized with their industries." Their initial efforts were in creating plugins for Omeka

Travis May of the Federal Reserve Bank of Saint Louis followed Anne, describing his work on FRED, an economic database with over 83,000 economic series from 57 different sources with a majority of the data coming from the United States.

Cal Lee of University of North Carolina at Chapel Hill was up next with "Taking Bitstream Seriously". "The category of dealing with everything we get is pretty large...relates to trying to be a little bit more systematic to try to deal with messy situations where we get this kind of media.", he said after referencing existing systematic tools for transfer like bag.it. Cal went on to describe his project, BitCurator (funded by the Andrew W. Mellon Foundation) that is soon to be headed into phase II. "The main goals are to develop and disseminate a package and support open source tools that can help people apply digital forensics methods.", he said. "There are two main things that aren't traditionally addressed by the digital forensics field itself: building these things into library/archival workflows and supporting provisioning of public access to data.

The famed Jason Scott (@textfiles) of Archive Team took the stage next in patriot attire (and his token black hat) and begun, "I am the harbinger of death. I am the angel of death. I am the sad grim reaper that sits at the crossroads of your lost and dying dreams. I am the boatman on the River Styx who takes your hard drive from you and rides with you across the river to your utter destiny." he continued, humorously, "When the handshakes no longer happen and when the smiles fade - that's where I am living. I am living in this world because I help found something called Archive Team, archiveteam.org.

He went on to describe a few of Archive Team's recent projects including savings all of Xanga (project page), for which he described the progress indicating that the preservation won't end well, and Snapjoy (now down, project page), for which he showed more hope due to the small number of users.

Jason emphasized that there are many online communities that are "real shifting sand" that have no guarantees or laws preventing them from going away. "The fundamental question", he said, "is 'Is an online presence a valid humanitarian concern?'".

"Unfortunately, we are now the victims of the 'brogrammer/journalist' complex, which has worked together to really convince us that the place to put all of our stuff is with people we don't know for reasons we don't know until they decide that they're done with us...or have they've sold it to Google.", he said. "We have three virtues within Archive Team: Rage, Paranoia, and Kleptomania. So basically, we're very angry about these things going away, we have an enormous paranoia about things that might go away at any given time, and we take everything as fast as we can."

Jason went on with further allusions and anecdotes about Archive Team and their projects but the video (below) does his presentation better justice.

With the panel complete, a series of Lightning Talk followed.

Lightning Talks

William Ying of ARTStor presented "ARTstor Shared Shelf Preservation Plan Based on the NDSA Levels of Digital Preservation".


Abbie Grotke (@grotke) of Library of Congress presented "Content Working Group Case Studies"


Kim Schroeder of Wayne State University presented "Realities of Digital Preservation — What Are the Concerns and the Practice?


David Brunton of Library of Congress presented "The Importance of Being Developers"


Cathy N. Hartman of University of North Texas presented "International Internet Preservation Consortium: Update"


Patrick Loughney of Library of Congress presented "The Library of Congress National Recording Preservation Plan"


Christina Drummond of University of North Texas presented "Your VO 'Lab' Results Are in: What NDSA Members Think of the NDSA"


Yvonne Ng (@ng_yvonne) of WITNESS presented "The Activists' Guide to Archiving Video"


With Yvonne closing up the lightning talks, Barrie Howard of NDIIPP excused the audience from the first day and encouraged everyone to view the poster session just outside of the main presentation room.



Day 2

Lisa Green (@boudicca), Director of @commoncrawl started off Day 2 (with an introduction by Bill LeFurgy) with her presentation "Digital Preservation for Machine-Scale Access and Analysis". Citing Hilary's work from Day 1, she said "By machine scale I believe we have do be doing digital preservation in such a way that enables us to do data science on information we're preserving."

Lisa continued by giving a history of the progress of how we have moved from the concept of archiving hard bound information to machine readable information. "By the end of the 20th century, we had significantly increased our storage capacity. At this point, we were able to store and move around megabytes of data very easily. This was about the time that some of the really forward thinking people at Library of Congress started thinking about Digital preservation. We can store so much information now that we needed a new unit to even wrap our heads around the amount of storage we have: a "Library of Congress" worth of information."

She continued, describing rare books are setup for display without being accessible (e.g., behind glass) and juxtaposed them with Google Books and Google NGram Viewer in how the latter does not necessarily give direct access to information in the former. "We're not building a time capsule here. We're not putting things away so that they're safe for future generations and maybe we take a peek at them now and then. Citing the Library of Congress' mission statement:
The Library's mission is to make its resources available and useful to the Congress and the American people and to sustain and preserve a universal collection of knowledge and creativity for future generations.

"To me, the first part is the most important part - To make its resources available and useful. What good is collecting all of the information if we're not pushing forward the boundaries of human knowledge? I would propose that some efforts in digital preservation are focused a little too much on the second part, to preserve and sustain, and not enough on the available and useful.", she said.

Emily Gore (@ncschistory) of Digital Public Library of America followed Lisa. She spoke of the various partners that have contributed data to their organizations and that "we free our data. Our data is your data. Our partners' data is you data. You can download the complete repository of data you've give us. Do with it what you will." stating that, by default, the partner's data is under a CC0 license.

As with the first day of the conference, a panel followed titled "Green Bytes: Sustainable Approaches to Digital Stewardship" with an introduction by Erin Engle (@erinengle).

Green Bytes Panel

David Rosenthal of Stanford University

Kris Carpenter of Internet Archive

Krishna Kant of of George Mason University and the National Science Foundation

With the completion of the panel, the crowd was given a half hour to preview the workshops to follow. The five workshops/sessions occurred simultaneously with five different topics:

Workshops/Sessions

While very relevant to our interests at WS-DL, I presented at the Web Archiving session, so cannot give an account of the others.

The Tools of the Trade: The Library of Congress Perspective session contained presentations titled "World Digital Library" by Sandy Bostian of Library of Congress; Jukebox by Sam Brylawski of University of California, Santa Barbara; and "Congress.gov" by Andrew Weber (@atweber) of the Law Library of Congress.

The Digital Curation Education and Curriculum session started with the first presentation titled National Digital Stewardship Residency Program" by Kris Nelson of Library of Congress, Bob Horton of IMLS, Andrea Goethals (@andreagoethals of Harvard University, Jefferson Bailey (@jefferson_bail) of Metropolitan New York Library Council, and Prue Adler of Association of Research Libraries. The second presentation of the session was titled "Closing the Digital Curation Gaps: Getting Started Guide" by Helen Tibbo of UNC at Chapel Hill.

The Digital Preservation Tools session contained presentations titled "WGBH Media Library and Archives" by Karen Cariani of WGBH and "DSpace and Fedora Commons: A Comparison of Projects" by Wayne State University Students .

The Managing Software Projects session was more panel-like with David Brunton of Library of Congress, Lisa LaPlant of GPO, Daniel Chudnov of George Washington University Libraries and moderated by Kate Zwaard (@kzwa)

of Library of Congress.

Post-Panel Q&A;

After the simultaneous session, the crowd was excused for lunch, where the NDSA Innovation Awards were presented by Jefferson Bailey (@jefferson_bail).

Following lunch was a short break then another set of simultaneous workshops and sessions.

The Web Archiving session contained presentations titled "WARCreate and WAIL" by Mat Kelly (@machawk1) of Old Dominion University and "DuraCloud and Archive-It Integration: Preserving Web Collections" by Carissa Smith of Duracloud.

The Digital Preservation Services session contained presentations titled "Digital Preservation Network" by David Minor of UCSD and "Integrating Repositories for Research Data Sharing" by Stephen Abrams of UCC.

The Graduate Curriculum in Digital Preservation session was panel-like involving Jane Zhang of Catholic University, Anthony Cocciolo of Pratt Institute, Kara Van Malssen of AudioVisual Preservation Solutions and Jefferson Bailey of Metropolitan New York Library Council.

The Digital Stewardship Tools from the Library of Congress session contained sessions titled "EDeposit and DMS" by Anupama Rai and Laura Graham of Library of Congress, "NDNP/ChronAm" by David Brunton of Library of Congress and "Viewshare" by Camille Salas of Library of Congress.

The Project Pitching session required prior sign-up and involved three funding agencies: Institute of Museum and Library Services, National Historical Publications and Records Commission, and National Endowment for the Humanities.

The final session of the day was another panel titled "Innovative Approaches to Digital.

Innovative Approaches to Digital Stewardship

Amy Robinson of EyeWire.

Rodrigo Davies of MIT Center for Civic Medias

Aaron Straup Cope of Cooper-Hewitt Museum Labs

After Aaron's presentation, the three presenters fielded questions.

With the completion of the panel, the conference wrapped-up and the crowd was adjourned from the conference.

— Mat