iipc netpreserve.org contact
site search with google:
who we are
member archives
for members
join the iipc
working groups
press releases
conferences and

2012 general assembly

International Internet Preservation Consortium
2012 General Assembly
April 30 - May 4, 2012


Tweeted using #IIPC12

Monday 30, April 2012 - "The Broad Value of Web Archives: Demonstrated Use" - Open to the public.
Time   Notes
8:30 am Breakfast Mumford
9:00 am Welcome & Framing remarks  
9:20 am Researcher Use Cases  
  A decade and a half of archiving the web for data mining: Lessons learned and how users use web archives Kalev Leetaru, University of Illinois
  How web archives are used in the Text REtrieval Conference (TREC) Ian Soboroff, National Institute of Standards and Technology (NIST)
10:25 am Break Mumford
11:00 am Using web materials in researching contemporary terrorism Bruce Hoffman, Georgetown University
  The Challenges of Researching the Social Web Stuart W. Shulman, University of Massachusetts
11:40 am Discussion  
12:00 pm Lunch Montpelier
1:00 pm Trends in Archive Use  
  Data Mining in News Data from Multiple Media Claude Mussou & David Rapin, Institut national de l'audiovisuel (InA)
  Trends in Pandora Monica Omodei, National Library of Australia (NLA)
  Actual and potential users of the BnF web archives: experiences and expectations Clément Oury and Peter Stirling, Bibliothèque nationale de France (BnF)
2:10 pm Discussion  
2:30 pm Break Mumford
2:45 pm Business Use  
  Web archives for investigation, e-discovery and compliance for the legal industry Rod Wittenberg, Reed Technology and Information Services
  Web archives to meet regulatory, management, e-discovery and cultural heritage needs Mark Williamson, Hanzo Archives
3:25 pm Discussion  
3:40 pm Break Mumford
3:55 pm Use in the Public Sphere  
  Harvesting from the harvest: Automatic extraction of state government publications from web archives Kathleen Kenney, State Library of North Carolina
  How can Web Archives become a critical component of today's Internet? Leïla Medjkoune, Internet Memory
  Web Archiving as part of a Research Library Special Collection: the Latin American Government Documents project Kent Norsworthy, University of Texas at Austin
5:00 pm Adjourn  


Tuesday May 1, 2012 - General Assembly - IIPC members only
Start time   Notes
8:30 am Breakfast Mumford
9:00 am Chair speech Martha Anderson, Library of Congress

Program Officer update Aaron Binns, Internet Archive
  Treasurer update Clément Oury, Bibliothèque nationale de France (BnF)
10:00 am Break Mumford
  Communications & Membership update Abbey Potter, Library of Congress

Website redesign presentation  

New member presentations Mumford
  Web Archiving at Columbia University: Collecting Web Content for Research Robert Wolven, Columbia University Libraries
12:00 pm Lunch Montpelier
  Web Archives at George Washington University Daniel Chudnov, George Washington University
  Estonian Web Archive: Preserving the Estonian Mind Jaanus Kõuts, National Library of Estonia
  Los Alamos National Laboratory Herbert Van de Sompel, Los Alamos National Laboratory
  Project Updates  
  IIPC Memento Aggregator Robert Sanderson, Los Alamos National Laboratory
  How to fit in? Integrate a web archiving program in your organization Clément Oury, Bibliothèque nationale de France (BnF)

JhoNAS, WARC support in JHove2 and NetarchiveSuite Nicholas Clarke, Netarchive.dk

Twittervane Helen Hockx-Yu, British Library
2:30 pm Break Mumford
Member Updates & Announcements  
  Library of Congress Web Archives Update Abbie Grotke & Nicholas Taylor, Library of Congress
  HIVE for LC Web Archives: Web Archives and Automatic Subject Indexing Rick Fitzgerald, Library of Congress; Craig Wills, UNC

International Digital Exchange Assessment (IDEA) Megan Caverly, Library of Congress
  Leveraging Web archives Research Leïla Medjkoune, Internet Memory
  Web Archiving in 2012 at National Diet Library Masaki Shibata, National Diet Library
3:45 pm Break Mumford
  Challenges and Opportunities in the Absence of Legal Deposit: Web Harvesting for the US Government Printing Office and the US Federal Depository Library Program David Walls, Government Printing Office
  British Library Update Helen Hockx-Yu, British Library
  SCAPE Update Barbara Sierman, National Library of the Netherlands
  The Spanish Legal Deposit Law: knitting the web for digital resources Mar Pérez Morillo, National Library of Spain
  Havel Collection Update Czech National Library
5:00 pm Adjourn  
6:30 pm Reception Great Hall


Wednesday May 2, 2012 - Working Group meetings - IIPC members only
Working Group Presentations
Access Working Group  
Preservation Working Group  
Harvesting Working Group
Biblioteca Alexandrina
Heritrix User Group  
Steering Committee meeting  


Thursday May 3 2012 - Workshops & Cross Working Group meetings - IIPC members and invited guests
Working Group Presentations
Web Lifecycle Management Web Archiving ‘Lifecycles’ Workshop
Netarchive Workshop Agenda
Legal Roundtable
Harvesting and Preserving the Future Web Future of the Web Workshop IIPC Future of the Web
Workshop – Introduction & Overview (Draft)

Los Alamos National Laboratory Research Library


Friday May 4, 2012 - IIPC GA - Workshops -IIPC members and invited guests
Workshops Presentations
Crowdsourcing workshop The Crowd & the Library

UDFR Unified Digital Format Registry (UDFR) Understanding the System and Service
ISO workshop on metrics and quality
Workshop on quality indicators
New collections, new measures: metrics and quality indicators for web archives


Meeting and Workshop descriptions

This agenda is a draft and subject to changes and updates.

Access Working Group
9am - 10:30am: Updates from AWG member institutions, research initiatives, product/tool enhancements/demos, etc. (e.g. QA module for WCT, Access2Preserve, launches, recent policy decisions, legislation or other legal impacts, etc.)
10:30am BREAK
11am - 11:45am: Memento Project discussion
11:45am - 12:30pm: Olympics 2012 Curation/Planning/Crawl
LUNCH 12:30-1:30pm
1:30pm-2:30pm AWG 'Birds of a Feather' discussion sessions
Access Working Group members will meet as a single group or in small groups to discuss key challenges and/or initiatives etc. they need to/plan to tackle in the next year and identify any possible areas they'd like IIPC help/involvement. It will be self organized as Helen & Kris will be in the SC meeting.
2:30pm BREAK
AWG meetings conclude for the day following the break.

Crowdsourcing workshop
Lead by Trevor Owens, Library of Congress

The web is a social platform, built by people and organizations for people and organizations. Web archives are, to all extent and purposes, no different. Yet the disparity between the number of people involved in developing the web, and the number of people involved in archiving the web, is enormous. This proposal seeks to investigate how crowdsourcing web archiving activities may begin to redress that balance and increase the amount of manpower available to throughout all stages of the web archiving workflow in member institutions.

  • Session 1 on crowdsourcing for cultural heritage (summarizing discussion paper)
  • Session 2 presenting a generic model around which to frame discussions (either the generic workflow model developed by IIPC DPWG in 2009 or the digital content & crowdsourcing lifecycle model (Oomen & Aroyo, 2011)
  • Session 3 with breakout groups representing different types of web archive collections, developing, enhancing, adding to or ruling out ideas floated in discussion paper, followed by feedback
  • Session 4 group brainstorm on common issues or themes, barriers to implementation, critical success factors.           

Participation will be open to all members, though capped at 24.

Harvesting and Preserving the Future Web
Facilitated by Kris Carpenter, Internet Archive

The Web's initial content model was the document. Great progress has been made in collecting and preserving this Web for future scholars. The Web is evolving to a content model that is a programming environment with services. This meeting's topic is the much more difficult problem of collecting and preserving this future Web. To frame ideas and thoughts, we are bringing together people working on various aspects of this, such as collecting AJAX and HTML5, synchronizing Web resources, and preserving Web services such as scientific workflows, together with institutions with an interest in preserving the future Web. The goal from this meeting is to begin to scope the challenges.

Harvesting Working Group
Status of the HWG (Kristinn Sigurðsson)
Follow up on items from last meeting (Kristinn Sigurðsson)
Browsers as crawlers (David Rapin)
DeDuplication (Youssef Eldakar)
WARCs (Kristinn Sigurðsson)
Other topics, general discussion

Heritrix User Group

  • Introductions
  • Heritrix 3 training (emphasis on advance training)
  • Best Practice Birds of a Feather discussions (small groups)
  • Closing Thoughts/Recommendations for Future Meetings

ISO workshop on metrics and quality
Lead by Dr. Clément Oury, Bibliothèque nationale de France (National Library of France)

The goal of the workshop is to present and discuss the main outcomes of the report: the statistics and quality indicators chosen to evaluate collection development, collection characterization, collection access and usage, collection preservation, and web archiving costs. The workshop will provide as illustrations real-case institution examples: members of the WG will present how they gather and use the proposed indicators. This workshop is open to all kind of institutions. Attendees are encouraged to present their own experiences and bring examples of commonly used indicators within their institution.

Legal Roundtable
Web archivists meet lawyers: how can legislation (or lack thereof) encourage or limit your web archiving program?

This roundtable will facilitate an open discussion between web archivists, expert practitioners and lawyers in order to discuss and compare the impact of international and national legislations and policies on web archiving activities. This discussion is open to all participants who have some background, or interest in, legal matters and want to raise practical or prospective questions as to the impact of legislation on their day-to-day work and advocacy efforts to promote web archiving. What are the current challenges and how can IIPC help?

Netarchive Workshop
NetarchiveSuite is a complete web archiving open source software package. It gives the ability to prepare, schedule, run and monitor harvests of websites. It also enables to perform quality assurance and preserve harvested content. See more information on:http://netarchive.dk/suite/ and https://sbforge.org/jira/browse/NAS.

NetarchiveSuite is currently developed for production purposes and maintained by the NetarchiveSuite community, which includes the following institutions: Netarkivet.dk (State and University Library in Aarhus and The Royal Library in Copenhagen), Denmark, the National Library of France (BnF), and the National Library of Austria (ONB).

Preservation Working Group
Status updates of the current work packages and activities. Will also include a demo of JHONAS-TO BE RECORDED probably using webex.
- Discussion on strategies for preservation for web archiving
- What is the current status on this topic in each institution?
- Is there an existing example of a relevant obsolete file format?
- How can we deal with preservation issues on basis of current used tools and systems?
- Do we need a common pilot project?
New work packages and activities
- What are planned activities in each institutions in this area?
- In which areas would it be helpful to work together?

Steering Committee
Will be disseminated to members by separate correspondence

Web Lifecycle Management
Will discuss the web archiving lifecycle in terms of tools and workflow, how it is evolving and what that means for our infrastructure and architectures. We will also discuss how spontaneous archiving of world events affect the "typical" lifecycle for web archiving.

The meeting is open to all interested members of the preservation community; space is limited, however, and prior registration is required. The meeting will include technical presentations on the UDFR architecture and code walkthroughs of the major components of its open source technology stack, including OntoWiki, RDFAuthor, Virtuoso, Zend, PHP, Apache, and Noid; code walk-throughs; and a review of the four main ontological models: OntoWiki system configuration, UDFR user profiles, UDFR class and property ontology, and UDFR data.


Abigail Potter, IIPC Communications Officer

Valid XHTML 1.0! top | © 2004-2011 IIPC | copyright and privacy statements | credits