Web Science and Digital Libraries Research Group: Stanford

Showing posts with label Stanford. Show all posts

Monday, April 17, 2017

2017-04-17: Personal Digital Archiving 2017

On March 29-30, 2017 I attended Personal Digital Archiving Conference 2017 (#pda2017) held at Stanford University in sunny Palo Alto, California. Other members of the Web Science and Digital Libraries Research Group (WS-DL) had previously attended this conference (see their 2013, 2012, and 2011 trip reports) and from their rave reviews of previous year's conferences, I was looking forward to it. I also just happened to be presenting and demoing the Web Archiving Integration Layer (WAIL) there as an added bonus.

Day 1

Day one started off at 9am with Gary Wolf giving the first keynote on Quantified Self Archives. Quantified Self Archives are comprised of data generated from health monitoring tools such as the FitBit or life blogging data which is used to gain in sites into your own life through data visualization.

Gary Wolf kicking off #pda2017. pic.twitter.com/X3pUOEddNs
— John Berlin (@johnaberlin) March 29, 2017

@agaricus Gary Wolf talking abt the quantified self. groups meet to discuss #lifelogging & self tracking https://t.co/MnNPDJcYiA… #pda2017
— Melody Condron (@MTbeekeeper) March 29, 2017

After the keynote was the first session Research Horizons moderated by WS-DL alumni, Yasmina Anwar.

@yasmina_anwar moderating Research Horizons #pda2017 @WebSciDL pic.twitter.com/OJfBrsfx2S
— John Berlin (@johnaberlin) March 29, 2017

The first talk of this session was Whose Life Is It, Anyway? Photos, Algorithms, and Memory (Nancy Van House, UC Berkeley). In the talk, Van House spoke on the effects of "faceless" algorithms on images and how they can distort the memory of the images they are applied to in many personal archives. Van House also spoke about how machine learning techniques when done in aggregate on images without context can have unintended consequences, especially when attempting to detect emotion. To demonstrate this, Van House showed a set of images tagged with the emotion of Joy one of which was a picture of an avatar from the online life simulator Second Life.

Nacy Van House speaking on algorithms their use on images #pda2017 pic.twitter.com/QNRPtljJDD
— John Berlin (@johnaberlin) March 29, 2017

Van House presentation on search algorithms is fascinating: showing the flaws in context, tagging, computer matching & recognition #pda2017
— Melody Condron (@MTbeekeeper) March 29, 2017

Fb manipulates what we see, algorithms -how do they work, who developed them, controlled vocab. Can be a serious problem #pda2017
— Chaitra Powell (@ChaitraPeezy) March 29, 2017

The second talk was Digital Workflow and Archiving in the Humanities and Social Sciences (Smiljana Antonijevic Ubois, Penn State University). Ubois spoke on the many ways scholars use non-traditional archives such as Dropbox or photos taken by their smartphones to preserve their work. One of the biggest points brought up in the talk by Ubois was that humanities and social sciences scholars still see the web as a resource rather than home to a digital archive.

Smiljana Antonijevic Ubois: Digital Workflow and Archiving in the Humanities and Social Sciences #pda2017 pic.twitter.com/hdgmL5XWrk
— John Berlin (@johnaberlin) March 29, 2017

#pda2017 @Smiljana_A scholars opt for documenting research materials/ archival documents through smart phone photos.
— robin margolis (@poeticdoxa) March 29, 2017

The third talk was Mementos Mori: Saving the Legacy of Older Performers (Joan Jeffri, Research Center for Arts & Culture/The Actors Fund). In the talk, Jeffri spoke on the efforts being made to document and preserve the works of artists by the performing arts legacy project. The project found that one in five living artists in New York had no documentation of their work especially the older artists.

Joan Jeffri Mementos Mori: Saving the Legacy of Older Performers #pda2017 pic.twitter.com/PBABb9Wt5n
— John Berlin (@johnaberlin) March 29, 2017

just heard joan jeffri speak on https://t.co/MakGTuAXuD at #pda2017 @ecolleary
— Abby Adams (@digarchivist) March 29, 2017

The final talk in the session was Exploring Personal Financial Information Management Among Young Adults (Robert Douglas Ferguson, McGill School of Information Studies). Douglas spoke on the passive preservation i.e usage of web portal and tools provided by financial services, done by young adults when it comes to managing their money and the need to consider long-term preservation of these materials.

Robert Douglas: Exploring Personal Financial Information Management Among Young Adult #pda2017 pic.twitter.com/RsFy6Vze2V
— John Berlin (@johnaberlin) March 29, 2017

#pda2017 checks and paper-based financial records can have sentimental and social values, especially to older adults - marking time, events
— Lotus Norton-Wisla (@lnortonwisla) March 29, 2017

Session two was Preserving & Serving PDA at Memory Institutions moderated by Glynn Edwards.

Glynn Edwards moderating: Preserving & Serving PDA at Memory Institutions #pda2017 pic.twitter.com/px1ikiqnfL
— John Berlin (@johnaberlin) March 29, 2017

This session started off with Second-Generation Digital Archives: What We Learned from the Salman Rushdie Project (Dorothy Waugh and Elizabeth Russey Roke, Emory University). In 2010, Emory University announced the launch of the Salman Rushdie Digital Archives. This reading room kiosk offered researchers at the Manuscript, Archives, and Rare Book Library the opportunity to explore born-digital material from one of four of Rushdie’s personal computers through dual access systems. One of the biggest lessons learned noted by Waugh was the need to document everything the software engineers do as their work is just as ephemeral as the born digital information they wished to preserve.

Dorothy Waugh & Elizabeth Russey Roke,
Second-Generation Digital Archives: What We Learned from the Salman Rushdie Project #pda2017 pic.twitter.com/d2mfv6OQUN
— John Berlin (@johnaberlin) March 29, 2017

#pda2017 Emory opted to provide two tiers of access in reading room for Rushdie CPUs. 1) PDFs of content 2) emulation of CPU environment
— robin margolis (@poeticdoxa) March 29, 2017

After Waugh was Composing an Archive: the personal digital archives of contemporary composers in New Zealand (Jessica Moran, National Library of New Zealand). In recent years the Library has acquired the digital archives of a number of prominent contemporary composers. Moran discussed the personal digital archiving practices of the composer, the composition of the archive, and the work of the digital archivists, in collaboration with curators, arrangement and description librarians, and audio-visual conservators, to collect, describe, and preserve this collection.

Jessica Moran Composing an Archive: the personal digital archives of contemporary composers in New Zealand #pda2017 pic.twitter.com/kbMnaWztFY
— John Berlin (@johnaberlin) March 29, 2017

#pda2017 @jessicammoran shares on composer personal archive featuring musician stand-bys of floppy disks and drafts of composition on Logic!
— robin margolis (@poeticdoxa) March 29, 2017

The final talk in session two was Learning from users of personal digital archives at the British Library (Rachel Foss, The British Library). Foss discussed the efforts made by the British Library to provide access to their digital collections that require emulation to viewed. Foss disscused that arhiving professionals also need to consider how we assist and educate our researchers to make use of born-digital collections implying understanding more about how they want to interrogate these collections as a resource.

Rachel Foss Learning from users of personal digital archives at the British Library #pda2017 pic.twitter.com/JgvyFBQKgk
— John Berlin (@johnaberlin) March 29, 2017

#pda2017 Rachel Foss shares choice to allow user access to unreadable files, offering teaching moments around digital preservation
— robin margolis (@poeticdoxa) March 29, 2017

Lunch happened. Session 3 Teaching PDA moderated by Charles Ransom.

Charles Ransom moderating Teaching PDA #pda2017 pic.twitter.com/jcbQCwsOSO
— John Berlin (@johnaberlin) March 29, 2017

Journalism Archive Management (JAM): Preparing journalism students to manage their personal digital assets and diffuse JAM best practices into the media industry (Dorothy Carner & Edward McCain, University of Missouri). In collaboration with MU Libraries and the school’s Donald W. Reynolds Journalism Institute, a personal digital archive learning model was developed and deployed in order to prepare journalism-school students, faculty and staff for their ongoing information storage and access needs. The MU J-School has created a set of PDA best practices for journalists and branded it: Journalism Archive Management (JAM).

Model for Journal Archive Management #JAM #pda2017 #Journals pic.twitter.com/4acegUqGqc
— Yasmina Anwar (@yasmina_anwar) March 29, 2017

Journalism Archive Management (JAM) program @Mizzou teaches communications students to create, label, store, find +reuse digi files #pda2017
— Robin M. Katz (@robinmkatz) March 29, 2017

An archivist in the lab with a codebook: Using archival theory and “classic” detective skills to encourage reuse of personal data (Carly Dearborn, Purdue University Libraries). Dearborn designed a workshop inspired by the Society of Georgia Archivists’ personal digital archiving activities to introduced attendees to archival concepts and techniques which can be applied to familiarize researchers with new data structures.

Carly Dearborn: An archivist in the lab with a codebook #pda2017 pic.twitter.com/dfHFauB8y4
— John Berlin (@johnaberlin) March 29, 2017

W her campus population, @carlydearborn found data producers just wanted step-by-step instructions, not to think critically abt it #pda2017
— Robin M. Katz (@robinmkatz) March 29, 2017

Session 4: Emergent Technologies & PDA 1 moderated by Nicholas Taylor

Nicholas Taylor moderating Emergent Technologies & PDA 1#pda2017 pic.twitter.com/BjsIcP1je9
— John Berlin (@johnaberlin) March 29, 2017

Cogifo Ergo Sum: GifCities & Personal Archives on the Web (Maria Praetzellis & Jefferson Bailey, Internet Archive). In the talk Praetzellis and Bailey spoke on the gif archive GifCities created for the Internet Archives 20th anniversary which included a search interface. The GeoCities Animated GIF Search Engine, comprising over 4.6 million animated GIFs from the GeoCities web archive. Each GIF links back to the archived GeoCities web page upon which it was originally embedded. The search engine offers a novel, flabbergasting window into what is likely one of the largest aggregations of publicly-accessible archival personal documentary collections. It also provokes a reassessment of how we conceptualize personal archives as being both from the web (as historical encapsulations) and of the web (as networked recontextualization).

Maria Praetzellis & Jefferson Bailey
Cogifo Ergo Sum: GifCities & Personal Archives on the Web #pda2017 pic.twitter.com/Sufvb56EHE
— John Berlin (@johnaberlin) March 29, 2017

@jefferson_bail & maria praetzellis present on geocities and personal archives on the web #pda2017 pic.twitter.com/ZswptEB5nV
— ePADD (@e_padd) March 29, 2017

Comparison of Aggregate Tools for Archiving Social Media (Melody Condron). In the talk Condron spoke about many tools which could make archiving social media easier: Frostbox, If This Then That and digi.me. Of all the tools mentioned If This Then That provided the easiest way for its users to push social media into archives such Internet Archive or Webrecorder.

Melody Condron Comparison of Aggregate Tools for Archiving Social Media #pda2017 pic.twitter.com/1mBxFtdfrS
— John Berlin (@johnaberlin) March 29, 2017

"You can make Twitter to turn into yr coffeemaker", @MTbeekeeper! Gr8 tools for aggregating social media #pda2017 pic.twitter.com/i2zGcprOlG
— Yasmina Anwar (@yasmina_anwar) March 29, 2017

Video games collectors and archivists: how might private archives influence archival practices (Adam Lefloic Lebel, University of Montreal)

Adam Lefloic Lebel, Video games collectors and archivists: how might private archives influence archival practices #pda2017 pic.twitter.com/5BXVGz0sNm
— John Berlin (@johnaberlin) March 29, 2017

Collectors help with preserving games - Research from the University of Montreal @melswal #pda2017 pic.twitter.com/iVO6MCxRBy
— Sarah Slade (@sladiladi) March 29, 2017

Demonstrations:
There were two different demonstration sessions the first was between session 4&5 and the second was at the end after session 6.

.@johnaberlin from @WebSciDL @oducs is demoing #WAIL @StanfordLibs #pda2017 pic.twitter.com/htNqLae4Ib
— Yasmina Anwar (@yasmina_anwar) March 29, 2017

The demo for the Web Archiving Integration Layer (WAIL) consisted of two videos and myself talking to those who stopped by about the particular use cases of WAIL or answering any questoins they had about WAIL. The first is viewable below which is detailed feature walkthrough of WAIL and the second was showing off WAIL in action.

Session 5: Emergent Technologies & PDA 2 moderated by Henry Lowood

CiteTool: Leveraging Software Collections for Historical Research (Eric Kaltman, UC Santa Cruz) Kaltman spoke about how the tool is currently being used in a historical exploration of the computer game DOOM as a way to compare conditions across versions and to save key locations for future historical work. Since the tool provides links to saved locations, it is also possible to share states amongst researchers in collaborative environments. The links also function as an executable citation in cases where an argument about a program’s functionality is under discussion and would benefit from first-hand execution.

Applying technology of Scientific Open Data to Personal Closed Data (Jean-Yves Le Meur, CERN) Le Meur explained the methodology and technologies developed (partly at CERN) to preserve scientific data (like High Energy Physics) could be re-used for Personal restricted data. Existing initiatives to collect and preserve for very long term the personal data from individuals will first be reviewed, as well as a few examples of well established collective memory portals. Solutions implemented for Open data in HEP will then be compared, looking at the guiding principles and underlying technologies. Finally, a proposal to foster a solid shared platform for closed Personal Data Archive will be drafted on the model of Open Scientific Data Archives.

Personal Data and the Personal Archive (Chelsea Gunn, University of Pittsburgh) Gunn questioned if quantified self and lifelogging application are forms of personal data as a part of our personal archives, or do they constitute a form of ephemera, useful for the purposes of tracking progress toward a goal, but not of long-term interest?

Using Markdown for PDA Interoperability (Jay Datema, Stony Brook University). The only thing you can count on with born-digital projects is that you will have to migrate the content at some point. But having done digital library development for over a decade, I'd like to talk about simple text, and a problem that has a proven solution. Markdown is an intermediate step between text and HTML. If you're writing anything that requires an HTML link, its shortcuts are worth learning. Most web applications rely on the humble submit button. Once text goes in, it becomes part of a database backend. To extract it, it may require a set of database calls, or parsing a SQL file, or hoping that someone wrote a module to let you download what you entered.

Session 6 PDA The Arts moderated by Kate Tasker

From Virtual to Reality: Dissecting Jennifer Steinkamp’s Software-Based Installation (Shu-Wen Lin, New York University) Lin spoke about time-based and digital art combines media and technology that challenges traditional conservation practices while requiring dedicated care from working with Steinkamp’s animated installation Botanic that was exhibited in Times Square Arts: Midnight Moment. Lin's talk focused on the internal structure and relationship between the software used which was Maya, After Effects, scripts, and final deliverables. Lin also spoke about provide a risk assessment that will enable museum professionals as well as the artist herself to identify sustainability and compatibility of digital elements in order to build a documentation that can collect and preserve the whole spectrum of digital objects related to the piece.

The PDAs of Others: Completeness, Confidentiality, and Creepiness in the Archives of Living Subjects (Glen Worthey, Stanford University) The title and inspiration for Worthey's presentation came from the 2006 German film Das Leben der Anderen, which dramatized the covert monitoring of East Germans. Although the biography was "authorized", Worthy spoke on how the process of gathering and documenting materials often reveals tensions between completeness and a respect for privacy; between on-the-record and off-the-record conversations; between the personal and the professional; between the probing of important questions and voyeuristic-seeming observation of the subject's complex inner life.

RuschaView 2.0 (Stace Maple, Stanford University) In 1964, LA Painter, Ed Ruscha put a Nikon Camera in the back of his truck, drove up and down Sunset Strip and shot what would become a continuous panorama of "Every Building on the Sunset Strip" (1966). Maples talk highlighted both Ruscha's multi-decade project, as well as Maple's multi-month attempt to create the metadata required to reproduce something like Ruscha's "Every Building..." publication, in a digital context.

(Pete Schreiner, NCSU) Between 2003-2013 an associated group of independent rocks bands from Bloomington, Indiana shared a tour van. When the owner, a librarian, was preparing to move across the country in 2014, Pete Schreiner, band member and proto-librarian decided to preserve this esoteric collection of local music-related history. Subsequently, as time allowed, he created an online collection of the photographs using Omeka. This case study presents a guerrilla archiving project, issues encountered throughout the process, and attempts to find the balance between professional archiving principles and getting it done.

Day 2

Due to request of a presenter(s) who did not want their slides material recorded/show too others beyound the attendies no photos were taken

Session 7 Documenting Cultures Communities moderated by Michael Olson

(Anna Trammell, University of Illinois) Trammell's talk discussed the experience gained from forming relationships and building trust with the student organizations at the University of Illinois, capturing and processing their digital content, and utilizing these records in instruction and outreach.

(Jennifer Douglas, University of British Columbia) Online grieving and intimate archives: a cyberethnographic approach (Jennifer Douglas, University of British Columbia) Douglas presented a short paper discussing the archiving practices of the community of parents grieving stillborn children. In that paper, Douglas demonstrated how these communities functioned as aspirational archives, not only preserving the past, but creating a space in the world for their deceased children. Regarding the ethics of online research and archiving, Douglas' paper introduced the methodology of cyberethnography and explored its potential connections to the work of digital archivists.

(Barbara Jenkins, University of Oregon) In the talk Jenkins spoke on the development of an Afghanistan personal archives project which was created in 2012 and was able to expand its scope through a short sabbatical supported by the University of Oregon in 2016. The Afghanistan collection Jenkins was able to build combines over 4,000 slides, prints, negatives, letters, maps, oral histories, and primary documents.

Session 8 Narratives Biases Pda Social Justice moderated by Kim Christen

Andrea Pritchett, co-founder of Berkeley Copwatch, Robin Margolis, UCLA MLIS in Media Archives, and Ina Kelleher presented a proposed design for a digital archive aggregating different sources of documentation toward the goal of tracking individual officers. Copwatch chapters operate from a framework of citizen documentation of the police as a practice of community-driven accountability and de-escalation.

Stacy Wood, PhD candidate in Information Studies at UCLA, discussed the ways in which personal records and citizen documentation are embedded within techno-socio-political infrastructural arrangements and how society can reframe these technologies as mechanisms and narratives of resistance.

Session 9 PDA And Memory moderated by Wendy Hagenmaier

Interconnectedness: personal memory-making on YouTube (Leisa Gibbons, Kent State University) Gibbons spoke about the use of YouTube as a personal memory-making space and research questions concerning what conceptual, practical and ethical role institutions of memory have in online participatory spaces and how personal use of online technologies can be preserved as evidence.

(Sudheendra Hangal & Abhilasha Kumar, Ashoka University) This talk was about Cognitive Experiments with Life-Logs (CELL) and how it is a scalable new approach to measure recall of personally familiar names using computerized text-based analysis of email archives. Regression analyses revealed that accuracy in familiar name recall declined with the age of the email, but increased with greater frequency of interaction with the person. Based on those findings, Hangal and Kumar believe that CELL can be applied as an ecologically valid web-based measure to study name retrieval using existing digital life-logs among large populations.

(Frances Corry, University of Southern California) Corry spoke about the built-in feature on most smartphones, tablets, and computers today, and how these tool enables users to “photograph” what rests on the surface of their screens. These “photographs” rather screenshots were presented as a valuable tool worthy of further attention in digital archival contexts.

Session 10 Engaging Communities In PDA 1 moderated by Martin Gengenbach

Introducing a Mobile App for Uploading Family Treasures to Public Library Collections (Natalie Milbrodt, Queens Public Library) The Queens Public Library in New York City has developed a free mobile application for uploading scanned items, digital photos, oral history interviews and “wild sound” recordings of Queens neighborhoods for permanent safekeeping in the library’s archival collections. It allows families to add their personal histories to the larger historical narrative of their city and their country. The tool is part of the programmatic and technological offerings of the library’s Queens Memory program, whose mission is to capture contemporary history in Queens.

(Russell Martin, District of Columbia Public Library) The Memory Lab (Russell Martin, District of Columbia Public Library) The Memory Lab at District of Columbia Public Library is a do-it-yourself personal archiving space where members of the public can digitize outdated forms of media, such as VHS, VHS-C, mini DVs, audio cassettes, photos, slides, negatives and floppy disks. Martin's presentation consists of how the Memory Lab was developed by a fellow from the Library of Congress' National Digital Stewardship Residency, budget for the lab, equipment used and how it is put together, training for staff and the public, as well as success stories and lessons learned.

(Wendy Hagenmaier, Georgia Tech) Hagenmaier's presentation outlined the user research process of the retroTECH team to inform the design of the carts, offer an overview of the carts’ features and use cases, and reflected on where retroTECH’s personal digital archiving services are headed. retroTECH aims to inspire a cultural mindset that emphasizes the importance of personal archives, open access to digital heritage, and long-term thinking.

The Great Migration (Jasmyn Castro, Smithsonian NMAAHC) Castro presented the ongoing film preservation efforts at the Smithsonian for the African American community and how the museum invite visitors to bring their home movies into the museum and have them inspected and digitally scanned by NMAAHC staff.

Session 11 Engaging Communities In Pda 2 moderated by Mary Kidd

Citizen archive and extended MyData principles (Mikko Lampi, Mikkeli University of Applied Sciences) Lampi spoke about how Digitalia – Research Center on Digital Information Management – is developing a professional-quality digital archiving solution available for common people. The Citizen archive relies on an open-source platform allowing users to manage their personal data and ensure access to it on a long-term basis. MyData paradigm is connected with personal archiving by managing coherent descriptive metadata and access rights, while also ensuring privacy and usefulness.

Born Digital 2016: Collecting for the Future (Sarah Slade, State Library Victoria) Slade presented Born Digital 2016: collecting for the future a week-long national media and communications campaign to raise public awareness of digital archiving and preservation and why it matters to individuals, communities and organizations. The campaign successfully engaged traditional television and print media, and online news outlets, to increase public awareness of what digital archiving and preservation is and why it is important.

Whose History? (Katrina Vandeven, MLIS Candidate, University of Denver) Vandeven discussed the macro appraisal and documenting intersectionality within the Women's March on Washington Archives Project, where it went wrong, possible solutions to documenting intersectionality in activism, and introduced the Documenting Denver Activism Archives Project.

Bring Personal Digital Archiving 2017 to a close was Session 12 PDA Retrospect And Prospect Panel moderated by Cathy Marshall

Howard Besser, Clifford Lynch and Jeff Ubois discussed how early observers and practitioners of personal digital archiving will look back on the last decade, and forward to the next, covering changing social norms about what is saved, why, who can view it, and how; legal structures, intellectual property rights, and digital executorships; institutional practices, particularly in library and academic settings, but also in the form of new services to the public; market offerings from both established and emerging companies; and technological developments that will allow (or limit) the practice of personal archiving.

- John

Saturday, May 9, 2015

2015-05-09: IIPC General Assembly 2015 Trip Report

The day before International Internet Preservation Consortium (IIPC) General Assembly 2015 we landed in San Francisco and some delicious Egyptian dishes were waiting for us. Thank you Ahmed, Yasmin, Moustafa, Adrian, and Yusuf for hosting us. It was a great way to spend the evening before IIPC GA and we were delighted to see you people after long time.

@WebSciDL reunion before #iipcGA15 @phonedude_mln @mart1nkle1n @yasmina_anwar @ibnesayeed @hvdsomp @mousta pic.twitter.com/orSFWGULSW
— Ahmed AlSum (@aalsum) April 27, 2015

Day 1

We (Sawood Alam, Michael L. Nelson, and Herbert Van de Sompel) entered in the conference hall a few minutes after the session was started and Michael Keller from Stanford University Libraries was about to leave the stage after the welcome speech. IIPC Chair Paul Wagner gave brief opening remarks and invited the keynote speaker Vinton Cerf from Google on the stage. The title of the talk was "Digital Vellum: Interacting with Digital Objects Over Centuries" and it was such an informative and delightful talk. He mentioned that the high density low cast storage media is evolving, but the devices to read them might not last long. While mentioning Internet connected picture frames and surf boards he added, we should not forget about the security. To emphasize the security aspect he gave an example that grand parents would love to see their grand children in those picture frames, but will not be very happy if they see something which they do not expect.
Moving on to software emulators he invited Mahadev Satyanarayanan from Carnegie Mellon University to talk about their software archive and emulator called Olive Archive. Satya gave various live demos including the Great American History Machine, ChemCollective (a copy of the website frozen at certain time), PowerPoint 4.0 running in Windows 3.1, and the Oregon Trail, all powered by their virtual machines and running in a web browser. He also talked about the architecture of the Olive Archive and how in future multiple instances can be launched and orchestrated to emulate the subset of the Internet for applications that rely on external services where some instances might run those services independently.
In the QA session someone asked Cerf, how to ask big companies like Google to provide the data about their Crisis Response efforts for archiving after they are done with it? Cerf responded, "you just did." while acknowledging the importance of such data for archival. Here are some tweets that were capturing the moments:

High density, low cost storage media, but the devices to read them may not last long, says @vgcerf #iipcGA15
— Michael Widner (@mwidner) April 27, 2015

@vgcerf explains that there are not many places storing software, which you need in the future (to interpreting the bits). #iipcga15
— Helen Hockx (@hhockx) April 27, 2015

Cerf not sure that Google keeps its own software very long if it’s no longer used #iipcGA15
— Jane Winters (@jfwinters) April 27, 2015

Now seeing an example of the Great American History Machine, created in the late 1980s. Working now! #iipcGA15 pic.twitter.com/ZhxHP0zYFz
— Ian Milligan (@ianmilligan1) April 27, 2015

#iipcga15 Mahadev Satanarayanan highlights how @OliveArchive cleanly separates VM storage & execution, and how entire stack is open source
— Ina DL Web (@inadlweb) April 27, 2015

#iipcGA15 Awesome keynotes! Time for (hard) questions. pic.twitter.com/AcoHtkTs60
— Sabine Hartmann (@skhartmann) April 27, 2015

After the break Niels Brügger and Ditte Laursen presented their case study of Danish websphere under the title "Studying a nation's websphere over time: analytical and methodological considerations". Their study covered website content, file types, file sizes, backgrounds, fonts, layout and more importantly the domain names. They also raised the points like size of the ".dk" domain, geolocation, inter and intra domain link network, and if the Danish websites are actually in Danish language? They talked about some crawling challenges. Their domain name analysis tells that only 10% owners own 50% of all the ".dk" domains. I suspected that this result might be due to the private domain name registrations, so I talked to them later and they said, they did not think about private registrations, but they will revisit their analysis.

Brugger: size of .dk domain, file types, size of individual websites, where are websites located, networks within & between sites #iipcGA15
— Jane Winters (@jfwinters) April 27, 2015

#iipcGA15 Domain owners change. Top 10% of domain owners own 50% of the domains.
— Jackie Dooley (@minniedw) April 27, 2015

Between 2012 and 2015 14% of .dk domains changed ownership #iipcGA15
— Jane Winters (@jfwinters) April 27, 2015

"No 1:1 relation between Danish national archive and the Danish national web domain" #iipcGA15
— Yasmina Anwar (@yasmina_anwar) April 27, 2015

Andy Jackson from the British Library took the stage with his presentation title "Ten years of the UK web archive: what have we saved?". This case study covers three collections including Open Archive, Legal Deposit Archive, and JISC Historical Archive. These collections store over eight billion resources in over 160TB compressed files and now adding about two billion resources per year. With the help of a nice graph he illustrated that not all ".uk" domains are interlinked, so to maximize the coverage the crawlers need to include other popular TLDs such as ".com". He also presented the analysis of reference rot and content drift utilizing the "ssdeep" fuzzy hash algorithm. Their analysis tells that 50% of resources are unrecognizable or gone after oner year, 60% after two years and 65% after three years.

Fascinating - the halo around the middle cluster are .uk sites you can only find through other TLDs. #iipcGA15 pic.twitter.com/NvjJUfcJo0
— Ian Milligan (@ianmilligan1) April 27, 2015

Vanishing .uk domains. @anjacks0n at #iipcGA15 pic.twitter.com/t6H4ba0feV
— Katrin Weller (@kwelle) April 27, 2015

I had lunch together with Scott Fisher from the California Digital Library. I told him about various digital library and archiving related research projects we are working on at Old Dominion University and he described the holdings of his library and the phalanges they have in upgrading their Wayback to bring Memento support.
After the lunch, keynote speaker of the second session Cathy Marshall from the Texas A&M University took the stage with a very interesting title, "Should we archive Facebook? Why the users are wrong and the NSA is right". She motivated her talk by some interview style dialogues with the primary question, "Do you archive Facebook?" and mostly the answer was "No!". She highlighted that people have developed [wrong] sense that Facebook is taking care of their stuff, so they do not have to. She also noted that people usually do not value their Facebook content or they think it has immediate value, but no archival value. In a large survey she asked should Facebook be archived?, three fourth objected and half of them said "No" unconditionally. In the later part of her talk, she build the story of the marriage of Hal Keeler and Joan Vollmer by stitching various cuttings from local news papers. I am not sure if I could fully appreciate the story due to the cultural difference, but I laughed when everyone else did. Although I did follow her efforts and intention to highlight the need of archiving social media for future historians. And if asks me, is NSA is right? my answer would be, "Yes!, if they do it correctly with all the context included."

"click bait" slide for @ccmarshall's talk at #iipcGA15 pic.twitter.com/pW2wJNEO0z
— Michael L. Nelson (@phonedude_mln) April 27, 2015

Interview questions regarding to the preservation of data on fb #iipcGA15 pic.twitter.com/4GmKZ4EVvi
— Yasmina Anwar (@yasmina_anwar) April 27, 2015

Archiving Facebook's public vs. private data: maybe not the same challenge #iipcGA15
— Emmanuelle Bermes (@figoblog) April 27, 2015

C. Marshall: impossible to reconstruct a story like Vollmer's today because you'd have to rely on Facebook volatile data #iipcGA15
— Emmanuelle Bermes (@figoblog) April 27, 2015

The majority of #iipcGA15 attendees love the idea of #facebook archive!!
— Yasmina Anwar (@yasmina_anwar) April 27, 2015

#iipcGA15 try #WAIL and #WARCreate by @WebSciDL @machawk1 to #archive #Facebook https://t.co/jRl0moLpAz
— Sawood Alam (@ibnesayeed) April 27, 2015

Meghan Dougherty from Loyola University Chicago and Annette Markham from Aarhus University presented their talk "Generating granular evidence of lived experience with the Web: archiving everyday digitally lived life". They illustrated how sometimes intentionally or unintentionally people record moments of their life with different media. Among various visual illustrations, I particularly liked the video of a street artist playing with a ring that was posted on Facebook in a very different context than the context it appeared in YouTube. They ended their talk with a hilarious video of Friendster.

#iipcGA15 context matters https://t.co/FaVn23uZob
— Sawood Alam (@ibnesayeed) May 8, 2015

@mdocx1 questions how well web archives capture everyday digital lived life. #iipcGA15
— Helen Hockx (@hhockx) April 27, 2015

Megan Dougherty quote rebecca solnit tyranny of the quantifiable that which can be measured takes priority over that which cannot #iipcGA15
— rosalie lack (@rosalielack) April 27, 2015

"She forgets the camera, or rather treats the laptop camera as a close friend"—@mdocx1 on our relationship w/our digital lives #iipcGA15
— David Moles (@chronodm) April 27, 2015

@mdocx1 suggests a StoryCorp of the Web - link for the original StoryCorp http://t.co/kNG97fqMo4 which goes to @librarycongress #iipcGA15
— Abbie Grotke (@agrotke) April 27, 2015

#iipcGA15 @mdocx1 So funny video on "Friendster discovered by an Internet archeologist" fromThe Onion YT account : https://t.co/nHDl0k6Asd
— Ina DL Web (@inadlweb) April 27, 2015

Susan Aasman from University of Groningen presented her talk "Everyday saving practices: "small data" and digital heritage strategies". This talk was full of motivation, why people should care about personal archive of their daily life moments. She described how the service Kodak Gallery launched in 2001 with the tag-line, "live forever", and closed in 2012 after transferring billions of images to Shutterfy which was only available for US customers. As a result, people from other countries have lost their photo memories. She also played the Bye Bye Super 8 video of Johan Kramer that was amusing and motivating for personal archiving.

#iipcGA15 @aasmanna Family memories as technological memories. Engaging talk about personal archiving. pic.twitter.com/yskvw7FxHY
— Sabine Hartmann (@skhartmann) April 27, 2015

In 2001 Kodak launched a website promising to preserve everyone’s photos online... and failed @ISSN_IC #iipcGA15
— Emmanuelle Bermes (@figoblog) April 27, 2015

Aasman's project "Changing platforms of ritualized memory practices: the cultural dynamics of home movies" http://t.co/ZCkazCMBnb #iipcGA15
— Katrin Weller (@kwelle) April 27, 2015

Johan Kramer - Bye Bye Super 8 https://t.co/3rFSnS5hL8 #iipcGA15
— Michael L. Nelson (@phonedude_mln) April 27, 2015

After a short beak Jane Winters from the Institute of Historical Research, Helen Hockx-Yu from the British Library, and Josh Cowls from the Oxford Internet Institute took the stage with their topic "Big UK domain data for Arts and Humanities" also known as BUDDAH project. Jane highlighted the value of archives for research and described the development of a framework to help researchers leverage the archives. She illustrated the interface of the Big Data analysis of BUDDAH project, described the planned output, and various case studies showing what can be done with that data.

What is a web archive ? https://t.co/TaGnIQBCwF #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

BUDDAH: big uk domain data for arts & humanities http://t.co/dT0lfTYEEq aims at valuing web archives as a material for research #iipcGA15
— Emmanuelle Bermes (@figoblog) April 27, 2015

#iipcGA15 ... inform collection development and access at BL, train researchers in use of big data. Great acronyn: the BUDDHA project.
— Jackie Dooley (@minniedw) April 27, 2015

Helen Hockx-Yu began her talk "Co-developing access to the UK Web Archive" with reference to the earlier talk by Andy. She noted that a scenario that fits everyone's need is difficult. She described the high level requirements including query building, corpus formation, annotation and cuuration, in-corpus and whole-dataset analysis. She illustrated the SHINE interface that provides features like full-text search, multi-facet filters, query history, and result export.

#iipcGA15 @hhockx presents SHINE prototype for advanced academic use of web archive
— Ina DL Web (@inadlweb) April 27, 2015

#iipcGA15 @hhockx SHINE project : FT search, multifacet search, Ngram,trend analysis and access to data behind http://t.co/CTI5U1vx9e
— Ina DL Web (@inadlweb) April 27, 2015

"It's really not a choice between 'big' or 'small' data… what we really need is the flexibility to move between the two."—@hhockx #iipcGA15
— David Moles (@chronodm) April 27, 2015

Finally, Josh Cowls presented his talk about the book "The Web as History: Using Web Archives to Understand the Past and the Present" in which he contributed a chapter. He talked about the four second level domain from ".uk" TLD including ".co.uk", ".org.uk", ".ac.uk", and ".gov.uk" and how they are interlinked. He described the growth of web presence of the BBC and British universities.

Awesome chart!! @joshcowls at #iipcGA15 pic.twitter.com/HpsbIBsJKw
— Michael Corbett (@Reloaded2Boot) April 27, 2015

#iipcGA15 Mapping the UK Webspace: Fifteen Years of British Universities on the Web http://t.co/voe0rXJYT7
— Michael L. Nelson (@phonedude_mln) April 27, 2015

Fun fact #iipcGA15 : to demonstrate the interest of web archives as material for history, write a book and put it online
— Emmanuelle Bermes (@figoblog) April 27, 2015

#iipcGA15 @JoshCowls In the book The web as history a focus on the evolution of BBC online presence
— Ina DL Web (@inadlweb) April 27, 2015

IIPC Chair Paul Wagner concluded the day by emphasizing that we have only started scratching the surface. He also noted in his concluding remarks that the context matters.

Day 2

Herbert Van de Sompel from Los Alamos National Laboratory started the second day sessions by talking about "Memento Time Travel". He started with a brief introduction of the Memento followed by a bag full of announcements. For the ease of use in JavaScript clients, Memento now supports JSON responses along with traditional Link format. Memento aggregator now provides responses in two modes including DIY (Do It Yourself) and WDI (We Do It). The service now also allows to export the Time Travel Archive Registry in structured format. Due to the default Memento support in Open Wayback, various Web archives now natively support Memento. There is an extension available for MediaWiki to enable Memento support in it. Herbert described the Robust Links (Hiberlink) and how it can be used to avoid reference rot. He said that their service usage is growing, hence they upgraded the infrastructure and now using Amazon cloud for hosing services. He noted that going forward everyone will be able to participate by running Memento service instances in a distributed manner to provision load-balancing. He also demonstrated Ilya's work of constructing composite mementos from various sources to minimize the temporal inconsistencies while visualizing the sources of mementos.

.@hvdsomp kicking off #iipcGA15 Day One with a presentation on Memento. They’ve got a fantastic, well-documented API! http://t.co/YIGMTDbzGx
— Ian Milligan (@ianmilligan1) April 28, 2015

Day 2 of #iipcGA15 starts with Herbert Van de sompel pic.twitter.com/bD3aGk2elM
— IIPC (@NetPreserve) April 28, 2015

Memento for Chrome #iipcGA15 https://t.co/dMyGIZ3zXg adds #memento capability for your browswer. see also: http://t.co/ValYQrslkD
— Michael L. Nelson (@phonedude_mln) April 28, 2015

Another #Memento extension for #Chrome: Mink. see: http://t.co/QzAXp6IOsm #iipcGA15
— Michael L. Nelson (@phonedude_mln) April 28, 2015

18 public web archives + http://t.co/YdWmuXgLak + http://t.co/AwA61HKm4w + #MediaWikis aggregated. see: http://t.co/5GpeUFY4H6 #iipcGA15
— Michael L. Nelson (@phonedude_mln) April 28, 2015

Good guide on Robust Links from the Memento Project - what you can do as a web page author, user, etc. #iipcGA15 http://t.co/Dlh9lDoquJ
— Ian Milligan (@ianmilligan1) April 28, 2015

Hiberlink http://t.co/bJC1cr1d4A addresses "reference rot" in scholarly citations #iipcGA15
— Michael Widner (@mwidner) April 28, 2015

some results of the #Hiberlink project on reference rot in scholcom http://t.co/V0jsh8QEuX @PLOSONE @hvdsomp #iipcGA15
— Martin Klein (@mart1nkle1n) April 28, 2015

Time travel reconstruct. @hvdsomp in #iipcGA15 pic.twitter.com/nCU1CArQGK
— Yasmina Anwar (@yasmina_anwar) April 28, 2015

Replay of archived websites relies on patching resources not necessarily crawled at the same time. Not something known to users. #iipcGA15
— Helen Hockx (@hhockx) April 28, 2015

Daniel Gomes from the Portuguese Web Archive talked about "Web Archive Information Retrieval". He started classifying web archive information needs in three categories including Navigational, Informational, and Transactional. He noted that the usual way of accessing archive is URL searching which might not be known to the users. An alternate method is full-text search that poses the challenge of relevance. Daniel described various relevance models in great detail and how to select features to maximize the relevance. He announced that all the dataset and code is available for free and under open source license. The code is hosted on Google Code, but due to their announcement of sunsetting the service the code will be migrated to GitHub soon.

#iipcGA15 Map of web archiving around the world. We are doing well. But still so much room to grow. @dcgomes77 pic.twitter.com/cR2BGWiouO
— Sabine Hartmann (@skhartmann) April 28, 2015

"A full text index is like a huge book glossary" @dcgomes77 #iipcGA15
— Yasmina Anwar (@yasmina_anwar) April 28, 2015

Machine learning for web archives content discovery.. using in links, term frequency, etc. Very cool. #iipcGA15 pic.twitter.com/YV9zjwR66o
— Ian Milligan (@ianmilligan1) April 28, 2015

#iipcGA15 68 ranking features--too many to put into production. Using URL, title, text body, anchor text of incoming link.
— Jackie Dooley (@minniedw) April 28, 2015

Search the Past with the Portuguese Web Archive #iipcGA15 http://t.co/ap6tRwNJh6 #www2013 https://t.co/0xRGZTYTMC
— Michael L. Nelson (@phonedude_mln) April 28, 2015

Here’s the Google Code repository for the Portuguese Web Archive: looking forward to checking it all out. #iipcGA15 https://t.co/MeBRdRO0Ta
— Ian Milligan (@ianmilligan1) April 28, 2015

After this talk, there was a short break followed by the announcement that remaining sessions of the day will have two parallel tracks. It was a hard decision to choose one track or the other, but I can watch the missed sessions latter when the video recordings are made available. Later the parallel sessions were interfering each other so the microphone was turned off.

@NetPreserve only if I could #Memento #TimeTravel to attend both sessions. I will be looking forward for the #iipcGA15 session recordings.
— Sawood Alam (@ibnesayeed) April 28, 2015

#iipcGA15 Sometimes it is good we can still get by without microphones.
— Sabine Hartmann (@skhartmann) April 28, 2015

After the break Ilya Kreymer gave a live demo of his recent work "Web Archiving for all: Building WebRecorder.io". He acknowledged the collaboration with Rhizome and announced the availability of invite only beta implementation of the WebRecorder. He demonstrated how WebRecorder can be used perform personal archiving in What You See Is What You Archive (WYSIWYA) mode.

#iipcGA15 use #WebRecorder.io beta for #personal #archiving #WYSIWYA like #Facebook #Twitter @webrecorder_io
— Sawood Alam (@ibnesayeed) April 28, 2015

#iipcGA15 Webrecorder.io: on-demand archiving via browser. WYSIWYA: what u see is what you archive. Available to anybody. quality > quality.
— Jackie Dooley (@minniedw) April 28, 2015

Demo-ing webrecorder.io ability to record Facebook while you're logged in and Vines - good stuff! #iipcGA15 #webarchiving
— Web Archiving RT (@WebArch_RT) April 28, 2015

Ilya Kremer: http://t.co/xb01j9mJC6 built on the top of pywb https://t.co/yG0snopyd8 and warcprox https://t.co/2e0QcHsmfl #iipcGA15
— Ahmed AlSum (@aalsum) April 28, 2015

Ilya Kreymer is looking for collaborators, developers, UI designers, and archivists to move webrecorder.io to the next level #iipcGA15
— Ahmed AlSum (@aalsum) April 28, 2015

public demos for beta webrecorder.io http://t.co/J1eqeqQW3H -- also #Memento compliant #iipcGA15
— Michael L. Nelson (@phonedude_mln) April 28, 2015

Zhiwu Xie from VirginiaTech presented "Archiving transactions towards an uninterruptible web service". He described an indirection layer between the web application server and the client that archives each successful response and when server returns 4xx/5xx failure responses, it serves the most recent copy of the resource from the transactional archive. It is similar to services like CloudFlare in functionality from clients' perspective, but it has added advantage of building a transactional archive for website owners. Zhiwu demonstrated the implementation by reloading two web pages multiple times of which one was utilizing the UWS and the other was directly connected to the web application server that was returning the current timestamp with random failures. He mentioned that the system is not ready for the prime time yet.

@zxie suggests web archiving is working as UPS for websites during down time, similar to what we had in U.S. government shutdown #iipcGA15
— Ahmed AlSum (@aalsum) April 28, 2015

Uninterruptible Web Service, in diagram form #iipcGA15 pic.twitter.com/RugYwBMaSQ
— Mouse Reeve (@tripofmice) April 28, 2015

Xie: project uses SiteStory as a back up for website availability...Patch in archived version of a web page if 500 error occurs #iipcGA15
— Web Archiving RT (@WebArch_RT) April 28, 2015

During the lunch break I was with Andy, Kristinn, and Roger where we had free style conversation on advanced crawlers, CDX indexer memory error issues, the possibility of implementing CDX indexer in Go, separating data and view layers in Wayback for easy customization, some YouTube videos such as "Is Your Red The Same as My Red?", hilarious "If Google was a Guy", Ted talks such as "Can we create new senses for humans?", "Evacuated Tube Transport Technologies (ET3)", and the possible weather of Iceland around the time IIPC GA 2016 is scheduled.

#iipcGA15 announced #iipcGA16 is scheduled on April 11, 2016 in Reykjavik, Iceland.
— Sawood Alam (@ibnesayeed) April 29, 2015

Jefferson Bailey presented his talk on "Web Archives as research datasets". With various examples and illustrations from Archive-It collections he established the point that web archives are great sources of data for various researches. He acknowledged that WAT is a compact and easily parsable metadata file format that is about 18% of the WARC data files.

Two elements when thinking about research data: collections, and derivation @internetarchive #iipcGA15 pic.twitter.com/0wI2Pd4auj
— Yasmina Anwar (@yasmina_anwar) April 28, 2015

#iipcGA15 @jefferson_bail takes us through web archives as research datasets. pic.twitter.com/AR9Ae9YMx5
— Sabine Hartmann (@skhartmann) April 28, 2015

@jefferson_bail of the @internetarchive presenting research datasets: web archives are mature and ready for data-driven analysis. #iipcGA15
— Helen Hockx (@hhockx) April 28, 2015

#iipcGA15 Those WATs: 18% size of a WARC. In JSON, easily analyzed/parsed.
— Jackie Dooley (@minniedw) April 28, 2015

#iipcGA15 WANE = web archive named entities. Uses Stanford NER tool. Entities from colls (names, titles etc). Less than 1% of WARC size.
— Jackie Dooley (@minniedw) April 28, 2015

#iipcGA15 Failed research ideas still provide useful insight in @jefferson_bail talk. Great that is shared as well.
— Sabine Hartmann (@skhartmann) April 28, 2015

@jefferson_bail just showed amazing visualisation of linked images within a fashion blog collection. #iipcga15
— Helen Hockx (@hhockx) April 28, 2015

Ian Milligan from the University of Waterloo presented his talk on "WARCs, WATs, and wgets: Opportunity and Challenge for a Historian Amongst Three Types of Web Archives". He described the importance of web archives and why historians should use web archives. His talk was primarily based on three case studies including Wide Web Scrape, GeoCities End-of-Life Torrent, and Archive-It Longitudinal Collections, Canadian Political Parties & Labour Organizations. I enjoyed his style of storytelling, some mesmerizing visualizations, and in particular the GeoCities case study. He noted that the GeoCities data was not the form of WARC files, instead it was regular Wget crawl.

"Web Archives offer windows into lives of everyday people". @ianmilligan1 presenting use cases of 3 web archive datasets. #iipcga15
— Helen Hockx (@hhockx) April 28, 2015

Visualizing the link structure of the wide web scrape by @ianmilligan1 #iipcGA15 http://t.co/7E9K18igS4
— Michael Widner (@mwidner) April 28, 2015

@ianmilligan1 #iipcGA15 answers do researchers want metadata or content analysis? #CDX vs #WARC pic.twitter.com/yG8eqmz0YY
— Sawood Alam (@ibnesayeed) April 28, 2015

@ianmilligan1 finds WAT files useful and offer the right details: these are sweat spot between the light CDX and heavy WARCs. #iipcGA15
— Helen Hockx (@hhockx) April 28, 2015

"Web archives will profoundly change the work of historians" says @ianmilligan1 #iipcGA15
— Michael Widner (@mwidner) April 28, 2015

Here’s my slide deck from last week’s #iipcGA15: “WARCs, WATs, and gets: Opportunity and Challenge for a Historian.” http://t.co/3kx3KMH3UN
— Ian Milligan (@ianmilligan1) May 4, 2015

After a short break Ahmed AlSum from the Stanford University Library (and a WS-DL alumnus) presented his work on "Restoring the oldest U.S. website". He described how he turned yearly backup files of SLAC website from 1992 to 1999 into WARC and CDX files with the help of Wget and by applying some manual changes to mimic the effect as if it was captured in those early days. These transforms were necessary to allow modern Open Wayback system to correctly replay it. Ahmed briefly handed the microphone over to Joan Winters who was responsible to take backups of the website in early days and she described how they did it. Ahmed also mentioned that the Wayback codebase had hardcoded 1996 as the earliest year that was fixed by making it configurable.
As an after thought I would love to see this effort combined with Satya's Olive Archive so that from the server stack to the browser experience all can be replicated as close to the original environment as possible.

@aalsum is trying to #restore the #oldest US website #iipcGA15 pic.twitter.com/xq39xlPBM1
— Sawood Alam (@ibnesayeed) April 28, 2015

#iipcGA15 SLAC archivist Joan Winters talks about the earliest website outside Europe. pic.twitter.com/LU3UMrJRu3
— Sabine Hartmann (@skhartmann) April 28, 2015

Amazing, painstaking work by @aalsum to reconstruct the SLAC website, the US’s first. Interviews, primary resource reviews, etc. #iipcGA15
— Ian Milligan (@ianmilligan1) April 28, 2015

#iipcGA15 Homepage wasn't a concept in 1991. One entry page with two internal links. First page said "Someday there will be text here." :)
— Jackie Dooley (@minniedw) April 28, 2015

blog posts on the oldest U.S. website (@SLAClab) from @aalsum http://t.co/25N9PU2BTU and @nullhandle http://t.co/WkG2OdLJTD #iipcGA15
— Nicholas Taylor (@nullhandle) April 28, 2015

Evolution of #SLAC #homepage @aalsum #iipcGA15, we don't have homepage! pic.twitter.com/qqAG1vE94B
— Sawood Alam (@ibnesayeed) April 28, 2015

The slides from my talk about Restoring US First website http://t.co/4JyB3hTZEV #iipcGA15
— Ahmed AlSum (@aalsum) April 29, 2015

Federico Nanni from the University of Bologana presented "Reconstructing a lost website". Looking at the schedule, my first impression was that it is going to be a talk about tools to restore any lost websites and reconstruct all the pages and links with the help of archives. I was wondering if they are aware of Warrick, a tool that was developed at Old Dominion University with this very objective. But, it turned out to be a case study of the world's oldest university established around 1088. One of the many challenges in reconstructing the university website he mentioned was the exclusion of the site from the Wayback Machine for unknown reasons which they tried to resolve together with Internet Archive. Amusingly, one of the many sources of collecting snapshots includes a clone of the site prepared by student protesters.

#iipcGA15 No national web archive for Italy. U Bologna excluded from Wayback Machine, so hard to reconstruct its web history. Undaunted!
— Jackie Dooley (@minniedw) April 28, 2015

#iipcGA15 Frederico Nanni exposes how he was able to retrieve Universiy-ty of Bologna web archive when @internetarchive had excluded it
— Ina DL Web (@inadlweb) April 28, 2015

Reconstruct a lost website:sometimes we still need persons to ask questions and do the paper trail and wait for student protests. #iipcGA15
— susan aasman (@aasmanna) April 28, 2015

Last speaker of the second day Michael L. Nelson from Old Dominion University presented the work of his student Scott G. Ainsworth "Evaluating the temporal coherence of archived pages". With an example of Weather Underground site he demonstrated how unrealistic pages can be constructed by archives due to the temporal violations. He acknowledged that among various categories of temporal violations, there are at least 5% cases where there exists a provable temporal violation. He also noted that temporal violation is not always a concern.

#iipcGA15 How much of web archived? Sources vary. Are archives stable? Nope. Temporal drift while browsing? Yep bec sparse crawls.
— Jackie Dooley (@minniedw) April 28, 2015

.@phonedudemln showing how this Wayback page _never existed! Mashing together temporal elements. #iipcGA15 pic.twitter.com/0lD6Qw1Mxb
— Ian Milligan (@ianmilligan1) April 28, 2015

@hhockx @phonedude_mln #iipcGA15 pic.twitter.com/pDVMAJl5X2
— Sawood Alam (@ibnesayeed) April 28, 2015

Listening to the talk by @phonedude_mln reminds us why HTTP headers matter. #iipcGA15
— Mark Phillips (@vphill) April 28, 2015

Evaluating the Temporal Coherence of Archived Pages http://t.co/9TCLWnvinY #iipcGA15 @hvdsomp @Galsondor @WebSciDL http://t.co/mHPuTrDF8k
— Michael L. Nelson (@phonedude_mln) April 28, 2015

Day 3

The third day sessions were in the Internet Archive building, San Francisco instead of the usual Li Ka Shing Center at Stanford University, Palo Alto. A couple of buses transported us to the IA and we enjoyed the bus trip in the valley as the weather was very good. IA staff was very humble and welcoming. The emulator of classical games installed in the lobby of IA turned out to be the prime center of attraction. We came to know some interesting facts about the IA such as the building was a church which was acquired because of its similarity with the IA logo and the pillows in the hall were contributed by various websites with the domain name and logo printed on them.

.@hhockx and @anjacks0n hard at work at the @internetarchive #hadoken pic.twitter.com/c05RG8OgRz
— PsypherPunk (@PsypherPunk) April 29, 2015

Very excited today! Our session of #iipcga15 is @internetarchive pic.twitter.com/1XVtJi5u3o
— Mar Pérez Morillo (@mpmorillo) April 29, 2015

@internetarchive acquired a church to make it the main office because it matches the logo. #iipcGA15 pic.twitter.com/8RiYtgqUFX
— Sawood Alam (@ibnesayeed) April 29, 2015

Sessions before lunch were mainly related to consortium management and logistics these include Welcome to the Internet Archive by Brewster Kahle, Chair address by Paul Wagner, Communication report by Jason Webber, Treasurer report by Peter Stirling, and Consortium renewal by the chair followed by break-out discussions to gather ideas and opinion from the IIPC members on various topics. Also, the date and venue for the next general assembly was announced to be on April 11, 2016 in Reykjavik, Iceland.

#iipcGA15 @brewster_kahle & @pnwagner To get us through the programme of the day. pic.twitter.com/JkK3BZ7x9q
— Sabine Hartmann (@skhartmann) April 29, 2015

#iipcGA16 will be in Reykjavík, Iceland #iipcGA15 pic.twitter.com/PZ67bjKVbR
— Kristinn Sigurðsson (@kristsi) April 29, 2015

After the lunch break, your author, Sawood Alam from Old Dominion University presented the progress report on "Profiling web archives" project, funded by IIPC. With the help of some examples and scenarios he established the point that the long tail of archive matters. He acknowledged the growing number of Memento compliant archives and the growth of use of Memento aggregator service. In order for the Memento aggregator to perform efficiently, it needs query routing support apart from caching which only helps when the requests are repeated before cache expires. Then he acknowledged two earlier profiling efforts one being a complete knowledge profile by Sanderson and the other minimalistic TLD only profile by AlSum. He described the limitations of the two profiles and explored the middle ground for various other possibilities. He evaluated his findings and concluded that his work so far gained up to 22% routing precision with less than 5% cost relative to the complete knowledge profile without any false negatives. Sawood also announced the availability of the code to generate profiles and benchmark them in a GitHub repository. In a later wrap-up session the chair Paul Wagner referred to Sawood's motivation slide in his own words, "sometimes good enough is not good enough."

#iipcGA15 Profiling Web Archives talk by @ibnesayeed of Old Dominion University pic.twitter.com/mpYTKcwZUi
— Sabine Hartmann (@skhartmann) April 29, 2015

@ibnesayeed proposes web archiving profiling approach easier than @azaroth42 CDX aggregation & accurate than @aalsum URL sampling. #iipcGA15
— Ahmed AlSum (@aalsum) April 29, 2015

@ibnesayeed: Web archiving profiling code is available at https://t.co/6ue4XZT4wR #iipcGA15
— Ahmed AlSum (@aalsum) April 29, 2015

@anjacks0n suggests adding web archiving profile from @ibnesayeed presentation in the openWayback #iipcGA15 https://t.co/qIrm4s2yH8
— Ahmed AlSum (@aalsum) April 29, 2015

Slides of my talk on Profiling Web Archives at #iipcGA15 http://t.co/VJUl8DMqaj @WebSciDL @ibnesayeed
— Sawood Alam (@ibnesayeed) May 7, 2015

In the break various IA staff members gave us tour of the IA facility including book scanners, television archive, an ATM, storage rack, music and video archive where they convert data from old recording media such as vinyl discs and cassettes.

On the @brewster_kahle tour of the @internetarchive. Excited to explore! #iipcGA15 pic.twitter.com/Q7iy4ffIdS
— Ian Milligan (@ianmilligan1) April 29, 2015

I think my favourite moment of #iipcGA15 was experiencing the hum and shimmer of the @internetarchive servers... pic.twitter.com/ijfRxRUKp7
— Andy Jackson (@anjacks0n) May 7, 2015

After the break a historian and writer Abby Smith Rumsey talked about "The Future of Memory in the Digital Age". Her talk was full of insightful and quotable statements. I will quote one of my favorite and will leave the rest in the form of tweets. Se says, "ask not what we can afford to save; ask what we can afford to lose".

#iipcGA15 Historians are the only people qualified to predict the future. Says Abby Smith Rumsey. pic.twitter.com/PofC4zB0mb
— Sabine Hartmann (@skhartmann) April 29, 2015

#iipcGA15 Abby Smith Rumsey : what is at stake with preservation is the survival of the species. Humans have known how to pass knowledge
— Ina DL Web (@inadlweb) April 29, 2015

Components of memory: starts by forgetting/ filter what is irrelevant/ keep what will be valuable in the future Abby #iipcGA15
— Emmanuelle Bermes (@figoblog) April 29, 2015

Rumsey: Scale is an issue with information, it always has been (no exception re: web archiving) #iipcGA15
— Abbie Grotke (@agrotke) April 29, 2015

Collect and make available, don't curate, allow the future to judge the value. #iipcGA15
— Andy Jackson (@anjacks0n) April 29, 2015

#iipcga15 Abby Smith Rumsey: "ask not what we can afford to save; ask what we can afford to lose"
— Dan Kerchner (@DanKerchner) April 29, 2015

Finally the founder of the Internet Archive, Brewster Kahle took the stage and talked about digital archiving and the role of IA in the form of various initiatives including book archive, music archive, and TV archive to name a few. He described the zero-sum book lending model utilized by the Open Library for the books that are not free for unlimited distribution. He invited all the archivists to create a common collective distributed library where people can share their resources such as computing power, storage, man power, expertise, and connections. During the QA session I asked when he thinks about collaboration, is he envisioning a model similar to the inter-library loan where peer libraries will refer to the other places in the form of external links if they don't have the resources but others do or in contrast they will copy the resources of each other? He responded, "both."

#iipcGA15 @brewster_kahle talks at the IIPC GA. How can the whole web archiving community collaborate @NetPreserve pic.twitter.com/M43uoGjc9V
— Sabine Hartmann (@skhartmann) April 29, 2015

Brewster Khale: we need to develop unexpected uses of our digital libraries #iipcga15
— Emmanuelle Bermes (@figoblog) April 29, 2015

300k books are available for lending at @internetarchive one at a time #iipcGA15
— Emmanuelle Bermes (@figoblog) April 29, 2015

Online music: IA started with concerts when music artists authorized it, in exchange for free storage and bandwidth #iipcga15
— Emmanuelle Bermes (@figoblog) April 29, 2015

@brewster_kahle 100+ libraries participating in @openlibrary are buying, digitizing, and loaning non-rights cleared ebooks 1/time #IIPCGA15
— Tom Smyth (@smythbound) April 29, 2015

#iipcGA15 @brewster_kahle presents the news archives : tens of news TV channels comprehensively archived since 2009 https://t.co/VE4eGvJbEI
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

Personal digital archiving next step for Internet Archive, according to Brewster Kahle #iipcGA15, #pda15
— susan aasman (@aasmanna) April 30, 2015

@brewster_kahle Please don't through any book, film, video, CD, DVD or any material away, just give it to @internetarchive #iipcGA15
— Ahmed AlSum (@aalsum) April 30, 2015

#iipcGA15 @brewster_kahle : why not build libraries together ? Cooperative collection dvlpmt, distributed preservation & cloud/local access
— Ina DL Web (@inadlweb) April 30, 2015

#iipcGA15 @brewster_kahle : we should fight against the "winner takes all" idea behind the large centralized library repositories
— Clément Oury (@cleymour) April 30, 2015

"If somebody says 'We'll license it back to you, and you can be on the advisory committee…'—run the other way."—@brewster_kahle #iipcGA15
— David Moles (@chronodm) April 30, 2015

The chair gave a wrap-up talk and formally ended the third day session. Buses still had some time before they leave, so people were engaged in conversation, games and photographs while enjoying drinks and food. I particularly enjoyed a local ice cream named "It's-It" recommended by an IA staff. Lori Donovan from Internet Archive approached me and Mohamed Farag and initiated a good conversation about possible collaboration on archiving projects. We also talked about a project that WS-DL group at Old Domionion University was working on a few years ago to identify disaster related news and archive them. Our conversation ended up with a group selfie of three of us.

"@aalsum: #iipcGA15 group photo in front of @internetarchive https://t.co/aEXNJmpMMo" Great looking gang !!
— Paul N. Wagner (@pnwagner) April 30, 2015

Day 4

On fourth day Sara Aubry presented her talk on "Harvesting Digital Newspapers Behind Paywalls" in Berge Hall A where Harvesting Working Group was gathered while IIPC's communication strategy session was going on in Hall B. She discussed her experience of working with news publishers to make their content more crawler friendly. Some of the crawling and replay challenges include paywalls requiring authentication to grant access to the content and inclusion of the daily changing date string in the seed URIs. They modified the Wayback to fulfill their needs, but the modifications are not committed back to the upstream repository. She said, if it is useful for the community then the changes can be pushed out in the main repository.

#iipcGA15 @saraaubry : 23 press titles accessible upon payment, representing more than 200 local editions, are harvested every day
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

@saraaubry discusses working with news publishers to make their sites more crawler friendly - generally was positive experience #iipcGA15
— Abbie Grotke (@agrotke) April 30, 2015

#iipcGA15 @saraaubry : @DLWebBnF is using ARK identifiers for a federated search on several versions of URLs of the same press title
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

New Wayback features presented by @saraaubry #iipcGA15 pic.twitter.com/4fL7w09k3T
— webmiriam (@webmiriam1) April 30, 2015

@DLWebBnF is identifying alternative ways for collecting, deposit from publishers thru FTP, deposit from press aggregators #iipcGA15
— Abbie Grotke (@agrotke) April 30, 2015

Roger Coram presented his talk on "Supplementing Crawls with PhantomJS". I found his talk quite relevant to one of my colleague Justin Brunelle's work. This is a necessary step to improve the quality of the crawls especially when sites are becoming more interactive with extensive use of JavaScript. For some pages, he is using CSS selectors and takes screen shots to later complement the rendering.

@hhockx @PsypherPunk's #iipcGA15 presentation about #PhantomJS is very relevant to what @justinfbrunelle @WebSciDL is working on.
— Sawood Alam (@ibnesayeed) April 30, 2015

At #iipcGA15 @PsypherPunk talking about how we also store rendered home pages as potentially clickable image maps, or Google maps div as img
— Andy Jackson (@anjacks0n) April 30, 2015

Blog post by @PsypherPunk Archiving Screenshots: http://t.co/lmbWWx0wWr #iipcGA15
— Helen Hockx (@hhockx) April 30, 2015

HTTP Archive (HAR) format mentioned by @PsypherPunk: https://t.co/JMVK1yXWns #iipcGA15
— Helen Hockx (@hhockx) April 30, 2015

Hadn't come across CrawlJax before - looks interesting. http://t.co/wpwyRQxcyP #iipcGA15
— Andy Jackson (@anjacks0n) April 30, 2015

Kristinn Sigurðsson engaged everyone to talk about the "Future of Heritrix". He started with the question, "is Heritrix dead?" and I said to myself, "can we afford this?". This ignited the talk about what can be done to increase the activity on its development. I asked the question, what is slowing down the development of Heritrix, is it out of ideas and new feature requests or there are not enough contributors to continue the development? There was no clear answer to this question, but it helped continuing the discussion. I also suggested that if new developers are afraid of making changes that will break the system and will discourage upgrades then can we introduce plug-in architecture where new features can be added as optional add-ons.

Now at Harvesting Group: Is Heritrix dead? @kristsi No, but it needs sustainability and support. #iipcGA15
— Mar Pérez Morillo (@mpmorillo) April 30, 2015

Harvesting Working Group discussion on Heritrix: We need a framework where multiple crawlers can exist #iipcGA15
— Abbie Grotke (@agrotke) April 30, 2015

@anjacks0n suggests the future of Heritrix should be using the Archive Proxy #iipcGA15
— Ahmed AlSum (@aalsum) April 30, 2015

Helen Hockx-Yu took the microphone and talked about the Open Wayback development. She gave brief introduction of the development workflow and periodic telecon. She also talked about the short and long term development goals including better customization and internationalization support, display more metadata, ways to minimize the live leaks, and acknowledge/visualize the temporal coherence.

Requirements for Open Wayback presented by @hhockx #iipcGA15 pic.twitter.com/nWoCcSS7wH
— webmiriam (@webmiriam1) April 30, 2015

If you are interested in joining OpenWayback development, you can send an email to the group https://t.co/9R6m3aJkqW … #iipcGA15 @hhockx
— Ahmed AlSum (@aalsum) April 30, 2015

After a short break Tom Cramer gave his talk on "APIs and Collaborative Software Development for Digital Libraries". He formally categorized the software development models in five categories. He suggested IIPC to take the position to unify the high level API for each category of the archiving tools so that they can co-operate interchangeably. This was very appealing to me because I was thinking on the same lines and have done some architectural design of an orchestration system that achieves the same goal via a layer of indirection.

Different types of open source development (regardless of license) according to @tcramer #iipcGA15 pic.twitter.com/2CdJ2cGQyN
— Emmanuelle Bermes (@figoblog) April 30, 2015

A majority of open source software is actually "sole source" software (1 dev) or "closed source" (1 team or company) @tcramer #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

#iipcGA15 @tcramer: free to use software doesn't mean it's a distributed or scalable.
— Ahmed AlSum (@aalsum) April 30, 2015

#iipcGA15 @tcramer: presents @GeoBlacklight http://t.co/967Nxy5yTR, Hydra http://t.co/OBFJDYCcgJ, @FedoraRepo http://t.co/6PkW2t3yVB
— Ahmed AlSum (@aalsum) April 30, 2015

#iipcGA15 @tcramer Reason for success, open source fundamentals: Transparency, Inclusivity, Merit, agility, Quality, and Value.
— Ahmed AlSum (@aalsum) April 30, 2015

No grants were abused in the making of this project/community @tcramer #iipcGA15 pic.twitter.com/nxcr9iscEi
— Sawood Alam (@ibnesayeed) April 30, 2015

Daniel Vargas from LOCKSS presented his talk on "Streamlining deployment of web archiving tools" and demonstrated usage of Docker containers for deployment. He also demonstrated the use of plain WARC files on regular file system and in HDFS with Hadoop clusters. I was glad to see someone else deplying Wayback machine in containers as I was pushing some changes to the Open Wayback repository that will make containerization of Wayback easier.

Daniel Vargas is doing a demo on running OpenWayback instance using @docker container #iipcGA15
— Ahmed AlSum (@aalsum) April 30, 2015

LOCKSS can extract WARC files to play in Open Wayback through a Docker container #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

Nice demo running OpenWayback in Docker - https://t.co/8Zu2xMFRLg #iipcGA15
— Andy Jackson (@anjacks0n) April 30, 2015

During the lunch break Hunter Stern from IA approached me and told me about the Umbra project to supplement the crawling of JS-rich pages. Kristinn, me, and a few more people talked about the precision of time in HTTP/2.0, but no one was sure if it was changed from one second granularity to anything smaller such as millisecond or microsecond. Later I asked this question in the IETF HTTP WG mailing list and the response suggests that there was no change made to it. After the lunch there was a short open mic session where every speaker has got four minutes to introduce exciting stuff that they are working on. Unfortunately, due to the shortage of time I could not participate in it.

@kristsi Not only does HTTP/2.0 still use second granularity, it still lugs timestamps around in ASCII format @bsdphk http://t.co/b1j3kGbK4X
— Sawood Alam (@ibnesayeed) May 11, 2015

After the lunch break Access Working Group gathered to talk about "Data mining and WAT files: format, tools and use cases". Peter Stirling, Sara Aubry, Vinay Goel, and Andy Jackson gave talks on "Using WAT at the BnF to map the First World War", "The WAT format and tools for creating WAT files", and "Use cases at Internet Archive and the British Library". Vinay has got some really neat and interactive visualizations based on the WAT files. I talked to Vinay during the break and we had some interesting ideas to work on such as building a content store indexed by hashes while using WAT files in conjunction to replay and a WebSocket based BOINC implementation in JavaScript to perform Hadoop style distributed research operations on IA data on users' machine.

Peter Stirling at the BnF has been using WATs to map web archives relating to the First World War. Excited to see how it’s going! #iipcGA15
— Ian Milligan (@ianmilligan1) April 30, 2015

Peter Stirling at #iipcGA15 : analyze web archives to understand the use of digitized heritage documents on websites related to WWI
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

There are technical, legal and organizational challenges to the set up of a data mining service for researchers at @ActuBnF #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

#iipcGA15 @saraaubry : WAT files were for the first time presented to the community at the 2011 @NetPreserve General Assembly
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

#iipcGA15 @saraaubry More tools to extract WAT files from WARC https://t.co/H9SrZD8YWK
— Ahmed AlSum (@aalsum) April 30, 2015

#iipcGA15 Vinay Goel: if you have WAT files, you can directly produce CDX files from them
— ISSN Int. Centre (@ISSN_IC) April 30, 2015

Vinay Goel-WAT files provide contextual info for users about collections such as which domains where crawled, which urls, etc. #iipcGA15
— rosalie lack (@rosalielack) April 30, 2015

@vinaygo @internetarchive mentioned @ibnesayeed @WebSciDL #ArchiveProfiling work at #iipcGA15 w/ nice #visualization http://t.co/PaDcwRlNGs
— Sawood Alam (@ibnesayeed) April 30, 2015

.@anjacks0n has a repo for WAT files as well: wat-mining. https://t.co/hrhVVUVD6s #iipcGA15
— Ian Milligan (@ianmilligan1) April 30, 2015

After a short break Access Working Group talked about "Full-text search for web archives and Solr". Anshum Gupta, Andy Jackson, and Alex Thurman presented "Apache Solr: 5.0 and beyond", "Full-text search for web archives at the British Library", and "Solr-based full-text search in Columbia's Human Rights Web Archive" respectively. Anshum's talk was on technical aspects of Solr while the other two talks were more towards a case study.

Web Archive architecture drawn by @anjacks0n #iipcga15 pic.twitter.com/NTc8NjlzW4
— Ahmed AlSum (@aalsum) April 30, 2015

Historians prefer transparency and want to know how things work under the hood. They "hate" things like stemming. #iipcGA15 @anjacks0n #solr
— Helen Hockx (@hhockx) April 30, 2015

Size of index at BL is 15TB... So impossible to have as much RAM as index size (SolR recommendation) @anjacks0n #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

.@anjacks0n giving a live demo of the amazing UK Web Archive’s Shine interface. Big Data trending up! http://t.co/zpTMgddDyT #iipcGA15
— Ian Milligan (@ianmilligan1) April 30, 2015

BL's index is split in 24 shards (see https://t.co/XQEEK10J8r ) #iipcGA15
— Emmanuelle Bermes (@figoblog) April 30, 2015

#iipcGA15 @anshumgupta gives an interesting talk about @SolrLucene 5.0 new features.
— Ahmed AlSum (@aalsum) April 30, 2015

.@athurman discussing Columbia’s implementation of full-text search w/ the Human Rights Web Archive #iipcGA15 http://t.co/02yVjvPpxg
— Ian Milligan (@ianmilligan1) April 30, 2015

A search in this web archive for Chippewa (for example) allows expansion to these other terms too. Cool. #iipcGA15 pic.twitter.com/reFs8hUxj0
— Ian Milligan (@ianmilligan1) April 30, 2015

Day 5

On the last day of the conference Collection Development and Preservation Working Groups were discussing their current state and plans in separate parallel tracks. Before the break I attended Collection Development Working Group. They demonstrated Archive-It account functionality. I expressed the need of a web based API to interact with the Archive-It service. I gave the example of a project I was working on a few years ago in which a feed reader periodically reads from news feeds and sends it to a disaster classifier that Yasmin AlNoamany and Sawood Alam (me) built. If the classifier classifies the news article to be in disaster category, we wanted to archive that page immediately. Unfortunately, Archive-It did not provide a way to programmatically do that (unless we use page scraping or some headless browser), so we ended up using WebCite service for that.

#iipcGA15 @agrotke leads us into the morning session of the last day of the GA. We have collection development today. pic.twitter.com/xQ35y58Vbe
— Sabine Hartmann (@skhartmann) May 1, 2015

After the break I moved to the Preservation Working Group track where I had a talk scheduled. David S.H. Rosenthal presented his talk on "LOCKSS: Collaborative Distributed Web Archiving For Libraries". He described the working of LOCKSS and how it benefited the publishing industry. He described how Crawljax is used in LOCKSS to capture content that are loaded via Ajax. He also noted that most of the publishing sites try not to rely on Ajax and if they do, they provide some other means to crawl their content to maintain the search engine ranking.

Legal framework for LOCKSS: obtain explicit permission to crawl & preserve permission along w/ content #iipcGA15
— Emmanuelle Bermes (@figoblog) May 1, 2015

LOCKSS has a peer-to-peer protocol for verification and repair of the content in the boxes, w/ authorization check #iipcGA15
— Emmanuelle Bermes (@figoblog) May 1, 2015

LOCKSS supports a variety of stds incl. Memento, OpenUrl, http content negotiation,WARC import and export, bibliographic metadata #iipcGA15
— Emmanuelle Bermes (@figoblog) May 1, 2015

#iipcGA15 LOCKSS runs "Red Hat" model: free open source software but paid support
— ISSN Int. Centre (@ISSN_IC) May 1, 2015

DSHR: @crawljax web capture seeding turned out to be less crucial than originally thought due to journals' focus on @google #SEO #iipcGA15
— Nicholas Taylor (@nullhandle) May 1, 2015

#iipcGA15 Rosenthal: all components of LOCKSS processing chains should interact through web services
— ISSN Int. Centre (@ISSN_IC) May 1, 2015

David Rosenthal #LOCKSS mentioned @ibnesayeed @WebSciDL #ArchiveProfiling project in his talk #iipcGA15 pic.twitter.com/8x87EGV0Vv
— Sawood Alam (@ibnesayeed) May 1, 2015

Sawood Alam (me) happened to be the last presenter of the conference where he presented his talk on "Archive Profile Serialization". This talk was in continuation with his earlier talk at IA. He described what should be kept in profiles and how should it be organized. He also talked briefly about the implications of each data organization strategy. Finally he talked about the file format to be used and how it can affect the usefulness of the profiles. He noted that XML, JSON, and YAML like single node file formats are not suitable for profiles and he proposed an alternative format that is a fusion of CDX and JSON formats. Kristinn provided his feedback that it seems the right approach of serialization of such data, but he strongly suggested to name the file format something other than CDXJSON.

Slides of my talk on Profiling Serialization at #iipcGA15 http://t.co/GsfV5ICWch @WebSciDL @ibnesayeed
— Sawood Alam (@ibnesayeed) May 8, 2015

While we were having lunch, the chair took the opportunity to wrap-up the day and the conference. And now I would like to thank all the organizing team members especially Jason Webber, Sabine Hartmann, Nicholas Taylor, and Ahmed AlSum for organizing and making the event possible.

The #iipcGA15 comes to an end, thanks to everyone who made this such a great week! pic.twitter.com/z4TIHbZzsI
— webmiriam (@webmiriam1) May 1, 2015

In the afternoon Ahmed AlSum took me to the Computer History Museum where Marc Weber gave us a tour. It was a great place to visit after such an intense week.

#iipcGA15 The fun continues. pic.twitter.com/XEKjLbQ8Dv
— IIPC (@NetPreserve) May 1, 2015

Marc Weber is giving #iipcGA15 visitors a tour @ComputerHistory museum pic.twitter.com/nAwiqvoWK7
— Ahmed AlSum (@aalsum) May 1, 2015

Missed Talks

Due to the parallel tracks I missed some sessions that I wanted to attend such as "SoLoGlo - an archiving and analysis service" by Martin Klein, "Web archive content analysis" by Mohammed Farag, "Identifying national parts of the internet" by Eld Zierau, "Warcbase: Building a scalable platform on HBase and Hadoop" by Jimmy Lin, "WARCrefs for deduplicating web archives" by Youssef Eldakar, and "WARC Standard Revision Workshop" by Clément Oury to name a few. I hope the videos recordings will be available soon. Meanwhile I was following the related tweets.

Visualization of tweet locations during Charlie Hebdo events @mart1nkle1n at #iipcGA15 pic.twitter.com/XiBi7WaFCr
— Katrin Weller (@kwelle) April 28, 2015

Farag: Event Focused Crawler (EFC) helps curators focus the crawls and improve quality #iipcGA15
— Abbie Grotke (@agrotke) April 28, 2015

#iipcGA15 @cleymour introducing the WARC standard revision process. Do IIPC members need changes or evolutions?
— Dépôt légal Web BnF (@DLWebBnF) April 28, 2015

Youssef Eldakar from Bibliotheca Alexandrina is presenting issues (and solutions!) related the duplicates in WARC files at #iipcGA15
— ISSN Int. Centre (@ISSN_IC) April 28, 2015

Conclusions

IIPC GA 2015 was a fantastic event. I had great time, met a lot of new people and some of those whom I knew on the Web, shared my ideas and learned from others. It was the most amazing one complete week I ever had. I appreciate the efforts of everyone who made this possible including organizers, presenters, and attendees.

Resources

Please let us know the links of various resources related to IIPC GA 2015 to include below.

Official

Aggregations

Blog Posts

Web Archiving in 2015 -- a Quick Redux of IIPC's General Assembly at Stanford - Tom Cramer
Notes from IIPC General Assembly 2015 - Jefferson Bailey
Let Them Emulate! - Andy Jackson
IIPC 2015 Recap - Ian Milligan
IIPC trabalha para salvar a memória da internet (Portuguese) - Carlos Eduardo Entini
IIPC GA 2015, jour 1 : « context matters » (French) - Emmanuelle Bermes
IIPC GA 2015, jour 2 : WARC, WAT, WET et WANE (French) - Emmanuelle Bermes
Talk at IIPC General Assembly - David Rosenthal
Assemblée Générale de l’IIPC à Palo Alto (French) - Claude Mussou
Looking ahead from the 2015 IIPC General Assembly - Nicholas Taylor

Tools

Update (May 12, 2015): Added reference to HTTP/2.0 time resolution and some more blog posts.
Update (May 22, 2015): Added more blogs and tool references.
Update (June 1, 2015): Added link to the video recording playlist.
--
Sawood Alam

Web Science and Digital Libraries Research Group

Monday, April 17, 2017

2017-04-17: Personal Digital Archiving 2017

Day 1

Day 2

Saturday, May 9, 2015

2015-05-09: IIPC General Assembly 2015 Trip Report

Day 1

Day 2

Day 3

Day 4

Day 5

Missed Talks

Conclusions

Resources

Official

Aggregations

Blog Posts

Tools