Showing posts with label MementoFox. Show all posts
Showing posts with label MementoFox. Show all posts

Tuesday, June 18, 2013

2013-06-18: NTRS, Memento, and Handles

In a previous post I covered the shut down of the NASA Technical Report Server, which has since come back online in a reduced capacity.  In this post we examine some of the peculiarities of the current state of NTRS, particularly with respect to Handles and Memento. 

Earlier this week I needed to access an old NASA report of mine, ironically enough about NTRS, from 1996:
Richard C. Tuey, Mary Collins, Pamela Caswell, Bob Haynes, Michael L. Nelson, Jeanne Holm, Lynn Buquo, Annette Tingle, Bill Cooper and Roy Stiltner, NASAwide Electronic Publishing System-Prototype STI Electronic Document Distribution: Stage-4 Evaluation Report, NASA TM-104630 (parts 1 and 2), May 1996.
It is not a particularly enjoyable report; it is the kind of lengthy, multi-authored, sanitized, bureaucratic-engineering report that people write but don't read (a "better" summary can be found in AIAA-95-0964).  I probably have a pdf of the report somewhere in my files, but instead I pulled up my publication list and clicked on the linked URI: http://hdl.handle.net/2060/19960028185, which resulted in a redirection to http://ntrs.nasa.gov/errors/PDF-removed.html and an HTTP "403 Forbidden" error:


The raw HTTP:





In short, NTRS is denying me access to an engineering report about NTRS -- as it existed nearly 20 years ago.  I created the link to the Handle (i.e., http://hdl.handle.net/2060/19960028185) for the report because that's the right thing to do (tm): handles are "cool URIs" and hide the "how we do it today", with the idea that the publisher registers with the Handle System the mapping of a particular Handle to its current URI.  When the publisher changes its content management system, gets bought by another publisher, etc. the Handle itself doesn't change even if the value it maps to changes.  The Handle System is what implements the more familiar Digital Object Identifier (DOI) system that most major publishers use; in short the set of all DOIs is a proper subset of all Handles. 

I've always been critical of popular coverage of science stories because they often fail to link to the DOIs (or Handles).  For example, in this randomly chosen story the author links to the final target URI:

http://iopscience.iop.org/0004-637X/770/2/148

when he "should" link to the DOI itself:

http://dx.doi.org/10.1088/0004-637X/770/2/148

In this case, you can lexigraphically map between the target URI and its DOI, but that's not always the case.  And truthfully, if iopscience.iop.org commits to the stability of the former URI, then regular users won't notice or care about the difference (only digital library wonks like myself). 

So you can imagine my disappointment when I clicked on http://hdl.handle.net/2060/19960028185 and discovered that NASA has mapped this -- and all of its Handles -- to a "403 Forbidden" page.  I could not access this report.  Searching my own personal archives is always the last resort, so I went to Google Scholar and found that they still had recorded the original target URI for the report:


It does not display in the image above, but the URI is:

http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028014_1996060715.pdf

That's an ugly URI, and not one that you'll discover using the NASA TM number, the title, or other semantic clues. Unfortunately, clicking on that URI produces another "access denied" page, different from the one you receive when clicking on the Handle:


The raw HTTP:



To add insult to injury, the above page is a "soft 404" -- the WWW equivalent of turning on your porch light for Halloween but not distributing candy.  Fortunately, I was using MementoFox so I simply activated my timeslider and was able to grab a copy of report from Archive-It at:

http://wayback.archive-it.org/all/20100518033903/http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028014_1996060715.pdf

For those who care about the details, here is the TimeMap for the ntrs.nasa.gov URI:



I was able to access the ntrs.nasa.gov URI because Google Scholar had maintained the mapping, but we can also query Memento servers for the TimeMap of the Handle as well and discover five more copies:



Unfortunately, the Internet Archive won't serve their versions because of the current NTRS robots.txt file is blocking access (see IA's policy on robots.txt). 




The fact that the TimeMaps are different for http://hdl.handle.net/2060/19960028185 and http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960028185_1996060716.pdf is the subject of Ahmed AlSum's TempWeb 2013 paper; this is a surprisingly tough problem. 

In summary, NTRS erased the mapping from their Handles to the target URIs, which makes additional work when it comes to finding another copy in a public web archive.  It's not just my report (which is no big loss to "Science"), but other reports too; randomly replacing some digits in the Handle finds that it is archived in Archive-It as well:



I'm not sure how many of the unprocessed reports are available via Memento but until the time when NTRS is fully restored, the suite of Memento tools will help you out. 

--Michael

Tuesday, April 12, 2011

2011-04-13: Implementing Time Travel for the Web

Recent trends in digital libraries are towards integration with the architecture of the World Wide Web. The award-winning Memento Project proposes extending HTTP to provide protocol-level access to mementos (archived previous states) of web resources. Using content negotiation and other protocol operations, rather than archive-specific methods, Memento provides the digital library and preservation community with a standardized method to navigate between the original resource and its mementos.

Memento Client State Chart

The ODU Web Sciences and Digital Libraries Research Group has partnered with the LANL Research Library to create Memento and develop prototype Memento-compliant client and server implementations. A variety of Memento clients have been created, tested, and co-evolved along with the Memento protocol. There is now a FireFox extension, Internet Explorer browser helper object, and WebKit-based Android browser. The design and technical solutions identified during the development of these clients will be of interest to those considering implementation of a Memento-based platform, especially on the client side, and the interactions are also important for building conformant server-side systems.

MementoFox Screenshot

The full article can be found at:

Robert Sanderson, Harihar Shankar, Scott Ainsworth, Frank McCown, and Sam Adams. Implementing Time Travel for the Web. code{4}lib Journal, Issue 13, 2011-04-11. http://journal.code4lib.org/articles/4979.

-- Scott G. Ainsworth

Friday, March 19, 2010

2010-03-19: MementoFox Add-on Released

There have been a number of developments in the Memento project. Perhaps the most interesting is the release of the MementoFox Mozilla Add-on. Shown to the left is the MementoFox installed in FireFox 3.6. I went to cnn.com, then turned on MementoFox by clicking the green "(M)" logo near the top left. I used the slider bar to select a date of 2010-02-22 (red text box), some magic happened, and then I was presented with an archived version of cnn.com in the WebCite archive with an actual date of 2010-02-23, 1 day in the future of what I requested (green text box). Entering a new date in the red text box or using the slider bar will cause MementoFox to find the closest archived copy of cnn.com, possibly in archive other than WebCite. Everyone is encouraged to go to the Memento Demos page, install MementoFox and walk through some other time traveling scenarios detailed there. It is actually quite a lot of fun to play with. Feedback is welcome on the memento-dev group.

We have also had our paper accepted to the Linked Data on the Web 2010 Workshop (LDOW 2010), April 27 2010 in Raleigh NC. The paper details some minor tweaks in the protocol (see a prior post and the updated technical slides) as well as provide a DBpedia example of how time series analysis can be done using Memento for resource versioning. The paper has also been posted to arXiv.org:

Herbert Van de Sompel, Robert Sanderson, Michael L. Nelson, Lyudmila L. Balakireva, Harihar Shankar, Scott Ainsworth, An HTTP-Based Versioning Mechanism for Linked Data, Proceedings of Linked Data on the Web (LDOW2010), 2010. (Also available as arXiv:1002.2439).
A complete list of tools for implementing Memento support in your web server, wiki, etc. is available at the Memento Tools page. Let us know if you have a compliant server & happy Web time traveling.

--Michael