Click on red dots to view issues in the
context of the Overview and Discussion Points
The proliferation of electronic information and communication systems has created a crisis of accountability and evidence. As increasing proportions of the records of our society are available in electronic form, users are asking how they can be sure electronic records created in the past will be available in the future and how they can be sure those received today are trustworthy. The issue is a critical one for all aspects of humanistic studies because these scholarly disciplines depend on the study of original texts, images and multimedia sources. It is essential to have correct attribution, certainty of authenticity, and the ability to view sources many decades and centuries after they are created, to even imagine the humanities.
While the question of how to create and preserve electronic
evidence (records with provable authenticity) has been with us as long as computing,
research
in this field is relatively new in part because few source materials were created
electronically,
and available only in electronic form, until very recently. Thus, in 1991, when the U.S.
National Historical Records and Publications Commission sponsored a working meeting on
Research Issues in Electronic Records, virtually no published research was available. Since
the
publication of the report of that meeting, the field has proliferated (see special issues of
American Archivist [US], Archivaria [CA] and Archives & Manuscripts [AU], within the
past
year) although major areas are still under-developed.
Presently the research in archiving and authenticity falls into four broad categories.
On the simplest level, archiving has to do with preserving
bits. Because electronic
recording media are inherently unstable, it has always been a matter of concern to ensure
that the electronic signal be kept over time. Practical interest in denser and longer lasting
methods of storing data has meant that during the short history of electronic recording we
have witnessed the commercialization of a large number of different data storage
media and media formats. The rapid evolution of media has meant that considerable
attention has devoted to avoiding obsolescence and developing methods to read and copy
media from previous generations of systems. In general, previous media, layouts, and
formats can be read with appropriate hardware and special purpose software but the task of
continuing to devise methods to read old signals in old media is becoming more complex as
the media proliferate, recording and layout methods become more proprietary, and firmware
plays a greater role in decoding.
Archivists, and increasingly scholars, are aware that beyond
preservation of bits lies the arena of preserving "recordness". Research into what makes an
electronic document or dataset a record, and how the constituent parts can be bound together,
has become critical as communication of electronic information has becomes more
widespread. In the past several years, electronic mail, groupware and digital image banks
have forced the society to confront the issue of authenticity or reliability of an electronic
communication and spawned much research. Most recently, research has attempted to define
the functional requirements for recordkeeping and the metadata attributes of evidence.
Electronic records are always software dependent, but the extent of these dependencies varies widely. Increasingly many electronic objects are not merely static entities but are parts of systems in which they represent potential functionality. In recent years, dynamic links, objects that effect system states, and data entities that respond to their environment, have significantly increased the difficulty of preserving electronic records. New questions are arising about the concept of migrating functionality and the meaning of interoperability. Methods of overcoming, or at least representing, software dependence over time are critical to the survival of the record.
Finally, the society has responded unevenly to the spread of
electronic communications
capabilities. Some new legal and professional standards have been established; elsewhere
research is underway to define new practices and guidelines for electronic documentation and
action. Methods for bi-lateral commercial contractual communication are in place but
multi-lateral methods are still being studied. How to enable electronic patient records, patent
documentation, or copyright registration, and how to ensure privacy, confidentiality,
protection of proprietary information and the management of similar information-related
risks, is the subject of active research on the interface between sociology, policy and
technology.
Current Research & Its Promise
While research continues on each new medium, to establish its life and the best conditions for its storage and use, the research agenda has moved beyond storing bits with the growing acceptance that the only way to preserve electronic data across time is to periodically copy (refresh) the information to new storage media and, at appropriate times, to new formats. Leadership in these technical means of preserving bits has belonged to the National Media Laboratory, a spin-off of the 3M company and the contractor used by Federal projects and by the National Institute of Standards which establishes tests for media. Considerable research in recent years has focused on how to determine when the right time for media conversion is, how to choose appropriate new media, and how to predict long-term costs. While this research is important to computer operations, it does not contribute specifically to arts and humanities computing.
The issue of the authenticity of records, on the other hand, is at the heart of all humanistic scholarship. If we do not know the context in which information was created, and who participated in its creation, many of the questions of greatest interest to historians, philosophers, linguists, and creative artists are unanswerable. Contemporary electronic information systems generally do not create or store records satisfying these criteria. Not surprisingly, research into methods of ensuring the creation and retention of electronic evidence is a hot topic in archives, museums and electronic libraries. The most important research in this area has focused on the functional requirements for records. It has appeared under the corporate names of the National Archives of Canada (John McDonald, principally [1]), the World Bank, and more recently from the University of Pittsburgh (David Bearman, Richard Cox, and Ken Sochats [2]). It is recognized in the published research of the Rand Corporation (Tora Bikson [3], Jeff Rothenberg [4]) and the Dutch Ministry of the Interior (Peter Waters [5]). This research joins a recent thread of discussion and debate in the library community, regarding what Peter Graham (Rutgers University) has called "Intellectual Preservation". Although this concern is the focus of discussion in the RLG/Commission on Preservation and Access Task Force on Digital Archiving [6], it is not really the subject of original research in the library community at present.
Current research on software dependence and inter-operability is not largely driven by archival concerns and takes a relatively short term view of the requirement to preserve functionality. Little research has been done on modeling the information loss that accompanies multiple migrations or the risks inherent in use of commercial systems before standards are developed, yet these are the critical questions being posed by archives. There is little in these studies that specifically addresses humanities, except that humanities are particularly heavy users of old documentation and thus especially need to develop mean of overcoming system dependencies in data.
Margaret Hedstrom (New York State Archives)
[7] and the University of Pittsburgh project
[8] have led the ways in exploring the social and legal guidelines
for electronic records management. The Association for Information and Image Management
has sponsored conferences and a task force that examines these issues as has the work of the
Center for Electronic Law at Villanova University (Henry Perritt, principally). There has
been substantial research in the realms of electronic laboratory notebooks and electronic
patient records, but oddly little research has been done to identify critical dimensions of
archiving for program audits in areas like decision support systems, groupware and team
support systems, or even traditional "management information systems" or project
management environments.
Related areas of research include:
The metadata required for recordness and the means to capture this data and ensure that it is
bonded to electronic communications is the most significant area for research in the near
future. The announcement by the National Institute of Standards of a proposed Federal
Information Processing Standard (FIPS) for "Record Description Records"
[11] could be the stimulus for immediate research as is the
proposal by Standards Australia
[12], based on the University of Pittsburgh research. Continued
investigation
of mechanisms to specify metadata encapsulated objects (D.Bearman, K.Sochats
[13]) and capture them in implementations (M.Hedstrom,
J.McDonald, P.Waters) are most promising.
Over the next five years, specifications for workgroup tools
and
electronic office environments will need to have these methods built in. Large-scale
networks, and the acceptance of electronic transactions as the preferred means of
intra-corporate communications, will depend on methods of uniquely identifying, controlling
access and use, and decoding the structure, context and content of messages. As the scientific
community has come to realize (NRC, Preserving Scientific Data on our Physical Universe,
1995
[14]) standard metadata, grounded in a continually updated
understanding of disciplinary perspectives, is essential to documentation for the future.
Unless generic, scaleable, approaches for representing humanistic points-of-view are
developed soon, the history of modern societies in the late twentieth century will be
extremely incomplete, to the detriment of future scholarship in all humanities fields (Getty
AHIP, Sledge et al.).
Ongoing applied research on the archival significance of dynamic documents, object oriented software environments, and interoperability is needed in the medium term. There is very little active work in this area but the potential benefits to archives would be substantial if even such basic questions as the best ways to avoid loss of functionality in software migrations were answered. Solutions to most of these problems will need to involve collaborations in which archival participants and potential future users would be partners with technologists. Such research projects can be expected to be relatively costly and of extended duration, and will be on-going as new functionalities are propagated. Yet we can hardly imagine the widespread acceptance of interactive documents or multimedia and visualizations within traditional communications unless such software independence can be achieved.
Within organizations, archivists must find automatic means
of identifying the business
process for which a record is generated. Such data modeling will become increasingly critical
in an era of on-going business re-engineering. If records are retained for their evidential
significance and for a period associated with risk, then certain knowledge of their functional
source is essential to their rational control. If they are retained for long-term informational
value, knowledge of context is necessary to understand their significance. Work in these
areas will be stimulated by standards such as those drafted by Standards Australia
[15] and NIST
[16] in the spring of 1995.
Concrete work on social and legal issues will be best focused on identifying warrant for archival functional requirements in professional and organizational practices, locating required changes in law in such areas as privacy, freedom of information, and protection of proprietary rights and in applications such as electronic patient records, electronic laboratory notebooks, and contractually obligating electronic communications and commerce. While progress can be expected in all these areas anyway, a concerted research agenda would coordinate findings, push the arrival of the fully electronic society forward quickly, and enable realizations of the benefits of electronic records within the next decade. Much work on attributes of electronic business systems is being conducted in these areas but it is presently little informed by professional archivists.
Ultimately, we must research the use of electronic records
after their value for
accountability has been realized. How and why are they used? What value does the
information they contain have for users and is the value of information in records created
for other purposes it commensurate with the value of information contained in self-
consciously created information sources, such as books and articles. What do we need to
know about the content of records to support discovery and retrieval of billions of them
across heterogeneous environments? What does the subsequent use of records itself tell us
about the nature of society in the years since the creation of the record and the transaction it
documents? Here a lead could be taken by archivists, but little substantive research has been
undertaken to date except in the area of defining the requirements for Networked Information
Discover and Retrieval (C. Lynch et al., CNI Study team; G. Marchionini, U.Md.).
It is now evident that we can envision a world in which
virtually all records are
digital, including much of the knowledge of the past. How can we make our solutions to
retention, access and preservation of the digital cultural heritage of the world scaleable?
What cost efficiencies can we achieve over keeping paper records and making them
available through physical libraries, archives and museums when we are deploying systems
of distributed control and access spanning all records of all societies? Research in the future
will need to focus on a variety of implementation issues having to do with intelligent
information
seeking, end-to-end delivery and migration of data on a universal scale (B. Kahin et al.,
Harvard University). Again, there has been very little done in this area although recent
progress implementing Government Information Locators using the Z39.50 protocols
[17] suggest some of the potential for a Global Information
Infrastructure locator and document delivery service (E.Christian, USGS).
References to selected works by researcher teams and organizations cited:
Earlier Research Agendas/Overviews:
Physical Care:
Functional/Logical Control:
Social and legal guidelines/new organizational arrangements:
Proposed Standards:
Last revised: 1995-10-13