The law remains an ass: copyright and data protection in digitization projects

Alistair Dunning, Arts and Humanities Data Service

In articles on nearly anything to do with the Internet, the terminology of looking at or browsing a website or data resource is employed. This is slightly misleading, given that most websites operate by copying and sending the information that the user requested. Thus, when users look at a website they are actually exploiting copies of resources that now exist on their own computers, rather than looking at a unique website being sent from a distant source.

This, of course, has an affect on the scholar wishing to create and disseminate digital resources. As soon as the resource is published in electronic form, it proliferates wildly. Thus, many scholars embarking on digitization projects (converting novels, engravings, audio recordings, maps etc. to digital form) need to grasp the copyright nettle early on, securing the necessary permissions that will allow the project to proceed without any possibility of a legal sting later on.

At first glance, copyright law seems straightforward an image or a text is in copyright for the life span of its creator plus seventy years. But, and this is where the title of my talk becomes relevant, there are bizarre sub-clauses. One slightly obtuse example is that of James Barries Peter Pan. The text has an unlimited copyright span, so that Great Ormond Street hospital, the copyright holder, can still derive an income from it.

More seriously, anyone dealing with French material usually has to wait not an extra seventy, but eighty-one years before the copyright expires (the French do not accept the war years as being included in the seventy years adhered to by other EU countries). And a scholar dealing with early to mid twentieth-century photography can find that the person who took the photograph is not necessarily the person who owns the copyright. Elsewhere, those dealing with the performing arts can find that even the most humble player or behind-the-scenes worker in a grand theatrical production can have a share of the rights; indeed of any modern Shakespearian performance, the only person who does not hold any copyright is Shakespeare himself. More generally, copyright law tends not to be retrospective; those dealing with material created in, for instance, 1938 need to consult the relevant legislation from 1911. And even those digitization projects dealing with objects that have fallen out of copyright (a medieval manuscript, a Renaissance fresco) may need to be careful; contemporary photographs or transcriptions of those objects may have copyright attached to them.

Identifying what copyright subsists within an historical object is only the first of three stages in clearing it. The second problem is that it is often difficult to trace copyright holders, especially for work that is unpublished or anonymous. The law here can be unhelpful too legislation is very much on the copyright holders side, leaving the potential digitizer with the burden of responsibility in locating and negotiating with the relevant people.

A good example of this is the Darwin Correspondence Project, based at Cambridge University and in the US. The project is creating printed and electronic versions of all letters sent to and from Darwin. Surely, you would think, as Darwin was writing in the nineteenth century, the copyright has expired? Not so, lawyers advising the projects stated: the letters are deemed unpublished and therefore remain in copyright until 2039. While realizing that the actual chance of litigation is small, the Darwin team had to initiate a copyright arm within its digitization project. Securing permissions for the letters sent by Darwin was easy (the copyright holder, a descendant of Darwin was involved in the project), but the project has begun a process of tracing and liaising with copyright holders for the letters of the nearly 2,000 people who wrote to Darwin. As you can imagine, this is no small job, but it is an essential part of the project. While they have not traced all copyright holders, such good intentions have allowed them to go ahead with the digital preparation of the letters, reasonably safe in the knowledge that such procedures would stand them in good stead in case of any litigation.

The third stage of securing permission involves developing a licence form. Projects may often require specialist legal advice to write up the necessary documentation. Negotiating with rights holders may also involve diplomatic skills. Some rights holders assume that a degree of financial recompense is in the offing, and it is essential to make sure that they remain well disposed whilst alerting them to the fact that projects cannot usually afford to pay rights holders. Projects also need to deal tactfully with owners of the objects, for often they are not the owners of the copyright, despite their protestations to the contrary.

The nature of electronic resources also means that there is a greater need for licence forms to consider the future existence of the digital object. A digital object can easily take on roles beyond those originally expected of it. How will a digital image be edited? Will it be cropped, re-touched, re-coloured? What kind of security will be provided for the disseminated resource? Will there be watermarking or password-protection? Additionally, will any other organizations be involved? Will a preservation copy of the digital resource be lodged at a suitable data archive, such as the Arts and Humanities Data Service? And does this archive need to be included in the licence form? A project manager has to think through the long-term aims of his or her project in order to guard against having to contact rights holders later on in its life span.

Data protection (most recently the Data Protection Act of 1998) is the other issue with the potential to influence the work of humanities scholars. Databases and other electronic resources often reveal personal details about living people, and consent needs to be obtained before such resources are made public. Exemptions in the law mean that scholars can make use of such databases, but they must ensure that any results are anonymized prior to digital publication.

In some cases, this exemption is sufficient to allow historical projects to advance. In other cases, however, it causes problems, and projects need to go through processes similar to those of the Darwin project, outlined above, that is, establishing, locating and negotiating with the persons featured in the resource in order to avoid possible legal difficulties, or further anonymizing the data to make it safe for public usage. Again dedicated legal advice is often required (many universities now have data protection officers in place) to clarify the ambiguous points. In many cases, as with copyright, a lack of defining case law makes it difficult to take firm decisions: there is no clear case law, for example, defining whether a person has given clear, explicit, express or implied consent to the use of their data for a research project.

A good example of the influence of the Data Protection Act is the experience of a project entitled the Newcastle Electronic Corpus of Tyneside English. The project, run by the universities of Sheffield and Newcastle, is aiming, in part, to digitize recordings of interviews with Tyneside citizens, originally made on tape in the nineteen-sixties. As with the Darwin project, procedures for coping with the legal issues had to be built into the project structure. Project staff had to investigate whether interviewees had given consent to the further exploitation of the recorded material. Documentation surrounding the original recordings was studied in detail, and even though this was generally positive, and offered an implied consent, the interviewees often talked on issues designated as sensitive in the Data Protection Act (health, voting preferences etc.) With some of the interviews, therefore, it was obvious that disseminating the recordings would be breaking data protection laws, necessitating, in many cases, attempts to locate the interviewees, almost forty years after the original recordings were made.

One other issue relating to data protection concerns the life of future databases. If, for reasons of data protection, databases are handed to digital archives in anonymous form, it is depriving future historians of the full richness of the database. Key features, such as personal names, dates of birth and death, addresses and names of relatives, will be stripped out, impoverishing the historical record. It may be that the real difficulties faced by historians in this area will not be now, but 100 years in the future.

To return to the two case studies I have cited, the genuine likelihood of litigation for breaching the relevant laws is minimal; people tend not to care about interviews they gave forty years ago, or letters written by long-dead relatives. Nevertheless, the theoretical possibility of litigation means that projects such as these have had to incorporate an additional element in their project, eating up valuable time and costs that might be better spent on actual research. Some, but thankfully not all, digitization projects will discover that the law remains an ass, and a stubborn one at that.

July 2003

