EU Commission sets out plan to allow free data mining

Researchers will be given freedom to interrogate and search articles currently locked behind journal paywalls. But scientists want faster progress and publishers say the Commission should move carefully when proposing changes to copyright law

Kristiina Hormia-Poutanen, president of Liber

The European Commission has proposed a new regulation which would clear the way for researchers to perform text and data mining, as part of a broad update of European copyright rules.

Under the proposed new rule tabled for early next year, researchers in Europe would have free rein to use computer programmes to search journals, a practice which is tightly controlled by publishers. To date, the UK is the only country that has exempted automated computer crawling from copyright law.

This will benefit public interest research organisations, while taking into account the impact of a new law on the publishing market, the Commission claimed.

Researchers have battled for years to get permission from publishers to use programmes that can extract data from thousands of publications to which their institutions subscribe saying, “the right to read is the right to mine.” Given this, computer “reading” should require no higher level of permission than human reading.

Over the past year Research Commissioner, Carlos Moedas has been pushing for the right for researchers to mine papers unhindered, and he hailed the proposal as a staging post in moves to completely open up access to the outputs of publicly-funded research.

“I have strongly supported a copyright exception for our researchers and innovators because they should be given the best conditions to do their jobs. The exception proposed today will be pivotal in spurring innovation and growth in Europe,” Moedas said. “We also need to ensure that Europe does not fall behind other regions of the world, where text and data mining is already made easy.”

Concrete proposals

While researchers broadly welcomed the announcement, they feel there is still some way to go.

Kristiina Hormia-Poutanen, president of Liber, an association of European research libraries, said researchers need legal certainty. “The Commission communication implies that there will be a mandatory copyright exception for text and data mining, which we very much welcome. Liber will hold the Commission to its word on this when concrete proposals are launched in 2016,” she said.

“I am happy with the commitment, but that there is much to be done in the next six months to ensure that there will be proper legislation on text and data mining,” said Chris Hartgerink, a statistician with Tilburg University in the Netherlands.

However, the League of European Research Universities (LERU), which alongside Liber has campaigned for new rules, was less enthusiastic, saying it is disappointed the legislative programme is not addressed in a "more convincing and coherent way".

While the Commission says it is ‘assessing options’ and will ‘consider legislative proposals’, "it has had years to consider options and proposals. It is time to stop considering and to start implementing,” LERU said.

The Commission needs to define ‘public interest research organisations’, said Christoph Bruch, adviser at the Helmholtz Association Open Science Coordination Office and member of Science Europe, another vocal campaigner. 

Proportionate changes to copyright law

Publishers said they welcomed the Commission’s pledge to study the effects of new rules on their market. 

“Given the central importance copyright plays in the remuneration of authors and publishers, it is a relief to see that the Commission is beginning to appreciate the need to move carefully and proportionately when proposing changes to copyright law,” said Richard Mollet, chief executive of the UK lobby group, the Publishers Association. “The commitment that the proposals will take into account relevant market situations and licensing practices is particularly very welcome.”

Mollet called for more clarity on the term ‘public interest research organisations’ saying. “This is an incredibly broad term which could encompass many commercial businesses [which] are well able to pay for licences, and in fact already do so.”  

Although there is no clear explanation in the proposal, a Commission official said public interest research organisations includes all universities, research institutes both public and private, foundations and libraries.

It does not include private companies performing data mining on their own behalf. “Big companies can afford to pay publishing licences,” the official said.

It is not clear what the position of freelance researchers and citizen scientists will be. Given this, Liber appealed for, “A less constricted system for the benefit of all stakeholders including citizen scientists.”

The Commission has found it difficult to define exactly who will benefit from the exemption. For a while the proposal made a distinction between ‘commercial and non-commercial’ institutes, but this was considered too rigid and confusing.

Impact assessment

Hartgerink said any assessment the Commission makes on how new rules affect the publishing market need to be stringent. “The importance of evidence-based impact assessment is crucial because arguments are being put forth that have no proper foundation,” he said.

Hartgerink was recently blocked by the scientific publisher Elsevier from a mass download of papers. He subsequently wrote a blog venting his frustrations, which by now is well-travelled around social media.

“The argument Elsevier put forth for not allowing me to mine their articles was server overload, which I have proven they should be able to handle easily, considering it was less load than a YouTube video,” he said.

While publishers have lowered the barriers to their journal databases, they have not gone fast enough or far enough for researchers.

Some publishers have resisted liberal text and data mining rules, fearing the re-distribution and selling of their articles. Others have reported little interest in text mining from researchers.

Currently, publishers block data-mining software programmes by default, distributing special licence permissions to academics and university libraries instead. Many work with the Copyright Clearance Centre to give researchers access to content for text mining.
Receive our free weekly EU innovation newsletter, sign up now
Related subjects: Data mining

The Knowledge Future: Intelligent policy choices for Europe 2050
This report elaborates on challenges and opportunities that three "Megatrends" - globalization, demographic change and technological change - represent for Europe’s...