An Open Letter in Support of Palaeontological Digital Data Archiving


Dear Reader,


The Internet offers an unprecedented opportunity to disseminate palaeontological research data, in all its forms, far and wide, for better ease of accessibility, transparency, innovation, synthesis, and education and outreach.

This is obvious. But we believe journals are not making full use of this yet. Significant barriers to data re-use are all too commonly encountered.

To quote the Panton Principles:

" Science is based on building on, reusing and openly criticising the published body of scientific knowledge. "

We believe therefore that truly scientific palaeontology should open up the data underlying its many and varied publications, by making all analysed data (as much as is technically possible) available in appropriate, useable, digital formats, on the day of publication (and no later than this), to fully-enable and even actively encourage validation, replication, repurposing, re-analysis and synthesis of published works.

For this reason, we strongly advocate, wholeheartedly that funding bodies, academic societies, research journals and individual researchers act now to make digitally-archiving research data in appropriate freely-accessible databases a 'normal' part of the publication process.




Ross Mounce (University of Bath), Aodhán Butler (Uppsala University), Katie Davis (NHM), Alex Dunhill (University of Bristol), Russell Garwood (Imperial College London), James Lamsdell (University of Kansas), David Legg (Imperial College London), Graeme Lloyd (NHM), Michael Pittman (UCL), Rachel Warnock (University of Bristol), Jo Wolfe (Yale University)

Add your name to the list in support of our statement



1. When using the term 'research data' we embrace a wide definition encompassing every form of data that is digitised and analysed as the basis of a research publication. In the specific context of palaeontology this includes stratigraphic occurences, nomenclatural statements and taxonomic data (we note that the palaeontological community commonly digitally-archives these data on PaleoDB, albeit voluntarily); photographic images (that can be digitally-archived in colour, and in better quality than the paper/pdf publication version); phylogenetic data including, character-by-taxon matrices, cladograms (trees), and associated statistics; specimen measurement data; CT scanning and other non-photographic imaging data and many more...

2. Few, if any, technical barriers exist that could restrict the digital archiving of ALL palaeo-research data. Often desired data is very small, typically just kb's for specimen measurements and phylogenetic data. In the case of photographic images, it can be larger, requiring MB's, but even so, the cost of hard-disk storage space is so low these days that large-scale archiving of these is a simple matter. Even for much larger data forms such as CT scans that can be many GB's, the use of digitally-aware technology such as peer-to-peer file sharing, as practised by BioTorrents, fully-enables such data to be freely shared by distributing the load

3. To fully-enable Reproducible Research we need to look 'beyond the PDF' and archive data instead in its 'natural' digital formats e.g. 'Hennig', Nexus, NeXML and PhyloXML formats are appropriate for phylogenetic data matrices; Newick format is appropriate for cladogram topology data; spreadsheet files or simpler, tab-delimited or comma-seperated value (.csv) plain-text formats are appropriate for specimen measurement data; photographic images should be archived in full-colour, high-resolution, non-proprietary formats so that those who do not have physical access to the specimen can still see a fair degree of detail. Indeed, we should also bear in mind that the wider tax-paying non-scientist community may also wish to see these photos (and other data), that in many cases they have arguably paid for via tax-raised money channelled into government funding bodies [see for more]. By archiving our data online, we can proudly demonstrate real output from the work we've been doing.

4. We note that several funding bodies, already explicitly suggest that data output from their funded projects should be made publicly and freely accessible e.g. :
NSF - "Grantees are expected to encourage and facilitate such [data] sharing"
NIH - "NIH believes that data sharing is essential"
BBSRC - "BBSRC is committed to getting the best value for the funds we invest and believes that making research data more readily available will reinforce open scientific enquiry and stimulate new investigations and analyses "

5. The Journal of Vertebrate Paleontology (SVP), and Paleobiology and Journal of Paleontology (PalSoc) have made some progress already towards instigating digital data archiving plans. However we note that the Journal of Paleontology has not yet archived any data with Dryad as of yet. To our knowledge, no other paleontology journals have yet mandated digital data archiving in centralised databases as part of editorial policy.

6. A list of relevant databases and data initiatives for the palaeontological community:
PaleoDB, MorphoBank, BioMesh, TreeBASE II, BioTorrents, DigiMorph, MorphologyNet, MorphBank, Dryad, The Open Dinosaur Project, Zoobank, Cladestore, FigShare

