Parallel Session B

Data and Publications

Scholarly data, which in the publishing context means content that backs up and/or provides the source material for articles in journals, is a topic of considerable interest . You do not need to buy into talk about the fourth paradigm to see that digital data is likely to be of growing importance in the research process; in particular how it is curated and how it is linked to publications. The three speakers in this session will each attempt to show how data relates to scholarship and publications in practical terms.

An Introduction to the current debate about data

Anthony Watkinson, Senior Lecturer, Department of Information Studies, University College London, UK

The intention of this presentation is to frame the current state of play. There are a number of different stakeholders involved, including librarians and publishers. In particular, the digital curation movement in the library sector and the conclusions of the NISO/NFAIS project on supplemental data will be explained and analysed. The fact that scholars in many sectors are doing their own thing cannot be ignored but there are many disciplines where the building of big databanks is not a realistic proposition and serious help from information professionals is needed.

Data’s Progress – and its implications for scientific publishing

Fiona Murphy, Earth & Environmental Sciences, Wiley-Blackwell, UK

This presentation has had to be withdrawn as Fiona Murphy is unable to attend: key points will be covered by Anthony Watkinson in his talk.

Improving the scientific record: data citation and publication

Sarah Callaghan, Project Manager for the NERC Data Citation and Publication project, Oxford, UK

Data forms the foundation on which scientific results and conclusions are based; yet in the traditional academic publishing process, only the methods and conclusions resulting from a given data set are scrutinised by a process of peer-review. Scientists are benefiting from new methods which make it easy to collect and record data in quantity, while at the same time they have to deal with increased complexity of their datasets, requiring greater than ever amounts of documentation to ensure that the dataset is comprehensible even a short time after its collection.

Curating, archiving and managing data are difficult jobs, and most data producing scientists have neither the time nor the inclination to focus on them. Yet they are essential, both for the completeness of the scientific record, and to ensure that research data (which often cost a significant amount of money and can be irreplaceable) can be reused for other purposes. Citation and publication of datasets will provide formal recognition and academic credit and will encourage data-producing scientists to document and deposit their data appropriately. This presentation will describe how the mechanisms for citing and publishing data were created and implemented in the UK’s Natural Environment Research Council’s environmental data centres.

It is great to publish your data, but what about the standards?

Christiaan Sterken, University of Brussels, Belgium

This presentation deals with some general conceptual aspects of standards and standardisation in the broadest sense. Examples of good and bad standards will be presented, then the requisites of any standard system will be underlined.

Publishing in the digital environment offers unprecedented facilities for bringing out data of all sorts: experimental measurements, raw data, graphs, calculations, computer code, applets, etc. Unfortunately, the standards against which measurements are calibrated are often omitted because, traditionally, publication of standard data has always been bulky and expensive. As such, seemingly precise data (in the statistical sense) are in reality often inaccurate, because the standards of measurement are not given. This situation has led to a widespread unawareness of standardisation.

But providing the standards is not sufficient: standards must abide to some basic rules, viz., they must be coherent and constant in time, they must be expressed in unambiguous units, and they should occupy the same parameter space as the experimental data. Last, but not least, procedures for quality control of published and archived data must be set up. Not adhering to these principles leads not only to failures, but also to casualties, especially in the medical and pharmaceutical sciences.

Proper standardisation is also of prime importance in the world of publishing, and in academic evaluation via bibliometric indices that are based on citation counts. In the same vein, free access to standards should be offered, operational procedures should be accurately described, and appropriate teaching in the use and the production of standards should be provided.