A very important piece of work from RIN – about the critical need for data. Peter Suber has summarised it but you have to read it. This study whould be on the top of all science funders’ reading. The research – carried out by Key Perspectives – aka Alma Swan is thorough and compelling. But you don’t need me to tell you – just read it. (The RAE doesn’t come out with glory).
The Research Information Network (RIN) has released a new study, To Share or Not to Share: Publication and Quality Assurance of Research Data Outputs, June 2008. The study was commissioned by RIN and executed by Key Perspectives. From the executive summary:
…There are two essential reasons for making research data publicly-available: first, to make them part of the scholarly record that can be validated and tested; second, so that they can be re-used by others in new research.
This report presents the findings from a study of whether or not researchers do in fact make their research data available to others, and the issues they encounter when doing so. The study is set in a context where the amount of digital data being created and gathered by researchers is increasing rapidly; and there is a growing recognition by researchers, their employers and their funders of the potential value in making new data available for sharing, and in curating them for re-use in the long term….
We gathered information on researchers’ attitudes and data-related practices in six discrete research areas – astronomy, chemical crystallography, classics, climate science, genomics, and social and public health sciences – and two interdisciplinary areas – systems biology and the UK’s rural economy and land use programme. The primary methodology used was interviews with over 100 researchers, data managers and data experts….
Key findings….
3. …The convention in many fields is that derived or reduced data – as distinct from raw data – are what is made available to other researchers. Providing access to raw data is relatively rare, though it may be the most effective means of ensuring that the research is reproducible. But there is discussion in some fields about the lack of access to raw data.
4. Many datasets of potential value to other researchers and users – particularly those arising from small-scale projects – are not managed effectively or made readily-accessible and re-usable….
5. Many research funders are putting policies in place to ensure that datasets judged to be potentially useful to others are curated in ways that allow discovery, access and re-use. But there is not a perfect match between those policies and the norms and practices of researchers in a number of research disciplines….
10. Some researchers are motivated to publish their data by factors such as altruism, encouragement from peers, or hope of opening up opportunities for collaboration. But the lack of explicit career rewards, and in particular the perceived failure of the Research Assessment Exercise (RAE) explicitly to recognise and reward the creating and sharing of datasets – as distinct from the publication of papers – are major disincentives.
11. Many researchers wish to retain exclusive use of the data they have created until they have extracted all the publication value they can. When combined with the perceived lack of career rewards for data creation and sharing, this constitutes a major constraint on the publishing of data. Other disincentives include lack of time and resources; lack of experience and expertise in data management and in matters such as the provision of good metadata; legal and ethical constraints; lack of an appropriate archive service; and fear of exploitation or inappropriate use of the data.
12. Some publishers are taking steps to underpin the scholarly record by creating persistent links from articles to relevant datasets; and this signposting is viewed positively by researchers.
13. Relatively few researchers have the expertise, resources and inclination to perform themselves all the tasks necessary to make their data not only available, but readily accessible and usable by others.
14. …Datasets on journal websites are commonly in PDF format which is unsuitable for meaningful re-use.
15. Other obstacles to locating and gaining access to datasets produced by researchers and other organisations include inadequate metadata, refusal to release the data; the need for licences (which may restrict how the data may be used or disseminated) and/or for the payment of fees; or the need to respect personal and other sensitivities.
16. Effective use of raw scientific data in particular may require access to sophisticated specialist tools and technologies, and high level programming skills….
Conclusions and recommendations….
3. Research funders and institutions should seek more actively to facilitate and encourage data publishing and re-use by [using the following 10 strategies]….
5. Publishers should wherever possible require their authors to provide links to the datasets upon which their articles are based, or the datasets themselves, for archiving on the journal’s website. Datasets made available on the journal’s website should wherever possible be in formats other than pdf, in order to facilitate re-use.
6. Researchers and publishers should seek to ensure that wherever possible, datasets cited in published papers are available free of charge, even if access to the paper itself depends on the payment of a subscription or other fee.
7. Funders, researchers and publishers should seek to clarify the current confusion with regard to publishers’ policies with regard to allowing access for text-mining tools to their journal contents….
PS: For background, see our post from June 2007 on the launch of this study.
Unfortunately, sometimes it is not only the data that is restricted , but you also have to be very careful not to point out an error in the data, because that might invite the wrath of the company hosting the data: http://www.simbiosys.ca/blog/2008/06/18/public-apology-to-ccdc/
ZZ
Pingback: SimBioSys Blog » Blog Archive » Public apology to CCDC
Pingback: ChemSpider Blog » Blog Archive » When a Scientific Blog Posting, Data Licensing and Open Data Access Come Together