Data sharing and Quixote meeting (Zaragoza)

I am talking an a few minutes to a group of chemists, other scientists, computational scientists, informatics specialist, IR managers, etc. in ZCAM (computational chemistry) in Zaragoza. This is a very exciting project and we hope to not only talk, but actually do things today.

Rather than use Powerpoint I blog my materials. A lot is present in previous blog posts, but this adds an overview of what I might say, and some of the materials I might use. What I actually say depends as always on what has already been said, and not said and the interests of the people present.

My motivation

[With ChEBI, Christoph Steinbeck] Compute properties (spectra, conformations, reactivity) of compounds in the human metabolome.

Quixote…

Open to all – no central ownership (cf. Wikipedia). Not my project, but OUR
Very cost-effective with a high potential for success
A long-tail discipline, with discrete data.

Data Sharing

Must be driven by scientists (researchers, editors)
Should be domain-specific

Why share data?

To promote MY work and receive credit (data citation)
To save MY work
To share MY datasets with ME (i.e. look for paterns, correlation)
To share MY datasets with MY colleagues
To share MY datasets with the world
To improve methodology
To validate science

What are the problems?

People want to use their results as intellectual capital
People can sell their data for money
It takes effort and money
It challenges established interests (priesthood, market)
Chemists are more conservative than many disciplines

Why/how will it happen?

Because individuals (e.g. grad students) find it useful
Because groups find it useful
Because journals find it useful enough to mandate
Because funders require it
Because developers (e.g. programs)find it useful

What should we do today?

Make a wish list for compchem data sharing
What is possible right now?

Resources related to Data Sharing

Recent blogs by PMR

criteria-for-datasharers/
data-repositories-for-long-tail-science-setting-the-scene/
data-publication-some-replies/
publishing-data-the-long-tail-of-science/

Data repositories
Dryad A repository for data, especially biosciences
2011/07/28/uk-parliament-report-supports-dryad-and-data-access/
JISC and Dryad
How to deposit in Dryad
FigshareMark Hahnel’s commuity for sharing figures and data
Dataverse – publicly visible (not fully open) datasets in social science
dataverse deposit-terms
Authors in Figshare
Figures in Figshare
An author in Figshare
Proliferation_of_PBMCs,_expressed_as_stimulation_indices_ in Figshare
Polymerase_chain_reaction_data_from_matrix_metalloproteinase in Figshare

Validation of Science requires data
http://pubs.acs.org/doi/suppl/10.1021/jo200117p
Spectral data should be digital Mestrec blog (NMR software) argues that if the data were digital potential fraud would have been detected
The PDF of controversial data This data cannot be easily understood by machines so validation is impossible

CML, Quixote and Crystaleye Data sharers
Crystaleye home page
Crystaleye2 data sharer
Quixote data sharer
CML resources (dictionaries and conventions)
COMO Chem Eng knowledgebase

SPARQL query for Crystaleye2

[This will be used interactively with crystaleye2. Try it under SPARQL. It’s very new. If it works, congratulate Sam.

If it fails maybe the server is down, or blame me.]

Report only structures with R values less than 0.02:

PREFIX cif: <http://www.xml-cml.org/dictionary/cif/>

SELECT ?uri ?rfactor {

?uri cif:refine_ls_r_factor_gt ?rfactor

FILTER (?rfactor < 0.02)

}

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Data sharing and Quixote meeting (Zaragoza)

Egon Willighagen says:

August 26, 2011 at 5:22 am

Is that SPARQL end point public?

- pm286 says:
  
  August 26, 2011 at 7:22 am
  
  You mean: is there an API/URL. Will have to ask Sam.
  
Egon Willighagen says:

August 26, 2011 at 5:24 am

Oh, never mind: http://crystaleye.ch.cam.ac.uk/
I really need to have a close look at that software, used for Quixote and now for CrystalEye too… nice work!

- pm286 says:
  
  August 26, 2011 at 7:27 am
  
  It’s very nice. Still very new. It’s at https://bitbucket.org/chempound – suggest you see if you can get it running. Sam has developed a modular approach where different CML conventions can be used
  
Marcus D. Hanwell says:

September 4, 2011 at 4:00 pm

This is exciting work, sorry I missed the meeting. We had a very interesting session on chemical databases in Frederick, MD too. I met someone interested in sharing his data more widely, and will be following up but there is potential to use Quixote there. I need to try and get an instance up and running now the code is out there.

Data sharing and Quixote meeting (Zaragoza)

5 Responses to Data sharing and Quixote meeting (Zaragoza)

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta