Data sharing and Quixote meeting (Zaragoza)

I am talking an a few minutes to a group of chemists, other scientists, computational scientists, informatics specialist, IR managers, etc. in ZCAM (computational chemistry) in Zaragoza. This is a very exciting project and we hope to not only talk, but actually do things today.

Rather than use Powerpoint I blog my materials. A lot is present in previous blog posts, but this adds an overview of what I might say, and some of the materials I might use. What I actually say depends as always on what has already been said, and not said and the interests of the people present.

My motivation

[With ChEBI, Christoph Steinbeck] Compute properties (spectra, conformations, reactivity) of compounds in the human metabolome.


  • Open to all – no central ownership (cf. Wikipedia). Not my project, but OUR
  • Very cost-effective with a high potential for success
  • A long-tail discipline, with discrete data.

Data Sharing

  • Must be driven by scientists (researchers, editors)
  • Should be domain-specific

Why share data?

  • To promote MY work and receive credit (data citation)
  • To save MY work
  • To share MY datasets with ME (i.e. look for paterns, correlation)
  • To share MY datasets with MY colleagues
  • To share MY datasets with the world
  • To improve methodology
  • To validate science

What are the problems?

  • People want to use their results as intellectual capital
  • People can sell their data for money
  • It takes effort and money
  • It challenges established interests (priesthood, market)
  • Chemists are more conservative than many disciplines

Why/how will it happen?

  • Because individuals (e.g. grad students) find it useful
  • Because groups find it useful
  • Because journals find it useful enough to mandate
  • Because funders require it
  • Because developers (e.g. programs)find it useful

What should we do today?

  • Make a wish list for compchem data sharing
  • What is possible right now?

Resources related to Data Sharing

Recent blogs by PMR

SPARQL query for Crystaleye2

[This will be used interactively with crystaleye2. Try it under SPARQL. It’s very new. If it works, congratulate Sam.

If it fails maybe the server is down, or blame me.]


Report only structures with R values less than 0.02:


PREFIX cif: <>

SELECT ?uri ?rfactor {

?uri cif:refine_ls_r_factor_gt ?rfactor

FILTER (?rfactor < 0.02)


This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Data sharing and Quixote meeting (Zaragoza)

  1. Is that SPARQL end point public?

  2. Oh, never mind:
    I really need to have a close look at that software, used for Quixote and now for CrystalEye too… nice work!

  3. This is exciting work, sorry I missed the meeting. We had a very interesting session on chemical databases in Frederick, MD too. I met someone interested in sharing his data more widely, and will be following up but there is potential to use Quixote there. I need to try and get an instance up and running now the code is out there.

Leave a Reply

Your email address will not be published. Required fields are marked *