Open-Data-driven science and a brokering system for ONS

Cameron Neylon and Jean-Claude Bradley have blogged about a directory of Open Notebook Science (ONS) where projects including this approach can register.

21:19 14/10/2007, Cameron Neylon,
As has been flagged up by Jean-Claude Bradley there are a couple of places now where people can sign up to say that they have Open Notebook Science in their laboratory, practise Open Notebook Science,or even would like to find a place where they can keep an Open Notebook.  Jean-Claude has put a list on the Nodalpoint Wiki and I have set up a database at DabbleDB. Dabbledb is a rather cool web based database system that provides free access as long as you make the database contents freely available. Because the data is completely open I am not asking for people’s email addresses.

If you want to be included in the database you can put your details in on the form here. This will allow anyone to re-use the data (which you can find here) to generate lists on appropriate web-pages, or maps or any number of other nice re-uses of the data. If you are interested in the working of the database give me a yell and I can give you admin access.

PMR: As soon as we start to get the results of the NMR calculations on NMRShiftDB we'll put them up, but I don't want to register this before we have actually started (I have seen too many empty web pages in my career and I don't want to leave them myself.) So we all have to be a little patient.

But then I thought that CrystalEye is an ideal resource for data-driven science. I've blogged about how crystal-data-driven research started in the mid-1970's but there is a great opportunity to use crystalEye in new ways. Unlike the Cambridge Data Centre the data includes inorganic structures. The software is modern and extensible and it should be economic to develop many new applications.

CrystalEye is, of course, OpenData (we use the OKFN licence at present) and anyone can download it (we are still working out how to implement APP - Atom Publishing Protocol - to make this easy). But we'd also love to explore collaborative projects. We have all the data and software here so you don't have to set it up. Crystallographic data makes good undergraduate, Master's and PhD projects - Egon should know. So if you - or your collaborator/students/supervisor/whomever is interested in using this data perhaps we could explore this on the Wiki.

6 Responses to Open-Data-driven science and a brokering system for ONS

  1. Peter, I posted on the Chemspider Blog asking the question who would like to see us link to CrystalEye. Linking via structures to your collection makes sense in terms of expanding the semantic web of chemistry. the link is here:

    Two people have commented so tonight I looked for the CrystalEye data to download but cannot find the download site. Sorry, it may be in an obvious place but I don't see it. Please point me to the link.

    I know you see PubChem as a valuable there a reason that the CrystalEye data are not deposited there?

  2. One additional question... are all articles linked to CrystalEye Open Access articles or are some linked with permission from the publishers? Thanks

  3. Pingback: ChemSpider Blog » Blog Archive » Another Response to Constructive Feedback from Peter Murray-Rust…

  4. pm286 says:

    First, crystalEye is a resource for Nick Day's thesis and is made available as Open Data as part of our commitment to Open Science. In effect it is Open Notebook Science. One reason was that we wished people to try it out and comment when they found problems with the data - so far there haven't been any.

    It is currently possible to scrape the site and thereby download the whole content - this can be done without our permission. At least one group has done this and is putting it into Freebase.

    Since this is not much fun we are hard at work creating an Atom Publishing Protocol which will mean it can be downloaded in reasonable sized chunks. Since this isn't really part of Nick's thesis it takes time.

    I do not believe that Pubchem accepts this sort of data and anyway it isn't appropriate or necessary. As long as Pubchem points to the data it can be identified. Rather than aggregate data (the old way) the modern way is to point to distributed data resources using URIs and RDF. We are actively talking with Steve Bryant about the right type of URI to use.

    But anyone can deposit the links in Pubchem if they want - the data is Open. That's the point.


    The data points to the DOI of the paper from which the CIF was downloaded. Where the CIF is copyrighted (in my opinion illegallly) we may or may not point to the publisher's site.

    The articles are mainly not Open Access but the CIFs are posted in accessible locations.

  5. People can certainly list their systems on the wiki and Dabble but even if they don't have anything concrete they can list their policies. The idea is to help students and postdocs find PIs that have compatible views of Open Science.

  6. Thanks for the comments in 4. above. We don't need all the data..what we do with ChemSpider is index the structures and provide the links back to the original data sources. For example, the page here are the links to fluconazole when the sources are online). Would you happen to have an SDF file of the structures or maybe Nick can export the structures with the IDs which we can use to construct URLs pointing back to the structures. This would save a lot of work. if this is not possible I understand but I am asking to save time. Thanks

