The future of chemical informatics and the forces opposing it (PM-R, ACS 2010)

Dictated to my machine Arcturus

When I spoke at the ACS symposium in March in San Francisco I was delighted that Cameron Neylon was there because he was able to record my presentation. He has now put it online at http://www.viddler.com/explore/CameronNeylon/videos/11/ ( about 26 minutes) and has done a very good job. The text is not always extremely clear (this is a feature of recording screens) but I normally speak a considerable amount of what is on the screen for this purpose so it should be possible to understand the gist if not the detail. The talk covers a wide range of subjects with illustrations of most of them and includes

  • The problems of producing non-semantic information such as continuing to publish in PDF to the exclusion of xml or other modern formats.
  • The major problem of digital rights on scientific information. I argue that this must change and the publishers who continue to copyright data or otherwise hide it behind firewalls are actively preventing science.
  • I argue that almost all the current innovation in Chemical Software comes from individuals and that most of them work in the open source and open data are community.
  • I show that machines can read millions of chemical paragraphs a year and that the major sources are journals (where they are not protected by false digital rights), theses (where they are not hidden in non-semantic repositories) and patents. I demonstrated how we can read a patent in less than a minute and extract most of the chemical reactions from it.
  • I show how it is possible to create machines that crawl the current website of publishers and where allowed abstract, semantify, and aggregate it in searchable form. I believe the large secondary abstracters in chemistry we’ll soon be largely replaced by robotic tools which will create information resources that are more detailed and richer and easier to search. Assuming these are open it will also lead to a wealth of innovation. As an example I showed Nick Day’s Crystaleye which has aggregated nearly 200,000 structures from the literature and and added a huge amount of semantics (http: //wwmm.ch.cam.ac.uk/crystaleye).
  • I showed the potential of linked open data and how the Bio scientists are years ahead of the chemists because they wish to share data rather than protect and resell it.
  • I showed the panton principles which a group of us have launched recently under the auspices of the open knowledge foundation and science commons. These principles are aimed at making Scientific Data open and making it easier to do so.
  • I showed a new project Amy where we are developing ways of talking to fume cupboards rather than using a mouse and where we are incorporating many other techniques of communication such as gestures, ultrasound, Computervision etc.

Once again to thank Cameron for this because it is difficult for me to produce conventional slides of my presentations. This is partly because I use multi media and partly because I do not know beforehand what I’m going to say.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *