OSCAR4 launch roundup


IMO the OSCAR4 launch was a great success. We had visitors from outside the Unilever Centre and also remotely on Twitter and the streaming video. The talks were very well presented and were all captured on stream and more permanently. I have had a peek at the recordings and they will be useful in their own right – the text is a little small in places for mine (but the blog is public). For me there was a real surprise – captured on video. I compared OSCAR3 to OSCAR4 as a Heath Robinson train to a Lego ™ train. As I switch back to the web page there was a Google splash page with Stevenson’s rocket – completely unplanned. For a few secs I thought it was deliberate but it’s just a coincidence. Follow the web page http://www-pmr.ch.cam.ac.uk/wiki/OSCAR4_Launch to see details of where the videos are available

Here are the famous cupcakes and the journal-eating-robot…

OSCAR4 is a library – we trailed bits of its API – an Dave Jessop did an excellent job of pulling out the essentials and showing how with a few commands we could search for chemical terms, customize the dictionaries and ontologies and create chemical structures.

The first task was to extract NamedEntities from text. If you know what text is, and know what a NamedEntity is then it’s simple. Here’s the code. The whole code:


        Oscar Oscar = new Oscar();

List<NamedEntity> namedEntityList = oscar.findNamedEntities(text);


This 2-line program (idiomatic in almost all modern languages) says:

  • Create an Oscar object
  • Feed it some text and get a list of the named entities.

That’s it. If you understand the terms “text” and “named entity” then you can run Oscar. If you don’t know what a named entity is then just run the program and look at what comes out.


Here’s OSCAR munching a spectrum:

        String text = “1H NMR (d6DMSO, 400MHz) d 9.20 (1H, m), 8.72 (1H, s),” +

                ” 8.58 (1H, m), 7.80 (1H, dd, J=8.3 and 4.2 Hz) ppm.”;


Let’s take it in reverse order. In a well engineered library there’s often only one way you can fit the bits together. Like the Lego™ train.


Hmm, I need a DataParser (that was hinted at);


        List<DataAnnotation> annotationlist = DataParser.findData(procDoc);


So now I need to feed it a ProcessingDocument:


        ProcessingDocument procDoc = ProcessingDocumentFactory.getInstance().makeTokenisedDocument(tokeniser, text);


And to make one of those I need a Tokeniser. Ah, here’s one of those


        Tokeniser tokeniser = Tokeniser.getDefaultInstance();


So I made a default one. This is the convention-over-configuration stuff – use what is in the box and it will work, usually in a reasonable manner.

Put them together in the right order:

        Tokeniser tokeniser = Tokeniser.getDefaultInstance();

ProcessingDocument procDoc =

ProcessingDocumentFactory.getInstance().makeTokenisedDocument(tokeniser, text);

        List<DataAnnotation> annotationlist = DataParser.findData(procDoc);


And I have my DataAnnotation (only one because there is only one spectrum). It’s in XML and can be easily converted into CML so displayed in CML-compliant tools.


So here we are in the Panton afterwards..



This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *