I have been invited to give a lecture at the Mathematical Knowledge Management 2007 meeting next week in Hagenberg, Austria. My talk is entitled Mathematics and scientific markup. I am both excited and apprehensive about this – what is a chemist (whose level of mathematics finishes at Part1A for scientists in Cambridge) doing talking to experts in the field?

However in the spirit of the new Web I’m blogging my thoughts before the meeting. This serves several purposes:

- helps me get my ideas in order
- gets feedback from anyone who may have an interest
- identifies other people who may also be blogging about the meeting
- acts as a public resource from which I can give my talk if I have problems with my machine.

The conference topic …

Mathematical Knowledge Management is an innovative field in the intersection of mathematics and computer science. Its development is driven on the one hand by the new technological possibilities which computer science, the internet, and intelligent knowledge processing offer, and on the other hand by the need for new techniques for managing the rapidly growing volume of mathematical knowledge.

The conference is concerned with all aspects of mathematical knowledge management. A (non-exclusive) list of important areas of current interest includes:

- Representation of mathematical knowledge
- Repositories of formalized mathematics
- Diagrammatic representations
- Mathematical search and retrieval
- Deduction systems
Mathassistants, tutoring and assessment systems- Mathematical OCR
- Inference of semantics for semi-formalized mathematics
- Digital libraries
- Authoring languages and tools
- MathML, OpenMath, and other mathematical content standards
- Web presentation of mathematics
- Data mining, discovery, theory exploration
- Computer Algebra Systems
- Collaboration tools for mathematics
What has molecular informatics to do with this? More than it appears. Chemistry overlaps considerably with chemistry and here formal systems are important. It should be possible to explore the formal representation of thermodynamics or material properties in semantic form (though I may find that my use of “semantics” is imprecise or even “wrong”). Repositories are an obviously exciting area – can we find mathematical objects either by form or by metadata? OCR is important for all content-rich disciplines – see below. Inference and semantics are becoming increasingly important in the emerging web. And so I tick about half the topics above – not in mathematical detail, of course, but in the general approach to the problems.

As an example, what objects contain enough structure and canonicalized content that they act as their own discovery metadata? Most objects need a human or a lookup-table to add the metadata for their web discovery. For example you need to know the names of humans – you cannot work these out by looking at them. But in chemistry we can describe a molecule by its InChI – a canonical representation of the connection table (which is not easily human-interpretable). This is both its content and its discovery metadata. You can search Google for molecules though InChIs will find instances of molecules on the web. I wondered what other objects could be identified just by their textual content. Perhaps a poem (although it won’t tell you who wrote it). I started typing lists of numbers into Google and suddenly found I was getting hits on Neil Sloane’s Encyclopedia.

In this a sequence can be identified by its content – search Google for “1,3,6,10,15” and you get A000217 in The On-Line Encyclopedia of Integer Sequences. I had a chat with a well known computer scientists ex-mathematician at WWW2007 and he bet that I couldn’t tell me the next term in a sequence within 5 minutes. I bet him the drinks that I could. So as we had wireless in the bar I searched Google and immediately found the answer – he was astounded – and bought the drinks.

So many of the problems are generic between domains. Searching for MKM2007 I found this paper on how to extract mathematics from PDF (Retro-enhancement of Recent Mathematical Literature). It’s better than recreating cows from hamburgers as they have some access to source – but there are similarities to what we are trying to recover from PDF.

I shall use the tag mkm2007 for this and subsequent posts (in which I’ll explore things like the different between top-down and bottom up management systems.) No one has yet used it but maybe someone will find it – let’s see.