Welcome to the petermr blog! This is one of a series of blogs
from scientists in the Unilever Centre for Molecular Informatics at
Cambridge. I’ll indicate some of the others on my blogroll. For
now, just note that there is another blog specifically dedicated to
Chemical Markup Language (CML) and I’ll be contributing a lot to that as
well.
This blog will cover a wide range of topics that are mushrooing
on today’s web and which will change the practice of science. Areas
which I expect to blog frequently are:
- The relationship between human readable material (“full text”)
 and scientific data. Henry Rzepa and I have coined the term datumemt
 for the synthesis of these, especially using XML technology. the
 scientific publication in its current form is inspired by 19th Century
 orinting technology and “electronic publications” merely encourag
 outdated ways of communication. Web inspired technologies should
 revolutionize scientific communication. A particular interest is the
 development of the “robotic amanuensis” for scientists – personal
 software which can help indivduals read and publish information
 effectively.
- Open data, open source, open access, open knowledge. Unless we
 have free aceess to the primary outputs of science we are denied the
 opportunity to develop new ideas in informatics-driven science. I have
 argued publicly that primary scientific data belong to the scientific
 commons and that they must be free. A corollary is that the outout of
 funded science is not just full-text but the complete supporting
 information environment of the experiments.
- “programming for scientists”. Modern scientists are enhanced
 by “information prosthesis” – the ability to receive and repurpose
 information. If they are able to “program”, they have greater
 expressive power. Many of the future skills will not be with
 conventional programming languages but the tools emerging from the
 explosion of social and technical operations in today’s web. I’ll be
 learning from my colleagues and trying to give readers and contributirs
 a flavour of what is now possible.
- markup languages in (physical) science. These are the
 handmaidens of the goals above. Currently there are a few main
 approaches for content: MathML, GML (geography), Scalable Vector
 Graphics, Chemical Markup Language, AnIML (analytical chemistry),
 ThermoML (theorchemistry). There are many obvious gaps and I’ll suggest
 guidelines for any person or group interested in building a language.
- creation and management of virtual communities. I’v been involved with creating and nurturing communities for the last 15 years including
 BioMOO, the Virtual School of Natural Sciences, XML-DEV, and now the Blue Obelisk. I also believe strongly in
 Wikipedia and related efforts. I’ll review the features of successful communities and the
 guidelines for growth.
We welcome anyone as a poster but require them to register (to
prevent spam). We honour copyright, but ask that posters make there
contributions available under Creative Commons. This allows the posters
to retain their moral rights, but allows us to re-use the blog
(including their contributions) for other purposes if required (e.g. it
might be revised for supporting information, tutorials, etc.) We will
always attribute posters.
Technical note: I can edit this blog (e.g. if I make typos or get something wrong) but no-one else can. If you post a comment, we don’t think that anyone can change it. So be careful!
Please let us know your ideas.
P.
Good stuff. Glad to see you’re blogging and I’m sure I’ll have some more comments on your posts. As good science is replicable, it’s interesting that in cheminformatics the data are controlled so that it is impossible to replicate the experiment — if I understand correctly.
(1)
Absolutely right, Christine. Chemoinformatics is now de facto irreproducible. It used not to be so, as Rich Apodaca has blogged. I shall write more of this in the next two days.