Welcome to the petermr blog! This is one of a series of blogs
from scientists in the Unilever Centre for Molecular Informatics at
Cambridge. I’ll indicate some of the others on my blogroll. For
now, just note that there is another blog specifically dedicated to
Chemical Markup Language (CML) and I’ll be contributing a lot to that as
well.
This blog will cover a wide range of topics that are mushrooing
on today’s web and which will change the practice of science. Areas
which I expect to blog frequently are:
- The relationship between human readable material (“full text”)
and scientific data. Henry Rzepa and I have coined the term datumemt
for the synthesis of these, especially using XML technology. the
scientific publication in its current form is inspired by 19th Century
orinting technology and “electronic publications” merely encourag
outdated ways of communication. Web inspired technologies should
revolutionize scientific communication. A particular interest is the
development of the “robotic amanuensis” for scientists – personal
software which can help indivduals read and publish information
effectively. - Open data, open source, open access, open knowledge. Unless we
have free aceess to the primary outputs of science we are denied the
opportunity to develop new ideas in informatics-driven science. I have
argued publicly that primary scientific data belong to the scientific
commons and that they must be free. A corollary is that the outout of
funded science is not just full-text but the complete supporting
information environment of the experiments. - “programming for scientists”. Modern scientists are enhanced
by “information prosthesis” – the ability to receive and repurpose
information. If they are able to “program”, they have greater
expressive power. Many of the future skills will not be with
conventional programming languages but the tools emerging from the
explosion of social and technical operations in today’s web. I’ll be
learning from my colleagues and trying to give readers and contributirs
a flavour of what is now possible. - markup languages in (physical) science. These are the
handmaidens of the goals above. Currently there are a few main
approaches for content: MathML, GML (geography), Scalable Vector
Graphics, Chemical Markup Language, AnIML (analytical chemistry),
ThermoML (theorchemistry). There are many obvious gaps and I’ll suggest
guidelines for any person or group interested in building a language. - creation and management of virtual communities. I’v been involved with creating and nurturing communities for the last 15 years including
BioMOO, the Virtual School of Natural Sciences, XML-DEV, and now the Blue Obelisk. I also believe strongly in
Wikipedia and related efforts. I’ll review the features of successful communities and the
guidelines for growth.
We welcome anyone as a poster but require them to register (to
prevent spam). We honour copyright, but ask that posters make there
contributions available under Creative Commons. This allows the posters
to retain their moral rights, but allows us to re-use the blog
(including their contributions) for other purposes if required (e.g. it
might be revised for supporting information, tutorials, etc.) We will
always attribute posters.
Technical note: I can edit this blog (e.g. if I make typos or get something wrong) but no-one else can. If you post a comment, we don’t think that anyone can change it. So be careful!
Please let us know your ideas.
P.
Good stuff. Glad to see you’re blogging and I’m sure I’ll have some more comments on your posts. As good science is replicable, it’s interesting that in cheminformatics the data are controlled so that it is impossible to replicate the experiment — if I understand correctly.
(1)
Absolutely right, Christine. Chemoinformatics is now de facto irreproducible. It used not to be so, as Rich Apodaca has blogged. I shall write more of this in the next two days.