Category Archives: ahm2007

Name that graph (acknowledgements to Rich)

Rich Apodaca has an excellent series of graphs (e.g. Name That Graph) where he has removed key annotations (titles, units, axes, etc.) I'm not going to to steal his theme but there is one graph that I hope my readership is familiar with. I've been using it in an article - more later - and will also blog before the article appears. So, with apologies to Rich, what;s this?

A clue. It's about chemistry. But you don't need to be a chemist... and since most of you should know the answer, please don't post it

AHM2007: Best paper (Jon Blower) - Virtual globes Hurricanes and penguins

Jon Blower was awarded the best paper at AHM2007 . This is an an outstanding example of escience where SIMPLE technology is brought to bear on multiple datasets, each of which by themselves does not carry a message but the combination does (

Virtual globe technology holds many exciting possibilities for environmental science. These easy-to-use, intuitive systems provide means for simultaneously visualizing four-dimensional environmental data from many different sources, enabling the generation of new hypotheses and driving greater understanding of the Earth system. Through the use of simple markup languages, scientists can publish and consume data in interoperable formats without the need for technical assistance. In this paper we give, with examples from our own work, a number of scientific uses for virtual globes, demonstrating their particular advantages. We explain how we have used Web Services to connect virtual globes with diverse data sources and enable more sophisticated usage such as data analysis and collaborative visualization. We also discuss the current limitations of the technology, with particular regard to the visualization of subsurface data and vertical sections.

JOn showed some stunning slides and animations, which had the theme of combining datasets

He showed Keyhole Markup Language (KML) which supports simple geographic features - points, lines, polygons, etc.  Successful because it's NOT trying to do too much. It enables the mashups between the datasets - the common frame of reference. And it, together with the software is all Open (unlike the Google Earth mashup approach).

Hurricane Katrina - satellite meterology mashed with hurricane intensity showed unexpected Sea cooliing  which was critical to understanding the effect of hurricanes in mixinf hot and cold sea.

A mashup of penguin tracks (through radio transmitters) with satellite chlorophyll showed that the penguins circulated round areas of high chlorophyll - presumably in the ocean (?).

The message is that we need open data, open standards and code, simple, universal technology for visualisation.

Critical to fund the data exploration area.

So did the recent Hurricane Felix cause the sea to cool? Apparently much less than Katrina from his movie. But this is real eScience commenting on today's events of world importance.

DBPedia2: major opportunity for semantic web (including chemistry)

I have blogged about the exciting potential of DBPedia before ( dbchem" href="">dbpedia - structured information from Wikipedia => dbchem). It is a semistructured RDF triple collection created automatically from Wikipedia. The really exciting thing is that huge numbers of WPedians have contributed to DBPEdia without even knowing it. Simply by evolving simple community metadata (tagging and infoboxes) the WPedians have created a top-class semantic resource. A WP category of, say, "1997 deaths" gets translated to a triple something like:
:D iana :death_date "1995"^^xsd:date
which says that the object with label "Diana" had a "deathDate" category with value "1995" which is is of type date.
Now the OKFN has blogged

DBpedia recently released the new version of their dataset. The project aims to extract structured information from Wikipedia so that this can be queried like a database. On their blog they say:

The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.

As well as improving the quality of the data, the new release includes coordinates for geographical locations and a new classificatory schema based on Wordnet synonym sets. It is also extensively linked with many other open datasets, including: “Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets”.

This is probably one of the largest open data projects currently out there - and it looks like they have done an excellent job at integrating structured data from Wikipedia with data from other sources. (For more on this see the W3C SWEO Linking Open Data project - which exists precisely in order to link more or less open datasets together.)

PMR: DBPedia1 was mindblowing, but - not surprisingly - suffered from inconsistency and incompleteness. For example there were several RDF predicates for deathDate "death_date", "deathdate", etc. This is entirely forgivable for a first try. As DBPedia awareness spreads through WPedians they will converge on how infoboxes are created to give maximum semantic value. It only needs one or two evangelists in a discipline - e.g. in chemistry - to work this out, show the value, and then popularise it. The main body of WPedians will then adopt these methods and rapidly create a coherent semantic hyper-object.

The exciting thing is that this is zero-cost.

This will revolutionise reference chemistry. We have recently shown - and will be demoing at AHM2007 - how we can extract semantic chemistry from eTheses. That means that any student writing a thesis can increasingly link - painlessly - to WPedia for their lighweight ontological resource. Authors will know they are using the terms correctly - readers will know what the terms mean - and much more.

So I predict that with a few years DBPedia will become the semantic resource for chemistry. Every entry in WPedia enhances it - you never go backwards. We'll be able to combine fundamental information for compounds such as colour, melting point, density, etc. There will be enough semantic data that a machine could rediscover the periodic table.
And that's just the start. So, I'll be browsing DBPedia in the blank spaces at AHM2007.

UK eScience All Hands 2007

I'm at the UK eScience All Hands Meeting - the sixth - and I think I have been to all. The meeting is closely, but not completely, coupled to the UK's pioneering investment in eScience (roughly equivalent US term is cyberinfrastrucrure). I'm listening to the keynote - Malcolm Atkinson - :

  • research using eScience
  • research enabling eScience
  • eInfrastructure supporting research and innovation

He highlights Computational Thinking (Jeanette Wing) which will be my reading during (perish the thought) any boring talks:

Cameron Naylor has blogged:

If it hasn’t been obvious from what has gone previously I am fairly new to the whole E-science world. I am definitely not in any form a computer scientists. I’m not a computer-phobe either but my skills are pretty limited. It’s therefore a little daunting to be going for the first time to an e-science meeting. This is the usual story of not really knowing the people from this community and not necessarily having a clear idea of what people within the field or community think the priorities are.The programme is available online and my first response on looking at it in detail was that I don’t even understand what most of the session titles mean. “OMII-UK” is a fairly inpenetrable workshop title for which the first talk is “Portalization Process for the Access Grid”. Now to be fair these are somewhat more specialised workshops and many of the plenary session names make more sense. This is normal when you go to an out-of-your-field conference but it will be interesting to see how much of the programme makes sense.
PMR: Don't panic. There will be a lot of technology that is not familiar. Not all is relevant to you. The people are often more important than the technology.
One of the issues with e-science programmes is the process of bringing the ‘outside’ scientist into the fold. Systems such as our lab e-notebook require an extra effort to use, certainly at the beginning, and during the development process there are often very few tangible benefits. Researchers are always time poor people so they want to see benefits. In theory we are here to demonstrate and promote our e-notebook system but I suspect this may be a case of preaching to the converted. It will be interesting to see a) whether we get much interest b) whether the comments we get are more on the technical implementation or the practical side of actually using it to record experiments.One of the great things about starting this blog has been the way it has facilitated discussion with others interested in open notebook science and open science in general. I am less sure it has brought scientists who are interested in the work in our notebook in. My feeling is that this meeting may be a bit similar. On the other hand it may get us some good ideas on solving some of the problems of visualising the notebook that I want to discuss in a future post.

So if you are at the meeting and want to see the notebook please drop by to the BBSRC booth on Wednesday afternoon and do say hello if you see a shortish balding bearded guy who is looking lost or confused.

PMR: There is a tension between the needs of "scientists" and the desires and directions of "computer scientists". Sometimes they overlap - frequently they don't. A great deal of the technological development takes place because it is needed, but others because it pushes the boundaries of computer science. That's not a bad thing, unless it dominates. I am continually refreshing my judgement about what it gets right and what it doesn't. Some disciplines need heavyweight technology, but others like chemistry probably don't. But using existing lightweight technology is not sexy, and doesn't engage many computer scientists.
I'm tagging this as ahm2007. I could only find 2 tags in Technorati. Compare to www2007 where there were hundreds of posts. So any bloggers might congregate round this tag.