petermr's blog

How to write good documentation quickly

Posted on April 26, 2010 by pm286

As you know we launched the Panton Principles 2 months ago. There has been a lot of interest and I have been talking with a number of open access and open data publishers to convince them of the value of making their data explicitly open. I described the principles in words but realized that they needed something more substantial to read which is tailored to their particular needs. So I decided to write an FAQ.

This can be a very tedious process but I enlisted the help of the open knowledge foundation. We decided that I would ask the questions and that we would communally answer them. I wrote a series of questions

And then we set up a pirate pad. This is a communal open website where any one can edit a document. It’s rather like google docs but is better for simultaneous editing and requires less problems in inviting people.

About six or seven people actively participated and within a day we had the bulk of the FAQ written. This is because different people were able to answer different questions because they brought different experience and ideas. If I had had to do the whole thing myself I would probably still be scratching away at some of the questions and answers.

You can see the answers at http://pantonprinciples.org/faq/ .

Joe and I have decided to create an FAQ for Chem4Word. I have suggested about 20 questions along the lines of the FAQ above – and together we will answer them. In fact given the relative ease of speaking to the machine I will find it quite pleasant to dictate some of the answers on this blog. This means that not only can they be incorporated in the FAQ but that there will be a public exposure of some of the issues.

Posted in Uncategorized | 1 Comment

OKCon 2010 thoughts

Posted on April 26, 2010 by pm286

A brief blog post with a few thoughts about the okcon conference in London last Saturday.

It was a wonderful meeting where I got the feeling that the Open Knowledge Foundation was now a real power in the world of making information and resources available to everyone. It was well attended and all the sessions were exciting and varied. The plenary session started with a State of the Nation session led by Rufus which has already been recorded and posted on vimeo

http://vimeo.com/11220474 Thanks to Jo Walsh). There were presentations ( see previous blog post) which had a wide variety of topics. In my own presentation I tried to show how science was critical to making major decisions in the current world such as in climate change. Access to data is critical and it is frequently difficult to know what data exists or to get it even when that is known. For example the IPCC copyright their publications and require formal permission to reproduce any material. This is unacceptable in the modern world where we are increasingly requiring machines are to discover information and bring it back to us. These machines cannot and should not be required to understand legal niceties so the only reasonable way forward for the semantic web is for all public information is categorically open. I reported on the work done by the OKF and Science Commons on creating appropriate protocols and licences for ensuring that their data was dedicated to the public domain and appropriately licensed. I believe that the OKF’s support of the Panton Principles is an important milestone in open science.

There were many exciting presentations but two that stuck out work on clean climate code and on open street map. The presentation on climate code dealt with the FORTRAN which was used to generate the hockey stick graph. Some critics claimed that this FORTRAN could not be compiled and was of such a low quality that it could not have been used. The presenters showed that they could in fact compile the FORTRAN and reproduce the graph pretty well. There were several other groups who had written similar programs and generated very similar curves. This shows that the actual calculation is reproducible. They are now campaigning that all code used in this endeavour should be clean and public. Our group takes a similar view in chemoinformatics software.

The work by Open Street Map was also impressive. When the earthquake struck in Haiti the current maps were extremely poor and it was difficult for the rescue services to know where the roads had been. However various companies and organisations made satellite images available and in a remarkably short time volunteers created high quality maps of Haiti allowing the rescue services to know where buildings had been and how to get to them. In fact the maps of Haiti are now superior to those available before the earth quake. This is a major credit to OSM which has reached this position in only about five years from its early beginnings in mapping the streets of London by bicycle courier GPS traces.

We finished with a session in the bar where we discussed what needed to be done to help the effort to make climate change data available and computations more accessible. We agreed to set up a working party and Jonathan Gray at the OKF is the contact for anyone who wishes to explore how we can help this process.

I am proud to be a member of the advisory board of OKF and congratulate all those who organised the meeting especially Rufus, Sara and Jo.

(In the dictation I am like don marquis’ archie the cockroach who cannot reach the shift keys of the typewriter. I shall learn).

Posted in Uncategorized | 2 Comments

Can machines understand science?

Posted on April 26, 2010 by pm286

I am taking a new approach to blogging by dictating my thoughts to the machine.

On Friday I am giving a talk called “Can machines understand science?”. I will argue that the combination of technology and information now allows us to communicate with machines and for machines to communicate with us. This does not cover the whole domain of human activity but in limited areas such as formal aspects of chemistry machines can perform as well as many human beings.

John Searle devised a thought experiment of the” Chinese room” where a human who did not understand Chinese was in a room. The person received Chinese characters which they looked up in a large book of instructions which told them how to react to the characters and how to transmit an appropriate output. If the book of instructions was sufficiently large the machine might output instructions that appeared to be an intelligent response to the input. If this could be done such that a human observing the process from outside could not tell whether there was a human or a machine inside, Searle asserts that the machine+human “understands” Chinese.

I believe that we are in the same position with some areas of chemistry. Machines can carry out tasks in a way that cannot be distinguished from humans doing the same task. For example I believe that a machine can answer some chemistry questions on exam papers as competently as a human. These will fall into a number of categories such as

· regurgitating rote to learning

· carrying out certain algorithms

· and looking up rules or data in formal procedures.

For example we have now written a system OPSIN which can translate IUPAC nomenclature into chemical diagrams. Daniel Lowe has now achieved a very high success rate for organic compounds with over 95% conversion and virtually no errors. This is clearly considerably better than a first year undergraduate who only has a limited chemical vocabulary.

Of course the machine has to read the examination paper and if this is on printed paper this is a technical problem (although current scanning will probably allow most of the essential material to be captured). There will be ambiguities especially for short lines and dots and the system has to be able to make a reasonable guess [1]. If the paper was available in ASCII or xml then it would be possible to read it without errors. Having done that the machine has to understand the language in the question but here we are fortunate in that most exam questions are phrased in very formal language and it is fairly easy to apply language processing techniques to understand them. As a result I assert that a machine could answer an organic nomenclature question on an exam paper.

This is of course only one small sub domain of chemistry and a fairly unusual one in that it consists of a very large number of fairly well explained rules. There are other parts of chemistry which also have formal rules such as balancing equations or predicting the outcome of well described reactions such as in elementary organic chemistry. In these I believe that machines can do as well as student chemists.

You will argue that this is a very small part of what chemistry is about but we have been able to take it considerably further. Our machines can now understand parts of research papers such as the experimental recipes. This is a valuable process in that the machines can now reach the literature much faster than humans. For example we can read a 300 page patent in a minute or two and understand much of the chemistry. This has been automated to scale to the weekly output of patent offices.

This blog post was dictated. It is a very different process from I think the style is rather stilted. However with practice I expect to be able to dictate my thoughts to the machine at least as fast as I can type them. Stop to think what the machine has actually done in transcribing my audible noises into meaningful English language sentences. There is a lot that has to be done by the machine but that emphasises how to considerable the advances have been in the last 30 years.

[1] Rant: and of course PDF isn’t much better.

Posted in Uncategorized | 2 Comments

Chem4Word and Blue Obelisk software

Posted on April 25, 2010 by pm286

Egon Willighagen has been a pillar of ODOSOS chemistry and comments yesterday in Chem4Word goes Apache 2.0

Early March I reported about Konstantin‘s JChemPaint-based chemistry plugin for OpenOffice, but there is competition: Chem4Word. Being for Microsoft Word, the plugin only works on top of proprietary software, unfortunately; therefore, I cannot tell you if Chem4Word release is any good, but what Jim has showed me about a year ago, it is pretty cool. Another big difference is that Microsoft gave the Chem4Word a big grant, and Konstantin does not have such funding, AFAIK, and relies on community support.
Now, Chem4Word was released earlier this month, as announced by Joe, and I just heard from Jim about it now being opensourced (and Peter blogged it too). Congratulations to all involved in the development! The Chem4Word project page indicates the actual license: Apache 2.0. Good choice!
Now, I said that a limitation of the plugin is that it requires proprietary software to run. This is why you will not quickly see my use it. Well, this is even why you do not see any screenshot! However, this should not spoil the news. This is for two reasons:

1. The plug-in is Open Source: this means that the community can learn from their project, and how the make molecular structures in Word documents semantic.

2. The plug-in saves the chemistry in the Chemical Markup Language in the XML-based Word document: this means that anyone will be able to extract the molecular structures in a semantic meaningful way.

And that’s, to me, the biggest news: if the organic chemists start using this plug-in, this will be a big win for Open Data. I am sure this is the hidden agenda of an unorthodox move of our fellow Blue Obelisk community members.

Many thanks Egon – carefully argued. I wouldn’t describe C4W as competition for OpenOffice – even if the community wishes to port it. In the Blue Obelisk community we have a useful amount of “duplication” or different ways of doing things, but we don’t compete. We are, in fact, glad when others come up with solutions that mean we don’t have to write code ourselves. Typical examples are CDK and JUMBO and Joelib, JChempaint and Chem4Word, etc. These give people an environment in which to try out new ideas, check consistency of data, etc. But we avoid having having 2 different versions of the Periodic table, bonding radii etc. We try to agree on the explicit and implicit semantics and interpretation of chemistry. We use CDK for substructure search and 2D diagram generation, for example; openbabel for substructure search and JUMBO for crystallography and geometry.

In the current case Konstantin is welcome to borrow material from C4W as long as credit is given. In practice .NUMBO is sufficiently different from existing approaches that there probably aren’t major bits that can be borrowed. But maybe it offers a chance for a fork – I don’t know enough about the details of the code. But the more Open offerings that there are, the more we convince the chemistry community that it’s worth using and worth developing.

Posted in Uncategorized | 1 Comment

Is Chem4Word Open Source? Yes (it can be forked)

Posted on April 25, 2010 by pm286

I have got a pingback (http://passthesource.org.nz/2010/02/11/enemy-action/ ) from my announcement of Chem4Word (our chemistry Add-in for Word). It is clearly argued and I share many of the sentiments. This is a LONG Reply.

A while ago, I asked whether we are seeing a trend to promote shallow layers of “open source” on top of a deep proprietary software stack. Once is happenstance. Twice is coincidence. The third time it’s enemy action. [PMR: This refers to Microsoft’s funding of the British Library and the City of Edmonton; the phrase (from Goldfinger) argues that there is a concerted campaign by Microsoft to use Open Source to create lockin to its products].

As a lapsed chemist, it saddens me to criticise a project with such a worthy goal. But this software spreads proprietary lock-in, not freedom. Those wishing to use it can only do so by first buying a stack of proprietary software. Those receiving documents created using it may well not be able to open them unless they have the same software and the same plug-in. Those who distribute the software, or documents created using it, are making science less free.

I am left with some questions:

Will those funding the project also be funding a port to a free software alternative, such as OpenOffice; if not, why not?
Is the phrase “open source” being embraced, extended and de-commoditised?
[…]

This shows once again what happens when we focus on the software licensing, instead of on the user’s freedom. Or am I missing something? Are the chemists involved in this doing something that I have missed?

Before I tackle this I’ll recall what I wrote a year ago (quoted by Bill Hooker) http://www.sennoma.net/main/archives/2009/03/peters_mr_and_s_on_science_and.php

[BH] Peter MR takes the view, with which I concur, that it’s more important to get scientists using semantic markup than to take an ideological stand against Microsoft:

[PMR] Microsoft is “evil”. I can understand this view – especially during the Hallowee’n document era. There are many “evil” companies – they can be found in publishing (?PRISM), pharmaceuticals (where I used to work) Constant Gardener) , petrotechnical, scientific software, etc. Large companies often/always? adopt questionable practices. [I differentiate complete commercial sectors – such as tobacco, defence and betting where I would have moral issues] . The difficulty here is that there is no clear line between an evil company and an acceptable one .[…]

[PMR]The monopoly exists and nowhere more than in in/organic chemistry where nearly all chemists use Word. We have taken the view that we will work with what scientists actually use, not what we would like them to use. The only current alternative is to avoid working in this field – chemists will not use Open Office.

[BH] There’s a difference between the plugins being Open Source and the plugins being useful to the F/OSS community. If collaborators hold Microsoft to real interoperability, the “Evil Empire” concerns largely go away, because the project can simply fork to support any applications other than Word.

There has also been a brief appeal from Glyn Moody (with whom I shared a platform yesterday at the Open Knowledge Foundation) for the OO community to fork and port C4W. Glyn is a consistent critic of threats against freedom. This includes NetNeutrality, extension of copyright and software piracy (ACTA). He has also been severely critical of the process used by Microsoft to get OOXML adopted as a standard by ISO (http://www.computerworlduk.com/community/blogs/index.cfm?blogid=14&entryid=2889).

(http://identi.ca/notice/29705683) Chem4Word is out as OpenSource – http://bit.ly/beigya great, but only works with MS Word: could someone hack this for OpenOffice, please?

First, to answer the questions:

1. There is currently no Microsoft funding to our group to translate Chem4Word into OpenOffice, but I will transmit the request to them and probably suggest they reply directly (although I can carry the message). Whether we are the best group to do it will depend on the scale of the project and what elements of research there are in it.

2. I cannot answer this from my personal interactions with Microsoft (I deal primarily with MSResearch). Microsoft has only fairly recently become active in the Open Source world. It has now joined the Apache Foundation. That means the issues will be more public and will be debated more openly. I would expect that Apache would be very concerned if it were to be converted to supporting “embrace, extend, etc.”. I am an optimist and believe that the influence is just as likely to be in the opposite direction where OSS successes get fed back into the culture of Microsoft and change it.

The architecture of Chem4Word has always been designed to keep the chemistry component Open (in terms of Open Specifications, Open data and Open Source) and we have delievered that. It’s involved a great deal of non-Word work and product including:

1. A port of the Open Source Java library JUMBO5 to C# (dotNUMBO). This has no dependence on Word and has resulted in some novel ideas in functional/stateless programming. The intention was and still is to try to keep JUMBO and .NUMBO in sync. This might be done by automatic translations, library wrappers or by blood and sweat. The .NUMBO enhancements will be fed directly into the next version of JUMBO6.

2. Development of a sub-schema of Chemical Markup Language (CML) which is now used to validate input files. This is probably the strictest validation of chemical input anywhere (syntactic and semantic variation is a serious problem in chemistry leading to a great deal of loss and corruption). Again the Schema is Open.

3. Development of a new approach to building molecules.

4. Web services which link to Open resources such as Pubchem and our OPSIN system.

I’ll pause to reflect on portability. The C# is written against units tests and to – I think – an acceptable porting standard. Although C# is primarily used in .NET environments it can be compiled and run under Mono (though we haven’t done so). The graphics is written in WPF and XAML. I doubt this is really portable to Mono. But graphics portability has been the curse of my life for 35 years and the horror never changes. It won’t be trivial to convert WPF/XAML to Java but then it wouldn’t be trivial to do that same with, say, OpenGL. I think we have to concede that user interfaces have to be written in a bespoke manner. (Even the hope of standardisation through browsers is a mess).

Next I’ll tackle OOXML (http://en.wikipedia.org/wiki/Office_Open_XML ). (Apparently the I4I patent case didn’t invalidate what we have written – Word2010 is free of problems and C4W runs under it). I accept Glyn’s concerns about Microsoft’s approach to the standards process. It looks ugly. I don’t think it’s as bad as the Halloween document. OOXML is therefore an Open standard and this is what we use. I’ve now looked in some depth at the bits of OOXML relevant to C4W and written code to process the chemistry them. That code is open. We expect to do the reverse – create C4W documents from APIs. In passing I note that whatever is used for managing the document is necessarily complex. Documents are complex, typography adds a serious extra axis. OOXML seem to be a reasonable approach (at least I can’t think of a much simpler one). The chemistry and the figures/media are also part of the Open Packaging Conventions. We use CustomXML for managing the chemistry (as CML).

Up to this point it leaves us with open specifications (CML), data (in CML) and software which, though Microsoft-oriented, requires no proprietary tools if you are prepared to work hard enough. It may (though we haven’t tried it) be possible to create a graphic display in Mono.

That leaves interaction with Word itself. Personally I don’t like Word and until recently preferred to create my documents in ASCII or HTML. I don’t like WYSIWG editors (and regret the passing of Wordperfect, etc.). But it is what people use. If we don’t like it there are only the possibilities:

· Persuade the chemical community to use OpenOffice. I’ve used OO. I like it even less than Word. It feels to me like a slightly unfinished product. Peter Sefton integrated OO into my blogging post and I wrote posts that way but didn’t enjoy it (admittedly native WordPress was worse).

· Use LaTeX. But LaTeX is not semantic and never will be and it’s entirely unsuitable for chemistry. We need structured semantic documents.

· Wait till there is a sufficiently credible Open alternative that chemists will adopt. This will take decades at present progress

· Give up trying to promote semantic documents in chemistry.

Glyn highlights that TimBL will not touch Word (http://opendotdotdot.blogspot.com/2010/04/rms-and-tim-berners-lee-separated-at.html). “Yet Berners-Lee refuses, on principle, to use Word, which is a proprietary rather than an open source format. On one occasion, one official recalled, Berners-Lee received an urgent document in Word from one of the most senior civil servants—and refused to look at it until a junior official had rushed to translate it into an acceptable format.” I used to take the same view – but have changed to tolerate it. If I’m a class-traitor, sorry. My main concern is with unacceptable practices in chemistry:

· Manufacturers who ensure lockin to equipment through binary formats

· Bad binary chemical software using proprietary standards

· Secrecy in data and algorithms

· Companies which sue chemists for openly reporting bugs or publishing benchmarks

· Publishers who actively lobby against freedom of information

I thought about it and believe that the value of opening up chemistry is more important than taking an absolute stand against the use of Word. (I might have done this ten years ago). We have always developed the code such that it is possible to fork. This is the primary contract of Open Source. Glyn has appealed to the OS/OpenOffice community to create an OO version. I don’t think this can be done by a single person and certainly not someone who doesn’t understand OO. F/OSS gives the right to fork. It does not promise that it can be done with zero cost. It needs funding in cash or in kind (significant committed volunteer effort).

So if the F/OSS community wishes to see C4W in OO I am happy to give what little technical suggestions I can. (It will need a different name). I can’t speak for my colleagues but I would not be surprised if they agreed.

Posted in Uncategorized | 2 Comments

Open Knowledge Foundation OKCon 2010 – what I might say

Posted on April 24, 2010 by pm286

Open Knowledge Foundation OKCon 2010 – what I might say

I’ve been asked to give a 10-min talk at OKCon 2010 today in London (http://www.okfn.org/okcon/) in the State of the nation opening section:

STATE OF THE NATION

Chair: Becky Hogge

· A Year in Review, Rufus Pollock, http://www.okfn.org

· Local Government Data, Chris Taggart, http://openlylocal.com

· Bibliographic Data and the Public Domain, Matthias Schindler, Wikimedia DE

· Open Science, Peter Murray-Rust, University of Cambridge

· Linked Open Data, Sören Auer, Universität Leipzig

· Open Licensing for Data, Jordan Hatcher, Open Data Commons

· The Post-Analogue World, Glyn Moody, opendotdot

· Open Philanthropy – in search of change agents (short talk/announcement), Helen Turvey, Shuttleworth Foundation

Since I don’t use PowerPoint on principle, mainly because it’s not (yet) semantic but also because Tufte kills kittens , I try to blog about what I hope to say. However the OKF will be recording this so I hope there will be a permanent record. Because I try to complement what other speakers say I don’t know in detail what precise topics I need to concentrate on.

At present I want to concentrate on making science truly Open. I’ll pay tribute all too briefly to pioneers of Open Experiments (Cameron Neylon, Jean-Claude Bradley) but I’ll primarily emphasize what we can do to make science Open after or at the point for formal publication. So points I may cover:

· Science is not Open. Some of it is publicly visible (free beer) but that’s not good enough for Linked Open Data. Here’s a topical example

Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change

Published for the Intergovernmental Panel on Climate Change

This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of the Intergovernmental Panel on Climate Change.

So we need written permission if we want the data needed to save the planet. The problem is that this mindset is endemic. Ask any Research Librarian whether you can reproduce some data and the universal answer is NO. You need permission. As Cambridge sinks under the rising North Sea from the melting glaciers and millions of books are destroyed we’ll have the satisfaction of knowing that no Copyright was violated.

· Why we need Linked Open Data. Tim Berners-Lee has 4 rules of Linked Data (http://data.gov.uk/wiki/Linked_Data). They’re great. But they don’t trequire the Data to be Open. Not even accessible. So we need a fifth rule of Linked Open Data. The data must be Open, with an OKDefinition-compliant licence (http://www.opendefinition.org/okd/). Explicit.

· Panton Principles. Almost all scientists are ignorant of the issues. This is not their fault. Most of them want their data to be open, and they think it is. But they actually need to state that explicitly. So we came up with a simple set of principles and actions to make this possible and launched them at the Panton Arms (Cambridge) in Panton Street (a major activity of OKF activity). (http://pantonprinciples.org/). The principles are simple:

When publishing data make an explicit and robust statement of your wishes.

Use a recognized waiver or license that is appropriate for data.

If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.

Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.

· IsItOpen. If a scientist finds some data can they re-use it? If it isn’t licensed with an OKD-compliant licence they simply don’t know. If they ask an expert (lawyer, librarian) they will be told NO. Ritually. Because the sacred law of Copyright is supreme. It’s actually a law and if you break it you might go to jail. So people tend to be cautious. However many data providers want their data to be Open. IsItOpen (http://www.isitopendata.org/ ) is a service created by OKF volunteers (and inspired by sites such as WhatDoTheyKnow) which allows anyone to ask a data provider whether their data is open. It will also serve as a way of making the providers aware of the OKF and OKD. Hopefully this will spread virally. Please use it. We’d suggest starting with those who are likely to say YES.

· Librefication through Software. If we develop the right software we can virally spread the power of Openness. For example if every authoring tool highlighted the option to librefy data then many people would. If every repository promoted the Panton Principles rather than the god of restrictive copyright there would be openness.

There’s a lot more that I can and will say but after months of inaction I haven’t found fluency yet. I have a secret weapon which I’ll reveal in future posts.

Posted in Uncategorized | 1 Comment

Chem4Word is out as OpenSource

Posted on April 23, 2010 by pm286

This is the day I have been waiting for! Chem4word has been released as open source (http://chem4word.codeplex.com/). . It has been a tremendous effort by the team.

The current “Dr. Who” of Chem4word is Joe Townsend. Joe has worked tirelessly for two years to develop the chemical code . There are many other contributors and I’ll blog them later but just to highlight Alex Wade (Microsoft Research) who has been awarded a Blue Obelisk (http://www.blueobelisk.org) for his enormous contribution to OpenData , Open Source and Open Standards.

Microsoft? Open Source?

Yes.

I’ll explain later, but the world is changing. Chem4Word is Open. Apache. Anyone can download it and do what they like. The only real constraint is that if you republish modified code you have to acknowledge the original authors.

Open Source is a reality in chemistry. Jmol, CDK, OpenBabel and many more. OSCAR, OPSIN from our own group.

Open Source is about community, innovations and quality. Those concepts are changing our world.

Posted in Uncategorized | Leave a comment

Back in the blogosphere

Posted on April 23, 2010 by pm286

I have deliberately not blogged for many months, devoting my efforts to producing code for the various projects in our group.
I am now able to start again and I’ll tell you why in the next post. We have an awful lot to talk about.
And while I’m posting – there’s the OKF meeting in London tomorrow… http://www.okfn.org/okcon/ where I’ll be talking about our new Open Bibliography initiative.
More later

Posted in Uncategorized | Leave a comment

The Mind Wobbles at Science Online

Posted on August 22, 2009 by pm286

There’s hardly any need for me to blog the sessions at #solo09 because I’m sitting next to the author of “The Mind Wobbles” who’s typing at breakneck speed. There’s a full report at:

http://themindwobbles.wordpress.com/2009/08/22/blogging-for-impact-science-online-london-2009/

There’s also a lot of FriendFeed activity – I find this more useful than Twitter as it’s threaded (and there’s transfer between the two).

Now we are contemplating “What is a scientific paper?” Four moderators; 2 spoken , 2 still to go. So far nothing world-shattering or even mind-wobbling or mid-teetering. .

Now Theo Bloom from PloS is saying it’s time for a radical overview. It”ll all be on “The mind wobbles”! So I’ll reserve comments for later. But I think I’ll be going to suggest that the panel isn’t going fair enough.

25% of authors can’t find one/any of the images (e.g. gels) that ther “included” in the paper. That’s a strong case for the sort of work we are doing in data capture with CLARION.

Posted in Uncategorized | 1 Comment

Galaxy Zoo at Mendeley

Posted on August 22, 2009 by pm286

Am returning the the blogosphere after some considerable hiatuses (due to debugging of Chem4Word which has taken many of the 162h w-1. That’s starting to ease up). So I’m at the Royal Institution in the company of 170 bloggers – the meeting (#solo09) was oversubscribed and I’m grateful to being squeezed in. I may blog on some of the sessions, but the current one is off-limits – for good reasons.

We had a great party last night hosted by Mendeley – the new company which is gearing up to revolutionise how we use references. I hadn’t realised they were in London – Farringdon – we had a very pleasant rooftop gathering with about 50-70 others.

Twitpic of party

We had 4 sessions – 2 planned – 2 unconference – where we shared aspirations, history, problems, etc. Had a great time talking about Galaxy Zoo – the online community that annotates galaxies. There’s about a million galaxies that needed to be annotated – do they look like spirals or “green swirly things” or “processed peas”? I knew GZ was large, but I was surprised to find out there were 230,000 registered members. About 70,000 are active.

So why do they do it? Apparently there are motivated by contributing to science. They’ve had 12 papers published on GZ work. So what a splendid idea to translate to other endeavours. We had a session at Scifoo on crowdsourcing science and this is certainly something that I’ll be taking on board.

More later…

Posted in Uncategorized | Leave a comment

How to write good documentation quickly

OKCon 2010 thoughts

Can machines understand science?

Chem4Word and Blue Obelisk software

Egon Willighagen has been a pillar of ODOSOS chemistry and comments yesterday in Chem4Word goes Apache 2.0

Is Chem4Word Open Source? Yes (it can be forked)

Open Knowledge Foundation OKCon 2010 – what I might say

STATE OF THE NATION

Chem4Word is out as OpenSource

Back in the blogosphere

The Mind Wobbles at Science Online

Galaxy Zoo at Mendeley

Recent Posts

Recent Comments

Archives

Categories

Meta