I'm now finishing the second month of my Shuttleworth Fellowship - the most important thing in my whole career. My project The Content Mine aims to liberate all the facts in the scientific literature.
That's incredibly ambitious and I don't know in detail how it's going to happen - but I am confident it will.
This week we posted our website - and showed how we create content. What's modern is that this is a community website - we're inspired by Wikipedia and OpenStreetmap where volunteers can find their own area of interest and contribute. Since there is no other Open resource for content-mining we shall provide that - we have 100 pages and intend to go beyond 1000. Obviously you can help with that. And of course Wikipedia's information is invaluable.
We have an incredible team:
- Michelle Brook . Michelle is Manager and making a massive impression with her work on Open Access.
- Jenny Molloy. Jenny has co-authored the foundations of Open Content Mining and ran the first workshop last year.
- Ross Mounce. Ross has championed Open Content Mining in Brussels and is developing software for mining phylogenetics.
- Mark MacGillivray. Co-authored Open Bibliography and founded CottageLabs who are supporting our web presence and IT infrastructure.
- Richard Smith-Unna. Founder of the volunteer scientist-developer community solvers.io to which he is pitching ContentMine to support Crawling.
But we have also masses of informal links and collaborations. Because we are Open, people want to find out what we are doing and offer help. It's possible that much of our requirements for crawling may be provided by the community - and that's happening over the last week. We've had an important contribution to our approach to Optical Character Recognition. Today I was skyped with suggestions about Chemistry in the ContentMine.
This all happens because of the Digital Enlightenment. People round the world are seeing the possibilities of zero-cost software, efficient voluntary Open communities and the value of liberated Knowledge. There's many projects wanting to liberate bibliography, reform authoring, re-use bioscience, etc. Occasionally we wake up and think "wow! problem solved!". If you think "we", not "me", the world changes.
The Fellows and Foundation are fantastic. I have an hour Skype every week with Karien, and another hour with the whole Fellowship. These are incredibly valuable. With such a huge ambition we need focus.
There's huge synergy with several formal and many informal projects. Once you decide that your software and output is Open, you can move several times faster. No tedious agreements to sign. No worries about secrecy, so no delays in making knowledge open. Of the formal projects :
- Andy Howlett is doing the 3rd year of his PhD in the Unilever Centre here on metabolism. He can use the 10 years' worth of Open Source we have developed and because his contributions are also Open we'll benefit in return.
- Mark Williamson is using our software in similar fashion.
- Ross Mounce and Matt Wills at Bath are running the PLUTo project. Because it's completely Open they can use our software and we can re-use their results.
- we are starting work with Chris Steinbeck at EBI on automated extraction of metabolites and phytochemistry from the literature.
Informally we are working with Volker Sorge (Birmingham) and Noureddin Sadawi (Brunel) on scientific computer vision and re-use of information for Blind and Visually Impaired people. With Egon Willighagen and John May on the (Open) Chemistry Development Kit. With the Crystallography Open Database...
How can it possibly work?
In the same way that Steve Coast "single-handedly" and with zero-cash built up OpenStreetmap.
- promoting the concept. We are already well known in the community and people are watching and starting to participate.
- by building horizontal scalability. By dividing the problem into separate journals, we can build per-journal solutions. By identifying independent disciplines (chemistry, species, phylogenetics...) we can develop independently.
- an Open modular software and information architecture. We build libraries and tools, not applications. So it's easy to reconfigure. If people want a commandline approach we can offer that.
- By re-using what's already Open. We need a chemical database? don't build it ourselves - work with EBI and Pubchem. An Open bibliography? work with Europe PubMedCentral.
- by attracting and honouring volunteers. RichardSU has discovered the key point is to offer evening-sized problems. Developers don't want to tackle a complex infrastructure - they want something where the task is clear and they can complete before they go to bed. And we have to make sure that they are promoted as first-class citizens.
Much of what we do will depend on what happens every week. A month ago I hadn't planned for solvers.io; or Longan Java OCR; or Peer Library; or JournalToCs; or BoofCV; or ...
PS: You might wonder what a 72-year-old is doing running a complex knowledge project. RichardSU asked that on hacker-news and I'm pleased that others value my response. If Neelie Kroes can change the world at 72, so can I - and so can YOU.
If you are retired you're exactly the sort of person who can make massive contributions to the Content Mine. And it's fun.