Figshare: how to publish your data to write your thesis quicker and better

I’m at the JISC repo fringe (#rfringe11) in Edinburgh (If you want to follow it the live blog is great: ). I was really excited to meet Mark Hahnel today – the creator of Figshare ( ). Mark is exceptional in that he has not only done research in Cell biology and just finished his thesis on stem cells, but also developed Figshare – a tool to publish his data to the open web.

Here’s the OKF blog and Mark’s account of Figshare : . Some extracts:

The following post is from Mark Hahnel, founder of the Science 3.0 network and member of the Open Knowledge Foundation’s Working Group on Open Data in Science.

Scientific publishing as it stands is an inefficient way to do science on a global scale. A lot of time and money is being wasted by groups around the world duplicating research that has already been carried out. FigShare allows you to share all of your data, negative results and unpublished figures. In doing this, other researchers will not duplicate the work, but instead may publish with your previously wasted figures, or offer collaboration opportunities and feedback on preprint figures:


What? Publishing your data. To everyone else? And spending valuable thesis time doing it? Wasting time on things that didn’t work? And giving your competitors an advantage?

In fact there is a very good self-centered reason for publishing your data as you do the experiment. It means you are always in control of your data. You won’t need to frantically hunt for the missing gel that your thought had only one spot on it but now you aren’t sure. The spectrum that clearly showed the methyl group was on an aromatic ring. The crystal structure that showed the metal was zinc not magnesium. If you publish your data, openly, at the first possible opportunity then you know it’s safe. It will have all the metadata it needs. It will be coupled to the bibliography.

And this means that writing your thesis will take less time to write and be higher quality. Because the discipline of publishing data will become second nature. You won’t forget to label the axes. You won’t wonder whether the distance was in Hartrees or Angstroms, the energy in Kcal or KJ. In fact you will be working towards continuous integration for your thesis.

What’s that? It’s a software concept. Every time you make a small increase in your program’s capability, you make sure it still works perfectly. Every time you collect a new piece of data you make sure that it’s in the right place, describing the right sample. It’s so easy to muddle things if you don’t record them properly at the time. And this saves time and worry.

So the benefit is primarily to YOU. We can expect Figshare and related sites to expand their capabilities. There will be a bibliographic spine – you’ll be able to look for figures that might look a “bit like yours”. Or, indeed if you are repeating a protocol, you’d hope they looked a lot like yours. And rather than simply trawling through the journal of indeterminate biology in the hope you’ll find relevant diagrams you’ll be able to search directly.

Figshare is so simple in concept it will succeed (that doesn’t mean it was easy to write!). And because it’s written by someone intimately concerned with the research, and in tune with writing theses it fits perfectly. Mark’s talking tomorrow and I’m looking forward to it – I hope the live blog catches it.

