berlin5 : Open Access to Research Data: surmountable challenges

This is the abstract I have submiitted for the Berlin-5 meeting : “Berlin 5 Open Access: From Practice to Impact: Consequences of Knowledge Dissemination”
Open Access to Research Data: surmountable challenges
Many scientists and organisations have recognised the power and importance of “Data-driven Science” where existing data is a primary resource in scientific research. In some communities (astronomy, particle physics, and some biosciences) this type of work flourishes and the primary challenges are technical – size, complexity, metadata, automation, etc. In many fields however, and almost all multidisciplinary endeavours the major obstacle is finding scattered, heterogeneous data. Many of the data first occur in scholarly publications and, while they can be interpreted and understood in low volume by humans, are poorly presented for re-use by machines. As an example, over 1 million new chemical compounds are published yearly, but are scattered through hundreds or thousands of journals.
In principle this could be solved by robotic indexing and the use of search engines. In chemistry, for example, we have developed text-mining techniques which can recognise as chemicals over 80% of terms in mainstream publications, and identify a similar percentage. Our tools could rapidly index the scientific chemical web and add significant semantic value.
The biggest problem, however, is that many publishers forbid or obstruct this activity. Most chemistry journals are closed and thereby immediately inaccessible to many. Even for subscribers there are usually lengthy licences which are fuzzy and difficult even for experts to interpret. There is an imbedded fear of offending publishers’ conditions either because of breaking copyright (even unintentionally) or being cut off by the publishers machinery (anecdotally very common). Many publishers specifically forbid robotic indexing.
The problem is solved for any “Open Access” publisher that adopts the spirit of the BBB declarations. Taken logically BBB requires that all content can be indexed and downloaded without permission. Unfortunately many publishers use “Open Access” but decorate their web site with additional licence conditions which are logically and ethically incompatible.
The label “Open Access” is a weak tool when describing access to, and re-use of, data. I and others have promoted the term “Open Data” (http://en.wikipedia.org/wiki/Open_data and references therein) to describe the need to consider data as a critical resource which needs political and legal activity. The use of Creative/Science Commons licences is extremely valuable but will need refinement as the principles of Open Access and Open Source do not translate automatically to data.
I shall give demonstrations of Open Data resources and outline some of the issues that the scholarly community must address rapidly if we are not to be impoverished by the “land grabbers” in the digital dataverse. We need a radical rethink of conventional information protection and need to be braver and more outspoken.
[Note: This was written pre-PRISM. I am concerned that if PRISM has any traction it will impact on Open Data as well as Open Access and will blog this later.]

Posted in berlin5, open issues | 1 Comment

Emotion and logic and PRISM

I’ve taken a week off blogging to write code and woken up to find I have nearly missed PRISM. PRISM is a publishers’ alliance which appears to be solely devoted to protecting twentieth century business methods by whatever process is expedient. I’ve come to the sad position that, unless I breathe deeply, I take the default position that publishers are a problem to be overcome, not part of the way forward.
So I breathe deeply. I work with some wonderful people in the publishing industry. The list isn’t exhaustive and I hope they aren’t embarrassed:
  • Timo Hannay from Nature. An early sponsor of our work and champion of innovation
  • David James, Richard Kidd and Colin Batchelor from RSC (and Alan McNaught) who have supportour work for several years. Colin was here on Tuesday involved in developing methods for semantic chemistry
  • David Martinsen from ACS, who has consistently supported new ideas and run the spring ACS meeting on new ideas in publishing
  • Brian McMahon and Peter Strickland from IUCr who have also supported our work and built superb scientific semantics.
PMR: So it’s sad to see the other side – the industry reacting viscerally to threats. Here is Bill Hooker, reporting Peter Suber and adding comments:

From Peter Suber:

The AAP/PSP has launched PRISM (Partnership for Research Integrity in Science & Medicine). I’m quoting today’s press release in its entirety so that I can respond to it at length:

A new initiative was announced today to bring together like minded scholarly societies, publishers, researchers and other professionals in an effort to safeguard the scientific and medical peer-review process and educate the public about the risks of proposed government interference with the scholarly communication process.[much egregious lying]
Anyone who wishes to sign on to the PRISM Principles may do so on the site.

Bill: Fortunately for us all, Peter has already responded; I won’t excerpt his point-by-point rebuttal here, you should go read it all.This is disgusting. This runs counter to everything that science, academia, scholarship (and scholarly publishing!) stand for.
There are no names on the PRISM site yet — but I’m going to find as many as I can and publish them here. Sunlight is the best disinfectant, and I want to know just who is taking part in this revolting effort to steal from the commons and turn public goods into private profit.
(We can start with the AAP: their members page is essentially one long list of companies and organizations with whom I will assiduously avoid doing business until and unless they dissociate themselves from PRISM, and preferably from the AAP altogether.)
More later. Oh yes indeedy.

PMR: The arguments from the PRISM community are not new – primarily that OA destroys peer review and therefore science/scholarship. This is, of course, completely fallacious. If you wish to see a clinical dismissal of the publishers’ position read PeterS. Otherwise imbibe the raw emotion of Bill.
PMR: To amplify (again reported by PeterS):

Tom Wilson, Publisher panic, Information Research Weblog, August 24, 2007.

The commercial journal publishers are really in a state of panic. Reports from various sources point to their launch of PRISM: The Partnership for Research Integrity in Science & Medicine, a lobby organization to help them try to persuade the US Congress (and presumably Parliament in the UK) to ban Open Access. Of course, they don’t say that: we have the usual weasel-worded statement that lobby organizations in the USA seem to be adept at….
[On the alleged threat to peer review] they are simply lying, and they know it. Free OA, scholarly journals operate the same peer review process as do commercial journals: if they didn’t scholars wouldn’t publish in them, but free, collaboratively supported journals are growing in number and take away submissions from the commercial journals, which will find it harder and harder to maintain quality….
What this recent initiative by the publishers points to is that the only sure way for the scholarly communities to take charge of the scholarly communication process is to rid themselves of their commercial exploiters and promote the publication of free, collaboratively produced and subsidised journals….

PMR: What disappoints me is that few of the conventional publishers have taken a positive view about the future. The future is EXCITING. The publishers are obstructing us getting there. Even the more forward-looking ones.
Part of the problem is that publishing is a cross between a public service and a commercial business. It hasn’t worked out where it stands and where it should stand. It is becoming increasingly clear that if it takes the business route it is will go down the video media route typified by the appalling FACT [1] adverts on DVDs. (These are the ubiquitous adverts telling you what will happen if you copy the DVD you have bought or rented. It really sets the scene for an evening’s watching. Perhaps we should have:
“You wouldn’t steal a car?”
“You wouldn’t steal a TV?”
“If you read a scientific paper you are not entitled to this is THEFT!!!!”
And it should be mandatory to have to read this declaration for 30 seconds before you are allowed to read the paper.
After all I am not just a scientific reader of a paper, I am a potential thief. And I should be told what dire fate awaits me if I dare to read scientific research I haven’t paid for. I shall have more replies from publishers to publish shortly.
Meanwhile back to Java.
[1] ADDED LATER. FACT is the Federation against Copyright Theft – at least in the UK.  Every time you watch a movie – at home or in the cinema – you are treated to an obligatory series of advertisements about the immorality, illegality and cost-ineffectiveness of pirate videos and movies. For many people it spoils the experience of the work. That’s increasingly how I feel about the conventional publishing industry. Having my work described as “junk science” when it is published in Open Access journals is simply an illiterate insult. Having Open Access described as “ethically flawed” is as bad.
Publishers should be enhancing the process and quality of scholarly communication. A  publication should be something in which all can take some pride, not simply a piece of commerce to be defended.
Posted in open issues | 1 Comment

What do we mean by open science?

There seems to be a critical mass of activity in the Open Science camp – possibly sparked off (or at least given amplification) by scifoo. Here is a very useful summary from Bill Hooker (Timo, invite Bill to scifoo next year). Bill missed the Second life event (so did I and I’m disappointed, but I really had other things to do)…

(Addressed in absentia to “Tools for Open Science”, Second Life, Aug 20 2007. I am sorry I could not be there.)I think we all know what we want, and I think we all want much the same thing, which boils down to just this: cooperation. A way forward for science, a way out of the spiralling inefficiency of patent thickets, secret experiments and dog-eat-dog competition. But we use a variety of terms, and probably mean slightly different things even when we use the same terms. It might — I am not sure — be useful at this point to come together on an agreed definition for an agreed term or set of terms — something equivalent to the Berlin/Bethesda/Budapest Open Access Declarations.
If this does not seem like a “tool for open science”, consider what the BBB definition has done for Open Access. It provides cohesion, a point of reference and a standard introduction for newcomers, and acts as a nucleation center for an effective movement with clear and agreed goals. Since this SL session takes off from SciFoo, and SciFoo is by all accounts very good at converting brainstorming sessions into practical outcomes, I thought perhaps the idea of a definition or declaration of Open Science might be a suitable topic. In what I hope is the spirit of SciFoo, here are some ideas that might be useful in such a discussion.
Terms
Whatever this thing is, what should we call it? There are a number of terms in use:
  • Open Science — has the weight of Creative Commons/Science Commons behind it, via iCommons
  • Open Source Science — Jamais Cascio, Chemists Without Borders
  • Open Source Biology — Molecular Biosciences Institute
  • I think “biology” too narrow — there seems little point in Open Chemistry, Open Microbiology, Open Foo all having different names. I think Open Source Foo too likely to lead to confusion with software initiatives, and too likely to lead to pointless arguments about what the “source code” is.
  • That leaves Open Science, which would be my choice for an umbrella term. A case can be made, though, for Open Research, on the same basis on which I argue against Open Biology etc — see this comment from Matthias Röder
  • Another “inclusive” possibility is to focus on information — Open Data, as per PMR’s wikipedia entry, or the broader Open Content. In the same vein, the Open Knowledge Foundation provides a fairly comprehensive definition of Open Knowledge.
  • I have seen “Science 2.0” around quite a bit lately, though it’s a bit too marketing-speak for my taste
  • Open Notebook Science is a very specific subset of Open Science: if your notebook is open to the world, there’s not much confusion about access barriers! It even comes with its own motto: “no insider information”. This is as Open as Open gets.

Sources and Models
We don’t have to re-invent the wheel:


Flexibility
We don’t want to start a cult, and we don’t want to bog anyone down in semantics. There’s no purity test or loyalty oath. My own view is that Open Science (or whatever we end up calling it) is not an ideology but an hypothesis: that openly shared, collaborative research models will prove more productive than the highly competitive “standard model” under which we now operate.
Openness in scientific research covers a range of practices, from tentative explorations with a single small side-project all the way to Open Notebook Science á la Jean-Claude, and we should welcome every step away from the current hypercompetitive model. Open Notebook Science provides a useful marker for the Open end of the spectrum; perhaps all a Declaration need do is identify the minimum requirements that mark the other end of the spectrum?

Conditions
What standards must a research project or programme meet in order to be considered Open?

  • obvious: Open Access publication
  • equally crucial: Open Data, that is, raw data as freely available (including machine access) as OA text
  • probably indispensable: Open Licensing so as to avoid confusion as to what is truly available and for what purposes; as per Peters Suber and Murray-Rust, this must be
    • explicit
    • conspicuous
    • machine-readable
  • Open Semantics: perhaps none of this will be much good without metadata and standards to allow interoperability and free flow of information
  • desirable: Free/Open Source Software
  • David Wiley: “four Rs” of Open Content (cf. Stallman’s four fundamental freedoms for software):
    • Reuse – Use the work verbatim, just exactly as you found it
    • Rework – Alter or transform the work so that it better meets your needs
    • Remix – Combine the (verbatim or altered) work with other works to better meet your needs
    • Redistribute – Share the verbatim work, the reworked work, or the remixed work with others
  • OKF definition of Open Knowledge

PMR: This is really useful. I can’t think of significant alterations. No-one is suggesting that science is altruistic – it can be hard and cruel as well as beautiful. And science doesn’t care who wins, but knows that the more who play by the rules the greater the progress and enlightenment.
Open availability of tools, methods, specimens, results, recipes, codes, data, etc. MUST enhance science. Not providing them simply impoverishes the field and provides personal gain at the expense of the rest. Scientists are people and they want to succeed personally.
I am very fortunate that the scientists I have known and who have acted as my mentors have been fantastic people. They have nurtured younger scientists, built a sense of community, fostered international science, cared about the human race. That is not a necessary part of science, but it is sufficiently common that it is worth striving for even if, occasionally, it leads to a non-optimal decision in the prisoner’s dilemma.

Posted in scifoo | 1 Comment

scifoo: Cameron Neylon on Open Notebook Science

More on Open Science from Jean-Claude Bradley. It’s sad to see how paper-driven we have become. It’s critical to publish, but I continually sense there is an increasing pressure of “I need a paper – what’s the most cost-effective way of getting one”? This is Jean Claude on Cameron Neylon

22:22 23/08/2007, Jean-Claude Bradley, Useful Chemistry
There has been a lot of discussion lately about the philosophy of Open Science in general terms.
This is certainly worthwhile but I think it is even more interesting to discuss the mechanics of its implementation. That is what I was trying to push a little more by setting up the “Tools of Open Science” session on SciFoo Lives On.
That’s why I’ve been very impressed by Cameron Neylon’s recent posts in his blog “Science in the Open“.
He has been discussing details of the brand of Open Science that interests me most: Open Notebook Science, where a researcher’s laboratory notebook is completely public.
Cameron has been looking at how our UsefulChem experiments could be mapped onto his system and this has sparked off some interesting discussion. I am becoming more convinced than ever that the differences between how scientific fields and individual researchers operate are much deeper than we usually assume.
By focussing almost entirely on the sausage (traditional articles), we tend to forget just how bloody it actually is to make it and we probably assume that everybody makes their sausage the same way.
The basic paradigm of generating a hypothesis then attempting to prove it false is certainly a cornerstone of the scientific process but it is certainly not the whole story. However, after reading a lot of papers and proposals, one gets the impression that science is done as an orderly repetition of that process.
What I have observed in my own career, after working and collaborating with several chemists, most of the experiments we do are done for the purpose of writing papers! The reasoning is that if it is not published in a journal, it never happened. This often leads to the syndrome of sunk costs, similar to a gambler throwing good money after bad, trying to win back his initial loss.
After a usually brief discovery phase, the logical scientist will try to conceive of the fewest number of experiments (preferably of lowest cost and difficulty) to obtain a paper. In this system, like in a courtroom, an unambiguous story and conclusion is the prefered outcome. Reality rarely cooperates that easily and that is why the selection of experiments to perform is truly an artform.
We’re currently going through that process. We have an interesting result observed for a few compounds and a working hypothesis. That’s not enough for a paper in my field. We cannot prove the hypothesis without doing an infinite number of experiments but we are expected to make a decent attempt at trying to falsify it. I know from experience roughly the number of experiments we need with clear cut outcomes to write a traditional paper.
So how much more value to the scientific community is that paper relative to the single experiment where this effect was first disclosed on our wiki then summarized on our blog?
Is this really the most efficient system for doing science or is this the tail wagging the dog?
When the scientific process becomes more automated, I predict that the single experiments will be of more value than standard articles created for human consumption and career validation.
[…]
One of the most useful outcomes of Open Notebook Science (and why I’m highlighting Cameron’s work) might be the insight it will bring to the science of how science actually gets done. (Researchers like Heather Piwowar should appreciate that)

This is where it starts – the passion, the innovation and publicity of people who want to change the current complacency. The exciting thing is that the Internet makes that possible. Within months.

Posted in open issues, scifoo | 2 Comments

scifoo: the mindless impact factory

More scifoo follow up from Richard Akerman. No comment from me needed. I’m leaving the second life picture because …
open science and the impact factory


Jean-Claude Bradley instigated a session in Second Life – SciFoo Lives On: Open Science.
[SF-SL-004]
Next week will be something like “Medicine 2.0”.
You can see in the transcript that one part of SciFoo that definitely lived on was a discussion around Open Science and webliometrics, both definitions and how to handle impact. It seems to me that we get tangled in endless debates about definitions. I have proposed that the nodalpoint Open Science wiki page be used to come to a consensus definition, but in the meantime:
open science
opening your scientific activities up to public examination, making work available without it having gone through formal peer review
peer review
The process of a group of scientific peers assessing the quality of a submitted piece of scientific work, currently most commonly associated with gatekeeping into a scientific publication, wherein it may also involve aspects of improving both the scientific thinking used in the paper and the expression thereof. There is no relationship between peer review and closed or open access.
open access
making a publication available without subscription fee, but possibly with usage limitations
free access
unfortunate term due to existing definition of open access, adding element of unrestricted usage and reuse (e.g. text mining)
impact factor
An imperfect measure of the scientific “importance” of an entire journal. Misused to measure the quality of individual scientific output

(Marked up using HTML definition lists, which you have probably never heard of, which incidentally is why the Semantic Web will fail.)
Yes, there are many types of peer review in different disciplines, and yes, things are often considered published and citable without having gone through peer review, such as conference papers and presentations which often go through a sort of editorial board selection instead.
I know these definitions are far from perfect, but good lord, can we get to good enough and go beyond this debate?
What I keep hearing is, how can we impact factorize open science. Well, the answer is, you can’t. Let’s stop trying to find some magic algorithm whereby a machine tells us what quality science is. What’s completely mad to me about this is that we already have processes to assess science quality. Every time you review a new student, every time you look at a grant proposal, heck, even on the infamous tenure committees and research assessments, a group of humans looks at a portfolio of existing or proposed work, and decides whether it is good enough.
So if I may modestly propose, let’s continue to do that, and no one other than journal publishers should ever look at impact factor numbers again. Arise, qualitative assessment, begone quantitive nonsense.
There is still a place for technology, but it’s not in providing some bogus seemly-quantitative quality measure. It’s in enabling us all to present our scientific portfolios online, or to use Euan’s words, our “professional lifestreams”. And there is a real problem to be solved. It starts with students and their scholarly output stuck in closed university systems. Students move around. Scientists move around. Their work history should move with them, not be lost in some scholarly dark web, or frozen as some web page at their previous institution that they no longer can access.
The European e-Portfolio is one effort to address this for students.
Electronic Theses and Dissertations is another piece.
The next step is to have those integrate into some, shall we say, flow or… flux (sekrit inside Nature joke) of the rest of their scholarly activity when they graduate. Bookmarks created, databases curated, papers reviewed, etc. etc.
That’s the technology piece.
The other piece, however, cannot be solved with technology.
Find better ways for humans to review scholarly portfolios and make decisions based on them. That’s going to address this problem of evaluation far better than anything else.
SIDEBAR
And of course you can do some side bits with technology of course once you have all this info circulating around, like ranking relevance to help people find the best, most relevant work in the flood of science that is sloshing around. Usage factor, other metrics, these may all help in recommending things to read.
END SIDEBAR
References
Richard Monastersky, “The Number That’s Devouring Science“, Chronicle of Higher Education, Volume 52, Issue 8, Page A12 (2005)
The PLoS Medicine Editors, “The Impact Factor Game”, PLoS Med 3(6): e291 doi:10.1371/journal.pmed.0030291 (2006)
Peter A. Lawrence, “The Mismeasurement of Science”. Current Biology, 17 (15), r583. doi:10.1016/j.cub.2007.06.014 (2007)
Bruno Granier, “Impact of research assessment on scientific publication in Earth Sciences” (PDF), a presentation at ICSTI June 2007 Public Conference on Assessing the quality and impact of research: practices and initiatives in scholarly information
Richard Akerman, “Web tools for peer reviewers…and everyone” (PDF), a presentation at ICSTI June 2007 Public Conference on Assessing the quality and impact of research: practices and initiatives in scholarly information
Corie Lok, “Scifoo: day 1; open science” (2007)
Alex Palazzo, “Scifoo – Day 2 – Science Communication” (2007)
Alex Palazzo, “Scifoo – Day 3 (well that was yesterday, but I just didn’t have the time …)” (2007)
Previously:
June 2007 Science Library Pad: ICSTI 2007 category

Posted in open issues, Uncategorized | 4 Comments

berlin5: Berlin 5 Open Access

I am delighted to have been asked to present on the topic of Open Data at “Berlin -5”.

The University of Padua, the CRUI (Council of Rectors of Italian Universities) and the Max Planck Gesellschaft are pleased to announce that the fifth conference in the “Berlin Declaration” tradition will take place in September 19-21, 2007 in Padua, Italy, with the title “Berlin 5 Open Access: From Practice to Impact: Consequences of Knowledge Dissemination”.The aim of the conference will be to bring together the various initiatives and key players within the Open Access movement in order to:
— maintain the enthusiasm of all people involved in the Open Access field,
— have an overview of the developing tools that sustain Open Access in scientific data and cultural heritage dissemination,
— develop the effective strategies that can contribute to the construction and implementation of this new paradigm of the scholarly communication world.
Further details are available on the conference website.
Program
The general subjects of the conference will focus on:
a) state-of-the-art of the sharing of the Berlin Declaration vision: survey on the impact of the new paradigm in the institutions that signed the declaration; supporting bodies policies and activities in favour of innovative scholarly communication processes;
b) the Open Access scene in the developing countries and emerging economies: strategies, achievements, impact;
c) Open Access and the e-science: how to support the free circulation of scientific raw data to facilitate cooperation and effective reuse;
d) e-publishing: the emerging of new strategies in scientific data dissemination; estimate of the impact in OA journals: new tools for scholarly evaluation in the growing layer of Open Access publications; the perspective of a changing landscape in the scientific journals policies; progress reports on the transition from reader-pays to author-pays models;
e) ICT developments and collaborations that support e-publishing and Open Access.
Further details are available on the conference website.

It’s very useful to tag pre-conference posts so that attenders can get an idea of the issues. This works very well with ICT-conferences – zillions of posts on www2007, scifoo, etc. So I’m tagging this with berlin5 and suggest that anyone interested do the same. I will probably manage a few posts before the meeting and hope to report some back on this blog.

Posted in berlin5, open issues | Leave a comment

Special issue of CTWatch on the coming revolution in scholarly communication

I have been busy with grants and hacking so have been away from the blog. (Making good progress on new ways of inputting and displaying chemistry). Here is a very important set of papers which are all highly relevant to this blog. I had hoped to find time to comment on each individually, but for now here is the table of contents.

18:45 18/08/2007, Peter Suber, Open Access News

The August issue of Cyberinfrasctructure Technology Watch (CTWatch) is devoted to The Coming Revolution in Scholarly Communication and Cyberinfractructure. All the articles are OA-related:

I will return to these as and when.
Also my mind is starting to turn to the Berlin-5 meeting  … next post

Posted in open issues | 1 Comment

scifoo: One chemical per one laptop?

On the Open Knowledge Foundation blog I noticed a call for projects related to One Laptop Per Child (which we saw at scifoo). I’m wondering what we could do in chemistry – there is so much around and so much that would be fun to do…


Tomorrow is the first day of the Northern Summer of Content 2007. The Summer of Content is an initiative of WikiEducator and the One Laptop Per Child project. Inspired by Google’s Summer of Code, the programme aims to match creators with mentors and stipends to “develop open content and run free culture events throughout the world”. The Northern pilot will run until the end of September and a Southern version will run from December 2007 to February 2008.The organisers place an emphasis on community in content production, and aim to create what they call “a self-supporting networked ecosystem of projects”. They aim to educate participants about open licensing, meta-data and accessibility, as well as providing support for technical aspects of creating content. A list of proposed projects can be found here.

PMR: Here are the current ideas:

Articles in category “Summer of Content proposals”

There are 33 articles in this category.

C

D

F

G

H

J

L

M

O

O cont.

P

S

T

W

PMR: I haven’t read these but what could we do in chemistry to create content? Wikipedia has fantastically good chemistry (even though most “academic” chemists aren’t interested and don’t contribute). It would be easy to do it on a one-compound-per-laptop – each volunteer gets one compound to find out as much as they can. Or, perhaps, one product. Many products have a list of ingredients – I have a mineral water bottle that has Calcium, Magnesium, Potassium, Sodium, Bicarbonate, chloride, fluoride, nitrate, sulphate (sic). Since most kids can’t do real experiments in the classroom any more, here’s a list of real chemicals doing real things. And there are lots of Wikis and blogs in the chemical blogosphere that might be interested.
But I am sure that others have more exciting ideas than this.
Posted in chemistry, open issues | Leave a comment

When does open science work?

It’s funny how things turn out in the blogosphere. I’d posted about how ludicrous copyright on dead scientists’ work (Copyright madness – story 2) was and expected some comment from the librarian community. Silence (there’s still time to comment!). I got a brief exclamation of horror from BlackKnight and to check that this wasn’t spam visited his blog. and I saw the Green Fluorescent Wow! My comments about this example of immediate Open Notebook Science has turned into a thread on when and whether to publish results on blogs. Here’s Black Knight:

Whee. I checked Technorati this evening, as you do (seeing as the bastard spammers have destroyed the usefulness of trackbacks), and discovered that yesterday’s post was spreading ripples in the blogospheric pond. It came first to the attention of Peter Murray-Rust, who has a thing (a good thing! — I hasten to add) about open access and open science in general, and thence to the open science community itself, in the shape of Cameron Neylon.
Funnily enough, Neil Saunders then picked it up from the OpenWetWare people, and I do some digging and find that not only did Neil do a DPhil but that he is now in Bostjan Kobe‘s lab. Bostjan is a long-time collaborator of my previous boss and from meeting him at Lorne he seems to have quite forgiven me for not going to work for him when I had the chance (or forgotten about it).
Brisbane, bleh.
So, anyway, it turns out that I had previously made contact with OpenWetWare, and talked about them over a year ago. Which just goes to show that (a) it’s a small world and (b) incest is more fun than you’d think.
But all that is not really what I wanted to write about now. The OpenWetWare (have you any idea how difficult it is to type that?) project is a laudable effort to promote collaboration within the life sciences. And this is cool, but then I realize that the devil is in the details.
Share my methods? Yeah! Put in some technical detail? Yea–hang on.
For sure, the ‘Green Fluorescent Wow!’ experiment (HT to Peter) was pretty simple and straightforward: An easy cloning experiment with a slight cleverness in choice of reagents, no IP and nothing particularly smart. But I’ve got other experiments underway that are clever, and potentially very exciting.
So can I write on my weblog about them? And how much detail can I give? If I say “My protein seems to do something odd to cell-motility”, is that an elegant sufficiency of detail? Surely people will get bored with generalizations, but am I right to worry, as one of our PIs does, that I might compromise my project by posting too much detail? Should I really be posting pictures of cells that are doing odd things?
It’s not a case of “Can I trust you bastards not to steal my work?” but balancing the ideal of ‘open source science’ with the need to publish before anyone else. I have responsibilities — to the boss and to my cow-orkers —, but I also want to share the fun and joy and heartache of this vocation.
So it’s all a little bit confusing, really. I want to bounce when experiments work, and scream and shout when I have a ‘little technical difficulty’; but how much can I say without compromising stuff? Seriously, I have lots I want to write about, but am not sure whether I should.

Comments (Cameron Neylon)

I think this is the real key to the whole thing. Will it compromise your work/career/future happiness? If everyone shared and was honest then it should work. What we need is some game theory/evolutionary biology person to tell us how many people it takes before we can support freeloaders.
But I agree it takes a shift in the way science works for people, especially postdocs, to be ready to risk making their knowledge base available. It would be absolutely key for people to get credit, and citations in some form, for making protocols/data available.
  • I don’t think anyone is suggesting that all science everywhere be automatically Open (e.g. there is – as yet – no Richard Stallman of Open Notebook Science. At at obvious level we have industrially sponsored projects in our group and we are required (and quite happy to be required) to check all new discoveries with the sponsors.
  • It’s very domain-dependent. In some areas such as maths and physics the idea can be the whole thing. A few seconds – such as the Watson-Crick DNA model or the Franklin data can communicate the whole message in a few seconds. But most science is made up of tedious work, much of which doesn’t.
  • It’s often thought that the “idea” is the most important thing. And sometimes it is. But most ideas don’t work out . Who has not had the grey-haired community (I’m one) telling them that it won’t work? Are they are sometimes right. So in many cases the credit goes to someone who has stuck doggedly at making a half-baked idea work.
  • There are, however, many cases where “your idea” has already been tried elsewhere and failed, and the reasons for its failure are documented and possibly understood. An awful lot of this happens at bars in conferences. “We’re trying to see how protein X might interact with Y” (not giving too much away). “Have you made Y?”. “No, we’re trying A’s method.” “Oh, we tried that for 2 years and we couldn’t get it to work; and A also published Z which didn’t work either. So we distrust stuff from that lab”.  If this is not disinformation then it’s very valuable and could save wasted years of work. Remember that the unproductive work has to be put on the balance sheet as well.
  • So – as Cameron says – it’s a game. You estimate the value of releasing your idea to the value of not releasing it. Either could be positive or negative.
But there are other pluses to publishing work on the Web:
  • You make a reputation. I’m not hiring green fluorescent rats but if I were I would recognise the applicant.
  • Your contacts may be genuine collaborators, not competitors. So, for example, if I were interested in a multidisciplinary programme on malaria including medicinal chemistry I would certainly wish to make contact with Jean-Claude Bradley. Indeed if I were interested in an open programme of screening compounds (as is the NIH) I would also make contact.

So “I’ve got something novel – I’m looking for collaborators” will become more common in the electronic era. Some of this will be public – others (as we heard at scifoo) will be mediated though brokers in private. Perhaps we’ll see embargo periods – publish the science into a closed arena for a few months with a requirement that it then becomes public. Ideas and novel science are too valuable to be allowed to decay.

Posted in open issues | 2 Comments

Green Fluorescent Wow!

A blogger (Black Knight) left a comment on my blog yesterday and in an idle moment I went to see what sort of things they were interested in and found:
Velvet Green

This has got to be in the running for the coolest cloning experiment ever.Last Tuesday a grad student in the reciprocal space cadet lab, let’s call him Fu Manchu, asked me if I had any GFP. ‘GFP’ expands to ‘green fluorescent protein’, which is a protein that is green, and, um, fluorescent. It’s naturally found in a certain jellyfish and glows green when you shine a certain wavelength of blue light at it. This is really useful for studying where proteins go inside cells or whole animals, because you can join the DNA that codes for GFP to the bit of DNA that codes for the protein you’re interested in, and put that into your experimental model. Much like I did here, there and elsewhere, in fact.
But Fu Manchu didn’t want to localize a protein in a cell (because as I hinted above, he’s a scatter-brain and wouldn’t recognize a whole cell if it mitosed); he wanted the protein itself to use as a crowding agent in some unspeakable experiment. I had to tell him that no, it wasn’t the sort of thing I kept in my fridge, but I did have the DNA that codes for GFP in a plasmid vector and he was quite welcome to take it out and make the protein in a system of his choice. It would take a few days to design and make the primers and do the PCR but it would be simple enough.
I went to have a cup of tea and realized that actually, GFP in a protein expression vector, that is, in plasmid that we use to make vast quantities of purified protein rather than in a different sort of plasmid that we use to make relatively small amounts of protein in situ, might be a useful reagent and I was a damned fool for leaving a very similar reagent in Cambridge and not getting it shipped over with my other useful bits and bobs (and I couldn’t get it shipped in the time frame that Fu Manchu wanted it). So I trooped back to the computer and had a look at restriction sites, and realized that if I was only semi-clever I could cut the GFP gene out of the one plasmid and into the other sort.
So on Wednesday I set up a couple of enzyme digests, purified the appropriate bits of linearized DNA and set up the reaction to stick that gene for GFP into this expression vector, where it would, if I did nothing else, make GFP and GFP alone (and left a couple of restriction sites so that I could, at a future date, drop in the gene for some other protein after the gene for GFP and make green some-other-protein).
To retrieve the construct made in this way I had to transform some bacteria — in other words, persuade them to take up this new construct and propagate it. The bacteria can be persuaded to do this because the vector you make has a gene for resistance to some antibiotic on it, and you select only the bugs that contain the plasmid you want by growing the bugs on plates that contain that antibiotic. Thursday morning, then, I hoped to see colonies on my plates; each colony having grown from a single antibiotic-resistant bacterium.
That’s the theory: in practice you always get ‘background’; bacteria that grow because they have taken up some plasmid that doesn’t have the gene you really want, or one of a myriad other excuses. And you then have to make DNA from several of these colonies and cut them up in special ways and all sorts of tedious stuff. Knowing this, I chose the bacteria that I would transform to be the sort that cannot help but make protein, even when you don’t want them to. And I reasoned that the bacteria that had the right plasmid, that is with the GFP in it, would be green.
On Thursday morning then, I took my plates upstairs — which indeed had colonies — and asked to borrow Tiffany Case’s fluorescent microscope. I told her what the plan was, and we looked at the plates together, and this is what the butler saw:
All the little colonies
That’s two photographs of the plate. On the left, normal light, and you can see all the colonies. On the right, we’ve illuminated with blue light and the only colonies you can see are the ones that are making GFP, and are therefore glowing green. Ignore the black pen marks — they’re just from when I counted colonies. Here’s a closer look:
Soylent green is PEOPLE!.
See those two really bright suckers? They’re making GFP from the DNA I gave them. That third colony isn’t, and therefore glows not. So on Thursday afternoon, forty-eight hours after conceiving the experiment, I was able to hand Fu Manchu a fresh plate containing bugs that make the protein he was after.
See those two really bright suckers? They’re making GFP from the DNA I gave them. That third colony isn’t, and therefore glows not. So on Thursday afternoon, forty-eight hours after conceiving the experiment, I was able to hand Fu Manchu a fresh plate containing bugs that make the protein he was after.
Basically, at this moment you’re either going”Dude, that is so totally awesome!”
or
“Huh? What?”
There is no middle ground.
PMR: I am in the “so totally awesome” camp. As a crystallographer I know about GFP but I have no idea whether what LabRat/BlackKnight has done is routine or novel. But what comes over is the Wow! factor. That is part of the essence of science. An experiment with a message so clear that it hits you in the face. Science is a hard mistress and many of us work years with sludges, non-crystals, and other junk. But there is the sheer beauty of seeing something – perhaps under the microscope – that you – and perhaps everyone else – has never seen before.And this is Open Science. Black Knight has told the world precisely what he has done – as he did it. No apparent thought of “shall I rush off and patent it?”. Not for everyone, perhaps, but great to see it happen.
Posted in open issues | 6 Comments