Peer Review – I write unacceptable code

I have so much stored up that I need to write about that I shall be triggered by random occurrences.

On Thursday Joe told me (and colleagues) that some of the code I had written in our Microsoft Chem4Word project was rubbish.

He was right. It was. I told him so and thanked him and the rest of the project.

How could he do this? I’ve been writing code for decades. The answer is simple – everyone writes unacceptable code. Not always, and not always at the end of the process. The reasons are many.

We are a smallish group, working under time pressure (what’s new about that) and with a new technology. Our collaborators are half a world away in Seattle. Nine months ago we knew nothing about C#, WPF, .NET, XAML, and had only just hacked our way through bits of DOCX. They knew nothing about chemistry, chemical editing idioms, validity, etc. We are both going through a rapid learning process. It’s inevitable that our code is unacceptable. Among several other techniques we need Code Review. By chance I came across:

The Case for Peer Review
published by SmartBear software. Just an intro, but it makes the case that Code Review saves time and money. Lots of it.

So what is it? WP defines it as:

Code review is systematic examination (often as peer review) of computer source code intended to find and fix mistakes overlooked in the initial development phase, improving both the overall quality of software and the developers’ skills.

and gives examples as:

Lightweight code review typically requires less overhead than formal code inspections, though it can be equally effective when done properly.[citation needed] Lightweight reviews are often conducted as part of the normal development process:

  • Over-the-shoulder – One developer looks over the author’s shoulder as the latter walks through the code.
  • Email pass-around – Source code management system emails code to reviewers automatically after checkin is made.
  • Pair Programming – Two authors develop code together at the same workstation, such as is common in Extreme Programming.
  • Tool-assisted code review – Authors and reviewers use specialized tools designed for peer code review.
  • Some of these may also be labeled a “Walkthrough” (informal) or “Critique” (fast and informal).

It helps that we work in a scientific enviroment. Science tests its assertions against the harsh reality of the world and the equally rigorous process of peer review. We expect to be told that our ideas are flawed and must be reworked. No scientist would stand in front of their colleagues and assert they are right because they are an expert – anyone (even Nobel Laureates) can be humbled by the peer-review process. Many – perhaps most – papers are seriously revised as part of the review process (and whatever you think of Open or Closed access no-one should eliminate peer review). Note, however, that Openness enhances the number of potential informal reviews.

So what do we do when coding? Every day, yes day, we have a 10-60 minute telcon, including Live Meeting and MS’s code development environment (Visual Studio, Team Foundation Server, etc.) We show bleeding edge code. We comment on it. Comments are always seen as constructive. Authors do not try to defend what they have written – rather they ask “what are we going to do about it”. It’s often useful to determine why the code is unacceptable, but not to use that to defend it.

Why was the code I wrote unacceptable? There’d been more than author and my code had been to clean up some of the problems (which it had done). But it doing so I had to check the validity of some of the input. We didn’t have a communal approach for reporting invalidity. I was in a rush so I chose to throw an unchecked exception – at least the code would tell people something was wrong. But it’s very very ucky.

This code will be going out to the world. So it has to be perfect. No, it has to be as good as we can make it with the given resources. But we are communally clear that spending some of the resources on code peer review is one of the most efficient things we can do.

Posted in Uncategorized | 4 Comments

Wellcome gets tough on Open Access depositions

When one is active in an area (in this case Open Access) it’s often difficult to see how important it is from outside. So I was delighted to get an internal email to all staff making it clear that it was MANDATORY for Wellcome grantees to publish their papers as Open Access. Here’s excerpts from the mail:

As you may be aware, the Wellcome Trust’s award terms and conditions require that all research papers arising from Wellcome Trust funded research must be made available on the PubMed Central website (http://ukpmc.ac.uk/) within six months of publication.

The Wellcome Trust have been monitoring compliance rates, and have been disappointed to find that these are currently very low.  As a result of this, they intend to more actively monitor compliance, and in future will be contacting researchers who have not had articles published as Open Access papers.

The University of Cambridge has been given a grant to cover costs associated with Open Access publishing.  If your journal charges for making your article available on PubMed Central, please refer to this website: http://www.bio.cam.ac.uk/sbs/funds/wt_claims.html for how to claim these costs back from my office.

Further information on the Wellcome Trust’s Open Access policy can be found here: http://www.bio.cam.ac.uk/sbs/funds/wtinfo.html, or at the Wellcome Trust’s website here: http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Open-access/Guides-and-FAQ/WTD018855.htm.

and the claims site announces:


Claiming Open Access Charges
This page describes how to claim back costs charged by publishers for placing papers on the UK PubMed Central website. Initially, you will have to pay the publisher’s Open Access charges. You can then claim these costs back as follows:

  1. Fill out a form (Open Access request form) with the requested information.
  2. Please return the form and an internal invoice …
  3. Once we have this, the monies you have paid for Open Access charges will be re-imbursed to your account.

I have the privilege of being on the UKPMC advisory board and we’ve been thinking about how to make the policies and practices more widely known. UKPMC is doing roadshows (the first in Oxford last month) and I am sure they would welcome enquiries from institutions or individuals wanting more info.
We have to realise that Open Access will take hard work. It’s not just building deposition systems and expecting them to get filled. It needs a commitment from the grant holder. It’s simple:

  • If you receive a grant you have to publish the results as Open Access.

If you don’t want to, no-one is forcing you to apply for grants.
(Well, yes they probably are, so you had better get used to the practice of publishing Open Access)

Posted in open notebook science, semanticWeb, Uncategorized, xtech2007 | 2 Comments

Restarting posts

I am reinvigorated to start posting again, and there will be another post shortly.
I stopped posting last year for a number of reasons, some of which may seem strange and some of which may make sense to other bloggers.
Firstly I hit a number of small technical barriers. These included changing machine and losing password and the volume of spam. Nothing insuperable in themselves but in the internet era activity is limited by a large number of small barriers. Yes, I could use FriendFeed, FaceBook, Twitter, Delicious, etc. etc. but they all have a energy quantum of entry fee and adding them all up just means it doesn’t happen. My mail is in a mess, etc.
Then I found the blog was driving me – feeling I had to post once a day, twice a day, etc. and so on. It takes time – a post can take 30-60 minutes and I increasingly didn’t have that.
Then I went to the Nature Blogfest in August. I felt a bit of a fraud. Worse, there was a session on roughly “why one gives up blogging” or “should you feel guilty for not blogging”. I’d not blogged for 2-3 weeks and being a speaker/panellist I felt I’d let people down. The session gave me the perspective that there was actually no guilt in not blogging. So I stopped.
I’d not mean to stop for very long, but found there was an impetus to getting restated. And I had a great deal going on here that I had to concentrate on. So, when I woke in the morning, I would do other things than blogging. It’s as simple as that.
However people IRL apparently like some of the blog and I’ve had a few hints to resume. So here we go.
There’s an awful lot to talk about:

  • Open Data and Open Knowledge.
  • Our Chem4Word project with Microsoft
  • Semantic chemistry – ontologies and Chemical Markup Language
  • Software

I’ve been on trips and missed adding this perspective, so I’ll revisit some of the highlights.
(And this mail will test whether the system works)

Posted in Uncategorized | 3 Comments

Recent developments

I have been offline for some time because a lot has been going on. As I mentioned earlier we have had 5 students over this summer and they have been phenomenal. Not that previous students – such as Joe Townsend – have’t also been.  But in the last 2 months we have achieved a huge amount of software and systems development and I’ll tell you about this later.
However it has kept me 100% occupied on software and I haven’t had a chance to think about blogging or anything much else.
There are several things that I shall write about (see http://wwmm.ch.cam.ac.uk)

  • The Chem4Word project where Microsoft and we are developing a chemical authoring system within the Office (and related) XML environments.
  • The Crystal editor that we have  been working on with sponsorship from the Int. Union of Crystallography and which Nick Day announced ar the Osaka IUCr meeting
  • A departmental data repository for crystallography
  • An ontology for chemical reactions (sponsorship from Royal Soc. Chemistry)

And there are upcoming meetings:

  • This week I am talking at the Ticer meeting on digital liraries in Tilburg
  • On Friday there is the Nature blogging meeting in London

So I have a full set of topics to cover. As I am in the process of changing machines and getting software ready for presentation I’ll leave the details for a day or two.

Posted in Uncategorized | 3 Comments

Update

I have been off the air for some time because of travel and also technical problems with the blog which Jim Downing has solved (thanks).
<p/>
I hope to blog soon about data, repositories and escience among various topics.

Posted in Uncategorized | Leave a comment

ESOF2008 Alma Swan's session

Alma Swan has organised the session at Barcelona ESOF : (http://www.esof2008.org/fileadmin/media/programme/scientific_programme_preliminary_abstracts.pdf) [Saturday, 18th July, 1630).

Sharing scientific data: who benefits?
Alma Swan, Key Perspectives Ltd, United Kingdom

Abstract: Digital datasets—text-based, numeric, audio, video or image-based—form the output of all scientific disciplines. How are these data being made available for sharing? What quality control mechanisms are in place? What kinds of naming conventions, tags, and metadata are in use and how effective are they at helping to manage open data? Who is storing, archiving and curating open data and at which levels? And how is the production and sharing of open data assessed: what processes are in place for crediting scientists for making their raw data openly accessible for sharing and re-use. How much can and should data publication replace traditional forms of publication of research findings?
Posted in Uncategorized | Leave a comment

ESOF

We have been so busy with our summer program – semantic authoring and capturing of chemistry – that I haven’t had a breathing space. I’ll be blogging more about that. However a change of scene – tomorrow I’m in Barcelona at ESOF: The Euroscience Open Forum. I’ll post more later. It’s very important that Eurpe is a world leader in this arena.
==========================================================
ESOF: The Euroscience Open Forum
About ESOF
ESOF2008 logoFor too long, Europe was lacking an independent arena for open dialogue on the role of all the sciences, including the humanities, in society. We have it now with the Euroscience Open Forum. The initiative was taken in 1999 by the researchers themselves: the Euroscience Open Forum was brought to life by Euroscience.
Euroscience recognised the need for an interdisciplinary, pan-European meeting place for open dialogue and the exchange of ideas.
Visit the ESOF2008 web site
The ESOF concept
Science and technology are becoming increasingly important as they concern and affect everybody. The Euroscience Open Forum is not an ordinary scientific conference, but a totally new concept. It consists of a Forum for discussion of topical issues, an embedded conference (with an exhibition) to showcase European achievements right across the scientific and technological spectrum, and an outreach programme.
The outreach programme consists of a large number of events and happenings throughout the ESOF host city, which are targeted to the public at large of all ages. At ESOF2004 in Stockholm, the outreach programme “Science in the City” attracted 11000 visitors. At ESOF2006, the outreach programme was linked to the “Wissenschaftssommer”, attracting some 60000 visitors.
ESOF also serves as a young scientists’ forum, encouraging students, PhD students and post-docs to share their experience and participate in debates about such subjects as the European Charter for Researchers, how to motivate young people to engage in scientific careers, and how the construction of the European Research Area enhances the prospects of young scientists.
ESOF’s aims are:
* Presenting scientific and technological developments at the cutting edge in all their variety from natural sciences to the social sciences and the humanities
* Stimulating the European public’s awareness of and interest in science and technology
* Fostering a European dialogue on science and technology, society and policy by offering a platform for cross-disciplinary interaction and communication on current trends and future roads for science and technology, their interaction with society and policy and the role of the public
ESOF’s European itinerary
The Euroscience Open Forum is held every other year, visiting the major scientific cities of Europe and bringing European science to the attention of all citizens.
The starting point of ESOF’s European journey was Stockholm, Sweden, in 2004. Two years later, ESOF’s itinerary brought the vent to Munich, Germany. And, after ESOF2006, the route will continue southwards : ESOF2008 will be held in the capital of Catalonia, Barcelona, Spain. ESOF’s exciting host cities reflect Europe’s cultural diversity. Thus, you will experience that the spirit of every Euroscience Open Forum is different…
ESOF’s success depends on you, too!
You can contribute to this open dialogue on all the sciences and on their role in shaping a knowledge-based society.
ESOF invites individuals and organisations to submit their best ideas in the form of proposals for the programme. The best of these proposals will be selected for the Forum by a Programme Committee of international standing.
For information about ESOF2008, please visit (http://www.esof2008.org/) http://www.esof2008.org/
You can also propose the next destination for ESOF’s travel plans. For further information, please contact us:

Posted in Uncategorized | Leave a comment

John Sulston calls for reform of IPR policy

Whether you support Open Access and Open Data or believe that Closed Access and patents are the best way of promoting high quality science, there is no doubt about the fact that restrictions on access to IPR area major drain on scientific effort. We all spend a significant point of time having to investigate contracts, and finding out whether or not we can actually do something. Now John Sulston has spoken out:

John Sulston, recipient of the 2002 Nobel Prize for medicine, has launched a new research institute, the Institute for Science, Ethics and Innovation at the University of Manchester. Sulston is using the launch to highlight his views on openness in science and the need to reform innovation and intellectual property policy. (Thanks to Subbiah Arunachalam.)

See the op-ed co-authored by Sulston and Joseph Stiglitz in the July 5 edition of The Times:

… The question of “Who owns science?” is therefore a crucial one, the answer to which will have broad-reaching implications for scientific progress and for the way in which the benefits of science are distributed, fairly or otherwise. Two of the most pressing issues concern equity of access to scientific knowledge and the useful products that arise from that knowledge. …

The second issue we wish to highlight is that of access to science itself. The ideal shared by almost all scientists is that science should be open and transparent, not just in its practices and procedures, but so that the results and the knowledge generated through research should be freely accessible to all. There is a broad consensus in the scientific community that such openness and transparency promotes the advancement of science and enhances the likelihood that the benefits of science are enjoyed by all. For more than a hundred years, these principles have been the bedrock of academia and the scientific community.

We call upon all interested in the future of science to join with us in an active and open-ended search for answers.

See also coverage in The Times and the BBC.

PMR: I hope that this message finds its way to the policy makers in academia as they have the power and the responsibility to act. In many cases the academic staff are unable to find the information they want or to allow it to reach those that they would hope to collaborate with. Not only are there patent and copyright restrictions, but universities often sign draconian contracts with the gatekeepers of scientific information. For example software companies can revoke licences or even sue the universities if we publicize bugs in the program. Publishers require libraries to sign contracts that forbid the use of the information in ways that individual staff don’t even know about. It’s only hearsay but I understand that these can include “excessive downloads” or data-mining.

In no way can any of this be seen as anything other than holding science back.
Posted in Uncategorized | 2 Comments

In praise of Undergraduates

One of the highlights of my year is our summer program of undergraduate projects in the Centre. We’ve done this for six years and each student spends 8-10 weeks working on projects in Molecular Informatics.

I have been astonished and delighted by what the students have been able to achieve and the lasting legacy they have left and are continuing to leave. I’m leaving out names and will speak in general terms. The students are usually sponsored by an external organisation and we have built up good relations with quite a number – such as publishers and pharma companies. Some students are also supported by the Department, and some by Unilever. We advertise by word of mouth and by the subject email lists. In general the number of positions has roughly matched the number of applicants – this year we have four projects which are all filled and I hope to talk more about them in this blog.

Oscar – our chemical text- and data-mining/processing facility sprang from summer projects (support from Royal Society of Chemistry and Nature Publishing Group). I am consistently delighted with the standard of the Oscar summer software – the Experimental Data Checker has run for nearly 5 years without needing any software support. CrystalEye sprang from a summer project sponsored by the International Union of Crystallography.

You might think that 2 months is too little time to do anything useful, and most of the time you would be wrong. It’s not uncommon to start getting useful material in the first week. This is in some part because we would as a large team. Some of us the Centre members hot-desk into the “training area” and we work communally – fixing each others’ probelms and discussing strategy.

Most of the students get to present to the sponsors and this has been very useful. One presented over a video link to the US office of the sponsor.

And there is a longer-term benefit – 5 of the students are now doing – or have just finished – PhDs with us. That has been an enormous benefit to the knowledge, expertise and culture of the Centre.

In more general terms, when anyone asks me how they are going to adjust to the rapid changes in modern thinking I advise them to include undergraduates in their team. If you are in the Library sector you have to understand how students think and act and the only way to do this is to work alongside them. You’ll find that long-held views about metadata, bibliographies, customised databases, and the linear reading of articles no longer hold. The e-generation works differently. And it’s often us who have to be educated.

I’m not involved in formal undergraduate education here (I have done some demonstrating) but if I were I would turn the system on its head and involve the students in preparing and delivering course material. They are oretty good at finding it, after all.
Posted in Uncategorized | 2 Comments

Open Access Data Repositories

Peter Suber has been working with colleagues to create a Wiki of Open Access Data repositories. From his blog

List of data repositories The Open Access Directory (OAD) list of Data repositories is now open for community editing.

OAD is a wiki, and you can help the cause by adding or revising entries to its lists.

Data repositories are becoming very important now and it’s clear that they are primarily useful if they are Open. Some subjects such as bioscience have had a long history of Open data repositories – and if the Wiki listed every one it would dominate the field.

Of course there are lots of nuances to discuss. What is Data? and what is Open? And I’ve spent time on this blog discussing these. At present I’ll just reiterate that we should label data as “Open Data” (from the Open Knowledge Foundation). And should protext freedom with Community Norms, not licences or contracts.

Every creator of an Open data resource should label it as such. All you need is:


This material is Open Knowlege
Posted in Uncategorized | 1 Comment