I submit a Nature article to Nature Precedings

I have been invited by the editors of Nature to submit a review/commentary article, currently on the theme of “Open Chemistry”. This is currently under the title “Horizons” though the actual format may change before publication. I wrote the article two weeks ago and it has been entered into the editorial process – I’m assuming that means it will appear in due course, the current date being ca. January 2008.
I have taken the opportunity of testing 4-5 aspects of the publication process:

  • pre-printing it in Nature Precedings. Anyone is allowed to do this – the submissions are vetted before being released into view (and I suspect are primarily to make sure they are in scope for scientific discourse, not that they reach a given standrad of excitement). I have done this (it’s not necessary for the actual Nature submission process) and it will doubtless appear at some stage. If so I will note it on this blog if you haven’t found it already.
  • Asking that the final manuscript be available under Creative-Commons. I have suggested CC-BY (this term was unknown to the Nature permissions office although it’s part of Nature Precedings, which are licensed under CC). They are going to return to me about CC-BY and I have also suggested SPARC author-addendum. Let’s see,
  • Using images without restrictive copyright. I have therefore chosen 2 from Wikipedia (which uses GPL), one from CrystalEye whose data is Open Data and where I am the author, one from Jean-Claude Bradley’s SecondLife snapshots, and one from our screenshot of OSCAR3. There is no need to seek permission for any of these. However the Nature copyright office still feels it has to write to Wikipedia for permission. What? They are FREE, OPEN as in free beer. No permission required. How do I explain what the words on the GPL mean?
  • reducing the citation count to zero. I provide one link to the blogosphere which then links to the rest of the blogosphere and I provide 2 other links to other information. For the rest I use copious links to Wikipedia, which should increasingly replace ritualised citation of methods, algorithms, fundamental work, etc. Of course this isn’t applicable to most current scientific publications, but it’s worth considering whether the reader is disadvantaged. I doubt it.
  • posting into the institutional repository. I know that Southampton have got this to “one click”, but my previous experience with DSpace suggests I shall take a few more. I’ll time it.

I’ll also try to see how many clicks and links I get from wherever. Nature have a voting system but I don’t know whether they release download info on Precedings.
It will be quite fun to see how much the manuscript changes during the process. Even if I can’t re-use the final version perhaps I can mount a delta.
NOTE: The manuscript is now public: http://precedings.nature.com/documents/1200/version/1

Posted in chemistry, open issues | 2 Comments

Comments on comments and agents and eyeballs

One of the difficult features of blogs is how to manage comments. On this blog these are relatively infrequent, wile on – say –ChemBark, TotallySynthetic.com or The Chem Blog some articles generate over 100 replies. I got into the habit of responding to most posts, but have sometimes dropped out of the habit, especially when busy.
In the highly commented blogs, I suspect, some of the regulars go back frequently to check how the discussion is going but here, I think, most people visit once. Because of that comments do not always get prominence. So I have also got into the habit of extracting comments and commenting on these.
(BTW in some cases I would much prefer a forum rather than a blog as it allows you to review posts on a given topic more easily. Slashdot uses this type of approach, with added moderation. I am surprised that it isn’t easier to created structured history from a blog – yes, you don’t want it for pictures of kitty (or perhaps you do) – but for blogs with advocacy it’s really valuable. It saves cutting and pasting links and other horrors.)
When I leave a comment unanswered it’s probably simply that I haven’t time. Sometimes also I don’t want to get distracted into a discussion that, though interesting, isn’t mainstream. So here are two recent ones…

  1. Steven Bachrach Says:
    October 1st, 2007 at 2:10 pm ePeter,Not to splash cold water on this, but one should keep in mind that the cyclohexinol story blew up because of Rychnovsky’s paper in a closed access journal (Org. Lett.). Ruchnovsky chose not to post this in a blog or even in an OA journal. The blogosphere only then picked up on it, and did greatly promote this subject to a wider audience. I think we are still a long way from web 2.0 being the publication access of choice for chemists for original research. For the time being, web 2.0 seems to be more suited as a more personal, chatty style of communication – sort of a hipper, more broadly authored, and more current version of Chemical and Engineering News (or Chemistry in Britain).Steve
  2. ChemSpiderMan Says:
    October 1st, 2007 at 3:55 pm ePeter, I’ve given many examples of the issue of Data Quality on the blog. Some links are:http://www.chemspider.com/blog/?p=64
    http://www.chemspider.com/blog/?p=164
    http://www.chemspider.com/blog/?p=168
    http://www.chemspider.com/blog/?p=137Also, today at the PubChem “Advisory Group” meeting I will be presenting on this issue to the attendees. I will put the presentation online later.
    You should be aware that issues of Quality are showing up already and proliferate problems…for example, while the story about structure drawing quickly is “interesting” the problem is that the structures shown there have errors (at http://liquidcarbon.livejournal.com/13138.html)
    It’s good to know that you appreciate our efforts around Data Quality. I’ve never seen a response to my data quality comments on your blog and I assumed that you would have been very interested in the work I’ve done on Wikipedia and taxol validation this past week. FYI, the link back to PubChem and the systematic name have now been edited on the DrugBox and the Wikipedia record is now correct.

(1). I think I kinda knew this – I am not an organic chemist so I came on the story later. Would it have been discovered by the blogosphere anyway? My take is thusfold:

  • if it’s interesting, yes. AFAICS hexacyclinol is not very interesting except in the synthetic organic chemical olympics and so wouldn’t get unearthed except by them. The best I have on the blogosphere is:
  • Anonymous (on Tenderbutton’s blog)
    June 1st, 2006 | 12:08 pm
    Has anyone seen Org Lett ASAP today? Take a look at Scott Rychnovsky’s “Predicting NMR Spectra by Computational Methods: Structure Revision of Hexacyclinol” then look at the earlier synthesis paper “Total Synthesis of Hexacyclinol” ACIE 2006 2769-2773 (especially the so-called supplemental data). Things don’t add up. Dylan, can you get to the bottom of this pecular situation?
  • or if we use machines (agents – see later)
  • PMR: so Steve is right – the paper alerted the blogosphere. What would have happened without the public interest? Would it have been reported in C&E news or at least the blog (I think I’m right on that one)? It’s certainly true that quality has a much higher profile because of the discussions.
    (2) This is simply because I was too busy. I’m not personally interested in the structures of diazonamide, taxol, etc. and my silence signified “good! thank you”. I will try to add something on a regular basis to comments when possible. I do, however, want to avoid infinite regress of discussion.

    There are clearly many data “out there” which are “wrong”. Sometimes it is typos – there are a lot in peaklists because it is a particular stupid C19 way or reporting science when there are acceptable (if not brilliant) eSpectra technologies. Some are mislabbeling – right spectrum, wrong compound association. Note that it is the associations that are the problem in all of this, including the versioning. “right” and “wrong” do not apply to molecular formulae or spectra, only the association of names, spectra, molecular structures, etc.
    So Le Clair had the wrong association between a structure and a spectrum (and between various samples). This is harder to formalize than it seems. each component is “right”.
    One of the real problems is that the association between structure and spectrum is never transmitted – if it were our problems would be much easier. The simple way to do this is using CML – the molecules and spectra are contained in the same container. NMRShiftDB does this, and we have been able to download thousands of CML files, extract the structures and spectra (as peaklists) automatically.
    Now what could we possibly want to do that for? If you comment I guarantee to comment – once.
    … and did I mention “Open data”?

    Posted in chemistry, open issues | 2 Comments

    Eyeballs from the blogosphere

    Fantastic! The blogosphere has already responded to our request for accounts of data quality enhancement.

    1. Egon Willighagen Says:
      October 1st, 2007 at 8:18 am ePeter, I’ve placed some pointer to past blog items from my blog that I feel relevant [1]. I’ve also tagged this overview with ‘pmrgrantproposal’ and requested others to do the same.
      1.http://chem-bla-ics.blogspot.com/2007/10/how-blogosphere-changes-publishing.html
    2. Cameron Neylon Says:
      October 1st, 2007 at 9:43 am ePeter, you don’t half set yourself steep targets with a 36 hour deadline starting on a Sunday morning! My posts on open notebook science are at http://blog.openwetware.org/scienceintheopen/category/open-notebook-science/

    PMR: Many thanks. Please keep these coming, especially anything from the hexacyclinol stuff.
    We have to ‘fess up. ‘I’ am actually a pan-dimensional hyperbeing like the mice in H2G2.
    And “36-hours” is a meaningless spatio-temporal measure – we borrow from a virtual universe and then replace the “time”.

    Posted in "virtual communities", blueobelisk | 1 Comment

    Guerilla OA activity

    Blogged by Peter Suber:
    Graham Steele, Conference Report, McBlawg, September 29, 2007.  Excerpt:

    Here is a report in relation to my attendance of NeuroPrion 2007 26th – 28th September, Edinburgh, Scotland….
    Given the approximate number (~ 800) [of attendees], clearly, it would not be possible to cover OA/IR’s with many on a one-on-one basis as originally planned. Having unfortunately previously *lost* my short podium slot, I started to consider other methods of getting my message across….Thankfully, with “Research Made Public” brazened on the front of my t-shirt all that day, this set the tone. A large proportion of delegates noticed this and I was the only person present with any *message carrying* clothing on that I was aware of.
    I chose my “I’m Open” t-shirt for day 2 since it was a much more visible and striking one.
    More familiar with the surroundings/set up, I noted that there was a 2 hour lunch/poster session which appeared to be, on paper at least, the best time to swoop into action. One hour in though, the only manned booths were commerce diagnostic related – so I had to quickly think of something else….Since I clearly couldn’t “post” on posters, I rapidly started to leave some basic “Open Access” posters and postcards on the tables where all the delegates were in discussion with one another. Process took only a few minutes and then *I vanished*.
    The ~ 950 were all on the lower floor with only two means of exit to ground level:- stair or escalator. I decided to leave a trail of the same “Open Access” postcards meaning that almost all (delegates) of them would see them….On the tables on the ground floor, I chose to leave some more of the same posters along with a few dozen DOAJ postcards….
    To a smaller extent, a few seeds were dropped up to level 3 where the main auditorium is situated….Within 20 minutes, I managed to place *something* on ~ 800 seats/armrests. Armrests are great since they cover two seats at once….A trickle of delegates started to arrive just as I was finishing so *I vanished* again….
    Upon my return, I could see hundreds of delegates reading/looking at what had *appeared* whilst they were away….
    Since the entrance area to both suites was quiet – I set up an “Open Access” stall on the most prominently placed empty (nice fluke) table. One of the most eye catching *goodies* I had was the blue/silver PLoS goblet which I proudly placed at the centre of *my stall* which contained a broad selection of what I had left. I also left a couple of our glossy “CJD Alliance” ring-bound *brochures* on display so that passers by got the connection to what I was doing. It was cool to sit at *my stall* with the ever so fitting “I’m Open” message across my chest.  I then *vanished* again….
    My final activity was to clear my stall and then stick up a final “Open Access” poster on the back of the prominently placed entry sign to the main auditorium. This meant that when all left it that day, they had their final reminder….
    Of those that I was able to discuss OA/IR’s with, almost all of the feedback was positive in nature. I was easily able to respond to any less positive feedback….

    PMR: I’m guessing that this was a meeting with a fairly traditional organisation and agenda so that OA was low on the profile of many delegates. These people do not hang out in revolutionary corners of Second Life such as the Blue Obelisk and are probably not impacted by OA  on a frequent basis.
    Like most chemists, of whom many (possibly even most from our SPECTRa survey) have not heard of OA.
    So we have to get the idea more prominent. I’m thinking about something like a set of flyers or posters which could be put in a departmental coffee room.  (Of course none of us have time for coffee any now). Although I am sympathetic to OA publishers it would come better from something like SPARC or ARL. And perhaps to pick up from Graham’s ideas – what is the best real-life advocacy that is seen as responsible?

    Posted in open issues | 1 Comment

    NSF/JISC meeting on eScience/cyberinfrastructure

    I was privileged to be at a meeting between JISC (UK) and NSF (US). Every paragraph of the report is worth reading – I quote a few…

    William Y. Arms and Ronald L. Larsen, The Future of Scholarly Communication: Building the Infrastructure of Cyberscholarship, September 26, 2007. Report of the NSF/JISC Repositories Workshop (Phoenix, April 17-19, 2007). It announces

    The fundamental conclusions of the workshop were:
    • The widespread availability of digital content creates opportunities for new forms of research and scholarship that are qualitatively different from traditional ways of using academic publications and research data. We call this “cyberscholarship”.
    • The widespread availability of content in digital formats provides an infrastructure for novel forms of research. To support cyberscholarship, such content must be captured, managed, and preserved in ways that are significantly different from conventional methods.
    As with other forms of infrastructure, common interests are best served by agreement on general principles that are expressed as a set of standards and approaches that, once adopted, become transparent to the user. Without such agreements, locally optimal decisions may preclude global advancement. Therefore, the workshop concluded that:
    • Development of the infrastructure requires coordination at a national and
    international level. In Britain, JISC can provide this coordination. In the United States, there is no single agency with this mission; we recommend an inter-agency coordinating committee. The Federal Coordinating Council for Science, Engineering and Technology (FCCSET), which coordinated much of the US government’s role in developing high performance computing in the 1990s, provides a good model for the proposed Federal Coordinating Council on Cyberscholarship (FC3S). International coordination should also engage organizations such as the European Strategy Forum on Research Infrastructures (ESFRI), the German research foundation DFG, and the Max Planck Digital Library.
    • Development of the content infrastructure requires a blend of interdisciplinary research and development that engages scientists, technologists, and humanities scholars. The time is right for a focused, international effort to experiment, explore, and finally build the infrastructure for cyberscholarship.
    3
    • We propose a seven-year timetable for implementation of the infrastructure. The first three years will focus on developing and testing a set of prototypes, followed by implementation of coordinated systems and services.

    Computer programs analyze vast amounts of information that could never be processed manually. This is sometimes referred to as “data-driven science”. Some have described data-driven science as a new paradigm of
    research. This may be an over-statement, but there is no doubt that digital information is leading to new forms of scholarship. In a completely different field, Gregory Crane, a humanities researcher, recently made the simple but profound statement, “When collections get large, only the computer reads every word.” A scholar can read only one document at a time, but a supercomputer can analyze millions, discovering patterns that no human could observe.

    The National Virtual Observatory describes itself as “a new way of doing astronomy, moving from an era of observations of small, carefully selected samples of objects in one or a few wavelength bands, to the use of multiwavelength data for millions, if not billions of objects. Such datasets will allow researchers to discover subtle but significant patterns in statistically rich and unbiased databases, and to understand complex astrophysical systems through the comparison of data to numerical simulations.” From: http://www.us-vo.org/

    The workshop participants set the following goal:
    Ensure that all publicly-funded research products and primary resources will be readily available, accessible, and usable via common infrastructure and tools through space, time, and across disciplines, stages of research, and modes of human expression.

    The shortcomings of the current environment for scholarly communication are wellknown and evident. Journal articles include too little information to replicate an experiment. Restrictions justified by copyright, patents, trade secrets, and security, and the high costs of access all add up to a situation that is far from optimal. Yet this suboptimal system has vigorous supporters, many of whom benefit from its idiosyncrasies.
    For example, the high cost of access benefits people who belong to the wealthy organizations that can afford that access. Journal profits subsidize academic societies. Universities use publication patterns as an approximate measure of excellence.
    Younger scholars, who grew up with the Web, are less likely to be restrained by the habits of the past. Often – but not always – they are early adopters of innovations such as web search engines, Google Scholar, Wikipedia, and blog-science. Yet, they come under intense pressure early in their careers to conform to the publication norms of the past.

    … and so the final proposal

    … a seven year target for the implementation of the infrastructure for
    cyberscholarship. The goal of establishing an infrastructure for cyberscholarship by 2015 is aggressive, but achievable, when coordinated with other initiatives in the U.S., Britain, and elsewhere. A three-phase program is proposed over seven years: a three-year research prototype phase that explores several competing alternatives, a one-year architecture specification phase that integrates the best ideas from the prototypes, followed by a three-year research and implementation phase in which content infrastructure is deployed and research on value-added services continues. Throughout the seven years, an evaluation component will
    provide the appropriate focus on measurable capability across comparable services. A “roadmap” for the program is suggested in the following figure.
    [… it’s too large to cut, so you’ll have to read it for yourselves…]
    … and the details …

    Posted in cyberscience, open issues | Leave a comment

    Open grant writing. Can the Chemical Blogosphere help with "Agents and Eyeballs"

    In the current spirit of Openness I’m appealing to the chemical blogosphere for help. Jim Downing and I are writing a grant proposal for UK’s JISC : supporting education and research – which supports digital libraries, repositories, eScience/cyberinfrastructure, collaborative working, etc. The grant will directly support the activities of the blogosphere, for example by providing better reporting and review tools, hopefully with chemical enhancement.
    The basic theme is that the Chemical Blogosphere is now a major force for enhancing data quality in chemical databases and publications, and we are asking for 1 person-year to help build a “Web 2.0”-based system to help support the current practice and ethos. The current working title is “Agents and Eyeballs”, reflecting that some of the work will be done by

    • machines, as in CrystalEye – WWMM which aggregates and checks crystal published structures on a daily basis.
    • humans as in the Hexacyclinol? Or Not? saga. Readers may remember that there was a report of the synthesis of a complicated molecule. This was heavily criticized in the blogosphere, and indeed the top 9 hits on google for “hexacyclinol” are all blogs – the formal, Closed, peer-reviewed paper comes tenth in interest.

    Given enough eyeballs, all bugs are shallow” – Eric Raymond. In chemistry it is clear that the system of closed peer-review by 2-3 humans sometimes leads to poor data quality and poor science. We’ve found that in some chemistry journals almost every paper has an error – not always “serious”, but … So:
    “Agents and eyeballs for better chemical peer-review”.
    Not very catchy but we’ll think of something.
    It’s unusual to make your grant proposal Open (and we are not actually putting the grant itself online, especially the financial details). But there are parts of the case that we would like the blogosphere to help with. If you have already written a blog on any of the aspects here, please give the link. You may even wish to write a post

    • showing that the blogosphere is organised and effectively oversees all major Open discussion in chemistry. I take Chemical blogspace as the best place for a non-chemist (as the reviewers will be) to start.
    • show that the Blogosphere cares about data. Here I would like to point to the Blue Obelisk and the way Chemspider has reacted positively to the concerns about data quality
    • show that important bad science cannot hide. I would very much like an overview of the hexacyclinol story – which is still happening – with some of the most useful historical links. Anything showing that the blogosphere was reported in the conventional chemical grey literature would be valuable.
    • Open Notebook Science.

    We have three partners from the conventional publishing industry – I won’t name them – who have offered to help explore how the Agents and Eyeballs approach could help with their data peer review.
    You might ask “why is PMR not doing this, but asking the blogosphere?” It’s precisely because I want to show how responsive and responsible the blogosphere is, when we ask questions like this.
    There is considerable urgency. To include anything in the grant we’ll need it within 36 hours, although contributions after that will be seen by the reviewers. I suggest that you leave comments on this post, with pointers where necessary. Later I suspect we’ll wikify something, but it’s actually the difficulty of doing this properly and easily that is – in part – motivating the grant.
    TIA

    Posted in "virtual communities", blueobelisk, cyberscience | 6 Comments

    Volunteers: does the computer experience translate to chemistry?

    One of the spinoffs of having been to scifoo is that I skim over 50+posts / day from the blogs that participants run. Some are multi-author blogs:  Here’s Andy Oram on Tim O’Reilly’s blog, talking about what makes volunteer documenters click. Read it all.

    01:47 30/09/2007, Andy Oram, Planet SciFoo
    By Andy Oram
    […]If value increasingly comes from communities of volunteers outside the compass of corporate management, isn’t it only right to shift resources to support these communities? I have to deal with that question in my own field of computer documentation, where the shift to community production is as happening as fast as it is anywhere. (I examine this trend in a series of articles about community documentation.) [PMR – listed below] But many industries could ask the same question I explore in this article: how can society shift its resources to support the important new source of value in communities?

    Volunteerism needs support

    The idea that volunteers play an important social role goes at least as far […]
    Volunteers who are paid, of course, are no longer volunteers. Companies have hit upon an enormous number of intermediate forms of reward by now: invitations to focus groups and conferences, honorable mentions, free products, etc. Still, serious problems in the concept of rewarding volunteers have been publicized:

    • Rewards create incentives to game the system, which would ultimately lead productive volunteers to abandon the system as unfair.
    • Even when rewards are fair, they “crowd out” the original incentives that led volunteers to serve in the first place.
    • It’s just plain impossible to determine how much each volunteer’s contribution is worth.

    The final point just listed is the killer. The reasons for it are easy to state: the ultimate value created by any new idea may lie far out in the future, and the give-and-take discussion around information makes it hard to trace a valuable idea to an individual or small group. Let’s look at this problem more closely.

    The value of information

    […]In computer documentation (as in journalism), certainly, it’s becoming harder and harder to add value to what the community contributes for free. So the challenge becomes how to improve the community’s offerings.
    I find the key traits of value in documentation to be:

    • Availability–somebody has to write it in the first place. (Readers also need computers and Internet access in order to meet this goal.)
    • Findability–people need something better than current search techniques to find obscure documents, and particularly need help finding background when they read a document that assumes too much prior knowledge.
    • Quality–this covers such general and complex issues as accuracy, relevance, and readibility.

    A particularly urgent aspect of quality is keeping a document up to date. Many a project has annoyed its users by starting out with reasonably good documentation and failing to keep it updated. Somehow, people who enjoyed writing something the first time lose interest in maintaining it. This is just as true for comments in source code and commercial books. (Many of my authors have built their reputations and businesses on books they’ve written, and despite good intentions have been unable to find time to update the books.) I myself have lived out the feeling of writing new documentation for a free software project and then lacking the motivation to go back to it.
    Thus, companies and user consortia who want to direct resources toward making software more usable can consider:

    • Offering incentives that make the best people contribute, while trying to avoid invoking the crowding-out phenomenon.
    • Providing paths through documentation, so readers can find what they need in their particular state of knowledge. This task is an ongoing research project for any particular body of documentation.
    • Ensuring continuity, by tracking the need to update documents and finding people to do so.
    • Training contributors to do a better job and make the most of their efforts.

    The last of the tasks interests me in particular, because it provides scope for offering my skills as an editor and O’Reilly’s as a publisher. But we need some compensation for it.
    I feel funny, of course, offering our services as editors or other quality providers when the original authors might not be paid. But if you accept that it’s harder to recruit people for supporting roles than for leading roles, payment is justified.
    To conclude, I think volunteers can be supported without being paid directly. If they know their work will be improved to be more useful and will have lasting value, they’ll have more incentive to contribute.
    […]

    PMR: and the details:

    … writings by Andy Oram about web pages, forums, and other media used by users of technology to educate each other. Articles include (in reverse chronological order):


    Andy Oram
    Editor, O’Reilly Media
    Home page

    PMR: This is very relevant to recent development in the Blue Obelisk, where a volunteer community has become the keeper of the SMILES de facto standard. We should read Andy’s thoughts carefully.
    The equations are similar but not isomorphic. Why do people work with the BO? Here are some ideas:

    • A sense of community. This is a major reward for many people, being able to keep in touch and knowing that you are on the right track (or more importantly, on the wrong one). And the price of membership, though not explicitly stated in the gift economy, is to contribute and to uphold the ideals of the work.
    • A fuzzy mixture of morals, ethics and politics. It is the “right thing to do”. If that drives some people, great. On the reverse I have been attacked several times for being immoral in promoting various aspects of Open Chemistry – it destroys the jobs of honest hard-working developers. [No, it creates jobs for those people who wish to translate to C21].
    • Personal “academic” karma. This is a major motivation. As the BO succeeds those people who have been associated with it will be asked to write articles for value publications, to cooperate on the next phase of funded Web 2.0 grants, etc. For aspiring scientists to work together.
    • Personal financial reward. This is a powerful and valid motivation. There is lots of potential – I wouldn’t have a job today if I hadn’t contributed to the development of XML. When we look for people to join us, the blogosphere is an obvious recruiting ground.  And as the balance shifts from closed to open there will need to be ways of monetizing Openness. The chemical information market is worth at least low billions of USD – it’s still going to be there in 10 years’ time. But many of the conventional players will be gone and new ones will have taken their place.
    • Fun. Yes, fun. We like writing algorithms. If you are a Sudoku addict you’d enjoy writing a chemical substructure search. We like drawing molecules. Many artists – like Jane Richardson – have joined the community of molecular graphics. We like building second life. We like writing blogs.
    • Changing the world. Everyone contributing to the BO is changing the world… It may not be apparent, but it’s real.

    and as Alma Swan, quoting Gandhi, (blogged by Barbara Kirsop) reminded us:

    ‘first they ignore you, then they laugh at you, then they fight you, then you win’.

    The BO has not won yet. It’s somewhere between ignore and laugh, and for the next little while we’d love some documentation volunteers!

    Posted in blueobelisk | Leave a comment

    Open Access at Abbey Square

    Yesterday Jim Downing, Nick Day and I were the guests of Peter Strickland and Brian McMahon at the International Union of Crystallography in their gorgeous offices at Abbey Square, Chester UK
    .iucr.png
    The IUCr is a member of ICSU – International Council for Science and as such acts as a governing body. It has taken a very proactive role over the last 5 decades (and probably more, but I can’t remember) on things like data quality, standards, creating a community. So do all Scientific Unions – such as IUPAC (which recently did me the honour of making me a fellow) – but I hope I’m not divisive in giving the IUCr some individual praise.
    I remember IUCr running an community exercise – I think in the 1950/1960 period – where labs were invited to collect data sets from a standard crystal (something like sodium ammonium tartrate, but I forget). That meant that the community could estimate the precision and accuracy that might be possible at that time. The philosophy has continued, and of course technology is much improved so that routine crystallographic data has excellent precision and accuracy. The IUCr has also emphasized the publication of data sets – as part of the scientific record, to check for and with the expectation that future scientists might revisit data sets and re-use them. (For example when I did my doctorate the programs couldn’t model anisotropic scattering from atoms and it would be easy to re-analyse the data. The IUCr has always promoted the publication of the raw data and it’s due to their advocacy that Nick Day has been able to create CrystalEye – WWMM from the supplemental crystallographic data that many responsible scientific publishers mount on their websites. The IUCr had given us some initial support for a summer student – Mark Holt – and we were showing where it had got to. CrystalEye is an excellent model for harvesting data from publisher sites – at least those who don’t try to posses public domain data. More on all of this later.
    The IUCr is also a publisher – its flagship journal is Acta Crystallographica (sections A-F). CrystalEye takes data mainly from E, C and parts of B. Acta has a hybrid approach to OA – the cost to authors is 900 USD which is a lot less than most. I think we can expect more developments in this area.

    Posted in Uncategorized | Leave a comment

    Four theses and a repository

    I’ve been advocating that all theses should be deposited in institutional repositories under CC-BY licences, and here’s an interlude with 4 I have personal knowledge of. I’m keeping the authors secret, although those in the know will identify some.
    ONE is from someone I have supervised. They have submitted their thesis and are awaiting a viva. I shouldn’t comment publicly on its quality even though I am not an examiner. But we are both keen to see the thesis under OA – the question is what to do with the data – ca 15 GB of computation. Is the repository the right place for it?
    TWO is from someone I know well who has also submitted their thesis. It’s in a University which already has a tradition of Open Access in their IR. Although it’s not in a field I can claim expertise (music and AI) I think it deserves widespread visibility.
    THREE is someone I examined recently at another University. I can’t publicly say what we recommended for the candidate, but at least we had a drink afterwards. I broached the subject of Open Access and the candidate was excited – they want the world to know what they have done. So I have written to the University and although this is not an established routine I got an encouraging reply.
    FOUR was written many years ago on manual typewriters (sic) and several carbon papers. I think physical copies still remain. So I wondered if the author of FOUR might wish to see his thesis digitised [1] and made open access. Should I suggest he writes to his alma mater? If he can get his act together.
    [1] Institutions such as Caltech have been retrospectively digitising their theses – see for example this one.

    Posted in open issues | Leave a comment

    Chemical Speeddrawing

    There used to be an advert on the London Tube advertising “Speedwriting” (something like “f u cn rd ths u cn gt a gd jb”. What about speed-drawing of chemical structures? Here’s Liquid Carbon:

    Finally, I’d like to offer a small pissing contest. It takes me:
    • 30 sec to draw THC
    • 38 sec to draw Penicillin G
    • 82 sec to draw discodermolide
    What about you? The compounds should be drawn with all stereochemical information and in the same general style (bond angles, side chains positioning) as in the picture below.

    PMR: some of the blogosphere responded and the times were similar.
    What about non-graphical input such as SMILES or even WLN (which Depth-First resurrected: Everything Old is New Again: Wiswesser Line Notation (WLN) ? Of course WLN doesn’t do stereo, but I bet the practitioners could beat the times above by some considerable margin. And it wouldn’t be too difficult to include the stereo – in the last 40 years we have lowercase letters on our keyboards!
    And it took me 27 seconds to type the SMILES for penicillin (admittedly without stereo and orientation). But, as readers of this blog know, I can’t type either.

    Posted in chemistry | 3 Comments