Monthly Archives: January 2012

What have the Publishers ever done for us? And do we need them?

Tim Gowers has used Spike Milligan as an inspiration for challenging Elsevier: http://gowers.wordpress.com/2012/01/21/elsevier-my-part-in-its-downfall/ . British satire is one of the things that keeps us going. I’ll use the equally irreverent Pythons in “Life of Brian” (http://en.wikipedia.org/wiki/Monty_Python%27s_Life_of_Brian ). From WP

There is also a famous scene in which Reg gives a revolutionary speech asking, “What have the Romans ever done for us?” at which point the listeners outline all forms of positive aspects of the Roman occupation such as sanitation, medicine, education, wine, public order, irrigation, roads, a fresh water system, public health and peace, followed by “what have the Romans ever done for us except sanitation, medicine, education…”.

Many industries have generated criticism in the wider community, ranging from anger to outright hate and revolution. Microsoft was (justifiably in my opinion) a major robber baron of the late 20th C . It was brought to heel by public/governmental anger and regulation and also by forces of innovation. Microsoft was effectively a monopoly but could not continue as such. Yet if you ask

“What has Microsoft ever done for us?”

even the most anti-M people would admit that they have brought new products and culture to the marketplace, and that huge numbers of people use these. If Microsoft products were suddenly taken off the market businesses would fold and kids would be crying. That’s true of most robber barons – steel, railways, cotton, etc. They brought new products and opportunities (albeit at great social and moral cost to many). Word is used by hundreds of millions (?billions) as is ExCEL.

I reiterate – I am not condoning Microsoft’s history – quite the reverse. I am simply saying they innovated. And some of that innovation is valued by many people.

The same can be said of most other entrepreneurs in ICT – Google, Facebook, etc. Whatever their sins they have innovated.

But when it comes to scholarly publishers it’s a different story. [I have acknowledged a few publishers such as IUCr, and some Open Access publishers – BMC, PLoS, EGU – who you should mentally exclude. But for the rest – including many society publishers – they have to stand up and be counted.]

Mike Taylor is a dinosaur expert who has got so angry with the publishing industry that he not only blogs about it but wrote an article in the Guardian. http://www.guardian.co.uk/science/2012/jan/16/academic-publishers-enemies-science where he asserted that “Academic publishers have become the enemies of science“. I agree with this phrase. I have blogged for some years about the restriction, the intransigence, the arrogance of the scholarly publishing industry and I shall continue to do so. (I should be writing semantic code, but I am so upset that I have to write this blogpost first). Read his post, I won’t quote from it.

There has been a reply from Graham Taylor – director of academic, educational and professional publishing at the UK Publishers Association
http://www.guardian.co.uk/science/2012/jan/27/academic-publishers-enemies-science-wrong . It makes the case for why the publishing industry creates value. Some of it is reaction to MikeT’s article, but in a few places it attempts to show why the publishing industry is essential and justifies the 10 billion USD it takes in every year. I have extracted the paragraphs that bear on this:

 

  1. when the reality is that their investments have made more research available to more readers at a lower unit cost than ever before. [and] Worldwide, around 3m research papers are submitted every year to scholarly journals – rising by around 3% per year in line with research budgets – of which around 1.5m are eventually published, including over 120,000 from UK researchers. Such journals are on the whole by their very nature tailored and adapted to the needs and interests of specific research communities. This is a complex and nuanced system that needs time to adapt to new methodologies.

 

  1. The scholarly world is not yet fully open access, nor even approaching it, but that is not the fault of the publishers. [and] Publishers are certainly not opposed to open access. [and] Publishers pursue the goal of universal access through whatever means are practically available.

This is all I can find on the value that publishers contribute. My analysis.

“publishers are trying as hard as possible to create Open Access”. This is simply false. Remember PRISM? A publisher consortium that paid 500,000 USD to create the phrase “Open Access means junk science”. “Open Access is ethically flawed” [RSC. Yes, they then got rid of the person who said it. If you look at the RSC licence for "Open Science" which is NOT BOAI compliant it is not the sign of a publisher trying as hard as possible to create OA.] And that’s typical of the industry.

“we’re publishing more each year so we’re putting our charges up”. This argument may work in some industries where there is an innate limitation on the supply of goods. But in digital industries we see costs plummeting every year. We expect disks, bandwidth, cpu, to get massively cheaper each year. And the software that creates digital objects improves. So any INNOVATIVE industry would be reducing its costs.

So back to my question: “What have the publishers ever done for us?” Here’s my list – and they are all negative.

  • Double-column PDF. About the most senseless way of providing information in the current age. [Oh, they'll tell us that they are creating stuff for new formats. But it "takes time"].
  • Restrictive and impenetrable licences. The industry has been excellent at this. It’s almost impossible to find out what you are forbidden to do – the easy answer is “everything except read the PDF”.
  • Branding. Readers do not want a different interface for each journal. It’s usually impossible to find the current issue – hidden among the glossy Flash adverts for how wonderful the publisher is
  • The rent-for-one-day-for-40-dollar article.
  • DRM

I can’t think of any positive innovation in the industry. I mean innovation. Any 10 billion industry will slowly track what everyone else did years ago. Wow! We have hyperlinks!!!! Crossref? DOI? These weren’t developed by the industry. There is NO industry research and innovation. [I'll note the efforts of Nature to develop new ideas – Connotea, etc. – but these were often shortlived because they were experiments, not commitments]. And what have they stubbornly missed and even fought against?

  • Taking authors seriously. The industry sees authors as cattle. The interfaces used for submitting papers are AWFUL.
  • Taking readers seriously. Readers don’t exist. The industry’s end-users are purchasing officers
  • Semantics.
  • Interactive publication.
  • The social revolution

So the industry can be seen to be stagnant, self-serving, introverted, arrogant and either relying on its lawyers or branding.

And that’s a VERY dangerous place to be. “Be afraid, be very afraid”.

Because the publishing industry relies on a dam built on sand. Reed Elsevier used to be active in the arms trade: http://www.idiolect.org.uk/elsevier/

Reed Elsevier have been forced to drop their links with the arms trade – and the reasons are clear: individual and collective action by members of the academic and medical community, combined with disquiet from the public, investors and employees of Reed Elsevier. Thanks to everyone who signed the petition and who lent support in every way.

Yes, petitions. Petitions can grow very quickly in the Internet age. And that’s what Tim Gowers and Tyler Neylon have started http://thecostofknowledge.com/. “If you would like to declare publicly that you will not support any Elsevier journal unless they radically change how they operate, then you can do so by filling in your details in the box below.”

I’ve signed it, and I’m proud to have done so. So have Mike Taylor, Mike Nielsen. Perhaps no surprises there.

BUT:

We all have blogs and they reach different communities.

And those communities will reach others. Out beyond the rotten walls of academia. To the scholarly poor, whose tax dollars go to prop up the industry. An industry dedicated in practice to denying them the results of research.

Yes. Because any innovative industry would have picked up the discontent beyond academia and thought:

Wake up – we’re in the 21st C – the cost of distribution is zero. Academia is 0.1% of the world’s population [a guess, but it's less than 1%]. We have a potential market 100 times bigger than our current market. WOW! People like Tim O’Reilly (one of the most innovative publishers) think this way. He’s dismissed the puny protestations of the industry on SOPA and PIPA “In short, SOPA and PIPA not only harm the internet, they support existing content companies in their attempt to hold back innovative business models that will actually grow the market and deliver new value to consumers.” https://plus.google.com/107033731246200681024/posts/LZs8TekXK2T

So we need a revitalised scholarly publishing industry.

But it will not come from the current one. They have shown themselves incapable of change, and arrogant towards their feeders – the academics. We have it in our power – to kill any or all of them and start again. It is a question of getting our act together.

Because almost all monopolist empires have the seeds of their destruction.

 

 

 

 

 

Panton Fellowships: What they are about and how to apply

#pantonfellowships #pantonprinciples

In 2010 we launched the Principles of Open Scientific Data and, because we met more than once in the Panton Arms we called them the “Panton Principles”. Since then “Panton” has started to become a brand for Openness. We’ve now had 6 Panton discussions with people who champion Open Data, and we are creating Panton Papers to support the discussion and formulation of ideas in Open Data.

Now we take this a major step forward. The idea to create Fellowships came from Jonathan Gray and he and I worked up an application to the Open Society Foundations (previously called OSI http://en.wikipedia.org/wiki/Open_Society_Institute ) . We were delighted that in November the OSF told us that the grant had been successful, and we thank them.

The idea of the fellowships is to support scientists to develop activities in Open Data. The remit is deliberately broad as we want to encourage novel approaches. The details are given here. http://blog.okfn.org/2012/01/25/panton-fellowships-apply-now/ with an excerpt:

We firmly believe that “open data means better science“. Panton Fellowships have been created in order to support scientists – particularly graduate students and early-stage career scientists – to explore this idea, and to tackle those barriers which currently prevent science data from being made open.

Dr Cameron Neylon, of the Panton Fellowships Advisory Board, commented on the ‘real potential’ of the Fellowships to influence practice surrounding open data in the scientific community. ‘Panton Fellowships will allow those who are still deeply involved in research to think closely about the policy and technical issues surrounding open data’, observed Dr Neylon. By allowing scientists the scope both to explore the ‘big picture’ – gathering evidence to promote discussion throughout the community – and also to work on specific technical solutions to individual problems, the Panton Fellowship scheme has the potential to make a real impact upon the practice of open data in science.

Panton Fellows will have the freedom to undertake a range of activities, and prospective applicants are encouraged to formulate their own work plan. As Fellows will continue to be employed and/or study at their current institution, activities undertaken for the Panton Fellowship should ideally complement and enhance their existing work.

Fellowships will be held for one year, and will have a value of £8k p.a. For more details and information on how to apply, please visit http://pantonprinciples.org/panton-fellowships/.

The Panton philosophy is increasingly adopted by promoters of Openness and as an example BiomedCentral have donated us banner advertising on their pages.

I am really looking forward to reading the applications!

Marcus Hanwell: The way ahead for CML and the community

#semphyssci

Marcus Hanwell (Kitware) reported from the #semphyssci working group which was looking at how to grow the development and use of CML in the community. One of the great excitements of the Workshop is the agreement among participants that CML is valuable, worth developing and that they will put the effort in to make it happen.

There are several ways to develop a successful information infrastructure:

  • Commercial entrepreneurship, typified by Gates and Jobs – and also Google and Facebook. Build the products that people want and sell them. Failure is common, but the market doesn’t care WHO succeeds.
  • Design by committee. In many cases this works. CIF, W3C are committee-based, managed, successful. Failure is often common, normally because committees are very slow. I’ve sat on ISO committees – nuff said. It’s very difficult to get innovation.
  • Benevolent Dictator For Life (BDFL). Linux, Python are examples of this. It depends crucially on the energy, vision, political skills, etc. of the BDFL.
  • Leaderless meritocracy. Almost always requires an oligarchy of small number of hardworking, disciplined, drivers. Overlaps with (3).

     

A common model is for a BDFL to create an initial prototype and then for (1) or (2) to take some role – either complete or partial.

Henry and I have been BDFL for CML for nearly 20 years. We’ve never wanted to be self-important and so much of the work has been low-key. Unlike ICT where new developments are welcomed both in terms of new ideas and new markets, chemistry regards new ideas with suspicion. There’s a large chemical information industry (content and software) which is almost completely out of touch with the 21st Century. There’s a few sparks, but most are based on the concepts of possessing content and building walled gardens, and of developing monolithic applications. Both are failing. But so far almost no interest from the chemical information market in semantics. The result is that we have had to build the ecosystem ourselves.

Now it’s changing. This meeting #semphyssci has shown that there is not only a desire for the CML ecosystem, but also a willingness to develop it. That’s why I invited the National Laboratories to this meeting and I’d like to congratulate them on their commitment. The mechanisms are yet to be worked out but I have no doubt it will happen.

An aside: Four years ago I went to CERN (http://blogs.ch.cam.ac.uk/pmr/2008/01/29/big-science-and-long-tail-science/) to talk to Salvatore Mele about open access publishing. SCOAP3 (http://scoap3.org) would cost some tens of millions of dollars but change the model of publishing to a scientist-centered one, rather than a publisher-centered one. Salvatore took me to the Large Hadron Collider, pointed to a hole 100 metres deep and said “last week we lowered 500 million Euros worth of instrument down the hole. We know how to make large projects work” (with the implication that open access was only a middle-sized project).

I have already welcomed Marcus and Kitware’s commitment to Open Source, Open Access and collaborative models of information infrastructures. These will inevitably triumph and the current ecosystem will adapt or die. Even their lawyers will not be able to enforce sustainability – the market will simply move on. And CML will be part of the new market. Because that’s a central part of the new market – free flow or raw data and ideas, managed by semantics.

And CML is currently the only system which supports chemical semantics across the major subdisciplines – especially in this context spectra, computation and solid-state.

Marcus coordinated the working group on the vision and ecosystem of CML and here’s his presentation http://vimeo.com/35400550. I have snipped from this – I think it’s rather fun compared with just the flat presentation of slides!

0:00 overview and value of CML

1:15 creating CML ecosystem

4:45 Resources for developers using CML

6:50 End-user applications (maybe whitepaper)

8:20 wish list for new tools

9:20 Features of community

10:20 future meetings

11:00 exercise in validation

Andrew Walker: Fantastic Mr FoX II

#semphyssci

Andrew Walker (video http://vimeo.com/35562270 ) presented “FoX, CML, and semantic tools for atomistic simulation” at the Semantic Physical Science symposium. Andrew has taken on from Toby White as the “Doctor Who” of FoX, the FORTRAN library for managing XML input and output. FoX allows for domain-specific XML conventions and supports a subset of CML (scalar/array/matrix, module/list, and molecule). There are now about 12 codes which have substantial conversion to allow CML output.

In this presentation Andrew describes the philosophy and current status of FoX.

Here I’ll take time to thank Andrew for continuing the work on FoX. Without that I don’t think we could expect groups such as PNNL and Daresbury to commit to supporting CML. There is now a critical mass of users and new features are clearly worth the investment.

The Time-annotated talk (on VIMEO you can click times)

0:00 Intro and data management

1:00 traditional example, glue code

2:00 Code in FORTRAN, require XML. Only tool C compiler

3:00 options

3:30 FORTRAN-XML is most attractive choice

3:50 Homage to Toby White and Alberto Garcia

 

5:30 Benefit of XML

6:00 example of XML code

6:40 benefits of XML

7:00 CML convention for atomistic simulation

8:00 overview of FoX

8:50 Codes using FoX – about 12

10:00 SIESTA tests new version against CML

11:00 example of output transformations (uses XSLT)

11:50 SIESTA output

12:30 Jmol output

12:50 AMBER example

13:50 Where to get FoX and completely reusable (BSD licence) with mailing list.

14:36 end (and some questions)

Martin Dove: The value of CML in managing simulations and data; “the best kept secret” is out

#semphyssci

Martin Dove and I have collaborated for malmost all the time that I have been at Cambridge. It’s fair to say that we wouldn’t have had a lot the progress in CML without Martin’s encouragement, collaboration, getting funding, and publications. Martin was an essential choice for our symposium on Semantic Physical Science.

We were colleagues in the Cambridge eScience Centre and Martin picked up the value of CML immediately. He invited me to be part of eMinerals and later MaterialsGrid – two large collaborative projects which addressed high-throughput simulation of (mainly regular) crystalline and similar materials. One subproject which really impressed me was simulating the damage done by alpha-particles in solidified nuclear waste. The particles zinged off at high speed into the surrounding crystal (NaCl, TiO2, etc.). http://rsta.royalsocietypublishing.org/content/367/1890/967.full Figure 3. IIRC materials such as NaCl recovered well from the damage (but are soluble) while TiO2 did not recover well.

One of the key features of this work was the “parameter sweep”. There are many variable parameters in studies like this – the material, the energy of the particle, the model used (e.g. the force field), etc. It easily leads to large numbers of calculations since we have to take steps along each parameter axis and multiply all possibilities.

Martin had the foresight to develop an impressive local Grid of departmental computers which lay idle for much of the time (e.g. when students were asleep). “CamGrid” (http://www.ucs.cam.ac.uk/scientific/camgrid ) has been a great success with over 1000 machines. It’s simple to use (CONDOR) and popular. The difficulty, as Martin shows below, is how to manage all the output.

That is where his support for CML has been so valuable. In MaterialsGrid the data representation was predicated as CML and the members of the group build CML-aware components. Toby White developed FoX to support CML in FORTRAN programs. It’s not fun using a non-object-oriented language to support XML but Toby’s FoX can do this in a way that the community has found straightforward and useful. FoX had generic XML support and also managed a useful subset of CML. Moreover Toby, Martin and colleagues built visualisations (such as ccVIZ) and you will see these below.

Martin makes it clear that he uses CML because it is useful and saves him time. We have responded by trying to build in the elements and attributes needed to support solid state and computation and these have stood up well over the last 5-7 years.

But CML is not yet universal. IN the talk martin describes it as “best kept secret”. I’d agree, and offer some reasons why.

  • The chemistry/materials community is intrinsically conservative (compared to bioscience)
  • The codebase is “mature” and quite a lot is commercial. It’s not easy to convince developers to add in yet-another-feature. In fact we have made progress with codes such as GULP, DL_POLY, CASTEP (through MaterialsGrid).
  • Many scientists need a working prototype before they believe. The prototype has taken a great deal of work – years
  • The infrastructure needs to be stable. That was almost impossible in the first 10 years of CML with W3C bringing out new specs, changing toolkits etc. However the spec has been essentially stable for about 5 years. It continues to be challenged and to be able to deal with those. So Martin was a very early adopter and has been through a good deal of pain.

There’s more, but it’s only in the last 2-3 years that there are signals that CML might be widely deployable. It still needs a lot of work but the way is clear. We have proved dictionaries, conventions, data validation, etc. And #semphyssci has shown that there is a desire for the value that a rigorous approach brings.

So many thanks Martin, and here is the annotation of your talk.

1:10 Scientific Example materials that shrink when heated
1:40 plotted simulated volume against temperature
2:30 no reason not to plot lots of points
2:50 workflow
3:20 can launch hundreds of jobs
3:40 hard bit is extracting results
4:00 traditional output file
5:00 traditionally might have to read code to understand
5:50 traditionally bad practice is tolerated
5:45 CML makes extracting data easy
6:00 Example of CML
6:30 input parameters and metadata
7:00 typical CML scalar property
8:00 Codes which produce CML
8:15 introduces FoX; writing CML is simple
9:45 otherwise have to write parsers
10:10 Toby White’s ccViz (DEMOs from here on)
11:45 Final quantities (e.g. Diffusion Coefficient)
12:00 Graphs
12:45 can send marked up file for collaboration
13:00 very good to avoid training students on legacy
13:20 extraction data (xtract will create a CSV file)
15:00 Martin was able to present this to high-school students – they could run hundreds of jobs
15:50 Martin cares because it makes lives easier
16:15 CML is “Best kept secret”
16:40 end

Brian McMahon: Publishing Semantic Crystallography; EVERY science (data) publisher should watch this ALL THE WAY through

#semphyssci

I owe a huge debt to the International Union of Crystallography (IUCr) and Brian McMahon. Quite simply they are the best semantic scientific publishers of the current century. They also have the best community-base for scientific publishing that I know. The Union exists for its members and not for itself; its processes are as democratic as a scholarly body allows, and it is passionate about doing science properly.

The IUCr has always had a major emphasis on data and terminology. It has run experiments on how reproducible crystallographic experiments can be. It spends much time on the basis of the science and how to describe it. For over three decades it has had initiatives in defining data representation. It’s blessed with the fact that modern instruments are highly reproducible and that crystallization is a classic method of purification. Because of that a crystal structure done in labs A and B are likely to be in very close agreement. There are exceptions – biological macromolecules are more heterogeneous – but generally it’s a highly reproducible science.

 

This tradition is now central to its publication ethos. Essentially every published result must be replicable (potentially falsifiable) from the information in the publication. Even 45 years ago (when I started) we were expected to type our raw data (thousands of observations) into the pages of the journals. Now it’s electronic but the bar has risen – we now have to publish the X-ray images. There is no room for subjectivity – and if the methodology is flawed the community WILL find it out.

The Union is committed to making crystallography accessible to everyone. For this reason it has advocated for 30 years that ALL publications (not just Acta Cryst) should publish their crystallographic data. It’s moving towards OpenAccess and has a completely Open Access journal, Acta Cryst. E. In this journal the complete crystallographic experiment is checked, and if it’s apparently flawed it’s returned to the authors for comment. Every atom, every bond is checked.

Surely that’s enormously expensive? How much does it cost?

ONE HUNDRED AND FIFTY (150) DOLLARS. That’s all. For a paper where every inch is peer reviewed. Where the contribution by the publisher is enormous.

Contrast that with a publisher which charges TWENTY times that and adds NO value.

The reason that IUCr can do this, and why it is so highly regarded in all disciplines is that over the years they have steadily invested in the information infrastructure (ontology) of their discipline. And it’s been a community effort. Many people (Sid Hall, Howard Flack, Herb Bernstein, John Westbrook, and 30 others http://www.iucr.org/resources/cif/comcifs/members including me ) have contributed in mails, meetings, software, specifications and lots more). Progress has been steady.

And all of this has been designed, guided, glued together by Brian. And he’s done more – in the small Chester office of IUCr he and a few others have built a remarkable suite of publishing software. Fit for purpose, respected by the community of authors and readers/users alike. What other science can say that? (A very few, and I hope they’ll identify themselves here).

And for me, IUCr/CIF/Brian have been a guiding light in the development of CML.

Here he is at our Semantic Physical Science symposium, 2012-01-12 http://vimeo.com/35397924

START AT THE BEGINNING AND WATCH IT ALL THE WAY THROUGH. Then watch it again. Then point your friends at it and take a copy.

0:00 Title

0:40 International Union of Crystallography

1:08 CIF

3:36 CIF Syntax and dataTypes

4:30 Publishing with CIF

6:41 Demonstration: CheckCIF

12:02 Interactive Chemical validation

14:42 Linking data to journal article and search for novelty of data

15:08 Jmol display applet

21:03 Supplementary data

21:47 PublCIF a tool to merge data and text and annotate them

27:08 end

BibSoup: It’s here! How to create and populate your own Bibserver

I am really excited about BibSoup. It didn’t really click till last night how easy and powerful it is. I think it will be very important. Here’s a summary and then we show you how YOU can use it. It literally takes 3 minutes to learn and probably 10 to create your own bibliography (if you already have the data).

  • It’s trivial to load and use. Yes, really trivial.
  • It’s free-as-in-beer. You pay nothing. Mark and colleagues have done all the work.
  • It’s Open as free-as-in-speech. Everyone can see my bibliography. And vice versa. (Of course I can download and run my private instance if I wish). Anyone can download the software, or any of the content. There are no walled gardens. The entries are CC0 and the collections are CC0.
  • There’s a vibrant community. If you have an idea you are an equal. If it’s a good idea you can convince the community it’s worth high priority (we are a meritocracy). You can work contribute in a communal system.
  • Everything is re-usable. If I want to teach about polymers I can search Karol Langner’s bibliography and copy references.
  • It’s extensible. BibJSON makes it trivial to add new fields – e.g. pointing to datasets, images, commentaries, etc. As a community we shall certainly be building new tools – there are so many ideas that are now possible.
  • And it’s powerful. Bibliography is now an instrument of 21st C web democracy. It’s the roadmap of scholarship.

Now I show how anyone can create their own Bibserver and populate it.

I’m going to upload my publication list from the Symplectic system at Cambridge. It will probably have errors and infelicities. ALL bibliographic collections have errors and infelicities. It starts:

@article{murray-rust2011semanticview.,

author = “Murray-Rust, P”,

journal = “J Cheminform”,

month = “Oct”,

pages = “48″,

title = “Semantic science and its communication – a personal view.”,

volume = “3″,

year = “2011″,

abstract = “ABSTRACT: The articles in this special issue represent the culmination of about 15 years working with the potential of the web to support chemical and related subjects. The selection of papers arises from a symposium held in January 2011 (‘Visions of a Semantic Molecular Future’) which gave me an opportunity to invite many people who shared the same vision. I have asked them to contribute their papers and most have been able to do so. They cover a wide range of content, approaches and styles and apart from the selection of the speakers (and hence the authors) I have not exercised any control over the content.”,

doi = “10.1186/1758-2946-3-48″,

eissn = “1758-2946″,

issue = “1″,

language = “ENG”,

pii = “1758-2946-3-48″,

day = “14″,

}


This is in BibTeX and that’s all we need to know – Bibserver does the rest. So let’s go there (http://bibsoup.net/ ). We need the first box:

I have previously registered (it’s trivial) so no I have

It’s not all my publications, but the most recent ones.

So now we can start browsing.

Yes, WE. If you go to http://bibsoup.net/petermr/peter_murray_rust_publication_list/ you can do everything I can do! Bibliography is truly Open!

 

It’s very easy. Let’s see the years:

[I have more publications – I think this concentrates on the last 10 years].

And who are my most prolific co-authors? Use the author button:

Let’s see who has contributed most to Chemical Markup Language (CML). “CML” is a good search term for a small collection like this (I haven’t published on Chronic myelogenous leukemia). We get:

19 publications.

Let’s see the authors:

(and 50+ more). Display them as a bubble diagram:

Which gives:

Showing that (of course) my CML-symbiote is Henry Rzepa.

Let’s see what CML papers I have published with Joe Townsend. Just click on the bubble and get:

Now let’s look at my journals – see if you can work out for yourself how I get:

So my favoured journal is J. Cheminform. which is an Open Access journal published by BiomedCentral. But look at its neighbours:

  • J. Chem Inf Model
  • J CHEM INF COM SCI
  • J Chem Inf Comput Sci

They are actually all the same journal!! That’s because it changed its name from JCICS to JCIM some years back. And because there is no consistency in abbreviations. So the idea that we can have one true global platonic bibliography is a myth – we shall never achieve it. We may get close in some areas but there will always be differences of opinion about “sameAs” and different practices.

So try it out. At present it does BibTeX and hopefully very shortly it will do RIS/Endnote.

So what happens when we get thousands of bibliographies? Won’t we need a tool to search for bibliographies?

Yes, and I think we have some ideas J

 

 

 

 

 

Cameron Neylon at Semantic Physical Science; Software philosophy, why the RWA is wrong, and how we change the publishing market

#semphyssci

Cameron Neylon gave one of the first talks at Semantic Physical Science. No slides, just analysis and passion. Some of this has been published in his blog (http://cameronneylon.net/blog/ip-contributions-to-scientific-papers-by-publishers-an-open-letter-to-rep-maloney-and-issa/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ScienceInTheOpen+%28Science+in+the+open%29 )

Cameron, like me, knows that Semantic science depends on people as well as technology and so he dwelt a lot on culture and practices rather than details. The talk was split into two parts. I have timelined this (roughly, there are no subtitles/breaks).

Here is the link to the video: http://vimeo.com/35398123 (Thanks Adrian Pohl)

The first section makes the case that good modern science must adopt quality control of the sort regularly practised in good software groups (common in industry, but not academia). Cameron is launching a new journal (Open Research Computing) to allow this approach to be published and thereby to give its practitioners formal value. Too much science has poor controls, poor aims, is not reusable and Cameron argues that we can learn these values from software engineering.

0:30 unit testing, criteria for judging science

1:30 continuous integration and test-driven design

2:00 good software practice helps to think about managing scientific process; better architectures for science in a broader term.

2:35 Creation of trained workforce, balance between training and research especially for graduates

3:30 good software is good mentality for good infrastructure for research

4:15 mustn’t create resources which aren’t used by anyone

4:30 computational experiments often don’t work

5:20 a unit test is just a control

5:50 publication and continuous integration

7:00 software can push quality into science

7:30 must be serious about replicating experiments

7:50 impact factor correlates with retractions

8:20 incentive not properly structured

9:00 new journal Open Research Computing

10:00 software methods papers

10:50 if 5% of papers are software get high impact

The second half critiques the Research Works Act. Cameron has shown that publishers add NO VALUE to his last 10 papers. As his blog (above) says:

Results: The contribution of IP by publishers to the final submitted versions of these ten papers, after peer review had been completed, was zero. Zip. Nada. Zilch. Not one single word, line, or graphical element was contributed by the publisher or the editor acting as their agent. A small number of single words, or forms of expression, were found that were contributed by external peer reviewers. However as these peer reviewers do not sign over copyright to the publisher and are not paid this contribution cannot be considered work for hire and any copyright resides with the original reviewers.

 

11:25 Research Works Act would wipe out NIH mandate and set us back years

12:00 Agument is that publishers contribute to quality of science

12:30 what do publishers contribute to Cameron?

12:50 ZERO (Zip Nada Zilch)

13:30 Publishers do provide a service

13:50 publishing not zero-cost

14:10 Old paper model is obsolete; all costs in generating first copy

14:50 publishing model is really bad way to run business

15:10 must address service costs

15:40 Transitional period required; RWA turns clock back

16:15 how to build service market

16:45 Must turn publishing round and address service

17:10 Software is a service. How do we configure the market?

18:30 Software gives us experience and clues

BibSoup! A new OPEN approach to managing personal and group bibliographies

We have just finished 3 hectic days of “sprint” (design, coding, documentation, testing, deployment) on the JISC/OKF Openbiblio2 project in Cambridge. This is an Open international project, with major input from Jim Pitman in Berkeley, and offered to anyone interested in collaborating and benefitting. Before the word “bibliography” makes you switch off, don’t! EVERYONE needs bibliography. Here are some examples which show how universal bibliography is:

  • Your publication list
  • Your reading list
  • A list of your software and the software you use
  • A list of your datasets and the datasets you use
  • A catalogue of the books you possess

The overall concept is “BibSoup” – a novel approach (some would call it Web 3.0) based on complete Openness of code, content and most importantly attitude. It’s based on meritocracy rather than central control. YOU control your own bibliography – a pot of BibSoup. It can be as perfect or imperfect as you like – BibSoup doesn’t mind. You don’t have to have all the information for a book (other people do that). You don’t have to have the author’s full name. If you don’t understand the difference between works, manifestations and expressions don’t worry.

The basis of BibSoup is that you build your own bibliography for your own purposes using the BibSoup technology and software. You don’t have to understand it – it’s easy to use. It consists of a server (Bibserver) which is easy to clone and deploy to hold your data. Bibserver uses JSON (“Jason”) as a transfer format (BibJSON).

You could run Bibserver on your own laptop for your own purposes (e.g. browsing all those articles).

You could run Bibserver on your website to tell the world about your Open collection and to share it with others. This is the most novel feature of BibSoup. By sharing your collection you’ll find people who are also interested in the same things. Maybe you’ll find that your annotations are valuable to others and vice versa. Maybe you’ll want to set up a group where you pool your references. But none of this is mandatory.


 

As Mark MacGillivray puts it (http://openbiblio.net/2012/01/17/tuesday-17th-january-open-biblio-sprint-day-1/ )

we are not trying to re-do what is already available online, we are not getting into the detail of normalisation or disambiguation within a centralised database, and we are not intending to alter the academic culture overnight; however, we are going to improve the BibJSON facility for wider use, we are trying to determine how we can get more small groups and individuals involved, and we are identifying compelling, essential and simple reasons for people to support the project at this early stage before the ultimate global benefits can be realised.

To reiterate. We are NOT compiling the one true bibliographic collection and competing with Open Library, Mendeley, Microsoft Academic Search, Google Scholar, Symplectic and other semi-open/Free collections of bibliography. We are NOT competing with Zotero as a reference manager. We, and our adopters, will these as valuable sources of bibliographic input. We ARE praising the virtues of completely open bibliographic collections such as the British National Bibliography. Our adopters may wish to use BibSoup as a way of cleaning up some collections of bibliography (references).


We believe that a completely Open ecology of bibliography will lead to communal contributions which rapidly enhance BibSoup (because Open projects belong to YOU). We want to encourage creations of bibliography (e.g. publication and reading lists) in a rapid and Open manner.

We’ve set up a series of resources:

Some bibliographies (just open them and browse – the “visualise” is fun if there are some high frequency components (author, journal) – facets in top-left corner).

And some videos (apologies for some truncation and quality – we are getting a better site soon)

Bibserver is EASY to use. Just login (http://bibsoup.net/account/login ) and upload your collection (Don’t try millions of records – suggest you contact us if you have more than 10,000). No software to install (although it’s all open and you can run it privately). Browse your collection anywhere on the web with any browser. And we are committed to making all these collections Easily Openly downloadable – there are no walled gardens (http://vimeo.com/34323486 ) in BibSoup.

We can ingest BibTeX, RIS and some other common formats at present. Parsers are easy to write, so join the project if yours isn’t there. BibJSON is as easy or complex as you want to make it. You can use the Open Openbiblio Bibserver or you can clone the code and run your own privately.

Open Update

I am snowed under with things that I have to do and want to do. I therefore cannot give as much attention to others, so I will comment briefly on them below

My main focus now is on these areas (all of which require several blog posts)

So some others which I would normally devote whole posts to:

  • Congratulations to Alma Swan on her appointment of Director of Open Advocacy (https://groups.google.com/a/arl.org/group/sparc-oaforum/browse_thread/thread/9d1c030b3239bb08 ). I have had the privilege to work with Alma for several years and can testify to her single-minded commitment to Open Access. She has made a major contribution in adding unchallengeable metrics that show that Open Access increases the value of scholarships through, for example, increased citations. I quote Alma: “I’m delighted to be taking on this new role,” said Swan, “Policymakers are increasingly interested in hearing the arguments. Presenting the evidence-based case to them will help to bring about the policy developments we all want to see.”

    Alma also created a very beautiful calligraphic calendar of Open Access – something that still brings inspiration when I look at it.

  • Springer announces change for its author-paid hybrid Open Access from CC-NC to CC-BY. https://groups.google.com/a/arl.org/group/sparc-oaforum/browse_thread/thread/24ef282b6ec3b963# I am VERY pleased by this and congratulate Springer. (I knew about this earlier when I wrote to Springer and agreed to abide by the embargo). This brings Springer’s hybrid model (Open Choice) into line with its Open Access offerings (all of which use CC-BY). Very few publishers use CC-BY, but Springer is showing the lead from the major publishers and there is every reason why the others should follow.

     

    CC-BY allows text-and-data mining and represents the full value that one can get from OA.

     

  • Richard Poynder for his frequent and very objective blogging of the current state of Open Access – see http://poynder.blogspot.com/ with a very comprensive daily list of those who have distanced themselves from the Research Works Act, H.R.3699

Expect a daily post on Semantic Physical Science (may depend on my weekly Vimeo quota) although Charlotte is also mounting them at http://www.sms.cam.ac.uk/ , the excellent Cambridge streaming video site. And expect slightly lower frequency for Open Biblio, and Panton discussions.