Reclaim our Scientific Scholarship (Beyond the PDF)

I’m going to an important and exciting meeting in January which looks at new ways of scholarly communication (https://sites.google.com/site/beyondthepdf/ ). Unlike some of the communities I interact with this one has already had huge amounts of discussion (160 messages). Many of these are concerned with the technicalities of formats. They are completely missing the point. (I am going to contribute very shortly and may well do it through this blog so as to gather more opinion).

Here’s the REAL problem.

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of James Stroud [xtald00d@GMAIL.COM]
Sent: 17 November 2010 06:39
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] [RANT] Publication Data Formats

I was reading the PNAS author guidelines and I came across this gem:

Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be published in raw format and will not be edited or composed.

Did I read those last two file formats correctly? I have actually came across a dataset in supplementary information that was several dozen pages of PDF. It was effectively impossible to extract the data from this document. (I can dig it up if pressed, probably.) I had no idea that the authors may have been encouraged to submit their data like that.

Does a premiere scientific journal actually request data to be in PDF format?

I can think of dozens of other formats that would be more fitting. They are summarized here:

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

What is the scholarly equivalent to a torch and pitchfork march and how can we organize such a march to encourage journals to require proper serialization formats for datasets in supplementary info? [PMR’s emphasis]

James

The last sentence has it absolutely right. It’s not about formats. It’s about control of the scientific process by organizations outside our control. The very fabric of this mail shows our serfdom.

We do not own our scholarship. The Antaran Stellar Society runs the communication of scholarship for the personal gain of it and its officers. The Sirius Cybernetics Library Corporation has copyrighted the Library of the Galaxy cataloguing system. It also runs it for itself and officers. The motto of these organizations is:

  • Embrace
  • Control
  • Exterminate

The only way forward for scientific publishing is to reclaim it. That’s not easy when scientific societies have sold their journals to Whitehole publishing. Major societies have abandoned their role as stewards of scholarship and turned it to maximising income.

What can we do? Currently we use publishers :

  • To legitimise our reported work. But do we really any more?
  • To establish priority. But the web (public or private carries date stamps).
  • To moderate and correct our work. Given the appalling state of journal copy-editing and the complete disinterest in data this is one the way out
  • To announce our work. But do we need publishers to do it? This blog reaches more people than the huge amount of effort I put into an invited paper for Serials Research (or something similar) that is behind a paywall and no-one reads. BTW it’s on Nature Precedings.
  • To get career, grant, institution brownie points. This seems to be a fact of modern life. But publishing PDFs is a stupid way to run it. My software is evaluated by how many people compile and run it, not by who reads the source code.
  • To preserve our work. Journals are not good at this. They destroy data-rich science because it increases their costs and anyway they haven’t a clue what data is. Librarians do understand this.

I except a few publishers from this and there are probably many more. The ones I am familiar with are society and community based such as Int. Union of Cryst., Eur. Geoscience U, Am. Soc. Biol. Mol. Biol. But most publishers could care neither about the author, reader or the scientific community.

So where are the pitchforks? Yes – we should and must revolt.

We can now run scientific publishing ourselves. That’s not such a difficult concept – after all it’s what we did when I started science and before the scholarly digital gold rush. And we can do it again.

In the Quixote project (http://quixote.wikispot.org/Front_Page) we are managing all our computational chemistry ourselves. Creation of the experiment, calculations, archival, analysis, and bundling for publication. We can write a paper ourselves without the barrier of a plutocratic publisher. We have enough e-presence that the world knows what we are doing and we can spread it. Google will index our work. It will index it BETTER than publishers because we know how to prepare it for machines. Why do we need Table of Contents of Galactic Science? We don’t – we can build crawlers and feeds ourselves. BETTER than anything out there.

So it comes down to a single requirement:

  • An independent body awarding merit for pieces of science. That’s hard. . It seems to be necessary. The current commercial and pseudo-commercial publishers do it very badly because their main motivation is not to evaluate work but to brand it to sell journals.

     

Everything else we can do ourselves.

And we should.

 

UPDATE: Lots of discussion of all flavours on : https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1011&L=CCP4BB#88

Posted in Uncategorized | 3 Comments

What do these columns mean? (chemical help required)

 

In Quixote we are writing a semantic infrastructure for computational chemistry. That means we have to create precise and consistent annotation for components. The community “knows the semantics” so often they aren’t explicit. But the problem is that if the semantics aren’t clear, or if they could be misinterpreted, then it’s impossible to write programs that behave consistently. Here we are interpreting a logfile [the voluminous output that is designed to be read by humans and not machines] and I want to know what it means. So I’m asking for your help. Now before you all tell me to RTFM it’s not common to describe every part of a logfile in the manual so I haven’t gone looking.

So here’s a chunk I was marking up using an element of knowledge and intuition (== guesswork). I had a sudden worry I’d guessed wrong. So I’m asking readers to help me annotate this chunk (the file is about 100 times larger). In particular do we all understand the same things?

Task: Please annotate the columns in the two tables as fully as necessary (i.e. Write a description of what each means in sufficient detail that it could be used by someone in – say – the Blue Obelisk community but who wasn’t familiar with the precise code). Where there are precise terms agreed in the community please use those. [BTW I have no particular comment on the level of semantics in this code compared to others.]

No. Tag Charge X Y Z

—- —————- ———- ————– ————– ————–

1 c 6.0000 1.00000000 3.00000000 5.00000000

2 f 9.0000 1.00000000 3.00000000 6.38300000

3 h 1.0000 2.02800000 3.00000000 4.65000000

4 h 1.0000 0.48600000 3.89000000 4.65000000

5 h 1.0000 0.48600000 2.11000000 4.65000000

 

 

      Atomic Mass

      -----------

 

      c                 12.000000

      f                 18.998400

      h                  1.007825

 

 

UPDATE: I’ll give my own answer later (this is NOT trivial) but I’d prefer others to contribute because then *I* learn something

Posted in Uncategorized | Leave a comment

Carolina Conversations on a Cambridge Train

I’ve been really busy putting in grants and helping to building the Quixote system (which will revolutionize the management of compchem data) that I haven’t had time to blog. So here’s something from about two weeks ago.

I’m sitting on the 0920 from Cambridge (the first that oldies can get cheap fares) to London en route to the Imperial War Museum. (All will be revealed later, dear reader). Because I get there early I’m in my favourite seat (no, I shall not tell you why as I don’t want others knowing, though it’s less critical than it used to be). I’ve sat down when the person in front sees why I’m sitting there and remarks on it. I’ve got permission to report the conversation though I shan’t give complete names unless my companions wish to identify themselves on this blog.

P: Hi, I’m Peter.

J [apparently US accent though I’m awful on accents]: I’m J.

P: Are you visiting Cambridge? [A safe gambit – they might be living there, anything…]

J: Yes, we’re over here from North Carolina – this is D [introduces D sitting across the aisle.]

P: Hi. How long are you in Cambridge?

J: We’re here for six months – I’m a visiting fellow in [College].

P: Ah! What’s your subject?

J: I’m studying New Orleans [IIRC] in the 1920’s. Cambridge Library is a great place to study…

[Discussion of subject… then PMR starts traditional rant on copyright, etc.]

J: Yes, it can be very difficult to publish images. For example the [Museum of Art] will sell photos of its pictures for 15 USD. But you aren’t allowed to reproduce them without lots of effort and money.

P: So what do you do?

J: Most of the time we simply give up and don’t publish the pictures.

[P comments that this destroys useful scholarship. But readers of this blog know that already and you are spared the details. Conversation turns to D. We chat in similar vein. P explains he is a scientist/informatician]

P. But today I’m off to the Imperial War Museum. We are putting in a grant to catalogue War memorials [more on this later].

D: I’m very interested in War memorials. Yesterday I visited Madingley Memorial.

[from Wikipedia]

D: there are over 30,000 US servicemen commemorated in East Anglia. Most of them flew out of the airfields. There are lots of memorials. Whenever I’m over I try to visit them….

… but there isn’t a comprehensive map of them. I’d like to know where they are so I can make sure of visiting them.

P. Well it’s the UKNational Inventory of War Memorials I’ve been talking with. I think they cover all War memorials on UK soil or in territorial waters. They’ve got over 60,000 in the catalogue but they reckon there are probably about 100,000 altogether. I’ll ask [UPDATE: Yes, UKNIWM does catalogue US memorials.]

P. This has been so exciting. It’s wonderful to discover new people through shared interests. I’ll copy you into the staff at UKNIWM. And I’ll let you know how our grant proposal gets on. Maybe there is scope for future collaboration.

And you’ll learn more about UKNIWM and our grant proposal in future posts.

 

Posted in Uncategorized | 2 Comments

Positive and negative at RLUK

#rluk10 #quixote

It’s been a great day at RLUK. Lots of progress towards Openness. We discovered millions of open bibliographic records. A great day for the Edinburgh node of the OKF. Lots of contact with the University, the National Library of Scotland. Deep nostalgia for my many years in this country. A better legal system. A more united view. The Enlightenment looks down on us and encourages us to greater endeavour. The colonies look to their Scottish past.

And then we got to talking about publishers. Yes, they’re in it for the money. Some of them care about the community. But increasingly the scholarly societies are defecting either to selling out their society journals to publishers who couldn’t care about the discipline or trying to emulate the commercial publishers. Result – a void in the support of scholarship in many disciplines. Where are the community standards for acceptable scientific practice? They cannot come from commercial publishers – they can only come from societies – national or international. The best of these are great. The International union of Crystallography is a paragon – creating ontologies, recording methodologies – caring about the discipline. The European Geoscience Union (EGU) is another – with its commitment to Open Access, its continual process of discussion. The AMBSM – which cares for standards in reporting bioscience.

But there are huge vacuums…

And that’s why a growing group of us are defining the practice and reporting of computational chemistry. It’s been abandoned by the scientists many of whom have sold their souls to the god of commerce. No-one at the top cares about interoperability, quality. It’s the easiest scientific discipline I know of to provide standards and ontologies. That’s because God created Quantum Mechanics (in a humorous off day) and Schroedinger showed us how to compute God’s creation.

It’s been waiting for 30 years for an ontology. And Quixote is creating it.

That’s my relaxation for the train journey back tomorrow.

Train journeys are fun.

Posted in Uncategorized | Leave a comment

Versita/Springer – please edit our commercial journals for free so we can sell them to you

#rluk10

One of our graduate students received a request from VERSITA – a Springer Journal – to become a “Language Editor”. S/he came to me and asked my opinion. I didn’t know what this involved so I went to the masthead and found:

 

http://versita.com/career/editorial_positions/mathematics/language_editors/

Central European Journal of Mathematics invites applications for position of the LANGUAGE EDITOR.

Central European Journal of Mathematics provides editorial assistance to authors from non-English speaking regions.

We are looking for volunteers to support us with language editing. Graduate students in mathematics are especially welcome.
 

LANGUAGE EDITOR PROFILE

The task of our language editors is to perform linguistical corrections on up to six manuscripts per year. Specifically, we would expect you to:

– correct inadvertent errors, mistakes in grammar, spelling, word order and punctuation
– improve the style of English, polish the fluency, provide internal language consistency throughout the entire paper
– clarify where there is room for misinterpretation

OUR IDEAL CANDIDATE

– is a mathematician
– knows TeX/LaTeX
– English is his/her mother tongue
– has constant/easy access to Internet

WHAT WE OFFER IN RETURN

– your name added to the list of the Journal’s Editors at our website and published in every journal issue – being attached as an editor to an international journal raises the attractiveness of your CV to potential employers
– references and recommendation letters 

 
 

CONTACT

Candidates interested in the position are requested to send their cover letter and CV to the Managing Editor, Tatiana Sworowska (tsworowska@versita.com), with the subject line “CEJM Language Editor”. 

My interpretation is this. “We are a commercial company who wishes to decrease our costs/increase our profits by getting rid of traditional copy editors (whose job is to improve the language and style of papers). We have outsourced almost all our production to companies who are not in a position to do copy-editing. We are therefore trying to get graduates to do this job for free so we can maximise our profits. By the way if you wish to publish in some Open Access Springer journals it can cost thousands of pounds.”

I would like to feel that publishers are part of the value added to scholarship. However it is becoming clear day by day that many (not all, but many) publishers are only in it for money and their primary effect is to restrict and cripple the publication process.

I am at a RLUK meeting where some of the keynotes have addressed the impossibility of continuing with the current publication process. I hope that people actually DO something instead of talking. “Reclaim our scholarship”.

 

UPDATE:

Useful clarification on figures (see comments) My use of “thousands of pounds” was correct:

John Mark Ockerbloom says:

November 11, 2010 at 1:03 am  (Edit)

I had a look at Springer’s online price list for the Central European Journal of Mathematics. This isn’t a cheap journal. The price of an institutional subscription is $1401 US for their basic annual access, or $1681 US for “enhanced” access. Plus $42 for “shipping and handling”.

Or you can buy one article at a time; this seems to cost $34 US for an article I checked.

PMR: Many journals do not allow purchase of the article, the RENT them for 2 days.

Mind you, authors can also choose to have their articles provided free by the journal, via Springer’s “Open Choice” program. The cost? $3000 US.

 

So, not only do they get their articles for free (or for $3000), and get their peer reviews for free, they also want to have the articles edited for free. Exactly what are we readers and libraries paying our $34/article or $1700/year for?

Posted in Uncategorized | 6 Comments

RLUK and the Democratization of Knowledge

#rluk10 @RL_UK #jiscopenbib

I’m talking later this week on the Democratization of knowledge to the RLUK conference. RLUK is the professional body for Research Libraries (i.e. mainly University Libraries) in the UK.

I don’t yet know what I shall talk about. I had hoped that we could generate a bottom-up activity in the domain of libraries which would excite people about the new possibilities and help to grow new activities. I had thought that the Open release of the BL’s catalogue data would excite librarians and give rise to community activities, but I can’t find interest by blogging and tweeting. I’d hoped we could arrange a mini-bookathon in 30 minutes using this as a focus. I wanted at least 15 minutes of the session as constructive but tough discussion.

I am told that I uspet people. Brian Kelly describes me as a “critical friend” – someone who stands outside and tries to help by making suggestions and analyses. I’ve been trying to do this for at least 5 years – the only positive sign that it’s useful is that I occasionally get invited back. But maybe it’s time to move on.

So you can relax. I am not going to say anything that can be seen as critical of research libraries. I am going to paint a picture of the present and future that is active and exciting in other fields. I will leave space at the end to see if the participants want to pick up on ideas and translate them to the library domain. But I shan’t attempt to do it myself other than in the area of bibliography and scientific research where I actually know a little.

My current theme is simple:

We are engaged in a struggle between the freedom of knowledge and its centralised control by commercial and political organizations/companies.

The tools available to the forces of undemocratization are many. Here are a few:

  • Paid Lobbying.
  • Acquisition or control of the means of electronic distribution
  • Control of the creation of electronic content
  • Changing society’s thinking through universal dissemination (media, books, etc.)
  • Digital Rights Management
  • Renting knowledge (e.g. journal articles, books, etc.)
  • Lawyers

Of course not all of these are universally bad – universal dissemination can be a major force for good or bad. But some have very little positive sides

The tools that are available to the democratization of knowledge are:

  • Constant re-affirmation of fundamental rights
  • Immediate and universal announcement and dissemination of unacceptable practices
  • Cross-fertilization between different groups leading to an n-squared enhancement of power
  • The ability to reach and hear from (almost) anyone on the planet
  • The ability to create bottom-up democratic content
  • Virtual communities

I don’t know which is going to triumph and whether it will be a jigsaw of good and bad. On even days of the week I think that web democracy is winning. On odd days that we are being swamped by forces of control.

What is clear is that if we stand and observe we shall not get a second chance.

Here are some examples of bottom up democracy. One area I have helped to spark off.

  • MySociety. A group which creates tools through which ordinary people have a democratic voice. I have used:
  • What Do They Know. To ask why and when the BL introduced DRM on ILLs.
  • OpenStreetMap. Where 250,000+ people worldwide have taken democratic control of the production of maps
  • Zooniverse. Where again hundreds of thousands of “ordinary people” are carrying out real, top-quality science, annotation, cataloguing, etc.
  • The BlueObelisk and the Quixote project on QC databases. Bottom-up, zero-cash (but not zero-cost) communities which are inexorably bringing openness to chemistry because they are Free gratis, Free Open and better than commercial offerings.

In Quixote we have the potential to create an infrastructure which provides enhanced (possibly even standalone) teaching and learning for computational chemistry. It’s less than 2 months old and already we are getting volunteers who want to use it as a new way of publishing scientific research in this area, and also a different approach to “teaching”. Possibly one where the students themselves run the course.

My question to RLUK before the meeting is:

“What aspects of the democratization of knowledge are you interested in? Which of these are you interested in putting effort into?”

According to the answers I get I’ll try to arrange the session so there is a real chance for bottom-up discussion at the end.

If I get no response I will leave the elephant unmentioned.

But it won’t have gone away.

Posted in Uncategorized | 4 Comments

Quixote as a publishing tool/process for compchem – homework

#jiscxyz #quixote

Every day brings more interest and excitement in Quixote. We are getting groups who are interested in using it (a) for education – managing student experiments and projects and creating Open educational resources and (b) publishing. Here I talk about (b).

Many disciplines require publication of supplemental data. Computational chemistry is somewhat half-hearted about this but there is a reasonable amount published in the Society journals. I’m talking with Quixote members about how the system can help the publication process. In principle Quixote can do much of the local management of the input and output files of compchem. That will be supplemented by the EmMa-Chem# (“chem-pound”) repository system that we (Sam) is developing in JISC-CLARION. So Quixote will be able to use this RSN. The challenge comes when the material is published.

Some publishers like repositories such as PDB, Genbank, Tranche, DRYAD, etc. But there is nothing in compchem (that’s the reason for starting Quixote, after all). The publishers require PDFs. It makes them feel happy. It has several drawbacks:

  • It takes quite a lot of work to create the PDFs. That has to be done by unpaid slaves (graduate students)
  • It introduces errors, which corrupt the data.
  • It makes the result unusable and therefore uncheckable.

So here is your homework. I asked one of the collaborators to send me 6 DOIs with supporting info to get some idea of what we would have to create. This is the first one – I expect the others to be of similar/worse quality. Use the URL http://pubs.acs.org/doi/suppl/10.1021/ol1002384/suppl_file/ol1002384_si_001.pdf (it’s freely visible and I claim it’s Openly reusable without permission).

  • What paper does it relate to? If you saved this file on your hard disk would you be able to answer the question in 6 months? How?
  • What are molecules 2a, 2b, 2c, 2d? How would you find out?
  • What compchem program was used to create the data? How did you find out?
  • Which table has corrupted numeric information during the cut and paste so seriously that it requires careful hand-editing to recreate the correct version?
  • Which scientific units have been corrupted by the cut-and-paste?
  • Which numeric/scientific quantities can only be extracted by retyping them?
  • How long do you think it took the authors to create this document? We will attempt to be considerably quicker in Quixote as well as more accurate
  • Would it be possible to re-run the calculation from the material present? If so how long would it take you to prepare the jobs?
Posted in Uncategorized | Leave a comment

Dudley Williams

Departmental announcement:

It is with great personal regret that [we] have to tell you that Professor Dudley Williams FRS died early today in Arthur Rank House after a short illness.

In the early 1960s, as a postdoc with Carl Djerassi at Stanford, Dudley was a pioneer in applying NMR and mass spectrometry in organic chemistry.  He joined the academic staff in Cambridge in 1964 and stayed here for the rest of his career, retiring in 2004.

During his 40 years here, Dudley developed techniques for understanding molecular structures and interactions in chemistry and biology; he elucidated the metabolism of Vitamin D, leading to life-saving therapies for individuals with kidney failure; he determined the structures and mode of action of Vancomycin and related antibiotics, making a huge contribution to fighting drug-resistant bacteria; and he contributed to our understanding of how molecules recognise each other.  His legacy lives on not only through this research but also through the students and postdocs who worked with him and through their students and postdocs.

Details of the funeral and of a memorial event celebrating Dudley’s love of people, science and music will be announced later.

Dudley was not only a colleague in the Department but a fellow of Churchill. He was a great person to have known.

Posted in Uncategorized | Leave a comment

BHT and TTT

#jiscopenbib #jiscxyz

I have a three dimensional-vector of my state – the axes are Busy/Happy/Tired. They are somewhat but not completely orthogonal. So today is 9/11/10 (I mark out of 10). If the B goes above 10 then we call the fire-brigade to stop the smoke. Blogging normal requires B<9. So a brief update.

The next few months will be momentous – we really are changing the world, but we are also seeing the forces that represent the digital gold-rush as strong as ever. That’s why the OKF is critical. The emergence of crowdsourcing/citizen activities are so heartening.

Quixote is zooming along. We are going to be getting students soon. Students are our hope for the future. Universities are too conservative and we need students to generate new ways of doing things. Why do we need lectures in QM calculations? It’s an outdated approach. Of course we need inspired interpreters by they don’t have to lecture.

Bibliography. Ben O’Steen met with Ben White and Neil Wilson of the BL. I also joined Neil. Neil can see the opportunities for the BL Open Bibliography and there are so many things we can do. We talked about involving citiziens – there’s so much talent outside the academic system.

Then off to the Imperial War Museum (yes, you heard that right). We’re putting together a proposal for creating an electronic volunteer community. Their volunteers who catalogue and map War memorials are based on paper and they really want to go electronic and semantic. And that’s exactly where the OKF can help (in case you don’t realise it much of the OKF strength is based on making things happen with software). I call it Liberation Software. So if anyone is interested in metadata for heritage sites, etc. get in touch.

CML. Quixote has given me so much energy that I’m making good progress with CML for compchem. We’ve been partway down that road already but there is still a lot of code to write and that’s what does for the B-vector. But the parser/semantic/dictionary framework is now becoming robust.

And we get encouragement from strange places. Henry Rzepa mailed today about SVG. In 1998 I thought SVG would sweep the world. 12 years later it’s in excellent undramatic health. http://www.codedread.com/svg-support.php. . So it’s a great pointer for CML . And who knows, even chemical/MIME might become honest

As Piet Hein said (Quoted with gratitude if not permission from Wikiquotes)

Put up in a place
where it’s easy to see
the cryptic admonishment

T.T.T.

When you feel how depressingly
slowly you climb,
it’s well to remember that
Things Take Time.

 

(l

Posted in Uncategorized | Leave a comment

Beyond the PDF. Can you read this? Can your computer?

#jiscxyz

I am delighted to have been invited to a very important meeting in January that Phil Bourne and Anita deWaard are organizing. https://sites.google.com/site/beyondthepdf/ . Why do we want to go Beyond the PDF. Isn’t the PDF the epitome of human crearivity and aesthetics. Doesn’t it produce the most elegant means of communication the world has ever seen?

Here’s some chemical information. Sorry, a Chemical Hamburger. No, it’s actually a chemical cow-pat. A machine wrote it and it was sparkling clear.

  • How did it get to this state?
  • Can you read it? Would you bet your life on your interpretation?
  • Can your machine read it? I’d be interested how well OCR does.

Posted in Uncategorized | Leave a comment