Presentation to Open Scholarship 2006

I am presenting this “talk” from the Web and including parts of my blog. This means I have to decide what I think I am going to say before I do or don’t say it. You know by now what I think of PDF and Powerpoint. This talk is in HTML and can be trivially XMLised robotically. It should be preservable indefinitely.
==== what I intend to cover ====
Data as well as text is now ESSENTIAL – we should stop using “full-text” as it is dangerously destructive in science. “PDF” is an extremely effective way of doing this. We need compound documents (Henry Rzepa and I have coined the term datument).
Need automated, instant, access to and re-use of millions of published digital objects. The Harnad model of self-archiving on individual web pages with copyright retained by publishers is useless for modern robotic science.
Much scientific progress is made from the experiments of others by making connections, simulations, re-interpretation. We need semantic authoring. Librarians must support the complete publication process.
Problems:

  • apathy and lack of vision – scientists (especially chemists) need demonstrators before people take us seriously
  • restrictive or FUDdy IPR. Enormously destructive of time and effort
  • emphasis on visual rendering rather than semantic content. Insidiously dangerous
  • broken economic model (anticommons)

Successes:

Other inititiatives:

  • SPARC – Open Data mailing list

What must be done

  1. DEVELOP TOOLS FOR AUTHORING, VERSIONING AND DISSEMINATING DATUMENTS. THESE MUST BE IN XML.
  2. INSIST THAT ALL AUTHORS’ WORKS ARE THEIR COPYRIGHT AND RE-USABLE UNDER COMMONS-LIKE LICENSE (from menu)
  3. INTRODUCE NEW APPROACHES TO PEER-REVIEW OF COMPLETE WORKS (WITH/WITHOUT “TEXT”). INCLUDE YOUNG PEOPLE AND SOCIAL COMPUTING
  4. DEVELOP AND USE LOOSELY-CONTROLLED DOMAIN-SPECIFIC VOCABULARIES (cf. microformats).
  5. PAY PUBLISHERS FOR WHAT ADDED VALUE THEY PROVIDE, NOT WHAT VALUE THEY CONTROL. CREATE A MARKET WHERE PUBLISHERS HAVE TO COMPETE WITH OTHER WAYS OF SOLVING THE PROBLEM (Google, folksonomies, etc.)

=======Previous posts and related blogs======
Open Data – the time has come
Open Source, Open Data and the Science Commons
Is “peer-review” holding back innovation?
Beginnings
Blogging and the chemical semantic web the blogs

My data or our data?
Science Commons
Science Anticommons
Hamburger House of Horrors Horrible GIFS Hamburgers and Cows – the cognitive style of PDF
Thanks (and XML value chain)
The cost of decaying Scientific data
OSCAR – the chemical data reviewer
Linus’ Law and community peer-review
============= Live demos =========
Taverna
OSCAR1 (applet version)
OSCAR3 (local demo)
Crystallography (not yet released)
MACiE
GoogleInChI
DSpace (individual molecule)
chemstyle (needs MSIE)

===== what I actually said ====
Many thanks to William for recording all the talks and I am delighted to have this record made available. (I have not yet discussed copyright but I hope it can go in our repository 🙂

Posted in open issues | 2 Comments

Is "peer-review" holding back innovation?

As part of my talk at Open Scholarship I’m going to show two pieces of scholarly work of which I am proud, which I believe fit all the criteria of publication and for which I get no formal credit. (I also regard this blog as a scholarly work, and also get no credit)…
The first is an invited talk at Google. (Yes, I can claim some minor formal credit for an invited talk, but probably not to a company!) This was videoed and has received 1727 downloads and 12 5-star ratings. (Of course some of this may be donw by robots or my friends, and probably some of them only watch the first few minutes, but there must be some serious viewers). It has everything a scientific publication requires:

  • accessible
  • peer-reviewable
  • formal record
  • re-usable
  • archivable (When I have time I’ll put it in our DSpace…)

The second is our WorldWideMolecularMatrix (WWMM). This is an evolving system for open access to the world’s molecules and properties and as part oif it we have put 175, 000 objects in the Cambridge DSpace. But it has never been formally published in a full paper. That’s partly because it’s not finsihed and partly beacuse everyone can see it. Why publish it?
But it has been peer-reviewed! Someone – I have no idea whom – started a Wikipedia entry. I’m naturally proud of this. The entry quotes extensively from the talk I gave at OAI4 in 2005 at CERN (“CERN Workshop on Innovations in Scholarly Communication (OAI4)” ) (Video). Joanne Yeomans recorded this talk as a video and this has – I gather – been regularly accessed. Again it has most of the features of a publication – but I can’t get any formal credit for it.
So to the current UK Research assessment exercise (RAE) – 4 citable papers in peer-reviewed journals does not allow for this type of innovation in scholarly publishing. Should I abandon the new approaches and concentrate on paper? It’s what the management would like…

Posted in open issues | 7 Comments

Open Scholarship 2006 – 2

My colleague and DSpace superguru Jim Downing has also blogged parts of the meeting:
These are some impressions of the Open Scholarship meeting so far… Some are notes, so it may be a bit jerky in places. I shan’t blog all talks.
IRs have made massive progress in last year. Hundreds even thousands of institutions now have them. There are commercial technology offerings and commercial hosting services.
Stephen Pinfield (Nottingham) reviewed progress – 250 repos (2004) 790+ (2006). 12 million records worldwide. Self-archiving has become common and recently – catalysed by Wellcome Trust – journals have moved towards hybrid publishing. He emphasised the bit-by-bit nature or progress “We overestimate the importance of short-term change, and underestimate the significance of long term change” (after John Kay). Even publishers are starting to take OA axioms on board. Challenges:
* Cultural change – the biggest problem. The “awareness” problem is being solved. But lack of incentives for *individuals” – they accept the idea intellectually, but…
* Practical support – still not easy enough. Must be drag and drop, self-archiving by proxy
* IR and institutional strategy – IR must be part of institutional policy – so IR managers must engage with *whole research process*, not just dissemination. Promote the institution, liaise with industry…
* discipline differences. Ginsparg believes all will converge on repository model, but others believe we have to have different models for different disciplines (I believe this – PeterMR). Early adoption happened in specific domains.
* Is self-archiving publication? Publication is now becoming a process, not an event (I shall show this in my presentation – PeterMR).
* versioning. “version of record”? “self-published”
* quality control. Current IRs are quality neutral – but quality flagging is essential. Not homogeneous within single IR.
* Metadata – cannot be worldwide agreement. but need standards and coordination
* standards – OA standards are community owned so still fluid
* digital preservation – which versions? Is institution responsible for preservation, or national agency
* IPR – who owns copyright is not clear. Institution? Author? we are still ducking the questions
* business models. costing and funding?
Don’t yet have enough examples of good service providers.
Open access in NOT just about access – it is about USE. (Dear to my heart – PeterMR)
Institutional vs Subject? Shouldn’t matter, but until we get better services it does. (Agreed – PeterMR. I need to know where to look for thousands of article in a subject)
Directions…
* OA but otherwise limited change (Harnad model). No reason for anything to change
* Hybrid business model – income from input (publication charge)… cf Wellcome
* Deconstruct the journal. Quality control does not have to be done by publisher
* Overlay – virtual journals draw from IR. Maybe quality at time of assembly
* multi-layered process – screen – IR – submit to peer-review – then mounted – dialogue etc. Citation could determines course of future research. Demise of journal article?
* fluid communication model (this is me – I shall show it in my talk – PeterMR)
Bill Hubbard
(Open DOAR – 797+ repositories).
Quality assessment of repositories – does it have data? is it OA? broken links? metadata-only sources?
2/3 have no metadata policy, harvesting policy, some forbid robot harvesting. most don’t allow commercial re-use of metadata. We need clear policies and DOAR hopes to have machine-readable policies in a few months.
Authors must find what they want in repositories.
A lot of repositiories are run on marginal costs – not easy to get startegic funding. Learned societies had the opportunity to creat subject repositories but have failed to respond.

Posted in open issues | 1 Comment

Open Scholarship 2006 – 1

I’m at the University of Glasgow – in the splendid castellated Hunter Halls – for the European meeting on Open Scholarship. There are over 200 delegates – a mixture of librarians, information technologists, research funders, etc. Hardly any publishers – Biomed Central (which also manages repositories) being an exception. The theme “New Challenges for Open Access Repositories. I’ll try to blog highlights.
Having worked for many years in a Scottish University (Stirling) I’m delighted to highlight the great progress and national coherence in Scottish Open Access. This was emphasised in the Opening Keynote by Derek Law from Strathclyde University. (Posts from this meeting may be a bit jerky as I am taking notes as we go)…

Scotland – “The best small country in the world”. Small countries can aspire to national solutions. Scotland has a history of declarations of freedom (Arbroath 1320) and is disproportionally strong in research (12.5% on UK metrics vs 8% of population).
Why is Scottish government interested in Open Access? Scottish education is venerated and OA is seen as providing: wider access, better value, quality measures. And there is no Dept. Trade and Industry in Scotland (which in England/UK is heavily lobbied by publishers and slows down OA). So, IRs with the right metadata will create a quality resource to market Scottish Resources. Even 2 cabinet members understand what “metadata” means. Sharing resonates with government.
Scottish Science Information Strategy – Open Access thread has flourished (SLIC – Scottish Library Inf. Council). 2004 declaration of Open Access stresses
…also exposing Scottish research to rest of world. “publicly funded work must be luckily accessible”.
Use the Research Assessment Exercise (RAE) as tools for mandating deposit. Glasgow has nearly 3000 entries in its IR. Scottish IRI-S has 3 out of 10 of top UK repositories. There will be a Cream Of Science project (cf. the Dutch one).
The publishers of the future will be a new generation and only the bravest of the current ones will survive.
and ended with a modified Declaration of Arbroath…
“for so long as 100 of us are left alive we will yield in no way to Elsevier domination”

However Stirling was where I made the biggest mistake of my scientific life – I first signed a form transferring the copyright of my work to a publisher (I think Acta Crystallographica). Why, in the early 1970’s did no-one in the academic sector foresee the problems. A simple refusal by universities not to hand over copyright would have forestalled the commercial publishinig industry with its ownership, and worse , its power to direct scholarship. Why were librarians, senior editors and principals silent? Can we be sure that our continued inability to control our own scholarship is not leading us into an even worse future?

Posted in open issues | 1 Comment

What are the advantages of XML and why should I care? (text)

This is an attempt to explain why XML is important in a scientific context. I shall try to assemble as many reasons as possible, but there are also many other tutorials and overviews.
I believe that XML is a fundamental advance in our representation of knowledge. It’s not the first time this has been attempted – for example you can do anything in LISP that you can do in XML and a good deal more. But XML has caught on and is now found in every modern machine on the planet.
Let’s start with a typical piece of information:

Pavel Fiedler, Stanislav Böhm, Jií Kulhánek and Otto Exner, Org. Biomol. Chem., 2006, 4, 2003

How do we interpret what this means? We guess that there are 4 authors (although it is not unknown for people to have “and” in their names), that the italic string is the abbreviation of a journal, that 4 is a journal number. But what are “2006” and “2003”? Unless you know that the first number is the year and the third the starting page (see RSC site) you have to guess. And many of you would guess wrong.
If, however, this is created as:

<author>Pavel Fiedler</author>
<author>Stanislav Böhm</author>
<author;>Jií Kulhánek</author>
<author;Otto Exner</author>
<journal>Org. Biomol. Chem.</journal>
<year>2006</year>
<journal>4</journal>
<page>2003</page>

you can see that each piece of information is clearly defined. There is no reliance on position, formatting or other elements of style to denote what something means.
But isn’t this harder to create and read? If everything is done by a human, perhaps. But almost all XML documents are authored by machines, either from editors or the result of a program. And the good news is that the style – the italics, etc. – can be automatically added. XSLT allows very flexible and precise addition of syyle information through stylesheets.
So it won’t surprsie yout that publishers actually create their content in XML. When you submit a Word or LaTeX document it gets converted into XML – either by software (which isn’t always perfect) or by retyping :-(. The final formatting – either as PDF or HTML can be done automatically by applying different stylesheets. So the document process is:
XML + PDFstylesheet -> PDF
XML + HTMLStylesheet -> HTML
The stylesheets don’t depend on the actual document being processed and work for any instance. Of course it takes some work and care to create them, but most of you don’t need to worry.
So for anyone working with documents, XML allows the content to be stored independently of the style. That’s a great advantage also when it comes to preservation and archival. Because XML is standard, Open, ASCII, etc. it doesn’t suffer from information loss when it is moved from one machine to another (how many of you have lost newline characters when going from Windows to Mac to Zip to Word, etc.?) It’s possible to calculate a digital checksum for a canonical XML document so any corruption can be immediately spotted.
There are a number of other aspects. Notice that the second and third authors have diacritic marks in their names. XML supports a very wide range of encodings and character sets so is an international specification.
In later posts I’ll show the power of XML for validation, how software can be applied and how data can be structured. Please feel free to add comments or questions.

Posted in XML | Leave a comment

What are the advantages of XML and why should I care? (0)

As I have blogged before we are looking at ways of improving the information infrastructure in our Centre. We’re all very consicous of how little we know – I know I know very little and I’m quite prepared to admit it in public. Ignorance per se is not a crime – only wilful ignorance. As part of the process we created some self-help groups and the first feedback is that they would like a set of FAQs for a wide variety of questions. Remembering that this is a group of 40+ molecular informatics scientists I’ll post some of the questions on an occasional basis. Because others can contribute to this blog maybe we’ll build some communal FAQs…
So I cannot resist “What are the advantages of XML and why should I care?”. I’ve invested several years of my life in developing XML, and layering Chemical Markup Language (CML) on to of it. So it’s very dear to my heart. This post won’t answer the general question directly so there will be more.
I got introduced to Markup Languages after WWW2. At WWW1 (1994) it was clear that HTML had succeeded very well with text and graphics but that more formality was required for other science disciplines. Recall that the early web was about science, not commerce and although TimBL saw the commercial potential, it was low key at that time. So Henry Rzepa went off to WWW2 and came back saying that people were talking about “something called SGML“. It was also clear that CERN (where TBL developed HTML) was strong on SGML and that it could support complex documents. I had been struggling for several years with the need formalize chemistry into a component-based system and with SGML this seemed possible. You could create your own tags for whatever you liked as long as you defined it formally (with a DTD).
So I created some sample CML documents with my own tags. That was the easy bit. The DTD (which defined the language) was harder but possible. The real difficulty was actually doing anything useful with SGML. You could read it and… agree it was correct… and send it to other people… but it didn’t do anything. Why would chemists use it? At that stage they wouldn’t….
The users of SGML were somewhat esoteric groups. Typical examples were the Text Encoding Initiative (a project to encode the world’s literature in SGML). At the other end were creators of aircraft maintenance manuals. (Although there were hints that SGML could be used for anything it was primarily used for text). The good news was that almost all major publishers of scientific articles used SGML in the production process.
I soon realised that to do anything useful – especially for chemistry – required procedural code. And there was very little. Some of it was extremely expensive – one company wanted $250K (sic) for a site license. The main clients were technical publishers – e.g. in aerospace. So I started to write my own system without any idea what I had got into. I found myself having to refer to “parents and children” of parts of documents – this seemed very strange to me at the time. I was extremely grateful to Joe English who developed a system called CoST and gave me huge amounts of virtual help. Joe, you were very patient – hope all is well! However there were a few pioneers of Open Source like Joe and IMO they saved the day for SGML and paved the way for XML. Top of the list is James Clark – whom I’ve never physically met – but has underpinned much of XML with his code and ideas. His nsgmls system was the only code that had the power I required and which could transform the (potentially incredibly complex) SGML documents into something tractable.
So by 1995 I had a system which could represent chemistry in SGML and process it with a mixture of tcl/CoST and nsgmls. It had fairly advanced graphics (in tk) and could even do document analysis of sorts. At that stage (another story) I was converted to Java and effectively wrote a complete system for CML/SGML in Java. This had a simple DOM, a menuing system and a tree widget (in AWT!) and could hold a complete chemical document.
Then, in 1996, Henry pointed me at a small activity on the W3C pages called XML. (Actually Henry and I had already used “XML” as part of CML, but we surrended the term). I got myself onto the working group and was therefore one of abou 100 people who contributed to the development of XML.
When XML was first started it was “SGML on the Web”. It wasn’t expected to be important and it wasn’t even on the front page of the W3C. As SGML was seen as complex and limited, XML wasn’t really expected to flourish.
XML’s success was due to the foresight and energy of a number of people, but especially Jon Bosak – the “father of XML”. Jon worked on technical document management in Sun (I hope that’s right) and saw very clearly that XML was part of the future of the Web. He coordinated the effort, got funding and political support, and I remember his pride in showing the back cover of the first draft of XML – ” sponsored by Sun and Microsoft”. This was a great technical and political achievement.
Tim Bray – another champion and parent of XML – writes:

“It is to Jon Bosak’s immense credit that he (like many of us) not only saw the need for simplification [of SGML] but (unlike anyone else) went and hounded the W3C until it became less trouble for them to give him his committee than to keep on saying SGML was irrelevant.”

It was supported by one of the largest and most active virtual communities. Henry and I offered to run the mailing list, XML-DEV, on which much of the planning and software development took place. By insisting on Open software as a primary means of verification the spec was kept to a manageable, implementable size. This meant that, unlike SGML, XML could be implemented by the DPH (“Desperate Perl Hacker”). And it was….
… the rest is history. XML has become universal. Jon (I think) described it as “the digital dial tone” – i.e. wherever information is being passed on the web it will increasingly be in XML.
So that explains why I care :-). Next post I’ll explain why you should also care.

Posted in "virtual communities", XML | Leave a comment

Blogging and the chemical semantic web

This post will explain how chemically-aware blogs can be indexed and searched. If you’re not a chemist, but still interested in the semantic web, this may be interesting.
I revealed in recent posts that molecules in blogs can be indexed on their chemical structure, thus making the web chemically semantic. (I use the lower-case version to show that we are not using the heavyweight Semantic Web (OWL, triples, etc.) but something much more akin to microformats. Anyway the idea is simple…
For any document containing chemistry, mark up the compounds with the InChI tag that can be guaranteed unique for each of these. I’m going to concentrate on blogs, but the idea extends to any web document. (I’ll exclude most chemical papers as they are generally closed and so we can only access them with subscriptions and often are prevented legally from the indexing below).
The main ways of adding InChI tags are:

  • persuade the author to do this when they create the post. Most of the current types of chemical software either create InChIs or create a file that can be converted into InChIs (e.g. with our WWMM services). With practice this would probably take 1-2 extra minutes per compound, especially if we can create a drag-and-drop InChIfication service at Cambridge or elsewhere. The InChI (which is simply a text string) can either be added to the blog or hidden in the alt tags of the imgs for the chemical structures. Again fairly straightfoward (though I have had to fight my editor). And I think we can expect blog tools to become semantic – at least for microformats – during the next months.
  • extract the structure from the blog and turn it into InChI. This is harder (unless the authors use a robust format such as CML or possibly SMILES). One way is to interpret chemical names as structures – we’ll explain our work on this later. But semantic authoring is far better.
  • extract a known Open chemical ID from the site. Pubchem is the only realistic approach (it has ca. 6 million compunds); CAS numbers are closed and copyright so cannot be used. If we do this, then I would suggest the Pubchem entry is indexed like this “CID: 2519″ . (This is very easily cut-n-pasted from the pubchem site). I am normally hesitant to use IDs but I think we can make an exception for Pubchem.

A good example of an InChIfied site is: the Carcinogenic Potency Database (CPDB) at Berkeley which contains a list of chemicals with a typical entry which shows the InChI (scroll to bottom part of page). This site consistently gets good hits on Google when searched by the InChI string (try it at our GoogleInchI server).
So, this post is to suggest to chemical bloggers that they add InChIs to their blogs. There are about 15 blogs that seem to have enough chemistry to make this worthwhile (I’ve taken these from Post Doc Ergo Propter Doc ) and I’d be grateful for comments on what I have misrepresented or what I’ve left out. The loose criteria for inclusion are (a) are there frequent chemical strucure diagrams or (b) are there enough chemical names that are worth tagging.

I add:


but exclude RSS and CMLRSS feeds at this stage (though they will be the future of some chemical newsfeeds).
So this is to encourage chemical bloggers to add InChIs (or Pubchem CIDs) to your blogs. If you do, we can index your blogs and we’ll be showing some more magic RSN.

Posted in "virtual communities", chemistry | 10 Comments

The mystery unfolded – the molecules have been (and can be) found

I think this was delayed by WordPress.)

  1. Jean-Claude and his students cracked a bit of it. Egon has explained it fully and provided the motivation…
  2. Egon Says:
    October 14th, 2006 at 7:55 pm eI have not been able to track down all of the involved blogs, but my final guess would be that these molecules are taken from chemical blogs. The first one is from tenderbutton, the last one already recognized by J-C (thanx for the tip!).(Peter, please don’t say I’m wrong… :)

Yes – they are from the blogs I mentioned – Useful Molecules, Totally Synthetic, Org Prep Daily, and Tenderbutton. The second posting was the very fisrt molecules on those blogts; the third was the most recent molecules (which were also in PubChem so I could copy the images).
The other theme (which some people hinted at) was that the molecules had InChIs. These were concealed in the alt attribute of the img so they weren’t visible to humans. Paul Docherty (Totally Synthetic) droped in for a chat last week and I showed him GoogleInChI. He was interested but worried that the InChI would take up too much space on the page. I first tried hiding it in:


<spane style="display:none">InChI=...</span>
InChI=...

in the first molecules but I was advised that Google doesn't like non-displaying information. So in the second batch I hid it in the alt attribute and this seems to work. (Unfortunately WordPress seems to corrupt handcrafted HTML on images and some of the alts got overwritten, so that is why the earlier molecules didn't all work).
So the main message is that if you put InChIs in alt attributes, Google will index your blogs. This means that we have chemically aware blogs for the first time. If all the blogs do this we shall have a de facto chemical knowledgebase.

Posted in "virtual communities", chemistry, open issues | 3 Comments

Final Mystery Molecules

Four more mystery molecules – this will be the last lot. Jean-Clause Bradley has guessed some, if not all, of my purpose. (Forget the first mystery molecules – WordPress corrupted the steganography).. But the second lot of molecules and this lot have two strong themes. The actual link requires some knowledge or some intuition into my thought processes. There is a purpose behind this! – which will be revealed soonish – and I know it works.


Bis-(4-morpholino)-methaneCID 21839
org prep daily InChI=1/C9H18N2O2/c1-5-12-6-2-10(1)9-11-3-7-13-8-4-11/h1-9H2


Bengazole ACID 374628
InChI=1/C26H42N2O8/c1-3-4-5-6-7-8-9-10-11-12-13-23(32)36-25(22-15-27-17- 35-22)26-28-19(16-34-26)20(30)14-21(31)24(33)18(2)29/h15-18,20-21,24 -25,29-31,33H,3-14H2,1-2H3


Methylamine
CID 6329
InChI=1/CH5N/c1-2/h2H2,1H3



levodopa
CID 6047 InChI=1/C9H11NO4/c10-6(9(13)14)3-5-1-2-7(11)8(12)4-5/h1-2,4,6,11-12H,3,10H2,(H,13,14)/t6-/m0/s1/f/h13H

Posted in "virtual communities", open issues | Leave a comment

How do I keep up with the Literature?

Here’s a cry from the heart (Stephen Koch Department of Chemistry SUNY Stony Brook) on the CHMINF-L list (for chemical informatics and libraries).

I would like to bring up an issue which has not generated a lot of
discussion on this list. How does a research chemist keep up with the current literature in his or her field? I will outline what I do, and am curious to hear other approaches.
In order to keep up with the literature in my field of research, which happens to be Inorganic Chemistry, I have always scanned the table of contents of a number of journals. JACS, Inorg. Chem., Comm., Dalton Trans. Angew Chem. etc. In the distant past, I would go through the paper copies of the new journals every week. In more recent times, I have had the table of contents of the journals emailed to me, and then I go online to view the graphic table of contents of the various issues. I delete the email when I have gone through that issue. The problem is that the number of journal articles has become overwhelming, so many of the journals are now coming every week and there are so many more journals which publish high quality research. What tends to happen is that the unread table of contents emails
build up. For me, it is only possible to begin to cope with the situation because I can rapidly scan through graphics table of contents.
I don’t have a large research group so I can’t go a division of labor and use my graduate students. I also have always feared that they will not be a complete job.
When I realize that months have gone by since I looked through some of the journals, I fall back to the fact that I have set up to the Web of Science alerts to keywords and author searches in the areas that are directly related to my current research projects. The fear is that you are going miss a critical article. It is bad enough to be scooped; it is not good to discover the article six months after it has appeared in press. The email alerts are also critical to find relevant articles that appear in journals that I don’t look at. I also use WOS citation alerts to my own papers with the idea that I will be interested in any paper which references one of my own papers.
I have started to use the new version of Google Reader where I have RSS “subscriptions” to the journals. The articles are added as they go online so if I use it every day or so, I have been able to keep up. You place a star on an article that you want to keep for further reference.
The RSS feeds from the ACS and Chemical Society journals are excellent, with very large graphics. The Wiley journals (Angew Chem etc) RSS feeds are poor with no graphics. Maybe it not surprising since the graphics in their table of contents are also very poor; you need a magnifying glass to view them.
It would be nice if you could automatically import your starred articles into Endnote and perhaps to automatically download the pdf files.
There remains the problem of how to deal with reading the articles and archiving them so that you can retrieve them at later date. I will save what I do for another day.

I am sure we can all sympathise with all of this! If we believe in constant brain size I can see the following solutions:

  • limit your research to a tiny field. This, unfortunately, is happening more and more. In contrast science, and the world, becomes large and more diverse.
  • Use your colleagues’ brains as a collective organism. I do this a lot. It works as long as there is a reward for collaboration. If, however (as is frequent in chemistry) the ethos is competitive within the laboratory or at least department it doesn’t.
  • Use your discpline’s brains as a collective organism. This is how Open Source (such as the Blue Obelisk) and Open Science work. Again it fails in much of chemistry where the ethos is bitterly competitive rather than collaborative (I would be interested to know of any organic syntheses where the work was planned and shared between institutions). By contrast the biologists have long realised that collaborative working is essential though it is certainly not trivial). The great thing about folksonomies and collective organisms such as the Blue Obelisk is that they are self-selecting. But there is no way of controlling their evolution. See also Useful Chemistry.
  • Use machines to read the literature for you. Our software in Cambridge can now read and use large amounts of the primary chemical literature. This is a good way of alerts and aggregation for well-defined concepts (e.g. are there any papers which mention the sort of molecule I am interested in?) The main difficult is that publishers (see the reference to Wiley) are still prodcuing Hamburgers, not Cows so the machines struggle. So my prediction is that publishers who include semantics in their publications (Cows) will gain market share over Hamburgers. But of course while the mighty impact factor is the only thing that matters that will be slow.
  • Go into a field where you create the literature. Very difficult, of course, and often lonely.

I’ve had a similar post from my immediate colleagues – “how to I find out how to do X?”, “which method for Y should I use?” I’ll post on that soon.

Posted in general | Leave a comment