Congratulations to RSC and Chemspider

I was delighted to hear from colleagues at the Royal Society of Chemistry that they have acquired Antony Williams Chemspider. I had the chance to have a long talk with Antony two weeks ago at BioIT and we discussed possible areas of mutual interest. At UCC we’ve also had a long and successful collaboration with the RSC, including work which led to the creation of OSCAR and Project Prospect.
As I blogged last week this is a pioneering example of the positive role of learned societies in C21, adding sustainability and trust.
I’ll write more later – still fighting the blogging software.
I think this will help to make a major change in chemical informatics and I’m keen to see how we could add synergy.

Posted in Uncategorized | Leave a comment

Stop the blacklist of UK scientists

I recently blogged about the UK’s Eng and Phys. Sci Res. Council (EPSRC)’ s retrograde policy to blacklist any scientist who had three grants rejected.

There is now a petition – ON THE PRIMEMINISTERS SITE – allowing scientists to petition against this and I have signed it

“You are now signed up to this petition. Thank you.
For news about the Prime Minister’s work and agenda, and other features including films, interviews, a virtual tour and history of No.10, visit the main Downing Street homepage
If you’d like to tell your friends about this petition, its permanent web address is: http://petitions.number10.gov.uk/UKScience/”

PMR: I’m impressed that the government has built this site – there is a section of the cabinet office which is actively promoting e-government and this is an excellent idea.

UK SCIENTISTS PLEASE VOTE AND PLEASE TELL YOUR COLLEAGUES TO VOTE AND SPREAD THE WORD.

[I am still trying to get ICE working on Windows 7 so blogs will be fairly short and badly formatted at times]

Posted in Uncategorized | Leave a comment

Windows 7, ICE and "Program Files"

<!– @page { margin: 2cm } P { margin-bottom: 0.21cm } A:link { so-language: zxx } –>

I’ve just upgraded to Windows 7 (beta, I think). I’m trying to post this with ICE/OO which currently throws an error message that it cannot find ICE. I think the problem is that Windows has two directories:

C:\Program Files

and

C:\Program Files (x86)

[Filenames with spaces are bad enough – filenames with brackets and close aliases look like a disaster waiting to happen.]

Anyway I am copying the ICE 2 to both. If you see this post (especially in Toowoomba) you’ll know it worked.

… update

It didn’t work though I think I got past the first error (and got namespace errors).

Posted in Uncategorized | Leave a comment

Trust in scientific publishing

[Please excuse formatting – reinstalling ICE soon]

Two stories have coincided – both relate to the role of trust in scientific publishing.

The first is when I was rung by Emma Marris, reporting for Nature, last week and asked what I thought of the financial problems in the American Chemical Society. I said that I wasn’t really the right person to ask as I had no previous or specialist area, but that it was essential that Scientific Societies were a key part of the scientific community. She’s included a quote in this week’s Nature:

http://www.nature.com/news/2009/090506/full/459017a.html

American Chemical Society makes cutbacks to fight financial losses.

Emma Marris

The American Chemical Society (ACS), the world’s biggest scientific society, is feeling the effects of the global economic downturn.

On 28 April, six months after tightening its belt a first notch, the society laid off 56 people, 3% of its employees…. [the rest is Pay-to-read]

I can’t reproduce the article (copyright) but here’s my bit …

Even vocal critics of the society’s opposition to open-access publishing aren’t delighting in its financial woes. Peter Murray Rust of the University of Cambridge, UK, whose blog covers open-access chemical information, says that he wishes the society well. “I have not been a supporter of many of [its] policies,” he says, “but I would say that we absolutely need national scientific societies.”

As Emma says I have been critical of some aspects of the ACS’s public policy, most notably its proactive role in PRISM – a coalition of (a few) leading publishers to discredit Open Access. From Peter Suber’s blog (2007):

[3]   July 2006 – As Nature later reports, Several publishing executives with ACS, Wiley and Elsevier meet with PR operative, Eric Dezenhall, to discuss a plan to defeat open access.  Dezenhall advises the executives to equate Open Access with a reduction in peer review quality.

This and similar actions have led people to question the scientific integrity of the participants .

In the C21 one of the critical commodities is trust. A typical (and misguided) mantra is: “You can’t trust anything in Wikipedia”. So who can ,by their nature be trusted in the scientific arenas? I’ll try the following list and am happy for comments:

  • learned societies (and international scientific unions)

  • universities, national laboratories and government agencies

  • libraries

  • funding bodies including (most) charities

  • (some) regulatory bodies if business is conducted publicly

Scientific societies have a critical role and that’s why I wish to see a healthy and growing involvement of scientific societies in establishing trust. Trust cannot be mandated, it has to be earned. It is hardly won and easily lost. In C21 Openness and democratisation are major tools in speeding up the growth of trust.

I’ve excluded the commercial publishers. There are worthy ones but there are also ones driven at least partly by the search for revenue at the cost of trust. The following story (http://www.earlham.edu/~peters/fos/2009/05/elsevier-and-merck-published-fake.htmll) broke recently about Elsevier’s publication – for money paid by Merck – of a fake journal. The “journal” was made to look like a typical medical peer-reviewed journal

Merck paid an undisclosed sum to Elsevier to produce several volumes of a publication that had the look of a peer-reviewed medical journal, but contained only reprinted or summarized articles–most of which presented data favorable to Merck products–that appeared to act solely as marketing tools with no disclosure of company sponsorship. …

The Australasian Journal of Bone and Joint Medicine, which was published by Exerpta Medica, a division of scientific publishing juggernaut Elsevier, is not indexed in the MEDLINE database, and has no website (not even a defunct one). …

This might well have gone unnoticed in a pre-digital age and it’s clear that the blogosphere is a major tool in detecting unacceptable publication. So – as many have noted – here is a commercial company which has campaigned to rubbish Open Access as “junk science” behaving in a manner which totally destroys any trust in their ethics and practice. I have no option but to say that I now cannot absolutely trust the ethical integrity of every piece of information in Elsevier journals.

The need for Open, trusted, scientific data and discourse is now clear. The scientific societies are well placed to help us make the change from closed paper to open trusted semantic digital. They clearly need a business model that transforms the new qualities into a revenue stream. This will not be easy but it has to be tried – there is no alternative. Some of the modern tools will help – the ability to mashup, aggregate, etc. will lead to new forms of high-quality information that will have monetary value. Certified validated information will lead to productivity gains and may be a valuable commodity.

So this should be a time for scientific societies to look positively to the future rather than fearfully at the receding past.

Posted in "virtual communities", Uncategorized | 4 Comments

British Library document on copyright

From Ben White of the BL (who sought views from me and others to go into the document). There is a lot positive in this and I really hope the Government takes the recommendations seriously in revising the law. [BTW the format of the document itself is strange and rather difficult to read on screen – it looks more like a poster].

Please find attached the British Library’s latest paper on Copyright and Research. http://www.bl.uk/ip/pdf/copyrightresearchreport.pdf

We had an event (see podcast if you have the time at www.bl.uk/ip) this Tuesday to discuss copyright and research – those on the panel included Lynne Brindley, CEO of the British Library, as well as IP and Higher Education Minister David Lammy, Torin Douglas BBC etc. Lots of great people in the audience too of course!

Please spread the word regarding the paper!

Sincerely yours

Ben White

Here is an excerpt:

In a supreme irony, the ease of access enabled by the digital age
actually leads to greater access restrictions:

1. Researchers increasingly find a black hole when researching
21st Century material– ironically the material of previous
centuries has become easier to access than the websites, word
documents and blogs of today because clearing rights to give
access to modern day material can be lengthy and expensive.

Currently Google blocks post 1868 material on their Google
Books site from users in the European Union because of the
longer duration of copyright in the EU. This means that
European researchers wanting to read material up to 1923
have to travel to the United States to view material that is
freely available there on the web but not in Europe. Much
of this material was of course produced by Europeans…

Some historical publishers have had to abandon post war
social history projects as the rights issues are too complex.

2. Researchers of the future find a black hole when researching
late 20th Century history as much of our digital history has
decayed and become digitally corrupted.

Parts of the British Library’s archive of celebrated photographer,
Fay Godwin, may no longer be accessible to researchers when
Microsoft and Adobe no longer support Windows XP/ Vista
and Photoshop (CS3) servers, as the servers are essential for
viewing some of her digital photographic collection. Restrictions
in copyright law mean that the British Library can do little
practically to prevent this.

3. Computer based research techniques become restricted by
copyright and contract law. Computer technology has already
significantly changed the way in which scientific research is
conducted. Scientists increasingly do not read books or journals,
but by writing computer programmes search, analyse and
extract data from written sources in a technique known as ‘data
mining’ or ‘text mining’. Science is propelled forward by access
and collaborative reuse of scientific information. It is important
that computer based research techniques are allowed for by
future copyright law, in the way that in the analogue world we
have protected research activity through ‘fair dealing’.

Medical researchers write their own computer programmes
to search across thousands of digitised articles in their libraries
to extract important medical data, such as the relationship
between a certain enzyme and the spread of cancer. Despite
this, the researcher is not able to share the results of their
findings with other scientists as this will contravene the terms
of their licence with the database provider, and the relationship
between the provider and the university.

It is heartening to see such a positive view being promoted at a national level. Perhaps this is something that individual libraries can help to support and propagate. Hopefully it can give encouragement to those who wish to challenge the unacceptable status quo.

Posted in "virtual communities", Uncategorized | 2 Comments

British Library urges new approaches to copyright

The British Library has a formal role in advising Government and solicited views (including mine) on the aspects of copyright in relation to research and scholarship. Here’s some of their conclusions which hopefully the Government will take on board and consider in relation to changing (yes, changing) copyright law. Looking at them there is a clear indication that the BL will be arguing for the benefit of relaxed copyright for research purposes and that should be an important contribution to text-mining and data-driven research.

From Open Access blog:

Adapting UK copyright law to accommodate research

Copyright – what is the future for education and research? A press release from the British Library, May 5, 2009.  (Thanks to ResourceShelf.)  Excerpt:

…[L]eading figures in UK education and research today met at the British Library to explore the tensions and opportunities surrounding the role of copyright law in an increasingly digital knowledge economy….

At this morning’s debate, Chief Executive of the British Library Dame Lynne Brindley launched the Library’s campaign to ensure that copyright issues of importance to the research and education sector are included in the ongoing public debate on copyright and are reflected in any subsequent legislation, rules or regulations resulting from recent Government initiatives. These suggestions include:

  1. Public Interest – Many contracts undermine the public interest exceptions in copyright law agreed by Parliament to foster education, learning and creativity. Addressing this issue is crucial so that existing and new exceptions are not over-ridden by contract law.
  2. Preserving our cultural heritage – Libraries must be able to make preservation of copies of the material they acquire, including web harvesting of the UK domain.
  3. Orphan works – 40% of the British Library’s collections are Orphan Works (where the rightsholder can no longer be found or traced). A legislative solution to Orphan Works would help provide access to the UK’s large historical collections over the internet.
  4. Fair Dealing – Researchers and libraries need to be able to make available “fair dealing copies” of anything in their collections, including sound and film recordings that Fair Dealing does not currently relate to.
  5. Technology Neutral – Computer based research techniques, such as scientific research, needs to be allowed by future copyright law, in the same way that in the analogue world research activity is protected through “fair dealing”….

Dame Lynne added: “There is a supreme irony that just as technology is allowing greater access to books and other creative works than ever before for education and research, new restrictions threaten to lock away digital content in a way we would never countenance for printed material. Let’s not wake up in five years” time and realise we have unwittingly lost a fundamental building block for innovation, education and research in the UK.” …

Posted in Uncategorized | Leave a comment

CML provides semantic chemistry

[I have had to change machines temporarily so this post not ICE-enabled]

Henry Rzepa has commented on CML/Chem4Word:

Henry Rzepa says:

The progress with Chem4Word is hugely impressive and important. But I would like to remind that the first steps to this were taken a long time ago now. In an article published in 2001, we [Henry, Michael Wright and PMR] set out much of the framework. I have taken the liberty of pasting the abstract of that effort here. In essence, we have moved from a very simple article markup (DocML) to the sophisticated one used by Microsoft, but the essence of transcluding interoperating and namespaced XML languages, the CML component of which is data rich, is preserved to the present.

“We report the first fully operational system for managing complex chemical content entirely in interoperating XML-based markup languages. This involves the application of version 1.0 of chemical markup language (CML 1.0) and the development of mechanisms allowing the display of CML marked up molecules within a standard web browser (Internet Explorer 5). We demonstrate how an extension to include spectra and reactions could be achieved. Integrating these techniques with existing XML compliant languages (e.g. XHTML and SVG) results in electronic documents with the significant advantages of data retrieval and flexibility over existing HTML/plugin solutions. These documents can be optimised for a variety of purposes (e.g. screen display or printing) by single XSL stylesheet transformations. An XML schema has been developed from the CML 1.0 DTD to allow document validation and the use of data links. A working online demonstration of these concepts, termed ChiMeraL, containing a range of online demonstrations, examples and CML resources such as the CML DTD and schema has been associated with this article via the supplementary material.”

I would note that IE 5 was a quite different beast from the present one, and that inevitably, our demonstrator called ChiMeraL, no longer functions! Would anyone like to offer to repair it?

Bachrach has also blogged on the topic, and a discussion is also developing there

[PMR] Henry is correct that this was the first full semantic chemistry publication (and it’s worth noting that – 8 years later – there are now publications emphasising “semantic” which go over much of the same ground. In fact the very first publication is:

http://acscinf.org/docs/meetings/216nm/216cinfabstracts.htm#33

33. THE COMPLETE CHEMICAL E-PUBLICATION.

Peter Murray-Rust, University of Nottingham, Nottingham, NG7 2RD, UK.

The development of new tools for use on the WWW is now extremely rapid. Even allowing for the current “hype” over XML (eXtensible Markup Language) and other protocols, it seems certain that most Web-based information systems will be adopting them. The announcement by major suppliers that they will be developing XML-based browsers and editors means that there will be a large number of affordable high-quality tools available very shortly. The goal is to make information available globally, in any discipline, and as easily as possible to authors and readers/users. Authors and tool developers will use documents and data from different domains that interoperate in a platform- and vendor-independent manner. The XML family of protocols will allow integrated documents and data for the first time. To support chemistry, we need to address specifically molecular problems. Although there are no agreed semantics for molecular data nor de facto standards, a starting point, Chemical Markup Language (CML) will be presented.

… which is now nearly 11 years old. At that meeting I presented an interactive semantic document where I had had to write almost all the software for molecular display, spectral display, tree-based navigation, etc. Most of that is now lost, although Henry had the foresight to archive some of it as a CDROM as one of his electronic conferences.

We are gratified that the design we worked out 10 years ago for CML 1.0 is still largely relevant today. CML covers the main areas of chemical publication – molecules, reactions, spectra, crystallography and compchem. These have all been extensively implemented and tested so we are clear that the design works. It is the only approach to managing all of these objects in a single schema.

Since that time there have been many enhancements to the design, some driven by experience and some by new web technology. We now make extensive use of RDF and ontologies to implement the CML dictionaries. We also regard CML as a set of microformats which can be used in an arbitrary or a controlled manner (for the latter we use the @convention attribute).  There are extensive converters to an from CML and tools for display and – now – editing. There is no technical reason why it should not become the digital dialtone of the [chemical] web (Jon Bosak’s phrase for XML).

Until Chem4Word and ICE, the  authoring of complete documents had to be done manually. We can now use these to create documents, include chemistry and add hyperlinks and – where there is open source code – some behaviour.

Posted in Uncategorized | Leave a comment

The European Net – have we won a victory?

We appear to have won something a battle? A skirmish? But we have also failed to win the struggle to contain the potential domination of the Telecoms.

From La Quadrature du Net

Amendment 138/46 adopted again. Internet is a fundamental right in Europe.

The debates on the Telecoms Package, thanks to a remarkable citizen mobilization, led to an extremely strong recognition of the access to internet as a fundamental right with the re-adoption of amendment 138/46 in second reading by a qualified majority. It is the final blow against three-strike laws such as Nicolas Sarkozy’s HADOPI bill, which are explicitely banned. The European Parliament nevertheless adopted a soft compromise on issues of network equity: no strong protection against net discrimination was adopted.Strasbourg, May 6 2009

La Quadrature warmly thanks the numerous European citizens who have contributed to the possibility of this new and stronger than ever statement for fundamental rights. Even on issues connected to network offers, the worst provisions introduced since the beginning of the legislative process were not adopted. Thanks to the public debate, the ill-intended co-operation between ISPs and right holders and discrimination of Net services and contents will not be forced, even though doors are still open for introducing it in Members States.

A formidable campaign from the citizens put the issues of freedoms on the Internet at the center of the debates of the Telecoms Package. This is a victory by itself. It started with the declaration of commissioner Viviane Reding considering access to Internet as a fundamental right1. The massive re-adoption of amendment 138/462 rather than the softer compromise negotiated by rapporteur Trautmann with the Council is an even stronger statement. These two elements alone confirm that the French three strikes scheme, HADOPI, is dead already. explains Jérémie Zimmermann, co-founder of La Quadrature du Net.

To safeguard these provisions, the European civil society will have to be strongly mobilized during a conciliation phase3 that would proceed with a newly elected Parliament and a new Presidency. Furthermore, some provisions in the compromise amendments to the Harbour directive adopted today allow telecoms operators to alter the Internet as we know it. Nothing will forbid them to turn the Internet away from a neutral zone where people have equal access to all content applications and services.

As these provisions have been negotiatied with the Council, they are likely to become law. Citizens will have to be particularly attentive to the transposition and implementation of the adopted provisions. It would be disastrous for the Internet to stop being a space where all can create innovative services and contents without permission from gatekeepers. In order for consumers to be in a position to endorse equitable network offers and reject the discriminatory offers, it is essential for at least some of the offers to be non-discriminatory. We will call the regulatory authorities and the Commission to ensure it by all policy means.

The strong statement for the access to the Internet as a fundamental right demonstrates that the Parliament can be courageous and reject the pressure to compromise when essential values are at stake. Unfortunately, on issues that appear more technical such as the absence of discrimination of services and contents on the Internet, the Parliament did not take the full measure of what it is at stake yet. Citizens must remain mobilized on these crucial questions., concludes Gérald Sédrati-Dinet, analyst for La Quadrature.

The price of freedom is eternal vigilance. At least the Net helps to keep us awake.

Posted in Uncategorized | 2 Comments

Green and Gold Open Access in Yorkshire

We had a great weekend in the Yorkshire Dales and aimed to do the three peaks (Pen-y-Ghent, Whernside and Ingleborough) in three days. (Yes, many people do them in one day, but it takes many hours). Anyway the last day was horrible and we took a lower level mountain. In the horizontal rain there was this sign. The colours (in the sun) really are green and gold.

graphics1
[image]

Posted in Uncategorized | Leave a comment

Mandate Theses in HTML

Peter Sefton has put forward a cogent case that all theses should be submitted as HTML. [If you are a PDF-junkie read the sentence carefully before howling. It doesn’t say only HTML.] He wishes USQ to…

Be the first university in Australia to mandate that theses are deposited in the institutional repository in HTML, with linked data and embedded semantics as well as the standard paper-on-screen PDF file.

[…]

Ill [PTS] start with the theses. The Open Access movement is now well established, and USQ already has a mandate (1) that all theses are to be submitted electronically and to go into ePrints when the degree is conferred. This does help to make research available to the community that paid for it, but it is such a pity that in the web age we are still stuck with the paper view of a research output. Citations are not reliably machine readable, data sets are rarely made available and if they are they are not linked into the thesis. And worst of all, the thesis is not made available in HTML where it is part of the fabric of the web. Can you imagine a university getting away with a web site which was PDF only? We certainly try not to deliver courses that way2. In most web situations PDF is considered an accessibility barrier and yet in the repository community its the main game.

There are some universities around the world with XML production systems for theses, where HTML should be available but as far as I know none of them have achieved the level of automation that we have or spent as much effort on the semantic web in way that will be usable by candidates. This is partly because most of the efforts have used complex XML schemas which are not a good match for word processing documents, whereas we target HTML which is a reasonably good match for a generic styled word processing document.

So why are institutions in general not mandating that these must be available as web pages?

Well, in most places that would be because it is too hard to do. Regrettably you cant just save as HTML from Word and expect to get repository-quality web pages, or expect any-old LaTeX file to be magically web-ready. You can read me ranting on about that in this list of delicious links about how hard it is to make HTML from word processors. But at USQ, we have a not-so-secret weapon: ICE. The Integrated Content Environment is the core university system we use here to create our long-form courseware a lot of which is very similar in size and structure to a thesis. With Jim Downing and team at Cambridge, we have shown on the ICE-Theorem (2) project how chemical theses can be created in ICE and published to the web complete with embedded chemical semantics and everyones favourite the rotating molecule (theyre taking this much further with Chem4Word which we will try to work with as well). Jim and I will be presenting that work at Open Repositories 2009. And we have a few other sample theses that show that we can produce rich web-based theses and still have the core part delivered as a printable PDF file.

We have the systems. We know it can be done. Its a small institution. Lets do it.

[PMR] I’ll be presenting the power of the semantic thesis at ETD2009 next month. There is no technical reason why theses cannot be deposited as HTML. (Alongside the PDF, of course). And when that’s in place we can deposit as Word or ODT. Then we will have a truly semantic thesis.

And, just to remind you for the n’th time the universities can make their own rules here. They do, anyway. If someone has to have a gown of colour c, speak words in a foreign language, sign impenetrable declarations, etc. surely the production of the work in HTML which babies now learn in their cradle is possible. The students will hardly blink if you require HTML from them.

So just do it

Posted in Uncategorized | 2 Comments