CLARION Chemical Data repository at Cambridge – (2nd try)

I mentioned yesterday that we had been funded by JISC to develop a departmental repository starting from C3DER (crystallography) and expanding to spectroscopy and chemical syntheses. We shall be working with a commercial supplier of Electronic Lab Notebooks in a tightly coupled project where both will benefit from the synergy we shall have the use of a robust platform and can add on many of the innovations we’ve been developing here, which should then get a wider currency. We think this is a new and exciting way of exploring the next generation of chemical informatics which will be semantic, enhanced and guided by ontologies.

The vendor has not been selected so I am keeping my mouth shut…

CLARION project Cambridge Chemistry Department

 

The data challenge: Chemistry laboratories produce many types of information and data raw data, processed data, observations, chemical structures, reaction schemes, experimental write-ups, conclusions, graphs, images, crystallographic, spectroscopy data, papers, references, and so on.  It is challenging to store this variety of information such that it is accessible and usable by a variety of users.  The challenges include:

 

Storing data in formats that allow its use by specialist data processing tools

Using data formats that are suitable for publication and long-term preservation

Allowing certain data to be used by people outside the department

Motivating researchers to open their data

Enhancing the meaning and context of the data to improve its usability

Making the data searchable and easily navigable

Ensuring that the system has minimal support overheads, yet continually evolves as required to meet changes in the IT environment.

 

Using an ELN:  The Cambridge Chemistry Department has a basic repository which stores crystallographic data.  Project CLARION (Cambridge Laboratory Repository In/Organic Notebooks) will create an enhanced repository that captures core types of chemistry data and ensures their access and preservation.  The Chemistry Department is implementing a commercial Electronic Laboratory Notebook (ELN) system; CLARION will work closely with the ELN team to create a system for ingesting chemistry data directly into the repository with minimum effort by the researcher.

 

Enhancing and expanding data usage:  CLARION will provide functionality to enable scientists to make selected data available as Open Data for use by people external to the department.  The project will use techniques for adding semantic definition to chemical data, including RDF (Resource Description Framework) and CML (Chemical Markup Language).  Much of these techniques will be extensible to other disciplines.  CLARION will address general issues such as ownership of data, and it will publicise its results to the chemistry and repositories communities.  Effort will be put into developing a sustainable business model for operating the repository that can be adopted by the department after project completion.

 

Timelines: The project runs for two years from April 2009. The initial pilot deployment of the ELN is scheduled for late 2009, and we hope to be publishing open data from it in early 2010.

 

Project blog:                http://clarionproject.wordpress.com/

Twitter:                       CLARIONproject    http://twitter.com/CLARIONproject

Contact:                      Brian Brooks <bjb45@cam.ac.uk>

Posted in Uncategorized | 1 Comment

BioIT 2009 – Trends from the Trenches

Plenary Trends from the Trenches Chris Dagdigian, BioTeam, Inc.
Some brief notes from plenaries
What’s Mainstream:

virtualization, partly because of power requirements. The simplests and most powerful thing you can do. Protects the web apps, databases that are lashed up. Valid use case, but not enterprise. So virtualization allows enterprise-like environment preserves this innovation without danger. Vital for science

not coming soon Vms for Grids and Clusters. Too much admin hassle

Storage first 100TB single namespace project. Jobs lost over data loss. Data triage is a given. Examples Single namespace for Mac has 80 TB, 1.1 PB on Linux system

Users have no idea of true cost of storage. $124 for 1TB fort hardware is misleading. Individual labs put in 100Tb+ systems

Unlimited data storage days are over. – need triage. Cheaper to repeat experiment than keep data

Data loss – exemple double disk failure in metadata 10 TB in goverment lab. You will get double disk failures. Need RAID6

Backup is becoming a thing of the past, no nightly full.

Amazon, Google MS can store for 80cent / GB / year. Can you do that??

IT cannot be sole decision maker for triage or for storgae optimization

Rate limits are chemistry, regagent costs and human factors

Proeblem is somewhat scary but most people surviving

Amazon is is the cloud has mutli-year headstart

Security in the cloud don’t expect things that you don’t provide. Objections are often political

Compute power is easier than IO. He believes that Amazon are working on data ingestion.

Will be big move of science data into storage cloud. Science data will take 1-way trip. Data will stay in cloud. Only derived data will return.

McKinsey report on Cloud Computing very good, also James Hamilton

Watch Amazon, Google and MS.
Best data practices are starting to trickle out. Google is now showing what it did 5 years ago so they must be up to very exciting things now
Finally federated data storage something for the future

Posted in Uncategorized | 1 Comment

BioIT in Boston: What is Open?

My talk is Open Semantic Data in Science. I’ll probably write 3-4 blog posts on the various aspects of this, and at present I’m thinking of:

  • What is Open? (this post)
  • What is semantic? And what do we require for it?
  • What is data?
  • What are we able to offer (with some modest emphasis on our own endeavours).

I am starting with the assumption that for science now and in the future Open Data will be essential. The culture, especially among young people, is that the answer is out there and is retrievable within seconds or less. There’s also a realisation that increasingly we don’t know in detail what we are looking for when we start a study. We read bits of papers, skim around till we get a feel for the subject, ask our colleagues, post questions on blogs, etc.

We’re also using machines much more to help us with the data, both in the volume and the diversity. This is a central theme at BioIT. So the fundamental postulate of Openness is:
  • ANY barrier to access and re-use, however small and seemingly trivial COMPLETELY destroys public semantic data.

(Note that I accept that there are closed worlds companies, healthcare, etc. which require access controls, but their technology can feed off what we are trying to create in public view).

Why am I so insistent on this? I’ll leave the moral and ethical arguments aside here and concentrate on the technical aspects. The Open Knowledge Foundation has addressed this point in its definition and I’ll quote from that highlighting particular points (and abbreviating occasionally)

A work is open if its manner of distribution satisfies the following conditions:

  • 1. Access


The work shall be available as a whole …, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.
Comment: This can be summarized as ‘social’ openness – not only are you allowed to get the work but you can get it. ‘As a whole’ prevents the limitation of access by indirect means, for example by only allowing access to a few items of a database at a time.

  • . Redistribution

The license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution.

  • . Reuse


The license must allow for modifications and derivative works and must allow them to be distributed under the terms of the original work. The license may impose some form of attribution and integrity requirements: see principle 5 (Attribution) and principle 6 (Integrity) below.
Comment: Note that this clause does not prevent the use of ‘viral’ or share-alike licenses that require redistribution of modifications under the same terms as the original.

  • . Absence of Technological Restriction


The work must be provided in such a form that there are no technological obstacles to the performance of the above activities. This can be achieved by the provision of the work in an open data format,

  • 5. Attribution


The license may require as a condition for redistribution and re-use the attribution of the contributors and creators to the work.

  • 6. Integrity

The license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work.

  • 7. No Discrimination Against Persons or Groups
  • 8. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for military research.
Comment: The major intention of this clause is to prohibit license traps that prevent open source from being used commercially. We want commercial users to join our community, not feel excluded from it.
9. Distribution of License
The rights attached to the work must apply to all to whom the work is redistributed without the need for execution of an additional license by those parties.
10. License Must Not Be Specific to a Package

11. License Must Not Restrict the Distribution of Other Works

and now the absolute requirement for Openness.

NONE OF THE ABOVE CONDITIONS ARE OPTIONAL

  • This is the crux. There are many data resources which are described as Open but they fail in one or more aspects. The commonest failures are:
  • to expose only part of the data. A database system with a query interface is normally not Open Data even if individual items can be downloaded without barrier. It is generally impossible to extract the whoel work as its boundaries are concealed by the search interface
  • to limit the amount downloaded. This is very frequent (you may use a maximum of 100 entries).
  • To forbid re-use. This data is copyright X and may not be re-used without permission)
  • To require access through specific technology. A search form limits the access.
  • To require any form of signin, even if free. Robots are illiterate in this aspect
  • To restrict purpose of re-use. Thus CC-NC (no commercial reuse) is NOT OKF-compliant
  • To fail to provide a clear statement that the data are open and comply with the Open Knowledge definition. It’s almost universal that data are NOT labelled as Open. This is easy to fix just add the OKF’s tags
  • graphics1
  • So the message is simple, though it will take time to spread
  • Use the OKF definition for all your data and tag it as such

This blog authored with ICE + Open Office; thanks to PeterSefton and USQ

Posted in Uncategorized | 6 Comments

BioIT in Boston: What I shall say and How I shall say it

I am talking on Wednesday at 2009 Bio-IT World Conference which unites life sciences, pharmaceutical, clinical, healthcare, and IT professionals. It is the perfect place to learn, be recognized and network. Well I hope I can be recognized as I’m meeting Steve Heller at 1600 and Antony Williams tomorrow. And networking is good they have wireless everywhere which is great relief as so many of these conferences have no wireless or charge zillions per day. And I’v found a free lunch (again many of these places are really mean, but you can often sneak in at the back to special functions.

So what shall I say? I was going to talk about the Chemical Semantic Web, but then looked at my program and found I had offered:

Open Semantic Data in Science

which is much better because (a) a Pfizer article in the conference magazine says the SW doesn’t exist, but Semantic Data does and (b) it will fit well into the revised program. Antony Williams is now just before me and Rajarshi Guha is just afterwards. So we shall try to meet beforehand and see if we can make a seamless program.

The conference asked us to upload slides a month before I argued that my style of presentation didn’t use slides and anyway I didn’t know what I was going to say. I therefore generally work it out the day or night beforehand. This isn’t because I am casual about it I take my presentations seriously and put a lot of work into them but a lot of this goes on in my head in the preceding days. I don’t recommend this to others and in Cambridge we require colleagues to present dry runs a week beforehand.

But the main thing is that what matters to me is what I say rather than what I write. I try to adjust my words to the actual audience there could be 10 or 100 people in this session I have no idea. I often use a blog to work my ideas out and I’ll do that here. The blog serves as a record and now that I have ICE as an authoring tool I should be able to add images to the blog. Here goes…

graphics1


[Question what is the name of the panda? Who is his famous companion ? And if you didn’t know how did you find out]

So, yes, images seem to work great! So I can start to use blogs with things other than words.

So how can this technology compete with the awful Powerpoint? Its main virtue and it’s an important one is that it has a good editing tool and it acts as a container for various types of content. Not a robust one, as anyone who has transferred it from or to Mac or Unix knows.

So what alternative is there?

I suggest Word or Open Office. Ideally I’d like to use this for the complete presentation but I have got fouled up with the page flips. I like my HTML presentations I like the ability to scroll. The problem is packing them together afterwards (I can’t do this beforehand as I don’t know what slides I will use).

So my current approach is to blog my main message before the presentation. By doing this with ICE I can author the blog as ODT. Then I can convert this to DOCX and upload it to the conference site. I’ve talked about this with Peter Sefton and we’re thinking about a way to manage some of this.

Ideally I’d like to be able to record which HTML slides I showed (I assume this requires Javascript). Then it would be useful to combine this into a single ODT/Word/HTML/PDF document for the later reader. That’s exactly what Peter does for the courseware at USQ.

So I’m blogging with ODT and will upload several blogs of my talk during the meeting.

I don’t know whether they will make sense, but at least they’ll me more semantic than Powerpoint.

This blog authored with ICE + Open Office; thanks to PeterSefton and USQ

Posted in Uncategorized | 1 Comment

BioIT – Chem4Word

I’m in Boston for Bio-IT World Conference & Expo 2009 for two main reasons, an invited talk “the Chemical Semantic Web” (Computational Chemistry track) and also our first public demonstration of the Chem4Word software (research.microsoft.com/en-us/projects/chem4word/ ) . For those who are at the meeting, the first’s on Wednesday morning, the second on Tuesday lunchtime.

The C4W demo has been worked on very hard for the last month. There was a dress rehearsal in Redmond at the Microsoft External Research meeting which was ready about 5 minutes before the presentation. We took the decision to freeze that functionality and to show it in Boston after the bugs had been ironed out. The discipline of having a fixed deadline (an international meeting) is an excellent way of concentrating minds within a project. Rudy Potenzone is demo-ing the software but I’ve got the demo on my machine as well.

What does Chem4Word do? It’s more important to say what it is.

At one level it’s an add-in that chemists can use to author documents. An the other end it’s a toolkit which can be used to develop the next generation of bench-top chemical software. I owe Rudy some introductory material, so I might as well use this blog to do it.

Chem4Word is an Open Platform for collaborative chemical software development in a .NET environment.

C4W will be transferred to CodePlex (the MS Open software site) and will be available for anyone to help develop, much as in the spirit of the Blue Obelisk. Learning from other Open Source chemistry projects we have though closely about sustainability of management.

Chem4Word is an Add-In to Word2007 that creates a semantic authoring tool for chemistry.

Word2007 is a platform that supports semantic authoring. Its use of smartTags allows words and phrases to be linked to a range of document components, including a Gallery, a Navigator.

Chem4Word uses (chemical) Ontologies.

With the new Microsoft Research Ontology Add-In external ontologies (we use Nico Adams’ ChemAxiom) document components can be managed by a formal ontology. At one level this is a chemical spell-checker, at another a thesaurus, at another a converter between scientific units and at yet another a transformation tool between scientific concepts.

Chem4Word emphasizes semantics by using CML as its exposed data model

Current chemical toolkits require a fixed data model for objects. C4W communicates with CML (and other XML) as its data model. This gives a declarative programming model where there are no side effects. Effectively this is a new programming language for chemistry, both formal and flexible

Chem4Word is modular

The graphics and UI are decoupled from the chemical engine. This means that commands can be issued to the engine from sources other than the UI. The document is also modular – it’s possible to examine the chemistry, the links, the tags all as XML and to build document processors independent of Word.

Chem4Word supports validation

All CML has to conform to a schema (CML-Lite) and can be validated at every stage. The import pipeline takes 4-5 stages with validation and normalization. It is impossible to import or author an invalid file. This is intended as an important contribution to bringing needed quality into chemistry.

Chem4Word integrates Text and chemistry and styles

The Word document introduces ChemistryZones : which are chunks of the document representing chemistry. These are all backed by a CML object which itself can have many components, currently:

  • single molecule

  • compound molecule (salts, hydrates, complexes)

  • formula

  • name

Each of these can be displayed in a chemistry zone, making it possible to change the representation of an object, while preserving the semantics. The Navigator allows the user to select a given zone or to navigate from it.

Current functionality

The current project had to balance functionality, semantics and aesthetics and has put most emphasis on semantics. The primary functionality is currently:

  • manage gallery, navigator and other Word concepts

  • create chemistry zones

  • import CML molecules

  • validate them

  • render them, with different styles in different zones

  • tweak them (move atoms to prettify the molecule)

  • change atoms

We have deliberately not (yet) introduced chemical editing tools as we wish to get the UI framework correct and validate the semantics. With the large number of molecules now available (e.g. in Pubchem) we can convert these to valid CML outside C4W and import them. This means that unless chemists are working with new molecules C4W will already support many of their authoring needs.

The future

The current project runs for another few months at the end of which we’ll have a release version. (We shall make the current version available to a few pre-alpha collaborators). A major emphasis is to create a distribution which is well designed for development and even if that means limiting the initial functionality. We’ll work hard on developing use cases where C4W is useful, especially in the creation of compound documents.

We’ll tell you then where this is going after that.

This blog authored with ICE + Open Office; thanks to PeterSefton and USQ

(Note: Just when I thought I had the ICE plugin working, it now fails to post. I think this may be due to firewalls or something else, but I can’t grab the error message as it disappears. So I have to cut and paste. I think that’s why the fonts go wonky)

Posted in "virtual communities", Uncategorized | Leave a comment

Three days to save the European Internet

Two days ago I had no idea the European Internet was under severe threat, and I’m a European. Part of the problem is that Europe is incredibly complicated and the governance is baroque and bizarre. It uses terms like (Acquis communautaire) admittedly I suffer from Anglophone blindness, but in any language the complexity of terminology and governance is horrendous.

The normal thing most Brits do is ignore it. I have a cosy feeling that continentals are more educated but that’s probably false. So we have a governance process that’s out of control. They pay themselves huge allowances, are regularly corrupt, but as a war baby I reckon that’s a small price to pay for not carpet-bombing civilians. Yes, the UK tabloids regularly bash the Common Agriculture Policy, etc. but…

I was shocked out of my complacency when the issue of Software Patents in Europe arose. I went to UCL (London) to hear Richard Stallman talk on this and was embarrassed to find an American who knew how European government worked. He know where the power lay, the Council of Ministers (who are unelected), etc. and he gave us clear instructions as to how to best mobilise.

Now we are at it again. Although I’m an educated citizen of Europe I don’t know how to promote my views best. But one of the great powers of the Web is that it promotes e-democracy. Not only can anyone say what they want but groups can use crowdsourcing to assemble arguments and advocacy. So I know that I can read up rapidly on the issues and know what the best use of my very limited efforts is. (Here I think it’s mainly raising the issues on this blog and writing as an individual to my MEP).

I’ve found Twitter very useful here. 2-3 followers have in the rather cryptic style of Twitter pointed out that there are two issues.

  • Net neutrality

  • 3-strikes

Both are evil but the wisdom seems to be that net (non)neutrality is even more evil. What’s NN? Here’s a helpful site (http://www.savetheinternet.com/=faq). Essentially Net Neutrality is about the infrastructure of the net as provided by the companies such as telcons, which by default do not have our interests at heart.

From the site:

What is Network Neutrality?

Network Neutrality — or “Net Neutrality” for short — is the guiding principle that preserves the free and open Internet.

Put simply, Net Neutrality means no discrimination. Net Neutrality prevents Internet providers from blocking, speeding up or slowing down Web content based on its source, ownership or destination.

Net Neutrality is the reason why the Internet has driven economic innovation, democratic participation, and free speech online. It protects the consumer’s right to use any equipment, content, application or service on a non-discriminatory basis without interference from the network provider. With Net Neutrality, the network’s only job is to move data — not choose which data to privilege with higher quality service.

Who wants to get rid of Net Neutrality?

The nation’s largest telephone and cable companies — including AT&T, Verizon, Comcast and Time Warner — want to be Internet gatekeepers, deciding which Web sites go fast or slow and which won’t load at all.

They want to tax content providers to guarantee speedy delivery of their data. They want to discriminate in favor of their own search engines, Internet phone services, and streaming video — while slowing down or blocking their competitors.

These companies have a new vision for the Internet. Instead of an even playing field, they want to reserve express lanes for their own content and services — or those from big corporations that can afford the steep tolls — and leave the rest of us on a winding dirt road.

The big phone and cable companies are spending hundreds of millions of dollars lobbying Congress and the Federal Communications Commission to gut Net Neutrality, putting the future of the Internet at risk.

Isn’t the threat to Net Neutrality just hypothetical?

No. By far the most significant evidence regarding the network owners’ plans to discriminate is their stated intent to do so.

The CEOs of all the largest telecom companies have made clear their intent to build a tiered Internet with faster service for the select few companies willing or able to pay the exorbitant tolls. Network Neutrality advocates are not imagining a doomsday scenario. We are taking the telecom execs at their word.

And you should read more.

Here’s an analogy. I shall start my journey to BioIT on two trains, East Coast Capital Connect (used to be British Rail) and Transport for London (the tube). Each makes up its own rules as the what services operate, what the fare structure is. For example if I want to travel from Cambridge to London they decide that I cannot have a cheap fare at certain times even though I have a concession. So as a class of citizen I am discriminated against in favour of corporate passengers (customers). That’s Train non-neutrality.

If I travel at the wrong time I incur a penalty. Let’s call that a strike. And let’s assume that a company decides that a recidivist breaker of this rule gets banned from travelling. That’s a per person decision, and somewhat analogous to the three strikes rule. There may be good reasons for wanting to ban individuals repeated disorderly behaviour for example. I don’t know, but I expect there are people banned from rail travel.

So in writing to my MEP I referred him to a summary of the issues better than trying to explain them myself when I don’t know what’s being voted on when and by whom.

I hope he knows.

Posted in "virtual communities", Uncategorized | Leave a comment

Three Strikes and You Are Disinternetted

I am learning more about the proposed (and terrifying) control of the European Internet by vested interests through ISP control. Since I’ve only known about this for 1 day forgive errors in the learning process..

Here’s a useful, recent, post explaining it.

In simple terms, if you infringe copyright three times you can be cut off from the Internet by your ISP. I don’t know for how long. Probably ever. Or at least as long as the foul-ups on credit ratings and gas bills to people with no gas.

This is the modern equivalent of having a hand cut off for stealing, or being transported to Australia.

So what’s the problem? None of us infringe copyright do we? We sit through those awful adverts from DVD rentals you wouldn’t steal from a pensioner – so don’t infringe our copyright.

I try to abide by copyright. But it’s technically hard. The copyright on many works is highly dubious. Anyone can add copyright notices to the public domain with a very good chance of making it stick. Publishers such as Wiley and ACS require their copyright to be added to supplemental scientific data. And they see the Shelley Batts case (Sued for 10 Data Points) – pursue those who copy this.

So it’s entirely possible that if I copy a graph or table of numbers or a spectrum from a Wiley paper I will be infringing ‘their’ copyright. Now all they have to do is tell the ISP to cut me off. I will have to argue with an ISP who has no knowledge or interest in the rights and wrongs and simply sees Copyright Wiley stamped on the document.

And it will get worse. Just like universal video cameras the front-line policing of the Internet could be through ISPs.

I will be a second-class citizen for the rest of my life.

Posted in Uncategorized | Leave a comment

Dear MEP, Please Save Our Internet

Glyn Moody has confirmed that there is a real threat to the European Internet:


Glyn Moody says:

April 25, 2009 at 4:14 pm (Edit)

Yes, this is very serious. Its threatening Net neutrality – which means that bits are passed end to end without caring what they are. Telecoms companies want to be able to block certain kinds of traffic, or charge more (for Skype, for example). This is likely to kill innovation.

Heres my commentary from a few weeks back, together with the letter I wrote to my MEPs. Theres also a link to what Sir Tim Berners-Lee said on Net neutrality, and why it matters:

http://opendotdotdot.blogspot.com/2009/03/save-european-internet-write-to-your_30.html

Feel free to contact me with questions.


I have therefore written to my member of European Parliament (MEP)


Dear Andrew Duff,

I am writing to urge you to vote to keep Europe’s Internet free when the issue is voted on May 5th.

I am sure you are aware of the issues, which are summarised on http://www.blackouteurope.eu/.

I am a chemist in the University of Cambridge, deeply involved with the Digital Information Environment and have been supported by grants from the UK’s eScience program (DTI/EPSRC)  and the JISC (www.jisc.ac.uk) to develop new radically new information systems for scientists. The Internet and the new generation of Web activity (Sir Tim Berners-Lee’s Semantic Web) are a revolution in human history and will bring major advances and be a major generator of wealth. This will be critical in pulling the world out of recession.

Cambridge and the East of England are leaders in this revolution, which depends on uncontrolled and ultra-rapid innovation. This innovation depends absolutely on the free flow of information, typified by TimBL’s “Linked Open Data” where different information sources are “mashed up”  through software. Often these systems are created by single, inspired, programmers with nothing more than a PC connected to the Internet. Any barriers, however small, destroy this innovation completely. That is why many of us campaign for the removal of barriers such as software patents, DRM and copyright that restrict the digital revolution. For example, the Open Knowledge Foundation (http://www.okfn.org), which started in Cambridge, works to create the instruments and the advocacy for unrestricted knowledge. Any control over the flow of digital information will have major adverse effects on the innovation we are now seeing.


Besides the economic argument, the proposed changes will have a very serious impact on digital democracy. Again Cambridge has been in the forefront of developing new web-based democracy such as seen in mySociety (http://
www.mysociety.org) with products such as TheyWorkForYou (http://www.mysociety.org/projects/theyworkforyou/) which gives immediate and comprehensive digital access to information on Westminster MP’s records and activities. (Some of their work is supported by the Cabinet Office).

The future of our planet will also depend on access to data and its unrestricted use. A typical example is AMEE (http://www.amee.com/), again a single person initiative, which produces carbon and energy calculators which are now used by major multi-nationals to unify the reporting of carbon footprints.

The European debate is characterised by large multinationals with powerful lobbies arguing for wealth creation through restriction and monopolies. This is backwards looking and slow-moving and economies and governments who are governed by this will be rapidly overtaken by those, like the new US administration, who espouse Openness as a primary approach.

Yours

Peter Murray-Rust


This blogpost was prepared with ICE+OpenOffice.

Posted in Uncategorized | 2 Comments

Is Europe&apos;s Internet in danger?

I got the following mail yesterday which alerts us to a potential threat to our Electronic Freedom. I’d value a reality check. I have checked on Twitter and got a little feedback it’s a real issue. But like all Euro stuff – I am sure it’s complex. I haven’t seen anything on EFF, for example. (EFF were active in fighting European Software Patents).

So should I write to my MEP before May 5th? – and if so with what message?

Do NOT Touch The Internet
Citizens Ready For A Fight Across The European Union.

The privatization of the Internet could be decided in the next few weeks. On May 5th, the European Parliament will vote on a package of measures which will affect the national laws of all EU countries.


The European Parliament is about to give up our rights to an open access to the Internet to protect the interests of Entertainment and Communications multinationals. A civil campaign by organizations of the whole European Community will be simultaneously launched tonight in order to prevent the privatization of the Internet and to defend the democratic right to access to information and digital tools.

This campaign includes:
1 – An informative
website explaining what the so called ” Telecommunications Package “, which is about to be voted on in the European Parliament, is, why it is so dangerous for the future of citizens. the impact on their daily economy, as well as useful information and tools created by citizens for citizens, including open letters, press releases, a blog and video material:
http: //
www.blackouteurope.eu/ 

2 – A letter to members of the European Parliament that it is being sent by hundreds of organizations and citizens (one Euro Parliamentarian recognized that they are receiving as many as 200 letters a day):
http://www.blackouteurope.eu/act/letter_to_meps.html

3 – An automatic system that allows citizens to send the letter directly to all the Members of the European Parliament:
http://blackouteu.wordpress.com/directly-send-to-parlamentarians/

4 – A system to monitor their vote in the matter:
http://blackouteu.wordpress.com/follow-their-vote/

And

5 – A Europe wide casting for grandmothers to create an inter-European viral video:
Fortheinternet.wordpress.com/

This blogpost was prepared with ICE+OpenOffice.

Posted in Uncategorized | 1 Comment

CLARION – our chemical data repository project

We were very pleased to be told recently we had been awarded a grant from JISC for repository enhancement. It’s CLARION (Chemical Laboratory repository In/Organic Notebooks) and the JISC page is https://pims.jisc.ac.uk/projects/view/1276. We’re in the process of uploading a project description to JISC but here’s some more informal background…

We believe that most chemistry data in most departments is valuable to science. That sounds like a platitude, but it’s a tribute to the standards of training in chemistry and to the standards of those who develop chemical protocols and instrumentation. Chemists care about quality and a single spectrum or crystal structure can be the ugly fact that slays a beautiful hypothesis (Huxley). These facts are largely reproducible so that the same substance in different laboratories will give the same analytical data (crystal structure, spectra, composition). Of course there are exceptions, but by and large this works extremely well. To the extent that the publication process increasingly requires these data to be made available to reviewers and to readers.
And the data are born-digital. They come out of machines as reproducible numbers. The semantics are not always explicit but they can usually be added if done by the author. But all too often the data are emitted as unsemantic PDF, printed on paper, scribbled with pencil, covered with coffee-mug rings and then published as some ugly bitmap. The poor reader then has to measure the peaks with a ruler.
I repeat. We are in the twenty-first century and we still use rulers.
That’s because the data publication process is not yet developed. Perhaps I should say data publication culture. Because the tools are all there. We’ve done this for the whole of the department’s crystal structures and put them in a repository (C3DER).
The structures are not yet all exposed as we need agreement with the researchers. I’m sure this will be forthcoming readily many have said it gives them a warm fuzzy feeling to make their data available. Usually it has to be done after publication (we don’t expect everyone to adopt open Notebook yet) and this needs culture and process.
So an important part of CLARION will be developing the means for working with scientists to expose their data at the appropriate time. CLARION will expand to include a variety of spectral data, both from central analytical services and from individual labs.
Another key aspect of CLARION is that we shall be integrating it with a commercial electronic laboratory notebook (eLNb). We’re in the process of evaluating offerings and expect to make an announcement soon. This will be a key opportunity to see how feasible it is to integrate a standard system with the needs of a departmental repository. The protocols may be harder but we’ll have the experience from the crystallography band spectroscopy.
An important aspect is that we are keen to develop the Open Data idea globally and we’s be very interested from other groups who are doing or thinking of doing similar things.

This blogpost was prepared with ICE+OpenOffice.

Posted in Uncategorized | Tagged | 1 Comment