CAS Discourages Using SciFinder to Help Curate Wikipedia

Antony Williams (Chemspiderman) is actively involved in creating Open chemistry. Here he reveals the limitations imposed by the American Chemical Society on creating Open data.

CAS Discourages Using SciFinder to Help Curate Wikipedia Structures and CAS Numbers

Posted by: Antony Williams in UncategorizedCopyright©2008 Antony Williams
Tonight I was catching up with my Watchlist on Wikipedia for the first time in a long time and noted that a comment had been added to the Wikipedia Project: CAS Validation page. This discussion page was started to have a place to discuss a second validation of my work by other membe[r]s of the WP:Chem team and especially to deal with my concerns about CAS numbers not matching the structure drawn in the Chemical Box or Drug Box. Sometimes the CAS number might be for the chloride salt but the structure would be the neutral form for example. So, this was our discussion place. I believe there is general agreement by all participants at WP:Chem that CAS Numbers have value for the users of Wikipedia and chemists is general so the presence of a CAS number in the boxes makes absolute sense and, of course, the correct CAS number for the structure makes sense in an encyclopedia. Therefore, validation and sourcing of CAS numbers has been pursued.
A comment from Eric Shively at CAS can be found here online at Wikipedia. He comments:
Chemical Abstracts Service (CAS) objects to anyone encouraging the use of SciFinder� and STN� to curate third-party databases or chemical substance collections, including the one found in Wikipedia. SciFinder and STN are provided to researchers under formal license agreements, under which the researchers agree to refrain from using these tools to build databases. We urge and expect those researchers to respect the explicit terms of the agreements they have entered into. CAS is a division of the American Chemical Society. Please contact CAS if you have questions. Eric Shively, CAS, eshively@cas.org Eshively (talk) 20:56, 5 March 2008 (UTC)
It’s an interesting stance. This at a time when there is more focus on facilitating information exchange. In an environment where people are using resources such as Wikipedia to source information one would assume that the availability of CAS numbers would actually be encouraged rather than so blatantly discouraged. It’s been said before that CAS numbers are like the phone numbers of the chemistry world so if they were to be sourced from a vendors catalog would that be acceptable? And how would anybody know where they are sourced anyway? If they were sourced from a bottle of chemicals on the shelf and added to Wikipedia is that acceptable?
Nevertheless,� as Mr Shively comments there are legal agreements in place and they are expected to be respected. Question: does every user of Scifinder read the agreement? When a large Pharma company licenses access to Scifinder for their users do they expect people to know the legalities of usage and train their users in such detail? Maybe…
As it is I am not a user of SciFinder…though I’d like to be. I think it’s an incredible resource. So, I don’t have to worry about the legal repercussions of using the system (yet). As it is I will continue my work of curating and I guess there will be a discussion now with the WP:Chem team about what to do about CAS Numbers.

PMR: I should at least thank the CAS/ACS for being so clear about their position – even though it is a simple NO. (It is usually impossible to get any replies at all from Closed Access and Closed Data publishers). In a previous post (Robert Massie on OA and PMR) I reported when Robert Massie commented on the value of Scifinder. Here the issue was that Scifinder (a search engine) and the content (Chemical Abstracts) was Closed, which in m opinion limits its use in Web2.0 applications – RobertM disagreed, saying that Web2.0 and Scifinder was not a binary decision.
Here the issue is that CAS identifiers have come to be accepted as a primary identifier system for chemistry – thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void – it cannot be worked out like an InChI. InChI and CAS serve different purposes – CAS can be related to any substance including mixtures of molecules such as kerosene – InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.
CAS numbers are copyright CAS/ACS who have the legal right to regulate their use – as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS – about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).
An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.
I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.
If CAS do not adapt to the culture of the modern web tensions will continue to increase in the chemical information arena. RobertM has already hinted that there is systematic stealing of CAS material. I do not condone this, but neither do I condone the closed control of a valuable system of identifiers.

Posted in Uncategorized | 10 Comments

What does "Open" mean in Open Repositories?

  1. Continuing the discussion on this blog Robin Rice Says:
    March 6th, 2008 at 3:08 pm eThanks for the response to my comments, Peter. Of course you’ve got a lot of issues in there, and I look forward to hearing the keynote, but on the openness question, yes I think it’s undeniable that not all data can or should be made open. I just couldn’t help but notice its absence in your abstract to a conference about Open Repositories, and have heard you credited as the originator of the term Open Data as well. [see March 6th, 2008 at 3:08 pm e for further comment]…

PMR: I suspect that “Open” is used in a variety of ways in “Open Repositories”. First from OR08:

Repositories play a pivotal role in the evolving scholarly information environment of open access research outputs and scholarly collections. With its theme of “Practice and Innovation”, OR08 will create an opportunity for practitioners and researchers to share experiences and to explore the challenges of the new scholarly communication.
During the four-day conference, Open Repositories 2008 will provide focused workshops and tutorials, followed by general conference sessions that cover cross-cutting and overarching issues and EPrints/DSPace and Fedora user group meetings.
The many repository platforms available today are changing the nature of scholarly communication. Institutions such as universities, research laboratories, publishers, libraries, and commercial organizations are creating innovative repository-based systems that address the entire lifecycle of information—from supporting the creation and management of digital content, to enabling use, re-use, and interconnection of information, to ultimately ensuring long-term preservation and archiving.
The conference program will cover the following themes:

  • transformational change in the knowledge workplace
  • professionalism and practice
  • sustainability
  • legal issues
  • successful interoperability
  • models, architectures and frameworks
  • value chains and scholarly communications
  • services built on repositories
  • use cases for repositories

PMR: The word “open” doesn’t actually occur here and the emphasis is on repositories.
From OR06:

The possibilities of internet -based collaboration have encouraged people to accept new levels of openness. In the current digital repository world that openness manifests itself in two ways:

  1. the desire to disseminate knowledge with a minimum of restrictions (open access)
  2. an open approach to developing the repository software tools (open source software development projects)

However in both cases that openness needs to be managed carefully. In both cases a variety of structures, licences and agreements are possible to enable openness.
Managing appropriate access to repository materials is key to the success of the repository. Researchers and analysts want some parts of their work openly available to the widest possible audience but want other parts only available to their collaborators. Protocols are needed to clarify this spectrum, and systems are needed to enable it. The boundaries of open access need to be expressed by formal agreements.
Open Source software development projects also require some structure to their openness. A variety of licences give form to the open source agreements, and governance structures are necessary for large projects. Higher education communities are taking responsibility for their own software development needs in collaborative “community source” projects. Commercial models also live harmoniously with open source projects.

PMR: This is clearer, but suggests that Open may refer both to Access and Source. It’s further complicated by the Open Archives Initiative which states:

The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. OAI has its roots in the open access and institutional repository movements. Continued support of this work remains a cornerstone of the Open Archives program. Over time, however, the work of OAI has expanded to promote broad access to digital resources for eScholarship, eLearning, and eScience.

PMR: “Open” can therefore refer to Standards and architecture. I’d be grateful for comment from others who have been to past ORnn meetings.
Moreover “Open” is often a political rather than an algorithmic term. Some proponents of “Open Access” maintain that access to human eyeballs (no price barriers) is sufficient – others (like me) require that it also removes permission barriers (i.e. the information can be re-used without hindrance). I have discussed my views of “Open Data” in Open Data in Science (preprint) (and, yes,  I was one of several people who independently started to use the term).
There are many aspects of data that I can discuss and the balance may change as a result of these discussions. At present what is foremost in my mind is that we have few effective data repositories in institutions.(There are several effective domain repositories).  Extending the practice of “Institutional Repositories” – designed to hold single copies of PDF manuscripts – will not address data – we need funadamentally different approaches.
Data are different from manuscripts. Data need managing from the start of the experimental process. It is most unlikely that they will be Open at this stage. (I am a great supporter of Jean-Claude Bradley’s Drexel CoAS E-Learning: Open Notebook Science – where the data is made open as soon as it is collected – and have tried to emulate it but it requires completely new approaches).  Therefore almost by definition data is initially hidden.
I shall argue that the conventional model where information is “put” into “repositories” is the wrong design – certainly for data.  Repositories have to be part of the scientific process – the key person is the scientist. This will actually make much more data open – and I’ll show how this can be nurtured.

Posted in Uncategorized | Leave a comment

Acta Crystallographica E is Open Access

We work closely with the  IUCr – International Union of Crystallography – one of the scientific unions (see International Council for Science) that has done much to develop new approaches to publication. The IUCr publishes a wide range of journals and runs a hybrid Open Access model where authors can pay for full (gold) Open Access.
The IUCr has  always had the policy that data should be Openly available and besides implementing this on its own journals works through the wider crystallographic community to encourage this. As a result major publishers such as the Royal Society of Chemistry and the American Chemical Society expose the data associated with crystallographic work. (In contrast Wiley, Elsevier and Springer do not publish freely accessible crystallography). This exposure of data has allowed us to create the CrystalEye system where machines collect crystallographic data as it is published.
Recognising the  value of publishing data the IUCr created  Acta Crystallographica, Section E, Structure Reports online (journal home) which has the primary purpose of enabling  crystallographers and chemists to publish simple, high-quality, reports of individual crystal structures. All submissions are refereed by humans and robots. There are now hundreds of articles per month leading to a rapid and highly valuable publication of scientific work. This is an important model for the future – I have found people in many fields who wish to publish work that is data-centric – it may not in itself create new scientific paradigms but it is the bedrock on which progress is made. And critical for the scientific semantic web.
In effect the publication of crystallographic data provides the basis of a semantic crystallographic web. We convert the format to XML and then (Andrew Walkinshaw) to RDF. This is giving us a large semantic resource with power beyond the convention dissemination of aggregated chemical and crystallographic data. More of that later.
This post is to congratulate the IUCr in moving Acta E to full (author/funder pays) Open Access with CC-BY licence. The fee is modest (150 USD) and reflects the real costs of publication.
It will help us to enhance CrystalEye in that any information (including images such as structural diagrams) can now be re-used without permission. In effect CrystalEye becomes an overlay journal for a range of crystallographic publications. We point back to the complete article on the Acta E site, knowing that anyone in the world can then have access to the full publication. (For the record, we also point to closed access articles from other journals).
So compliments to the IUCr. Acta E should be an excellent model for learned societies who wish to develop data publication.

Posted in Uncategorized | 1 Comment

Should Data Repositories be Open?

  1. In a comment to my last post (OR08 abstract) … Robin Rice Says: March 5th, 2008 at 4:51 pm eHi Peter,
    I’ve a question. What happened to the impetus for open data in this abstract? This looks like a useful set of solutions for storing/managing/curating data within research centres but not necessarily for disseminating or publishing that data. Repository services could play a role with that, by either
    packaging up some of those long tail datasets and making them accessible now and in the future (after the researchers have moved on to new projects), or by using the embargo features that repository software offers to make data available after the date of publication of a paper on which its based, or to create metadata records for discovery, with access controlled by the researcher, as you suggest is often necessary.

PMR: Obviously I’m a fan of Open Data and Open Access but I don’t take it as axiomatic that all Repositories must be completely Open. The primary purpose (IMO) is that (scientific) repositories preserve information and that they should try to capture all meaningful output from an institution. Much of this is, necessarily, not Open in the first instance. There are, for example, theses (and the data associated with them) that are closed because of commercially sensitive information, humanly sensitive information, etc. and universities have managed this concern for many years. So it’s reasonable that some information may stay closed for a considerable time.
There is also a pragmatic aspect. Many scientists (e.g. in chemistry) would never put their data in an Open repository at the beginning. The fear of being scooped (perhaps even by their own colleagues) or being banned from publication by publishers who regard this as prior disclosure, or invalidating a patent application. To over come that we have created an embargo process so that data can be stored and only disseminated later (in our eCrystals meeting with UKOLN and Soton 3 years was reported as probably tolerable to chemists). I hope that by carefully choosing the protocol it may be possible to lower this time gradually but it takes time and data.
Then – when the data come out of embargo – should they always be Open. I’d say yes, but there may be domain or community norms that militate against that, particularly in fields containing human data.
What is axiomatic, however, is that if we don’t capture it at all, then we cannot ever disseminate it, so my emphasis is on capture.
When giving the talk I do not feel bound to the precise topics in the abstract – so I’ll probably mention Open Data. What is on my mind at the moment is the critical need to adjust the thinking that Institutional Repositories as currently set up will address the data capture problem. They won’t – and if they try they will be much less successful than the IRs have been at capturingPDFs or other fulltext. So the need for a new breed of Data Repositories is clear. They will look very different from IRs if they are going to succeed.

Posted in Uncategorized | Tagged | 2 Comments

Thank you to hosts in Australia

Just got back from a fantastic 3.5 weeks in Australia where so many people looked after us so well. I’m more or less gutted (ended up with 7 presentations, all somewhat different, and a lot of discussions). It’s really helped me firm up ideas on Data Repositories (the Monash group – and ARCHER, ANDS, gave up most of a day to listen and talk). I like what they are developing as a Data Repository – a bit-store with different access technologies overlaying it (File system, SVN, SRB, etc.) We didn’t have time to explore details and I’d like to know more.
We’ve found several groups interested in data capture in chemistry and crystallography using SPECTRa and I’ll be debriefing to Jim Downing  on the train to Bath (yes, leave at 0645 for an all day meeting on eCrystals with Simon Coles). Gets rid of the jet lag – simply transforms it into exhaustion which is easier predict.

Posted in Uncategorized | Leave a comment

Repositories for Scientific Data (at OR08)

I am honoured to have been asked by Liz Lyon and Les Carr to give the opening keynote at Open Repositories 2008 at Southampton. In our group we have been very actively trying to work out what repositories in lab subjects might look like. It is quite clear that the current approach to institutional repositories, which was designed to manage fulltext, cannot and should not be extended for data. The theme of my presentation will be the need for a complementary, but separate, system of Data Repositories. The Abstract I have submitted is:

“Repositories for Scientific Data”
“Scientists are producing data at an ever increasing rate (“the data deluge”) due to automated instruments, image capture and simulation tools. This holds the promise of “data-driven science” where scientific discovery can be made by linking or mining existing data. The reality is, unfortunately, that almost all this data is lost. Although some publishers welcome data as an adjunct to “fulltext”, many do not and most do not have the domain expertise to store and curate the data. And although “big science” (such as high energy physics, geospatial imaging, genomics and structural biology) can often provide domain repositories (e.g. in bioinformatics) most science (“the long tail”) cannot.
There is an urgent need to address this problem. Current Institutional Repositories (IRs) are geared to storing and disseminating scholarly manuscripts and while some are prepared to accept other digital artefacts the practice is fragmented and does not scale. We need to define “Data Repositories” (DRs) which serve the interests of the scientists directly. This is highly domain-dependent and there is no one-size-fits-all solution. However there are some general principles.
* The DRs must intimately embedded in the current practice of the scientists – ideally they should be invisible to them.
* They must directly support the scientific effort and been seen as doing so rather than being confused with metrics, business processes, etc.
* The people running them should be physically present in the scinetific laboratories (“wearing lab coats”).
It is important not to overcomplicate with unnecessary middleware and metadata. The typical informatics toolset of a scientist includes Word/LaTeX, Excel, and the goold old filing system – which with huge storage comes back into its own. Free text indexing tools will do as good a job of creating domain metadata as humans. Many departments are starting to introduce backup systems such as Active Directory, Samba or SVN which satisfy the most important user of the repository – the scientist themself. HTTP/REST is good enough for many departments. These tools are an excellent starting point to engage the scientists and show there is real benefit.
This is a new field and I shall review some of the current approaches, including work from our own group (in chemistry and crystallography). It is critical that prototypes and developed with sustainability in mind. This is difficult (it is rarely possible to get direct grants) but the tools are often well known and easy/free to install. In many cases it may be possible to “hide” the costs of data capture in other accepted activities (“backup”, “publication”, “thesis preparation”, “instrument maintenance”, “analytical services”, etc.). Good prior design is much cheaper than retrofitting “repositories” and can be seen to have an immediate benefit on quality of data, re-use and mashups, speed of thesis preparation, etc. Indeed, if good principles of data management are brought into the teaching and learning process (e.g. in final year projects) then the students themselves will provide much of the innovation and tools.
On the assumption that we can have an Internet connection there will be live demonstrations.

Much of this is due to my colleagues especially Jim Downing. I believe that there has been too much over-engineering and that we should look to simpler approaches based on common tools where possible. In many cases what the scientist wants is a “bit-bucket” where the data can be stored in the knowledge that they won’t be automatically lost when the desktop crashes or they change laptop. Most scientists will not have worked with a versioning system such as SVN and this may be an important productivity tool for managing manuscripts and theses. Access control is an unavoidable necessity (the lab down the corridor may be your worst competitors…) and it highlights the central requirement for any repository system – enough people embedded in the department who can actually fix the glueware on a regular basis. This is the central and costly challenge we have to solve.
(I’m tagging this as OR08 – I couldn’t find any other suggestion)
UPDATE. After I wrote this I realised I need to say more about RDF and ORE and will do so.

Posted in Uncategorized | Tagged | 7 Comments

Make Openness part of the process

From Open Access News:

UAuckland to embed CC metadata for theses in its IR

03:43 01/03/2008, Gavin Baker, Open Access News

Michelle Thorne, University of Auckland embeds CC licensing, Creative Commons blog, February 28, 2008.

The University of Auckland has just announced that they have embedded Creative Commons licensing for all new submissions by PhD students into the university’s digital repository, ResearchSpace.
From the repository’s librarian Leonie Hayes:

“At the moment the showcase collection is PhD theses, there are nearly 800 in the PhD collection, most are open access. There are another 900 awaiting signoff from authors. When new graduates submit online they have a choice of adding a CC licence along with their consent for a digital copy.
We are also investigating application of Creative Commons licenses to our other digital collections.” …

PMR: This is the right way to promote Openness. Make it part of the process.  Put it into authoring tools, software for computation and simulation, for data analysis. So that when someone produces something useful Open Access and Open Data are part of the environment. It also helps to educate the generation of students and other scientists (though actually they are doing most of the educating – to us). 3-4 years of this and even the most conservative staff will become aware of the process.
We should do this in Blue Obelisk and other software. Create a licence which is emitted by every run of the program and requires conscious effort to remove (or setting a non-default input flag).

Posted in Uncategorized | 1 Comment

Proteopedia and BIOMOO at the Weizmann Institute

I was delighted to receive the following letter, with support from Joel Sussman...

Hi Dr. Murray-Rust,
I’m a student in Joel Sussman’s lab at the Weizmann Insitute of
Science. Joel, Jaime Prilusky and I have developed Proteopedia, a
new online tool/database with the overall goal of making structural
biology clearer for chemists and biologists by linking textual
content to 3D structures.
(It’s best explained by clicking the green links on the main page at
www.proteopedia.org)
We came across your article “Chemistry for everyone”, and felt we
should share Proteopedia with you as we subscribe to the same belief
system of open scientific databases accessible to scientists, non-
scientists, and machines.
We would very much be interested in your opinion of Proteopedia, and
if you would be interested in a userid/password to edit some pages,
we would be very please.
Joel wanted me to send you his best regards, and to pass along that
he is currently abroad and have poor email contact.
Best regards,
Eran Hodis

PMR: Eran – I have briefly looked at your pages but because I am abroad cannot give a full commentary. However this looks potentially a very valuable resource. Like all such it will require hard work but it will be worth it. I am sure you will find people of like spirit in the Blue Obelisk community (http://www.blueobelisk.org) and feel free to ask questions or suggest ideas.
Much of my personal inspiration came from another graduate student and the Weizmann, working with JaimeP, Gustavo Glusman. Gustavo became the caretaker/wizard of BioMOO, the biologists’s virtual meeting place. See http://www.cryst.bbk.ac.uk/PPS95/vsns-pps/technology/biomoo.html for a feel of what it was like. Much of the technology was text-only, though in 1994 we were able to develop “portable” graphics for proteins, using RasMOL and MAGE with eMosaic browsers. As a result we “built” rooms in BioMOO and had a course involving 250 “students” from round the world in the “Principles of Protein Structure” course. Gustavo, Jaime and many others deserve much credit for these efforts – many of the ideas are still very relevant to the Open scientific world today.
So much of what has been accomplished has been done by students and I wish you every success.
PeterMR
NB Spelling is “Proteopedia”

Posted in Uncategorized | 2 Comments

Robert Massie on OA and PMR

From Peter Suber’s Open Access Blog:

Robert Massie on OA

15:39 26/02/2008, Peter Suber, Open Access News

InfoInnovation has blogged some notes on Robert Massie’s talk at the NFAIS Annual Conference (Philadelphia, February 24-26, 2008). Massie is the president of the American Chemical Society’s Chemical Abstracts Service (CAS). Excerpt:

…It turns out that from its beginnings in the 19th century until 1966, CAS’ abstracts were written by volunteer abstractors – a robust early example of user-generated content. True, Massie noted, today new standards for chemical information exchange are developing; open access repositories are growing; collaborative websites are emerging; and political/social pressures for more free access characterize the age. “But do [these trends] have to be opposed? Or assimilated?” Massie noted in particular an article that appeared this month in Nature – “Chemistry for Everyone.” In it, noted research Peter Murray-rust argues that CAS is “incompatible with the requirements of Web 2.0”; that “closed publications, binary software and toll-access databases are being swept away by emerging philosophies and approaches.” But, Massie noted, universities are the Web 2.0 homeland, and SciFinder Scholar now serves over 1500 schools. Not only that – many sites in China have sprung up to provide information on how to break into the computer systems of major US universities in order to gain access to SciFinder. So, clearly, “young people in China like SciFinder a lot.”
Massie asserted that the question of Web 2.0 vs. traditional publications is “not a binary problem.” …

Comment PeterS). I wish I had access to the full talk in order to see two parts in full context. First, what did Massie mean by asking whether the trends toward OA (or Web 2.0?) “have to be opposed? Or assimilated?” It sounds like he thinks opposition is unnecessary and unwise. But does assimilation mean adoption? Second, I’d like to see whether he went beyond a narrow response to Peter Murray-Rust’s claim that the new models were sweeping away the old, and offered a wider response to his argument that the new models were superior.

PMR: Like PeterS I only have this snippet to comment on – perhaps Robert can make his slides available? A few points:

  • SciFinder (Scholar) is a good and valuable product. It is de rigeur in chemistry departments. However it is also expensive and many institutions cannot afford it. (I believe that some countries manage a national deal).
  • The information cannot be re-used (it is protected by copyright). This prevents mashups, compilation of secondary resources, etc. It cannot be linked to in a Web 2.0 manner, tagged, etc.

I am prepared to believe the assertion about China. There is a hunger for scholarship. I would also assert that “young people in China like Pubmed a lot” is true.
I will not comment on the ethics or politics of the alleged Chinese actions. However it seems clear that, for whatever reason, scientific information is becoming a battleground. I have already suffered from getting the University cut off by the ACS publications server (for actions that were entirely legal and where the server behaved IMO in an automatic and inappropariate manner – it thought I was stealing info – I wasn’t).
There is clearly a cost to the closed publishing community in trying to protect its content. Whatever the rights and wrongs of copyrighting scientific raw data, it is clear that the content in Chemical Abstracts is won by the sweat of many brows and is copyrightable. If the Chinese students are trying to get this by hacking into subscribers rather than providers there is a threat to academic systems in general. I’m guessing, but I would assume there will be an increasing pressure in contracts for the subscriber to have to provide mechanisms to prevent misuse of the subscribed information. This, of course, goes beyond SciFinder and may have to be seen as a major concern of academia. Do we have to police our information sources in the same way as we police access to airplanes?
But where the information is free these arguments vanish. I’m not arguing for the complete abolition of copyright but there is increasingly little value for it in the promotion of scientific activity. That is why I have urged publishers to prepare for Open Access (and Open Data) as it seems inevitable. The costs (financial and social) on controlling access to what increasing number of scientists regard as Open information will become unacceptable.

Posted in Uncategorized | 2 Comments

Australian update

Very brief. Had an excellent time in South Australia hosted by Philip Lock. Presented to Computer Science Department at UNISA and developed more ideas in “can machines understand chemistry”. A lot of synergy in discussion over XML, RDF, etc. Then a brief visit to the University of Adelaide to talk about eResearch. We spent time talking about what would be useful for Departmental Respositories (sic). This has helped my firm up my ideas well and I hope to post some detailed ideas soon. Shall be giving  a talk next Monday (2008-03-03) at Monash on scientific data. Have been overwhelmed by hospitality and have found a lot of commonality in our discussions.

Posted in Uncategorized | Leave a comment