the chemical semantic web has arrived! just do it NOW

I have been overwhelmed with excitement about the new maturity of semantic technology and RDF data that is available for our construction of the chemical semantic web. Note that I used to write “Chemical Semantic Web” with the assumption that it had to use the whole paraphernalia of the Semantic Web. But – as a newly discovered “scruffy” – I now know that we only need lightweight – very lightweight – approaches and this is usually labelled as the “lowercase semantic web”. So, from now on, I’ll probably write “csw”.
The vision is simple – make everything a URI and use RDF to support searches using the modern generation of tools. We’ve had several sessions – some under the theme “Linking Data” The first contained tutorial material; the second – at which TBL and others demonstrated examples and vision blew away all my scepticism.
Tim’s message was simple – don’t hang around – just do it NOW! Unfortunately I don’t have any material to hand and will have to rely on memory. But I have no doubt that we have the chance to transform the world of chemical information within months. And remember that we can now start using machines to help.
There’s a huge amount of tools! and I am still struggling to know which to use. And some of the terminology (from ontoworld):

A SPARQL endpoint is a conformant SPARQL protocol service as defined in the SPROT specification. A SPARQL endpoint enables users (human or other) to query a knowledge base via the SPARQL language. Results are typically returned in one or more machine-processable formats. Therefore, a SPARQL endpoint is mostly conceived as a machine-friendly interface towards a knowledge base. Both the formulation of the queries and the human-readable presentation of the results should typically be implemented by the calling software, and not be done manually by human users.

So from what I can see

  • we find a URI (derefenceable) linked to a set of RDF that we are interested in (e.g. “chempedia”)
  • point it at an endpoint (e.g. tabulator)

and then issue a query.
I’m probably wrong, but I’ll know more tomorrow. At least I am doing it NOW!

Posted in chemistry, semanticWeb, www2007 | 2 Comments

semantic chemistry – "dbchempedia" and crystaleye

An obvious requirement for the chemical semantic web is that we have chemistry – non-trivial as most is in walled-gardens. But things have really moved in the last hour. I left a message on Martin Walker’s Talk on WP, then at lunch met two of the semantic wiki-people – Chris Bizer who is creating dbpedia and Denny V who is creating a semantic chemistry Wiki in Karlsruhe. By the end of lunch Martin replied as below:

  1. Martin Walker Says:
    May 11th, 2007 at 7:45 pm eAs far as I can tell, there are around 3000 compounds with chemboxes, and over 2000 with drugboxes. I think we have many compounds on WP without chemboxes, but they are typically very brief articles (stubs) with little information. Of course linking into the mainstream of chemical information, as dbpedia seeks to do, may provide an incentive for more wikichemists to work on adding chemboxes. Sounds great!
    Martin A. Walker (Walkerma on WP)

So now they are all in touch and will work out a way that chemistry infoboxes on WP can be extracted into RDF. That will be sensational. It will give everyone a semantic chemistry handbook. You’ll be able to search it with the next generation of RDF tools – these are no longer vapourware. TimBL has a “tabulator” which can browse a RDF triple collection, fetching from the web where necessary. There are many other tools, and people are looking for content for them. So chemistry could be an exciting demonstration.

At the same time we are looking at Nick Day’s RSS feeds from CrystalEye and it looks like these are great starting places for SPARQL et al.

Posted in chemistry, semanticWeb, www2007 | Leave a comment

dbpedia – structured information from Wikipedia => dbchem

I’m at a session at WWW 2007 on Linking Data – which I think will be of enormously important for us. Something I had never heard of before (it’s new this year):
DBPedia
it scrapes 750, 000 infoboxes from WP and truns them into structured RDF. The message is simple – the more implicit structure there is in WP, the easier it is for DBP to extract it. If there is a template for a given category (e.g. chemical compounds) then we can easily create an interface to extract structured RDF. For example DBP now has:
1,600,000 concepts
58, 000 persons
75, 000 YAGO categories
207 , 000 WP categories
and I am sure it will be relatively easy to extract the chemistry (Martin, how many compounds are there with infoboxes?)
DBP has a SPARQL endopint, on an OpenLink Virtuoso server (I am sitting next to these guys) Typical Q:
“All German musicians born in Berlin in 19th Century”
Extensions include

  • free text search
  • COUNT()

Key components are:

  • All concepts are identified by URIs
  • All URIs dereferenceable over the web into a small RDF snippet.

The fantastic thing is that we now have a complete RDF resource FOR FREE. One example which was shown was “von Baeyer”, so whenever we refer to him we get his date of birth, history, probably even his FOAFs! DBP is becoming one of the central information hubs of the emerging web of data.
In that way DBP can become the “popular” chemical hub, while Pubchem-RDF will become the “specialist” chemical hub. Of course they will be linked and possibly even indistinguishable in some RDF snippets.
The queries are fantastic:
“A soccer player with #11 shirt in a club with a stadium of over 40,000 seats born in a country with over 10 M inhabitants”
Let’s think what the Blue Obelisk will be able to do for chemistry. TBL has said we can lash/mash things up “in an afternoon” I am going to find out today what we can do with the chemistry we have got.
The other RDF resources in the same web are books, US census, geonames, CIA factbook, DBLP, dbtune, FOAF, Revyu
600 RDF triples. This is staggering. 100Klinks out of DBPedia
And then in 2 months music, gutenbreg, SW-lifesci, flickr, eurostat, freebase, HTMLweb GRDDL , blogosphere (SIOC), music brainz…
So – let;s do dbchem…!!! There is still a lot for me to learn. There are starting to be several large hubs of links. Which is the hub for a community will depend on what they want and what they create.

Posted in semanticWeb, www2007 | 1 Comment

ChemZoo properties : treat as dangerous

Chemspider has recently replied in depth to my concerns:

ChemSpider as a part of Web 2.0 – and what is that Web 2.0 anyways?
In this blog I am going to excerpt from another blog (and bolded to identify) regarding ChemSpider (based on my previous post it’s the way of the blogosphere) and it’s non Web 2.0 status since pages from the ChemSpider blog are being excerpted in the same way.

I shall tackle the Web 2.0 issues separately but here I am concerned that the material and services produced by Chemspider are likely to be seriously misused by students. Chemspider states:

May I use your service in my teaching class ?

Absolutely. We would especially like the academic community to benefit from the information available on ChemSpider.

Now students – especially those startiing their courses – are likely to accept what they read on the web. When they access a company who sells the calculation of molecular properties the assume that the molecules, calculation and metadata is of sufficient quality for them to use. They cannot be expected to have enough judgment to assume that a large number of the answers they get will be wrong, and that the definitions and explanations of the properties are wrong or unclear. While I, as an experienced scientists in chemoinformatics realise how suspect much of the material and services (not just from Chemspider) is suspect, students cannot.

Here are the “definitions” of some of the properties that Chemspider/ACD provide. The standard of description, the lack of units and metadata, and experimental constraints would be below that that an undergraduate would be expected to present: I shall pick out a few examples

PhysChem Properties (as defined by ACD/Labs):

The Partition Coefficient (LogP) is the equilibrium distribution of a solute between two liquid phases, the constant ratio of the solute’s concentration in the upper phase to its concentration in the lower phase. ACD/Labs provides acess to logP prediction through their freeware ACD/ChemSketch and their LogP addon.

PMR: The Partition Coefficient is NOT LogP, it is P.

The Distribution Coefficient (LogD) is the ratio of the amounts of solute dissolved in two immiscible liquids at equilibrium. The distribution coefficient (logD) equation accounts for all possible partition coefficients (logP) that a system can obtain. For compounds containing a single ionizable group (acid/base) there are 2 partition coefficients or a single distribution coefficient accounting for the relative concentration of each species within each of the two possible phases

PMR: these two paragraphs do not make it clear what the differences between D and P (or LogD and LogP). Moreover there is no mention that a P (or LogP) is meaningless unless the solvents are fully identified. (I assume that the non-polar phase is octanol but I cannot find this in this document or in the calculation of properties. As P is temperature dependent it is necessary to report this – but I cannot find this in the data. (I assume it is 298K but this is not mentioned)

Polar Surface Area (PSA) is the measure of how much exposed polar area any two- or three-dimensional object has.

PMR: This is a fuzzy definition – the algorithm calculates a precise quantity but this gives no indication of how this is done. Different vendors will report different values for this quantity. The impression given is that the precise definition is unimportant. Note also that when this property is first displayed to the student there are no units (we try very hard to impress on students that all numeric quantities must have units).

Surface Tension is a property of liquids arising from unbalanced molecular cohesive forces at or near the surface, as a result of which the surface tends to contract and has properties resembling those of a stretched elastic membrane

PMR: .Again a fuzzy defintion that many educators would indicate that the student did not understand surface tension.

Molar Refraction is the equation for the refractive index of a compound modified by the compound’s molecular weight and density. Also known as the Lorentz-Lorenz molar refraction.

PMR: Molar Refraction is NOT an equation. Any student writing stuff like this would get near-zero marks.

Now I appreciate that Chemspider is “beta” which means that they want to community to correct their bugs but it is not fair to encourage students to be part of it. For example if you want properties of “sodium hydride” it will draw a picture looking like:

Na+ HH2

It is clear that this is NOT a copy of the pubchem entry (which is the normal NaH) but that the Chemspider software (or the ACD software) has taken a correct formula and displayed the formula incorrectly. Verify this for yourself but do not let students near it

Chemspider will calculate properties for any compound, and many of these are meaningless. For example, try “prussian blue” and it will give a logP even though the stuff is an insoluble pigment. Now that is because the chemical formula has been represented as separated iron ions and cyanide ions. This may be useful for searching, but it unacceptable for calculating properties.

So, in summary, you cannot rely on some of the properties calculated by Chemspider. For students that means you should not rely on any.

Posted in chemistry | 3 Comments

tbl+13 – the magic exposed (if not explained)

Here is the theme from TBL’s keynote: (slide 14/39)
tbl1.PNG
The Two Magics of Web Science. He used many examples of the slide below, emphasizing the cyclic nature of the process. Start at the top and cycle clockwise.
tbl2.PNG
This shows how Google evolved – a need (or issue) – developments in technology and social structure – success – and then the dark side (in this case spoofing) starts to foul the system. Here’s a similar ddiagram for wikis
tbl3.PNG
The stars (dotted lines) are the magic – you need both. Creativity has been common but collaboration – real collaboration – is rarer. Sometimes it evolves – e.g. in flickr, where people tag each others’ slides – sometimes its built into the system such as in bioinformatics. Here’s blogs
tbl4.PNG
and here is the bottom-up – or lowercase – semantic web.
tbl5.PNG
So the challenge for the Blue Obelisk community and the blogosphere is what we can do to maximise the chance of building – or evolving – collaboration. Of course we have that on Sourceforge but we need to move to a wider community – the excitement of the blogosphere – the Blue Obelisk Cemetery, etc.
When I tell people here about the coherence, excitement and qaulity of the chemical blogosphere they are impressed. There are certainly other communities in other disciplines but Chemistry can feel it’s making excellent progress. In the next posts I’ll explain how easy it is to create the lc-semantic web and what benefits it will bring us. Don’t be frightened of SPARQL (a query language), RDF (the basis of the semantic web) or OWL (an ontology reasoning language). There are – now – sufficient tools to tackle this.

Posted in semanticWeb, www2007 | Leave a comment

WWW 2007 Presentation

[This is roughly my presentation for the meeting, with conclusions. I may edit it during the day so early feed readers will have captured early versions]
The presentation concentrates on science, but applies to all scholarly journals. Addresses copyright and licenses; patents are completely separate issue.
Background Resources:

petermr posts:

We must act:

  • Need statement on Open Data (c.f. Open Access)
  • Funders must insist on Open Data
  • Institutions must insist that staff publish Open Data
  • Authors should use Science Commons Author Addenda in all data
  • Publishers should make all supporting information Open

In any case the scientific semantic web (2.0) will become so powerful it will ultimately sweep away twentieth century practices. Publishers, you have been warned.

Posted in open issues, www2007 | 1 Comment

Access to and re-use of Open Data in chemistry – impressions

Continuing the preparation of material for WWW 2007 …
It is almost universally held (see Open Data – Wikipedia) that facts cannot be copyrighted. It is common for scientific papers to be accompanied by “supporting information” or “supplemental data”. In most people’s vision, including many publishers, these are “facts” – melting point – molecular weight – amounts of compound obtained, etc. Not “creative works” – any scientist who is “creative” with their facts deserves no sympathy.
But some publishers see it differently. Here’s the American Chemical Society:

Electronic Supporting Information files are available without a subscription to ACS Web Editions. All files are copyrighted by the American Chemical Society. Files may be downloaded for personal use; users are not permitted to reproduce, republish, redistribute, or resell any Supporting Information, either in whole or in part, in either machine-readable form or any other form. For permission to reproduce this material, contact the ACS Copyright Office by e-mail at copyright@acs.org or by fax at 202-776-8112.

at least it’s viewable (but not usable) for free.
By contrast here’s Wiley:
Angewandte Chemie (no public supporting information)
… Blackwell
Chemical Biology and Web design
no public supporting information, but I could purchase the complete article and post it…

Quick Price Estimate

For a quick price estimate to reuse the content enter the information below and click Quick Price. To order, click Place Order.

I would like to…

send it in an e-mail republish it in an academic coursepack republish it in a book republish it on a CD-ROM/DVD republish it in a brochure or pamphlet republish it in a journal or magazine republish it in a newsletter republish it in a newspaper post it on a Web site post it on an intranet site purchase the article

No content delivery. This service provides permission for reuse only.

User type

Individual Educational institution STM signatory Pharmaceutical corporation Health care organization Other organization/institution Author of the article

Portion of the article

Entire article Text extract Any 1 figure Any 2 figures Any 3 figures Any 4+ figures

Quick Price

$306.00

Presumably this sale can be made many times – once for each purchaser.

Elsevier..Tetrahedrom – no public supporting info
… so …
There seems to be a complete lack of Open Data among these publishers. My recollection may be faulty but I thought that the data used to be more exposed. But the current reality is that major publshers expose virtually nothing of the data…
…One representative of Wiley told me that’s because they want to sell it back to us.

Posted in chemistry, open issues, www2007 | Leave a comment

The pit-bull and the pendulum

Continuing the preparation of my WWW 2007 panel material blogwise (and with
apologies to those who have heard me before on this) the following epitomises the difference of interests in the Open/Closed Access/Data community. In 1994 Rudy Baum (C&EN: Editor’s Page – Socialized Science) wrote strongly against Opening chemical data:

National Institutes of Health director Elias A. Zerhouni seems hell-bent on imposing an “open access” model of publishing on researchers receiving NIH grants. His action will inflict long-term damage on the communication of scientific results and on maintenance of the archive of scientific knowledge.More important, Zerhouni’s action is the opening salvo in the open-access movement’s unstated, but clearly evident, goal of placing responsibility for the entire scientific enterprise in the federal government’s hand. Open access, in fact, equates with socialized science.Late on Friday, Sept. 3, NIH posted its proposed new policy on its website, setting in motion a 60-day public comment period (C&EN, Sept. 13, page 7). Under the policy, once manuscripts describing research supported by NIH have been peer reviewed and accepted for publication, they would have to be submitted to PubMed Central, NIH’s free archive of biomedical research. The manuscripts would be posted on the site six months after journal publication.
Many observers believe that, if the NIH policy takes effect, other funding agencies will quickly follow suit. In short order, all research supported by the federal government would be posted on government websites six months after publication. This is unlikely to satisfy open-access advocates, who will continue to push for immediate posting of the research.
I find it incredible that a Republican Administration would institute a policy that will have the long-term effect of shifting responsibility for communicating scientific research and maintaining the archive of science, technology, and medical (STM) literature from the private sector to the federal government. It’s especially hard to understand because access to the STM literature is more open today than it ever has been: Anyone can do a search of the literature and obtain papers that interest them, so long as they are willing to pay a reasonable fee for access to the material.
What is important to realize is that a subscription to an STM journal is no longer what people used to think of as a subscription; in fact, it is an access fee to a database maintained by the publisher. Sure, many libraries still receive weekly or monthly copies of journals printed on paper and bound as part of their subscription. Those paper copies of journals are becoming artifacts of a publishing world that is fast receding into the past. What matters is the database of articles in electronic form.
As I’ve written on this page in the past, one important consequence of electronic publishing is to shift primary responsibility for maintaining the archive of STM literature from libraries to publishers. I know that publishers like the American Chemical Society are committed to maintaining the archive of material they publish. Maintaining an archive, however, costs money. It is not hard to imagine a scenario in which some publishers, their revenues squeezed at least in part by loss of subscriptions as a result of open-access policies, decide to cut costs by turning off access to their archives. The material, they would rationalize, is posted on government websites.
Which is, I suspect, the outcome desired by open-access advocates. Their unspoken crusade is to socialize all aspects of science, putting the federal government in charge of funding science, communicating science, and maintaining the archive of scientific knowledge. If that sounds like a good idea to you, then NIH’s open-access policy should suit you just fine.

“put the [] government in charge of funding science, communicating science, and maintaining the archive of scientific knowledge. If that sounds like a good idea to you, then [] open-access policy should suit you just fine.
Well, I can’t see much wrong with that – it’s certainly a major theme of funding in the UK. It’s not the government alone, of course, there’s the splendid work being done by the Wellcome Trust and other funding bodies. There is the problem of cost, of course, and publishing and archiving costs money. But if a funding body funds research it has a right (and a duty IMO) to make sure that work is as widely available as possible for the longets possible time.
Of course not all publishers use words like “socialized science” – which sounds slightly strange in other countries. But, lest you think that this was a storm in a teacup 3 years ago we have (news @ nature.com – PR’s ‘pit bull’ takes on open access – Journal )


Nature
Published online: 24 January 2007; Corrected online: 25 January 2007 | doi:10.1038/445347a

PR’s ‘pit bull’ takes on open access

Journal publishers lock horns with free-information movement.Jim Giles

The author of Nail ‘Em! Confronting High-Profile Attacks on Celebrities and Businesses is not the kind of figure normally associated with the relatively sedate world of scientific publishing. Besides writing the odd novel, Eric Dezenhall has made a name for himself helping companies and celebrities protect their reputations, working for example with Jeffrey Skilling, the former Enron chief now serving a 24-year jail term for fraud.
Although Dezenhall declines to comment on Skilling and his other clients, his firm, Dezenhall Resources, was also reported by Business Week to have used money from oil giant ExxonMobil to criticize the environmental group Greenpeace. “He’s the pit bull of public relations,” says Kevin McCauley, an editor at the magazine O’Dwyer’s PR Report.
Now, Nature has learned, a group of big scientific publishers has hired the pit bull to take on the free-information movement, which campaigns for scientific results to be made freely available. Some traditional journals, which depend on subscription charges, say that open-access journals and public databases of scientific papers such as the National Institutes of Health’s (NIH’s) PubMed Central, threaten their livelihoods.

Media messaging is not the same as intellectual debate.

From e-mails passed to Nature, it seems Dezenhall spoke to employees from Elsevier, Wiley and the American Chemical Society at a meeting arranged last July by the Association of American Publishers (AAP). A follow-up message in which Dezenhall suggests a strategy for the publishers provides some insight into the approach they are considering taking.
The consultant advised them to focus on simple messages, such as “Public access equals government censorship”. He hinted that the publishers should attempt to equate traditional publishing models with peer review, and “paint a picture of what the world would look like without peer-reviewed articles”.
Dezenhall also recommended joining forces with groups that may be ideologically opposed to government-mandated projects such as PubMed Central, including organizations that have angered scientists. One suggestion was the Competitive Enterprise Institute, a conservative think-tank based in Washington DC, which has used oil-industry money to promote sceptical views on climate change. Dezenhall estimated his fee for the campaign at $300,000–500,000.
In an enthusiastic e-mail sent to colleagues after the meeting, Susan Spilka, Wiley’s director of corporate communications, said Dezenhall explained that publishers had acted too defensively on the free-information issue and worried too much about making precise statements. Dezenhall noted that if the other side is on the defensive, it doesn’t matter if they can discredit your statements, she added: “Media messaging is not the same as intellectual debate”.
Officials at the AAP would not comment to Nature on the details of their work with Dezenhall, or the money involved, but acknowledged that they had met him and subsequently contracted his firm to work on the issue.
“We’re like any firm under siege,” says Barbara Meredith, a vice-president at the organization. “It’s common to hire a PR firm when you’re under siege.” She says the AAP needs to counter messages from groups such as the Public Library of Science (PLoS), an open-access publisher and prominent advocate of free access to information. PLoS’s publicity budget stretches to television advertisements produced by North Woods Advertising of Minneapolis, a firm best known for its role in the unexpected election of former professional wrestler Jesse Ventura to the governorship of Minnesota.
The publishers’ link with Dezenhall reflects how seriously they are taking recent developments on access to information. Minutes of a 2006 AAP meeting sent to Nature show that particular attention is being paid to PubMed Central. Since 2005, the NIH has asked all researchers that it funds to send copies of accepted papers to the archive, but only a small percentage actually do. Congress is expected to consider a bill later this year that would make submission compulsory.
Brian Crawford, a senior vice-president at the American Chemical Society and a member of the AAP executive chair, says that Dezenhall’s suggestions have been refined and that the publishers have not to his knowledge sought to work with the Competitive Enterprise Institute. On the censorship message, he adds: “When any government or funding agency houses and disseminates for public consumption only the work it itself funds, that constitutes a form of selection and self-promotion of that entity’s interests.”

So the pit-bull is loose – which way will the pendulum swing?

Posted in open issues, www2007 | Leave a comment

The reality of closed access

Here’s a typical example of getting information from the literature. Assume I don’t belong to a rich University
and I need to find out about cystic fibrosis. I can go to the splendid Pubmed (MEDLINE)

PubMed is a service of the U.S. National Library of Medicine that includes over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources.

I type in “cystic fibrosis” and find 27497 articles! Here’s #7:

7: Dellon EP, Leigh MW, Yankaskas JR, Noah TL. Related Articles, Links
Abstract Effects of lung transplantation on inpatient end of life care in cystic fibrosis.
J Cyst Fibros. 2007 May 2; [Epub ahead of print]
PMID: 17481967 [PubMed – as supplied by publisher]

<!– //analyze cookie to see if client wants automatic redirection, then do it if ( getCookie(WKCookieName)=='ovid' || getCookie(WKCookieName)=='lww' ) { //user has WKRedirect cookie, expose redirect message document.write('.Redir {display: block;}’) //tracking , based on user having cookie and what product redirected document.write(‘‘); //initiate auto redirect ID = setTimeout(“redirectSelection(getCookie(WKCookieName));”,redirectDelay); } else { //make visible main WK product selection text and links document.write(‘.forkcontainer{visibility: visible;}’); document.write(‘‘);//tracking , user no cookie condition } // –> Sounds important, let’s read it First we get the abstract (the summary)

[title and authors omitted]

BACKGROUND: The impact of lung transplantation on end of life care in cystic fibrosis (CF) has not been widely investigated. METHODS: Information about end of life care was collected from records of all patients who died in our hospital from complications of CF between 1995 and 2005. Transplant and non-transplant patients were compared. RESULTS: Of 38 patients who died, 20 (53%) had received or were awaiting lung transplantation (“transplant” group), and 18 (47%) were not referred, declined transplant, or were removed from the waiting list (“non-transplant”). Transplant patients were more likely than non-transplant patients to die in the intensive care unit (17 (85%) versus 9 (50%); P=0.04). 16 (80%) transplant patients remained intubated at or shortly before death, versus 7 (39%) non-transplant patients (P=0.02). Do-not-resuscitate orders were written later for transplant patients; 12 (60%) on the day of death versus 5 (28%) in non-transplant patients (P=0.02). Transplant patients were less likely to participate in this decision. Alternatives to hospital death were rarely discussed. CONCLUSIONS: Receiving or awaiting lung transplantation affords more aggressive inpatient end of life care. Despite the chronic nature of CF and knowledge of a shortened life span, discussions about terminal care are often delayed until patients themselves are unable to participate.

But I want to know more… maybe the data need re-interpreting … so let’s read the whole article …

Access Online Article
Effects of lung transplantation on inpatient end of life care in cystic fibrosis
Journal of Cystic Fibrosis, In Press, Corrected Proof, Available online 3 May 2007,
Elisabeth P. Dellon, Margaret W. Leigh, James R. Yankaskas and Terry L. Noah View Abstract
You must have cookies enabled on your browser to successfully login.
If you have a User Name & Password, you may already have access to this article. Please login below.
User Name:
Password:
  Cancel
Athens/Institution Login
Forgotten your User Name or Password?
If you do not have a User Name and Password, click the “Register to Purchase” button below to purchase this article.Price: US $ 30.00
Register to Purchase

… and it will cost you 30 USD…
So that – I hope – is an accurate depiction of the difference between Open (PubMed) and Closed (Journal of Cystic Fibrosis)

Posted in open issues, www2007 | 1 Comment

The importance of Open Data

(Note for true effect, go to the real live pages mentioned here).
Here is a page from the Canadian National Committee for CODATA (sent by Alison Ball). I’m going to choose just one of many data sources:

 

About This Database

This database is devoted to the collection of mutations in the CFTR gene and is currently maintained by Julian Zielenski, Anluan O’Brien and Lap-Chee Tsui as a resource for the international cystic fibrosis genetics research community. It was initiated by the Cystic Fibrosis Genetic Analysis Consortium in 1989 to increase and facilitate communications among CF researchers. The specific aim of the database is to provide CF researchers and other related professionals with up to date information about individual mutations in the CFTR gene and phenotypic data associated with CFTR genotypes. While we will continue to ensure the quality of the data, we urge the international community to give us feedback and suggestions. Since the purpose of this database is to facilitate research, we ask our colleagues to use the information with great discretion in clinical settings. Similarly, we ask those who are looking for genotype-phenotype correlation to exercise extreme care in interpreting the recorded data. For information related to this mutation database, please send an email to cftr.admin. For general information on cystic fibrosis, please use our linked sites. Previous website can be found here.

Comments or questions? Please email to cftr.admin
The Database was last updated at Mar 02, 2007

If you have never seen a bioinformatics database, try following: mRNA(cDNA) and Polypeptide Sequence
and you might get something like:

Click in the following graph to get the CFTR mRNA(cDNA) sequence of 600nts

<!– –>

mrnapolypeptideimagesvc.png

Enter the start and end Nucleotide of the CFTR mRNA(cDNA) sequence

From: To: 100

Enter the start and end Amino Acid of the CFTR polypeptide sequence

From: To:

Move:
100nt 500nt 1000nt 2000nt 5000nt Move Left Move right Don’t Move
Zoom:
200nt 400nt Zoom In Zoom Out Don’t Zoom

mRNA(cDNA) and polypeptide sequence:

Get a sequence only copy:
DNA sequence Three-letter symbol polypeptide sequence One-letter symbol polypeptide sequence

The point here is that the data is of great interest to many people – by no means just scientists. It allows you to query mutations in every part of the genome. Imagine if this data were locked up behind a commercial firewall…

Posted in open issues, www2007 | 1 Comment