Why PubMed is so important in the NIH mandate – cont.

In Why PubMed is so important in the NIH mandate  – which got sent off prematurely – I started to show why the NIH/PubMed relationship was so important. To pick up…
The difference between PubMed and almost all other repositories is that it has developed over many years as a top-class domain specific information engine.  Here’s a typical top page (click to enlarge):
pubmed.PNG
Notice the range of topics offered. Many of these are searching collections of named scientific entities. Such as genes, proteins, molecules, diseases, etc. One really clever idea – at least two decades old – was that you search in one domain, come back with the hits, search in another domain, and so on. An early idea of mashups, for example.
You can’t do this with Google. If you search for CAT you get all sorts of things. But in Pubmed you can differentiate between the animal, the 3-base codon, the tripeptide, the enzyme, the gene, the scanning techique and so on. Vastly improved accuracy. You can search for CAT scans on Cats. And there are the non-textual searches. You can do homology seraches for sequences. Similar molecules using connection tables. etc. etc.
Then there is the enormous economy of scale. Let’s say I search for p450 (a liver enzyme). I get 23000+ hits. I can’t possibly read them all. But OSCAR can. OSCAR can read the abstracts anyway, but now it will be able to read many more fulltexts as well. It can pass them to chemistry engines, which pass them onto … and then onto …
You can’t do that with Institutional repositories or with self-archiving. They don’t have the domain search engines, they don’t have the comprehensives. They don’t emit the science in standard XML.
For science it is likely that we have to have domain repositories. With domain-specific search engines, XML, RDF, ORE, the lot. It’s the natural way that scientists will work.
And PubMed – and its whole information infrastructure of MeSH, PubChem, Entrez, etc. is so well constructed and run that it serves as an excellent example of where we should be aiming. It’s part of the future of scientific information and data-driven science.

This entry was posted in publishing and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *