Can I data- and Text-mine Pubmed Central?

Until last week I had assumed that the NIH policy on access to publicly funded research grants full Open Access rights to anyone in the world. The works will be deposited in Pubmed Central (PubMed Central site). Pubmed Central has its own definition of “open access”) and generally uses the phrase “public access” – which is operationally unclear.
Last week I learned at Dagstuhl that data- text-mining of Pubmed Central was blocked by the site itself – delgates had found that there is a maximum of two papers that can be downloaded before the IP address is blocked.
I’d very much like clarification (as I have found the NIH sites and elsewhere extremely difficult to navigate on a consistent basis). There is no explicit mention of the right to download material for data-mining and a lot of verbiage about “consistency with publishers’s policies” which is no help to scientists like me.
So – simply – when the flood of public depositions comes on stream after April 7 (obviously with some delay) can I text-mine them?
This is important. Biology is in critical need of machine help in reading papers. The bioscience community spends tens of millions of dollars (a figure mentioned at Dagstuhl) on annotating genomes including the ontologies and lexicons. Without this we simply do not understand much of the science being published It is hugely costly to use humans for this.
When George Bush signed the mandate he clearly envisaged that the information should be used for the benefit of human health…
…and this means text-mining.
So – simply – can I run my robots over the material deposited by mandate?

  1. Yes – without question or fear of reprisal.
  2. No – not at all.
  3. Well – um – err – it depends on each individual paper and each individual publisher and nobody can give a clear answer

The current answer appears to be 2 (I will be cut off mechanically). I suspect the real answer is 3. Note that although our group has been able to write robots that can understand chemistry we are a long way from understanding publishers’ policies on access (mainly because many are designed to be unhelpful). So it is impossible to do bulk mining as we cannot differentiate publisher policies.
Please tell me I am wrong and that it’s really 1. If not, should we not prepare a case to the NIH – they have asked for submissions – asking them to assert that the policy is 1. and to make it clear. Perhaps the Open Knowledge Foundation should create a submission.
If the NIH aren’t prepared to do this then the “victory” is only the first step in a long struggle for liberating data.

Posted in Uncategorized | Tagged | 15 Comments

update and OR08 postscript

I’m behind at the moment as I was grounded for a whole day in Amsterdam Airport. There is too much to blog about so this is an interim. In no particular order:

  • still need to finalise thoughts from Dagstuhl on text-and data-mining (especially wrt NIH policy and UKPMC)
  • what did I say at OR08 (appended below)
  • congratulations to Egon Willighagen on his doctorate (I’ll try to find space later)
  • unofficial (as always) meeting of the Blue Obelisk in Nijmegen
  •  ORE. The final OR08 day was on ORE.  In UCC we now have the TheOREM and OREChem projects. My self-appointed role is to keep ORE simple.

Also next week I am giving an invited talk at the UK Serials Group in Torquay (on Open Scientific Data, I think).
So here is what I think I said at OR08. (I’d appreciate knowing whether it was recorded). Since about 20% was talk without any slide on the screen it’s easy to show those slides  – “a perfect and absolute blank” [1]. (Seriously, it’s good discipline to occasionally give a presentation without any visible aids.)
Here are my notes. I’m working towards a system where the notes and interwingled with the menu of slides. So every few slides one of the notes appears in the menu. (I wish that more people were trying to create HTML-based slides – Slidy, S5, etc don’t give me what I want which is the freedom to choose when I feel like it. Remember acetates? that was a good system. You could select them in different orders and even write on them to record new thoughts. Powerpoint stifles innovation)
=============

“Repositories and Scientific Data

Rough agenda:

  • What scientists actually do (esp. biologists and chemists)
  • Data loss
  • My use of repositories: (I have been using repositories for 20 years and they are far better that DSPACE, Fedora, Eprints. I realloy wanted to show these in action)
    • PDB (Protein Data Bank)
    • SF (computer code using subverions

UNIT TESTS AND SUBVERSION (I wasn’t able to show unit tests – making sure that every time you write something it is (a) valid and (b) preserved.)


  • What scientists want: pervasive data management
  • We have to build the systems and create the data market
  • Problems of data in current repositories (DSpace)
  • Examples from chemistry:
    • theses
    • publications

  • Demos of Semantic markup (OSCAR, CML, Prospect) (I managed to show OSCAR)
    • data
    • text

  • The LONG-TAIL in Science

      How data gets published:
    • it doesn’t
    • as supplemental
    • on web pages
    • through community / organizations

    You can only rely on a scientist having a knowledge of the following informatics tools:

    • hierarchical filing systems and directories
    • Data-typing through file extensions
    • Addressing through URLs

    and tools

    • Word
    • programs with input and output
    • spreadsheets
    • click-on-the-web
    • Google full-text search
      The SPECTRa project:
    • survey
    • misconceptions
    • temporal disconnect between experiment and publication
    • community

    Bottom-up developments:

    • Open Notebook Science (Bradley)
    • Community through data (Neylon, Piwowar, Coles)
    • Blue Obelisk
    • Wikipedia

    Recording and reporting…NOT the same

    • electronic lab notebooks don’t work
    • Reporting in theses
    • Reporting as publication

    SEMANTIC WEB


    DBPedia (didn’t manage to demo)

      Text and image mining:
    • PubMED
    • A necessary step to create the scientific semantic web
    • Enhances the formal understanding of the subject
    • helps to create ontolgies
    • enhances the value of existing repositories

      We shall not create the appropriate systems unless we know what people actually want…
    • What do scientists want?
    • What do their institutions want?
    • What does the LIS want?

    • RECOMMENDATIONS:

      • Train students to understand the value of information management
      • In their undergraduate projects
      • Create “repositories” with a natural structure
      • Gradually make tools more semantic – RACSO – ICE – WORD2007
      • Introduce validation / unit test for data
      • Use the thesis… academia’s primary advantage
      • Use free- and semi-structured text.
      • Always provide alternatives to PDF
      • Promote Open Data
      • Resources (all Googlable)
        “Open Data in Science” (Murray-Rust on Nature Precedings (http://precedings.nature.com)
      • “Open Data” on Wikipedia
      • Science Commons, Open Knowledge Foundation
      • (http://wwmm.ch.cam.ac.uk) WWMM (WorldWideMolecularMatrix) + Murray-Rust

    =========
    In summary the main idea that I wanted to promote are:

    • Young people are the future. Help them create it
    •  Theses are a major opportunity. Use them. Don’t hand them over to commercial control
    • Create semantic resources. Don’t rely on PDF. Use Word or LaTeX. (I think this message is starting to be heard)
    • HELP the scientists at the beginning of their process – not just at the end.

    Most scientists have never heard the word repository and those that have regard it with as much enthusiasm as “research assessment exercise” (with which it is synonymous in far too many places). Instead LIS staff should work with the scientists to solve their real problems – lost data, unsharable data, lost data. You will have to start wearing white coats.
    =================
    [1] from The Hunting of the Snark (how lovely to findsomething out of copyright)
    He had bought a large map representing the sea,
    Without the least vestige of land:
    And the crew were much pleased when they found it to be
    A map they could all understand.
    “What’s the good of Mercator’s North Poles and Equators,
    Tropics, Zones, and Meridian Lines?”
    So the Bellman would cry: and the crew would reply
    “They are merely conventional signs!
    “Other maps are such shapes, with their islands and capes!
    But we’ve got our brave Captain to thank”
    (So the crew would protest) “that he’s bought us the best—
    A perfect and absolute blank!”

    Posted in Uncategorized | Tagged | 1 Comment

    OR08 "Repositories and Scientific Data" – the challenge of complexity

    I’m at OR08 – normally I would try to blog the meeting but (a) I am recovering from my preentation and (b) I’m off to NL to defend Egon Willighagen’s thesis. So these posts will be bitty…
    I was honoured to be asked to kick of the meeting and chose the title above. In my presentations I generally don’t know what I am going to say in detail – I have a menu of several hundred slides and choose on-the-fly, depending in part on the audience (their backgrounds, interests, etc.) . This occasion took this to new heights.
    I’d prepared a number of interactive demos – OSCAR, OSCAR3, videos from JoVE: Journal of Visualized Experiments , Eclipse Unit tests, and a movie from Andrew Walkingshaw. At least three of these have never been shown in public, so this was fairly ambitious.
    Andrew had prepared his movie on MAC (and bought the pukka Apple movie generator). He mailed it to me – I downloaded QuickTime and iTunes on Vista. It showed a movie of polar bears in a think snowstorm (i.e. white on white). No matter what I and Jim did in the small hours of the morning we couldn’t get it to run. So we planned to run it on Jim’s machine and switch the video when necessary.
    Then I found out that because the conference was oversubscribed there was an overflow room. So the presentations were geared to use Powerpoint running on a dedicated Mac….
    Um… So the Soton techies (who did a great job) linked my output not through the VGA but through the VPN on an RJ45. Linked through a VPN to the other room.
    Years of presenting have thrown up some fun situations. Henry Rzepa and I created chemical presentations (Mage, RaMOL) for WWW1 at CERN in 1994. We prepared it two days in advance. Then, before the meeting, someone deleted the shared libraries (*.so) to “save space”. We got the demo working at – I think – T+5 (i.e. 5 mins after Henry had started presenting). Another when stewardess spilt coffee over my laptop – it recovered except that the semicolon key crashed the machine and to type an “a” it had to be cut and pasted from existing text.
    You can see what is coming. The complexity of the system – movie -> Vista -> VPN -> Mac -> Screen bombed. We abandoned the VPN (in real time) and rebooted. When Vista reboots it is nice and leisurely so gives natural spaces for me to fill in. When it finally came up there were two displays – one on my machine and one on the screen. But they were different. So I had to drive the talk from the screen …
    I have no idea what I said, so I can’t blog it. I hope I got most of the messages over. I’ll try to remember some of the things I’d like to have said and blog this later.
    But it was fun for April 1…

    Posted in Uncategorized | 5 Comments

    Structured Experiments and OR08

    xI’m gathering data for my presentation at OR08. Having appealed to the readership of this blog and found zero 🙁 I’m now looking at other blogs. A very valuable post from Cameron Neylon …

    The structured experiment


    From Neil [Saunder]

    My take on the problem is that biologists spend a lot of time generating, analysing and presenting data, but they don’t spend much time thinking about the nature of their data. When people bring me data for analysis I ask questions such as: what kind of data is this? ASCII text? Binary images? Is it delimited? Can we use primary keys? Not surprisingly this is usually met with blank stares, followed by “well…I ran a gel…”.

    Part of this is a language issue. Computer scientists and biologists actually mean something quite different when they refer to ‘data’. For a comp sci person data implies structure. For a biologist data is something that requires structure to be made comprehensible. So don’t ask ‘what kind of data is this?’, ask ‘what kind of file are you generating?’. Most people don’t even know what a primary key is, including me as demonstrated by my misuse of the term when talking about CAS numbers which lead to significant confusion.

    I do believe that any experiment [CN – my emphasis] can be described in a structured fashion, if researchers can be convinced to think generically about their work, rather than about the specifics of their own experiments. All experiments share common features such as: (1) a date/time when they were performed; (2) an aim (”generate PCR product”, “run crystal screen for protein X”); (3) the use of protocols and instruments; (4) a result (correct size band on a gel, crystals in well plate A2). The only free-form part is the interpretation.

    Here I disagree, but only at the level of detail. The results of any experiment can probably be structured after the event. But not all experiments can be clearly structured either in advance, or as they happen. Many can, and here Neil’s point is a good one, by making some slight changes in the way people think about their experiment much more structure can be captured. I have said before that the process of using our ‘unstructured’ lab book system has made me think and plan my experiments more carefully. Nonetheless I still frequently go off piste, things happen. What started as an SDS-PAGE gel turns into something else (say a quick column on the FPLC).

    [… and a good deal more…]

    PMR: This is very important and I shall draw heavily on this and add my interpretation. Simply put, the whole idea of “putting data in repositories” is  misguided. It is not addressing the needs of the scientific community (and I’m not going to expand ideas here because they are only half formed).

     

    Cameron – I’d be grateful for any more thoughts on this issue – public or private. They will be attributed, of course. Your ideas will probably form the “front end” for the work that the Soton group has been doing so attribution will be important there.

     

    Posted in Uncategorized | Tagged | 6 Comments

    Why the triple needs to be a quint

    From Cameron Neylon:

    Semantics in the real world? Part I – Why the triple needs to be a quint (or a sext, or…)


    … This definitely comes with a health warning as it goes way beyond what I know much about at any technical level. This is therefore handwaving of the highest order. But I haven’t come across anyone else floating the same ideas so I will have a shot at explaning my thoughts.

    The Semantic Web, RDF, and XML are all the product of computer scientists thinking about computers and information. You can tell this because they deal with straightforward declarations that are absolute. X has property Y. Putting aside all the issues with the availability of tools and applications, the fact that triple stores don’t scale well, regardless of all the technical problems a central issue with applying these types of strategy to the real world is that absolutes don’t exist. I may assert that X has property Y, but what hppens when I change my mind, or when I realise I made a mistake, or when I find out that the underlying data wasn’t taken properly. How do we get this to work in the real world?
    [… lots more – on provenance, probability, etc. snipped …]

    PMR: In essence Cameron outlines the frustration that many of us find with the RDF model. It makes categorical assertions which have 100% weight and – in its default form – are unattributed. Here are three assertions:

    • The formula of water is H2O
    • The formula of snow is C17H21NO4
    • Snow is frozen water

    Assuming I have the implicit semantics that frezzing does not change the chemical nature of a substance (not always true), these three statements taken at face value create a contradiction.
    I can remove the contradiction by introducing the semantic that a formula may be associated with more than one name and that a name may be associated with more than one formula. This taken at face value prevents us from making any useful inferences.
    What I have felt a great need for (echoing Cameron) is that the triple should be enhanced with two properties:

    • the provenance (the person or software making the assertion)
    • the weight of the assertion

    “At Dagstuhl it is continuing to snow.”
    If I pass this sentence to OSCAR it may mark up snow as a chemical substance. In doing so it now gives every annotation a weight based on the confidence (I shan’t explain how here). So, for example, it is much more likely that 2-acetylfoobarane is a chemical than HIV (hydrogen-vanadium-iodide) and OSCAR addresses these concerns.
    It’s possible to add provenance and confidence to RDF but I don’t know of a standard approach for doing this. If we start doing this we need to make sure we have consistent schemas.
    (Interestingly we’ve just been discussing the value of adding “strength of statement” to the results of text mining.

    Posted in Uncategorized | 4 Comments

    The high cost of the lack of open data

    From Peter Suber’s blog:

     

    The high cost of the lack of open data

    14:43 25/03/2008, Peter Suber, Open Access News

    The Value of Spatial Information, a report by ACIL Tasman prepared for Australia’s Cooperative Research Centre for Spatial Information and ANZLIC, March 2008.  (Thanks to Baden Appleyard.)  From the executive summary:

    …Constraints on access to data are estimated to have reduced the direct productivity impacts in certain sectors by between 5% and 15%. It is estimated that this could have resulted in GDP and consumption being around 7% lower in 2006-07 (around $0.5 billion) than it might otherwise have been….

    Comment.  These are big numbers and it takes a minute to put them in perspective.  In one country (Australia) in one year (2006-07), lack of OA to one kind of data (spatial data) cost the economy $500,000,000.

    PMR: I hardly need to comment. However in our current discussions at Dagstuhl on Text Mining and Ontologies in the Life Sciences it is clear how valuable Open Data is. It’s also clear how much the lack of open data in chemistry holds back innovation. I don’t have numbers, but it would be great to have an economist look at this…

    Posted in Uncategorized | 1 Comment

    Text Mining and Ontologies

    A brief update. I’m privileged to have been invited to a meeting at Dagstuhl in Germany. It’s on Text Mining and Ontologies (I expect that we shall all post abstracts over the next few days). It’s a heavyish program and – rightly – wireless is forbidden in the seminar room so there won’t be many posts. I’m applying Chatham House rule although you can see the participants on the web.
    The discussion today included

    •  whether ontologies modelled reality
    • can we create ontologies from text-mining (general answer no, except in limited cases, but it may be a useful help)
    • whether ontologies should always be created by domain experts (generally yes – any contracted-out ontology is garbage).

    I presented our group’s work today – OSCAR is now well known and being used elsewhere. It’s nice to be in an area where software and resources are freely available. My optimism level about free knowledge in science has risen.
    Also some useful unattributed conversations about repositories – the computer scientists are not impressed with DSpace etc. and the resources spent by universities. I’m gathering ideas for my presentation at OR08 next week on Data Repositories. I am gearing up to generate lively discussion.

    Posted in Uncategorized | Leave a comment

    Update on molecular repositories

    Catalysed by a recent comment on a 2007-12 post (Exploring RDF and CML) :

    1. Tim Berners-Lee Says:
      March 21st, 2008 at 1:09 am ePeter,Sounds exciting!Do you have any public RDF molecule data others could explore? URIs?Tim BL

    here’s an update of where we are at with molecular repositories. (We shall have a clearer idea when several of our group present at Open Repositories 2008 (OR08) in 10 days time. (Lots of progress can be made in 10 days). I’m omitting details here (so as not to spoil the show next month).

    • We are committed to RDF+XML/CML as the future for molecular information. This is the only way that we can manage such diverse information as documents, recipes, results of calculations, spectra, crystallography, physical and analytical properties, etc. The CML schema is now being used in many places and has remained stable for 15 months. Almost all parts have now been tested in the field (the main exception is isotopic variation – e.g. in geoscience). We can easily go from CML to RDF – the reverse is not always possible. The value of CML is that it is currently easier to use for chemical calculations as there is a knowable coherence of related concepts. Note that the CML community is developing a number of subdomains (“conventions”) which allows some degree of local autonomy as in CMLComp.
    • We are enthusiastic partners in the OREChem project (Chemistry Repositories, and from Jim Downing ORE! Unh! Huh! What is it good for?). This uses named (RDF) graphs to describe local collections (“Aggregates”) of URIs. The project will have several molecular repositories of which we shall contribute at least 2 (CrystalEye and our neascent “molecular repository”). All content will be Open Data.
    • Jim Downing has developed a lightweight repository (MMRe) based on Atom/REST and RDF. I won’t give too much away except to say it is deployed and over the last few days Joe Townsend has been adding data from chemistry theses (SPECTRaT) and Lezan Hawizy has been adding our collection of “common molecules” (scraped from various sources). This can now be queried through SPARQL.
    • Andrew Walkingshaw has converted the CML in CrystalEye to RDF – 100,000 entries and probably about 10 million triples. He’s been working with a well-known semantic web company (not sure if this is public yet) and has done some very exciting extraction and mashups. SPARQL searches work over this size. Andrew has also developed Golem – a system which extracts dictionary links (cml:@dictRef) from CML computations and is able to build dictionaries (ontologies) automatically and then to extract data.
    • In the last four days Thomas Steinke has converted VAMP to emit CML. We have run a few hundred calculations automatically (by extracting molecules from the NMREye repository, converting them to input, running the calculation, and then converting to RDF). The results – which contain coordinates, energies and NMR peaks – are being fed into another local repository.

    So we have a variety of sources which will all be available. We face a number of exciting questions.

    • How do we express a molecule in RDF? We are gradually converging on an “aggregate” where a molecule has identifiers, properties, and special resources such as chemical formula, the CML connection table, and a list ofg chemical names.
    • How do we assign identifiers. This is a really hard problem. Although for many chemicals there is little doubt about the relationship between names, identities and properties there cannot, in general, be a “correct” structure or a “unique URI” for a chemical. Look, for example, at “Phosphorus Pentoxide” (In WP). Experiment shows that there are several different forms, with different chemical connectivities. There are 2 formulae (P2O5 and P4O10) each with a different CAS number (Chemical Abstracts is a major authority in chemistry). Are these different chemicals or do they represent our changing chemical knowledge? Is one used for early publications and another for later ones? Only CAS can say when one number is used and not the other. It is because of this uncertainty that we cannot know exactly how many different chemicals there are in the CAS collection.

    There cannot be a platonic semantic description of chemical identity – many chemicals do not have a “correct” structure. Antony Williams has been doing a heroic and valuable job in detecting inconsistencies in reporting chemical structure and resolving them where possible (eMolecules and ChemSpider – A Respectful Comparison of Capabilities). But he is not establishing a “correct” structure – he is making authoritative statements about the relationship between names, structures and identifiers.
    This brings us to why RDF – probably in its quad form (i.e. with provenance) – is important to describe chemical structure.

    • Many substances occur in several forms and there is no single structure. We hope that RDF can manage these relationships.
    • Many name-to-structure assignments have changed over time as out experimental techniqures become more powerful. Thus the C19 chemists would first write PO5 (atomic weights were not “correct”), then P2O5 and only after X-ray crystallography P4O10. To understand historical chemistry we have to know the relationships used at the time.

    We have scraped about 10000 compounds from the web including Wikipedia and have a variety of triples associated with each. There is little overlap of triples – names, CAS, formulae are present or absent. So we now need to use RDF technology to reconcile this information. It’s a complex task and we will probably have to add weights/probabilities to some of the statements – some authorities are less reliable than others.
    In the first instance we’ll probably use some of the commonest identifiers to assert identity and that’s the version we should be releasing in a few days.

    Posted in Uncategorized | Leave a comment

    More COSTly work – DALTON and VAMP

    I am honoured to be a member of the COST D37 CCWF working group on interoperability in chemical computing (see Semantic Chemical Computing for the last meeting in Berlin). COST enables European collaboration by not only having group meetings but also by exchanging scientists (STSM – short term scientific missions).
    We’re excited by this model because we’ve had two working visits from COST members. First was Kurt Mikkelsen from Copenhagen with the DALTON code (I was away so I may get details wrong). Kurt is interested in physical quantities such as hyperpolarizability and other tensors of high rank. Although the code is fairly ancient – and (I gather) has spaghetti in places – Kurt and colleagues here (Toby White, Andrew Walkingshaw) were able to put enough FoX in that DALTON can be said to be CMLized.
    Now we’ve had a visit from Thomas Steinke from Berlin for 2 weeks (Thomas also helps run the COST process). Thomas is a developer on the VAMP program (Tim Clark) which is a semi-empirical code from the AMPAC phylogeny. The code is pretty hairy in places (e.g. backwards GOTOs out of loops) it took us (Thoams, Andrew, Toby and me) about 2 days to get a running version that emitted metadata, calculated NMR shifts, and final properties and coordinates. Yesterday we added the history (steps, cycles, etc.) which is not conceptually difficult but a spaghetti nightmare in places.
    So – if you understand a code well – it is possible to make substantial progress to CMLizing it in about 2 days. Well structured code is easy and the difficulties arise primarily from unmaintainble code. I’ll blog this in detail later, but it’s straightforward to output coordinates, NMR shifts (peakList), energies, and a wide range of scalar, array and tensor/matrix properties.
    The next phase was to run Andrew’s Golem over the outputs and create a dictionary and we’re now about to convert the results into RDF and put it into our molecular repository. More later…

    Posted in Uncategorized | Tagged | Leave a comment

    Repositories and Scientific Data (for OR2008)

    I have been invited to give a keynote lecture at Open Repositories 2008 (see the programme – about 25% down) and have chosen the title “Repositories for Scientific Data”. I’d value help from the repositarian blogosphere and elsewhere.
    My thesis is that the current approach for Instituional Repositories will not translate easily to the capture of scientific data and related research output. In some fields of “big science” (e.g. High Energy Physics) the problem is or will be solved by the community and their funders and institutions have effectively no role. However much – probably most – science is done in labs which are the primary unit of allegiance. Typical disciplines are chemistry , materials science, biochemistry, cell biology, neuroscience, etc. etc. These labs are often focussed on local and short-term issues rather than long-term archival, dissemination of data to the community, etc. Typical worries are:

    • My grad student has just left without warning – can I find her spectra?
    • How can we rerun the stats that our visitor last year did for us?
    •  My laptop has just crashed and I’ve lost all the images from the microscope
    • My chosen journal had to retract papers due to recent scientific malpractice. Now they want me to send them all my supporting data to prove I have adopted correct procedures. This will take me an extra month to retype in their format.

    If we are to capture and preserve science we have to do it to support the scientist, not because the institution thinks it is a good idea (even it is is a good idea). So we have to embed the data capture directly into the laboratory. Of course in many cases there is a key role for the Department, particularly when – as in chemistry – there is a huge investment in analytical services (crystallography, spectroscopy, computing).
    I am developing this theme for the presentation and would be very grateful for anecdotal or other information as to where the institution or department has developed a data capture system which ultimately feeds into medium-term (probably Open) preservation. Two emerging examples are Monash which has acquired a petabyte for storage of University scientific data and will layer a series of access mechanisms (SVN, Active Directory, Samba, RDB, SRB, etc.) on top of it. Recently Oxford has announced a Data Repository.
    If you have material that will help give a balanced picture of data reposition in institutions I’d be grateful for email (or comments on the blog but I’ll be offline for a few days from Monday). I’m aware that some disciplines have domain repositories independently of institutions (e.g. HEP, bio-sequences, genes, structures, etc and David Shotton’s image repository for biology) – I’m after cases where the institution has invested in depertamental or lab facilities and which are actually being used.
    Many thanks in advance.

    Posted in Uncategorized | 2 Comments