- ease of putting things in. It doesn’t require a priesthood (as so many relational databases do). You should be able to put in a wide range of things – these, molecules, spectra, blogs, etc. You shouldn’t have to worry about datatypes, VARCHARS, third normal forms, etc.
- it should also be easy to get things out. That means a simple understandable structure to the repository. And being able to find the vocabulry used to describe the objects.
- flexibility. Web 2.0 teaches us that people will do things in different ways. Should a spectrum contain a molecule or should a molecule contain a spectrum? Sme say one, some the other. So we have to support both. Sometimes required information is not available, so it must be omitted and that shouldn’t break the system.
- interoperability. If there are several repositories built by independent groups it should be possible for one lot to find out what the otehrs have done without mailing them. And the machines should be able to work this out. That’s hard but not impossile.
- avoid preplanning. RDBs suffer from having to have aschema before you put data in. Repositories can describe a basic minimum and then we can work out later how to ingest or extract.
- power is more important than performance (at least for me.) I’d rather take many minutes to find something difficult than not be ale to do it. When I started on relational databases for molecules it took at night to do a simple join. So everything is relative…
Richard Van Noorden – writing in the RSC’s Chemistry World – has described the eChemistry repository project, Microsoft ventures into open access chemistry. This is very topical as Jim Downing, Jeremy Frey, Simon Coles and me are off to join the US members of the project at the weekend. It’s exciting, challenging, but eminently feasible. So what are the new ideas. The main theme is repositories. Rather a fuzzy term and therefore valuable as a welcoming and comforting idea. Some of the things that repositories should encourage are: