Big Science and Long-tail Science

Jim Downing and I were privileged to be the guests of Salvatore Mele at CERN yesterday and to see the Atlas detector of the Large Hadron Collider . This is a “wow” experience – although I “knew” it was big, I hadn’t realised how big. I felt like Arthur Dent watching the planet-building in the The Hitchhiker’s Guide to the Galaxy. It is enormous. And the detectors at the edges have a resolution of microns. I would have no idea how to go about building it. So many thanks to Salavtore and colleagues. And it gives me a feeling of ownership. I shall be looking for my own sponsored hadron (I’ve never seen one). So this is “Big Science” – big in mass, big in spending, big in organisation, with a bounded community. A recipe for success.


CMS detector for LHC


The main business was digital libraries, repositories, Open publishing, etc. It’s clear how CERN with it’s mega-projects (“big science”) can manage ventures such as the SCOAP3 Open Access publishing venture. And the community will need somewhere to find the publications – so that is where repositories come in.

There is no question that High-energy physics (HEP) needs its own domain repository. The coherence, the specialist metadata, the specialist data for re-use. HEPhysicists will not go to institutional repositories – they have their own metadata (SPIRES) and they will want to see the community providing the next generation. And we found a lot of complementarity between our approaches to repositories – as a matter of necessity we have had to develop tools for data-indexing, full-text mining, automatic metadata, etc.

But where do sciences such as chemistry, materials, nanotech, condensed matter, cell biology, biochemistry, neuroscience, etc. etc. fit? They aren’t “big science”. They often have no coherent communal voice. The publications are often closed. There is a shortage of data.

But there are a LOT of them. I don’t know how many chemists there are in the world who read the literature but it’s vastly more than the 22,000 HEP scientists. How do we give a name to this activity. “Small science” is not complementary; “lab science” describes much of it it but is too fixed to buildings.

Jim Downing cam up with the idea of “Long Tail Science”. The Long Tail is the observation that in the modern web the tail of the distribution is often more important than the few large players. Large numbers of small units is an important concept. And it’s complimentary and complementary.

So we are exploring how big science and long-tail science work together to communicate their knowledge. Long-tail science needs its domain repositories – I am not sanguine that IRs can provide the metalayers (search, metadata, domain-specific knowledge, domain data) that are needed for effective discovery and re-use. We need our own domain champions. In bioscience it is provided by PubMed. I think we will see the emergence of similar repositories in other domains.

I am on the road a lot so the frequency (and possibly intensity) of posts may decrease somewhat…

Tags: , ,

