It’s always fun to find one’s blogging picked up in places you didn’t know existed – that’s a great virtue of the trackback system. This is from insideHPC.com (“HPC news for supercomputing professionals. * Reading the HPC news, so you don’t have to.”).
[Murray-]Rust asks, “Where should we get our computing?” 06.4.2008
A post from Murray Rust’s blog at the Unilever Cambridge Centre for Molecular Informatics asks this very question. He leans toward hosted providers
HPC comes out of the “big science” tradition. CERN, NASA, etc. Where there are teams of engineers on hand to solve problems. Where there are managed projects with project managers. Where there are career staff to support the facilities. Where there are facilities.
…So who knows how to manage large-scale computing? The large companies. Amazon, etc. COST D37 had a talk from (I think) Rod Jones at CERN who said that in building computing systems he always benchmarked the costs against Amazon.
It’s an interesting post if you’re wanting to dip your toes in someone else’s perspective on large scale computing.
PMR: It’s actually very topical for us. Two events yesterday. I am handing over the chairmanship of the Computer Services Committee (in Chemistry) as discussing with my replacement and one of the Computer Officers (COs) what we should be doing about servers, clusters and air conditioned rooms. Our building is not designed ab initio – it has evolved over 60 years. New bits come and old bits go. So we are constantly having fun such as gutting rooms discovered to be full of asbestos, mercury and goodness knows what. Sometimes these are in the basement and turned into server rooms. With lots of power to run the machines and lots more power to remove the heat generated by the first lot of power. And, for those of you who do not live on this planet, the cost of power has been increasing. So it’s costing money to run these rooms, the rooms can’t be used for X-ray diffractometers or human scientists.
And every so often something goes wrong. Last week a fan belt went on the aircon. Things melted. The COs had to fix things. Opportunity cost. Money cost.
But we also had a visit from our eScience collaborators in Sydney – Peter Turner and Douglas du Boulay. They’ve been spending some weeks in Europe including visting NeSC, RAL and various other eSciency things and people. Should they use storage systems like SRB (NGS – SRB)? SRB is a distributed file system which allows you to distribute your files. In different continents, etc. It became de rigeur in eScience projects in the UK to use SRB, for example in the eMinerals project. This project, which combines eScience and minerals and has ca 6-9 instituions in the UK and several overseas collaborators was run from the Department of Earth Sciences in Cambridge. Martin Dove has done a great job of combining science and technology and we have directly imported ideas, technology and people to our Centre.
The point here is that they started off with Globus, SRB, etc. And they found they ere a lot of hassle. And they didn’t always work. They worked most of the time, but when your files are 500 or 5000 kilometres away “nearly always working” isn’t good enough. Toby White joined us in the pub yesterday and expressed it clearly “I don’t want my files 500 miles away”. His files are at Daresbury. Daresbury has suffered severe loss problems under the black hole fiasco and files may not always be instantly available. It’s a fact of the world we live in.
I believe my colleagues in chemical computing would agree. There is a psychological need to have most of the resources “physically close”. And that’s difficult to define, but it means more than a telephone to someone in another country who is answerable to a different manager and project.
What about a Commercial service whose sole job is providing computing services? I think there’s a strong tensions here. Not least with the money. How important is it for use to tweak our CPUs? Especially with new generations of GPUs and so on? If it’s important, we need things locally. And often we do.
But it will cost money. At present much of the money is fudged – the machines are “there anyway”. But it won’t always be like that.
So long-tail science – including chemistry – will increasingly need to choose between academic HPC facilities (maybe “free”, but maybe real money), local clusters – autonomy but probably costly – and the cloud. There won’t be a single answer but we shall certainly see the market changing.