Unilever Centre for Molecular Informatics
 

petermr's blog

A Scientist and the Web

 

Archive for the ‘Uncategorized’ Category

#scholpub , Maxwell and the Laws of Acadynamics

Tuesday, May 15th, 2012

For many days we have been discussing #scholpub on the GOAL mailing list, run by Richard Poynder. Some important issues are coming up and there is now a healthy divergence of views which RP runs well. I’ll talk more later, I hope.

In the time between trying to content-mine PDF (yes, more later), I thought about the tragedy of the academic commons. We have 10,000,000,000 USD (count the zeros) or mainly public money and student fees to “buy” the #scholpub we produce. That’s a sizable market. It’s not as large as many, but quite enough to run competently and for the benefit of everyone.

Including the #scholarlypoor

But we don’t. #scholpub is the most inefficient “market” in the world. (No, perhaps arms procurement is worse ). I’ll analyse more in a later post. Hint, here’s the answer to my question:

“What’s the difference between Elsevier and British Gas (or Central Trains, or Scottish Power or umpteen more)?”

Answer: There is no regulator for #scholpub.

I wondered why. Basically because academia is 10,000 institutions all going in different directions.

In molecular sciences these particles obey a Maxwellian distribution. Some fast, some slow, some east, some west, some north, some south, some up, some down. Occasionally they bump into each other, but they are basically uncoordinated.

And they give rise to the laws of thermodynamics. The analogy that follows has some merit – I am still working it out – feel free to contribute: The laws in their formal form are not easily accessible but there’s a witty synopsis (http://en.wikiquote.org/wiki/Thermodynamics )

    0 You have to play the game

  1. You can’t win; you can only break even.
  2. You can only break even at absolute zero.
  3. You can’t reach absolute zero.

Law 1 says you can move resources (heat and work) around and that you conserve energy

Law 2 says that there are inefficiencies in the system (loss of useful energy) which only disappear at absolute zero (the lowest possible temperature)

Law 3 is obvious

 

So I thought – there is ten billion dollars in the system. It can be moved around. There are inefficiencies in the system, but if we work together we can achieve high efficiency, And then? The sad truth. So I proposed 3 laws. They are raw, you are welcome to tune the wording. But they are roughly based on the three laws of Thermodynamics and perhaps there is a zeroth here:

0. There is a lot of money in the academic #scholpub system

1. We can change the system by moving money around

2. To do this academics must collaborate

3. Academics will never collaborate

 

 

And when I published them Jan Velterop came up with the lovely “Laws of Acadynamics”. Thanks Jan.

Now there is a way to get round the Second Law. Maxwell’s Demon (http://en.wikipedia.org/wiki/Maxwell%27s_demon ) . A superintelligent being that bats individual molecules around. Organizes Universities to point in the same direction. Yes, we need a Maxwell demon.

But haven’t we had a Maxwell Demon already in #scholpub?

 

 

Whats’ the difference between Elsevier and British Gas?

Saturday, May 12th, 2012

This is a serious question and I have a serious answer. See if you can guess it. If so add a comment.

You can substitute “FooPub” for “Elsevier” where FooPub is any #scholpub such as ACS, PLoS, Wiley, BMC, etc.

You can substitute Eastern Water, Scottish Power, First Capital Connect (a train operator) and many others for “British Gas”.

I shall continue to turn my attention to content-mining in the next few posts.

 

 

Data are part of the future; the OKFN’s contribution

Tuesday, May 8th, 2012

I am really excited about the OKF’s commitment to data.

Most data is lost, badly produced, unclear, etc. The OKFN-P2PU School of Data intends to create a new approach to education for the data-age. I’m very excited to be part of this.

Don’t have time to do more than advertise:

http://blog.okfn.org/2012/05/08/were-recruiting/

The Open Knowledge Foundation are currently recruiting for a Data Wrangler and a Data Visualisation Developer. If you’d like join our team, please visit our jobs page.

At the Open Knowledge Foundation, we build tools and communities to create, use and share open knowledge – and to help others to do the same. In recent months, we have become involved in a growing number of open data projects, and two new positions have now been created within our team.

We are seeking two data experts to join us as a Data Wrangler and a Data Visualisation Developer. Read on to find out more about what the roles involve.

Data Wrangler

We’re looking for a data wrangler who is excited to tell stories through data. You will work on various datasets, to understand them and to tell their story to a broader audience. You will also be involved in training efforts, creating and teaching courses in data analysis to technical and non-technical audiences.

Your role will be exciting and varied, and will include:

  • Work on the School of Data, building learning challenges and course content (see our previous post for more information on the School)
  • Research for our new data blog, coming soon.
  • Collaborations with our Working Groups, for example the Working Group on Open Economics
  • Work on OpenSpending, one of our flagship projects.

Skills

We are open to people from a wide variety of backgrounds; whether coding, visualisation, journalistic, statistical or otherwise. We are seeking someone who has:

  • Experience in data analysis and statistical methods
  • Experience with data cleansing, ETL patterns
  • Good written communication skills
  • Experience with R/Stata/SPSS
  • Coding skill in a modern script language, e.g. Python, Javascript.
  • Basic skills in information/data visualization

If that sounds like you, please visit our jobs page to find out more.

Data Visualisation Developer

As a Data Visualisation Developer, much of your time will be spent on our flagship OpenSpending project.

OpenSpending is about mapping the money. We want to make government finances accessible to advocates, journalists and citizens. Our goal is to collect budgeting information from across the world and to present it in a form that promotes understanding, analysis and participation. Some of the questions we ask are:

  • How much is government spending on health? Is expenditure growing or shrinking? How does this translate into results?
  • What are the proportions of different government programmes? What is spending on prisons compared to schools? How much is Ghana spending on education compared to Nigeria?
  • How much tax do I pay into which area of government?

Our day-to-day work has many facets. We work on the core platform, undertake journalistic projects as part of “Spending Stories”, which won the Knight News Challenge in 2011, and work with organizations and civic activists world-wide to set up local budget transparency projects.

Your role with us

You’ll help us to create new visualizations to answer spending questions through meaningful, visual narration.

Skills we’re looking for:

  • Strong visual design skills
  • HTML5/Javascript visualisation experience
  • Familiarity with several visualization toolkits (e.g. D3, Raphael)
  • Experience with cross-browser compatibility
  • Plus (but optional): Knowledge of Python

Basically: send us some demos of good stuff you’ve done.

Towards a manifesto on Open Mining of scholarship

Tuesday, May 1st, 2012

Tomorrow a small group of people interested in “textmining” will have a Skype meeting under the auspices of the OKFN. We have sort-of-pushed this agenda for some years and now it’s come to fruition – there is clear public awareness of the value of textmining and the barriers that prevent it being used. Indeed my blog has even got mentioned in a financial analyst’s review of Elsevier (the implication being that if Elsevier continues to drag their feet their market will react against them). Of course it’s not just Elsevier, but they are the ones that have had most prominence. So this post if to prepare my mind and hopefully come out with some useful ideas.

There is no doubt that the lack of positive approaches to textmining is having huge costs:

  • Opportunity. We cannot do the things that we want to. Moreover this stifles the imagination of the rest of the community – without exciting examples of what can be done – and they *are* exciting – people do not realise what they are missing. And that’s all of us, not just subscribers to journals.
  • In wasted time. Anyone wishing to do textmining has to spend huge amounts of time trying to get permissions, worrying about being taken to court, and simply waiting for null responses.
  • Bad science. Much published scientific data is flawed. Not necessarily deliberately, but by the outdated methods of publication. Almost no scientific data are reviewed (a few publishers like Int. Union of Crystallography are shining exceptions). And their tools have unearthed bad and fraudulent science. There is no reason to believe it is different elsewhere – in fact I suspect it’s worse – the chance of getting caught is often near zero. Textmining is a major tool in data review.
  • Unexploited information and products. Google et at have shown that there are huge new markets. There is undoubtedly a large market in downstream information and information products from scientific research. I estimate it at low billions for chemistry alone.
  • Bad policy decisions. If the scientific literature is not used fully then decisions are flawed. These range from new drugs, to climate, to the effects of chemical to… Machines can provide decision support that complements humans.
  • Bad scholarship and bad scholarly relations. When a new technology emerges of benefit to scholarship then its wilful prevention for non-scholarly reasons has harmful effects on the whole community. It’s fair to say that many textminers see publishers as a major problem who are solely bent on making money by restrictive practices

There are more – but that should be more than enough to build an overwhelming case.

Now what is “textmining”. The word is very unfortunate for several reasons:

  • There are specific legal aspects of text which may differ from other forms of information.
  • There is a confusion with “fulltext”.
  • It suggests that only the words in scholarship are involved. This is particularly damaging since much information is conveyed in images, diagrams, audio and video (in fact all of the major MIME-types!). For example commercial publishers often forbid the re-use of diagrams or charge large amounts because artistic images have special protection under copyright.

I would like to see a more general term – perhaps “information mining” (IM) which covers all the types about and also “data”. Or possibly “publication mining”. It would be a disaster if we only agree how to manage “text” and left the rest unchallenged.

Some technical background. (I actually suspect that most of the people who make the rules about IM (libraries, publishers) haven’t a clue how it’s done). Simply:

  • You write (or borrow) a program that retrieves the things you want to mine. A simple F/OSS one is called wget. Ours (Nick Day, Sam Adams) is called “PubCrawler and has been specially built for crawling scholarly publications. You point it at a website and it systematically retrieves files/pages one-by-one. The only problem is that if you do this too quickly then it may overload the website, so responsible crawlers have a delay (perhaps 5 seconds) – POINT 1. The argument that textmining will destroy servers is a smokescreen. (There are many ways of avoiding technical problems). Note that if you already have the papers on a local machine this step is unnecessary. Universities create caches to avoid repeated downloads but publisher want the downloads so they can count-the-clicks. This process does NOT violate copyright though it may technically violate the restrictive publisher contracts that Universities have signed.
  • You have another program that mines information from each paper. This is hard and tedious to write but once done is automatic to run. How well it performs depends on many factors (the format of the paper, the language/style of the journal/authors, the use of dumb (GIF/PNG) or semi-semantic (SVG) diagrams, etc.). For text you could use Lucene – an Apache project. Daniel Lowe has shown that it’s possible to mine 500,000 chemical reactions from US patents using our F/OSS OSCAR/OPSIN/ChemicalTagger and the NIH’s OSRA for chemical diagrams. Things are better than they were 5 years ago and I am fairly hopeful about the technical mass-mining of chemistry. This process does NOT violate copyright though it may technically violate the restrictive publisher contracts that Universities have signed.
  • You publish your results. Here there is a potential problem with copyright although I suspect it has never been tested. I suspect anything less than bulk republishing of verbatim full-text would be allowable in many courts. In particular republishing “factual” information would incur no legal penalties, whether or not for commercial purposes.

The miner’s problem.

Simply stated:

  • IM MIGHT fall foul of copyright law. Because of the risk-averseness of libraries and the pressure from some publishers to limit activities such as UK/PMC no authorities are prepared to challenge of test this. Individual researchers left to make their own judgments, with little hope that they will get support from institutions. This canopy of fear is a dampener for research.
  • There are NO explicit rules. Because of this researchers do not know what they can and cannot do. Logic does NOT work in courts of law – only laws and precedence. People who make facile assertions that you can/not do something only muddy the waters.
  • It MIGHT fall foul of database laws such as sui generis in Europe. Against in our risk-averse culture no-one offers support to challenge this.
  • It probably WILL fall foul of the Publisher-imposed extensions to University contracts. These are basically unethical and imposed solely (IMO) for protecting the market.

Simply stated: Miners need clear, simple, permanent, automatic answers so they know what they can and cannot do.

Researchers are responsible people. There are many places where research has to take account of law and there are very few public breaches. The same should be assumed for IM.

The publishers’ problem.

The primary problem is that publishers now have a market (not necessarily of their own making) which is profitable and where change may bring problems. The flip-side, that IM may bring benefits is never mentioned! Thus Richard Kidd of the Royal Soc. Chemistry on this blog has voiced the fear that he/they are worried that my textmining may undermine the RSC’s viability and he wants an assurance that I won’t do anything to harm their income. I think of all publishers in the world the RSC is best placed to benefit massively from IM instead of preventing it happening.

This is a typical problem with monopolies (which the publishers have). They want to see their income continue indefinitely in the same way rather than changing their models. It’s natural, and history shows it’s ultimately doomed. Only the conservatism of academia (see Michael Eisen’s blog) keeps them in business. Whether or not we take the publishers’ interests into account depends on the worth that society gives to their services – and that is changing rapidly.

There is no natural law that says we do or don’t have to accommodate the publishers, whether or not they are learned socs. They no longer have the moral right to control unilaterally how scientific knowledge is published and used. There has been no constructive debate in this area and publishers should think about their source of material and its volatility.

The libraries’ problem.

This is a completely new technology which is opaque to many libraries. There are, of course some world-leaders in information management , especially the NLM and national libraries but the average University has no experience of either the technology or the law. This makes it problematic when publishers suggest that text-miners should go through their libraries and have joint discussions with publishers. This is counterproductive as is drastically slows the process and means that many of the decisions are made by non-practitioners. [I have so far written several times to my librarian and am waiting for a reply]. The rigmarole that Elsevier put Heather Piwowar through with UBC librarians is out of order and in any case doesn’t scale across publishers , libraries or researchers.

Current concerns and why we need principles

There is a high probability that some well-intentioned academics will “negotiate” terms with publishers which then are used a precedent to constrain everyone else. I, for example, am unwilling to accept the terms that UBC have. For that reason we are setting out principles, which we believe are absolute and which will inform the practices and their adoption. In the spirit of the excellent crafted BOAI and other declarations we are working towards words which will last for decades.

Bases of the principles:

  • The scholarly literature is created to inform and enlighten humankind. Authors expect that their material will be as widely used in an many ways as possible and by as many people as possible.
  • Information mining is a natural and major advance in the use of the scholarly literature and brings very large benefits.
  • The only inexorable laws relating to IM are copyright and database rights. These were not designed to restrict the flow of scholarship and should not be used for this purpose.
  • Subscribers to the scholarly literature are responsible people and will not deliberately break the law. They need a globally published set of principles by which they can determine what they may do.
  • Technology and human attitudes are changing rapidly and we should be positively and proactively responsive to them. We cannot and should not try to guess the future and we should not jeopardies it by short-term considerations

And perhaps a single definition. I suggest the term “Open Mining” as inclusive. Note that these principles are statements of what we wish to be the case, not a negotiation. BBB are statements of aspiration.

  • “By Open-mining we mean the unrestricted use of machines to extract, process and republish content in whatever form (text, diagrams, images, data, audio, video, etc.) without prior specific permissions other than community norms of responsible behaviour in the electronic age.”

“Responsible behaviour” and “community norms” covers stuff like server overloading, personal data, deliberate corruption, and adherence to generally accepted Internet practice.

That’s the aspiration. BBB are aspirations. Some scholars and some publishers have adopted them enthusiastically. They have helped enormously.

 

 

 

 

A pictorial Amusement

Monday, April 30th, 2012

I dropped in to see our computer officers today – they’ve just had an aircon failure and I was offering sympathy – they have a lot to deal with. While there I noticed this splendid spanner (== wrench/US). I love tools and this one has a majesty of its own in a computer office. It’s about 40 cm long (see ruler) and we guess it’s about 2 kilos.

I naturally assumed it was for something like bolting units to the floor or something like that, but that’s not why it was ordered. The reason is gently amusing – perhaps you can make some guesses).

Meanwhile tomorrow I’ll be blogging about text-mining. I’ve been hacking code furiously over the last 5 days and feeling it. There is a lot I need to write about but textmining is the priority.

 

Text-mining the scholarly literature: towards a set of universal Principles; Update and strategy

Wednesday, April 25th, 2012

For some years I have seen the primary literature as an enormous untapped resource of scholarly information. We humans are very good at some aspects of “reading the literature” but there are many areas where machines are better and should be used. These include scale (hundreds of thousands of manuscripts), checking, validation, transformation (e.g. scientific units), deduction (many papers have implicit semantics), aggregation of knowledge, and much more. We are now reaching the time when the technology of “text-mining” is mature enough to deploy and, for example, my group and I have developed among the best tools in the world for mining chemistry. I am now expanding that to other fields which I will describe in later posts.

In general the readers of the scholarly literature (who may include the #scholarlypoor) have been seriously frustrated by the restrictions imposed by publishers and universally agreed by librarians. Most subscriptions to most major journals have terms forbidding readers to mine/crawl/index/extract etc. This is not a consequence of copyright – it is an additional restriction imposed by published and apparently automatically assented to by academic purchasing systems (mainly libraries). This automatic assent has done scholarship a grave disservice, so I give the library community a chance to correct the historical record:

Has any library ever publicly challenged the terms of use [on mining] set by publishers? I haven’t seen any. But I’d be grateful to know public cases, and what happened. My current view is that publishers set conditions and that libraries accept them verbatim, which, unfortunately, means that they don’t have a track record of fighting for text-mining or other freedoms.

Moving on, the UK Hargreaves report has recommended removing these restrictions (which are not legally required) and also modifying copyright law. My grapevine suggests there is a high probability that significant changes will be made and that “text-mining” will become widely available without requiring explicit permission. We should prepare for this, and any responsible publisher and library/purchaser should be preparing for this.

A month ago I and colleagues in OKF submitted cases to the Hargreaves process. As part of that I asked 6 major publishers whether I could “text-mine” their journals. Naomi Lillie of OKF is summarising the results and I will keep you in suspense till then. It’s fair to say some were helpful, some were not and some were fuzzy (for whatever motivation).

A number of publishers said we should discuss it with the library. There is no need for this. I and my group can text mine material by myself – in one week Daniel Lowe extracted 500,000 chemical reactions from the US Patent Office without needing any help. Nick Day has built PubCrawler and extracted 200,000 crystal structures from supplemental information without any help. The only thing I need is:

  • An assurance I won’t be sued for behaving like a responsible scholar
  • An assurance that my institution won’t get cut off for (my) responsible behaviour

In case anyone in the publishing or library communities doesn’t understand what “responsible” means, it means:

  • I do not intend deliberately to re-publish the publishers manuscripts (“the PDF”) in bulk without valid scholarly reason.

I am a responsible scholar. I conform to health and safety. I obey the law of the UK. I do not steal. I can justify the expenditures on my grants. I attempt to value and promote human equality in my scholarship. I try to give credit where it is due. Responsible scholarship is a fundamental principle which I believe applies to almost all readers of the scholarly literature. Occasionally I and others fail – there are ample mechanisms for addressing these without forbidding textmining.

So this post asserts my absolute right as a subscriber to the scholarly literature to carry out textmining and to disseminate the results to anyone. I do not need any other permissions.

A number of details follow which I’ll address in later posts.

At present, therefore, a group of us – under the aegis of the Open Knowledge Foundation – is drafting a set of principles for textmining. They include:

We shall come up with a manifesto/set-of-principles. This will be a statement of our rights and our responsibilities. It is not a negotiation, anymore than Tom Paine or the Founding fathers negotiated in the construction of their declarations. Or, more recently, the BBB declarations of Open Access. Those declaration are priceless – it’s just a pity that there are not enough who believe in them enough to push for their universal acceptance. We shall not make the same mistake with the principles of textmining.

 

Panton Fellows, Principles in Japanese, #pantonscience

Tuesday, April 24th, 2012

 

 

It’s been an exciting week in Pantonia. I have been very active with our new Panton Fellows (http://science.okfn.org/2012/04/03/introducing-our-panton-fellows/) Last Monday Ross Mounce came over to Cambridge and we looked in depth about liberating information about phylogenetic trees. This is exciting and keeps me up at night and active on train journeys. And yesterday I took the train to Oxford to visit Sophie Kershaw who’s putting together a radically different course for Graduates, with emphasis in reproducible computing. I’m deliberately downplaying both of these here, as they’ll be telling you all about what they are doing.

Part of yesterday was an evening meeting run by Jenny Molloy – a new Open Science groups with about 12 of us in the Oxford eResearch Centre (OeRC) where we met Dave de Roure who took us out the dinner in the Royal Oak. While there we discussed in some depth what need to be done for text-mining including Diane Cabell and Dave Shotton. It’s really great to see critical mass in this way. I will have a LOT to write about textmining.

So today I met with Ayumi Koso (above) from Tokyo. Ayumi works with the Japanese government in Tokyo on the National Bioscience Database Centre (NBDC). She has already translated the Panton Principles into Japanese (http://pantonprinciples.org/translations/#Japanese ). She’s staying in Cambridge and so today has a chance to meet some OKF people. Here’s our visit to the Panton Arms, preceded by a visit to Hinxton/Sanger Centre to visit Tim Hubbard (OKF advisory). And this afternoon Laura Newman will be coming round to meet.

I am really fortunate to be living in the middle of all this.

(We’ve decided today that the Panton hashtag is #pantonscience)

 

My virtual talk in Poland

Friday, April 13th, 2012

 

I am presenting a talk in Poland today – although I am in Rome. http://www.eifl.net/events/open-science-education-conference-poland

Open science & education conference, Poland

13 Apr 2012 – 14 Apr 2012 Nicolaus Copernicus University (NCU) invites you to the Third International Conference on Open Access, that will take place on 13-14 April in Bydgoszcz, Poland, at Collegium Medicum NCU. This year’s theme is Open science and Open education. More information.

I hope to be on skype from Rome airport – it may be rather hairy.

I tried to create a set of slides and run an audio over them. I used Powerpoint, because it allowed narration easily. It was quite easy to create and I made a 6 minute introduction. But trying to upload it was a disaster – the upload is so asymmetric that it was taking hours and crashing. So I have changed the strategy.

We’ll play the first 6 minutes. http://dl.dropbox.com/u/6280676/first.pptx

Then if we can’t skype it is worth playing the OKF/JennyMolloy/PMR video http://vimeo.com/31861413

If we can skype then Cameron Neylon has agreed to click through the points and links below while I speak.

Start with the “Academic Spring” The Guardian’s remarkable and remarkably apposite http://www.guardian.co.uk/science/2012/apr/09/wellcome-trust-academic-spring

General points

  • Most science research/data is never properly published or used => Bad science, duplication
  • This costs/loses 100 Billion+ per year; so HUGE opportunities for new business/products. Europe or Silicon Valley??
  • The long-tail of science; scholarship OUTSIDE academia?
  • Conventional publication does not work for data
  • Diversity. No single solution. Communities of scholarship. HEP, Astronomy, Chemistry
  • Domain repositories essential; Inst Repos don’t work for science
  • OPEN. Must be BOAI-compliant: use CC-BY/CC0
  • Are universities the solution or the problem?
  • Sustainability. Funders and National Laboratories
  • Mandates are poor instruments; Culture must change. Rewards?
  • Create an author-centric culture/technology. Semantic documents. “ScienceForge”
  • Sustainability. Alliance with wealth-generation industries?
  • Text-mining VERY topical
  • Theses. Must become centralised semantic, Europe?? NL++, UK–
  • Demos: text-mining, repositories
  • Growing points:
    • Open (Web) Technology continues to advance
    • Linked Open Data / Semantic Web
    • Graduate students
    • Scholarly poor
    • Wikip(m)edia
    • Open Knowledge Foundation

“Slide links” – bold is priority

And a big thank you to everyone in Poland for their patience and to Cameron for helping

Horizon2020 what I said in Rome (and what Neelie said)

Wednesday, April 11th, 2012

I always try to blog what I said in meetings as I don’t (can’t) use traditional slides. Today I scraped slides off the web and my talk was significantly different from what I had prepared. This was in considerable part because of what had been said in the morning by, among others, Neelie Kroes (Deputy European commissioner) and Geoffrey Boulton (Royal Society). They anticipated many of my concerns (previous blog post) and I could simply praise them for it. Neelie Kroes was veyer impressive. She knew the field very well and was clearly fundamentally committed to making it happen. Europe can feel proud of her.

I was able to aks her a question – shouldn’t we be supporting young people and how can we get them to contribute to European wealth creation. Why no Euro Google/facebook, etc.? She was very excited and recounted how she’d been to a young person’s hacker camp (? In Spain) with ?thousands camping in tents. And how when she asked a 14-year old “aren’t you afraid of giving away information” – he said “you don’t get it, it’s about sharing”. We exchanged cards and I’m hoping that I can get some of the young people in the OKF involved.

I’d love to blog other aspects of the meeting – don’t even know whether it’s being tweeted

Open Infrastructure for Open Science/Data; and Academic Spring

Wednesday, April 11th, 2012

 

I am presenting this afternoon in Rome to an important group of science-oriented people/organizations – about 70 people will be there. As always I try to talk to people before the presentation. I’ve got 20 minutes, and I want to get across both ideas and examples. So I can’t do it all. This is my “checklist” for things I think are important. (Almost all my “slides” are scraped from the web and I will publish the links shortly in a separate blog).

  • Most science research/data is never properly published or used => Bad science, duplication
  • This costs/loses 100 Billion+ per year; so HUGE opportunities for new business/products. Europe or Silicon Valley??
  • The long-tail of science; scholarship OUTSIDE academia?
  • Conventional publication does not work for data
  • Diversity. No single solution. Communities of scholarship. HEP, Astronomy, Chemistry
  • Domain repositories essential; Inst Repos don’t work for science
  • OPEN. Must be BOAI-compliant: use CC-BY/CC0
  • Are universities the solution or the problem?
  • Sustainability. Funders and National Laboratories
  • Mandates are poor instruments; Culture must change. Rewards?
  • Create an author-centric culture/technology. Semantic documents. “ScienceForge”
  • Sustainability. Alliance with wealth-generation industries?
  • Text-mining
  • Theses. Must become centralised semantic, Europe?? NL++, UK–
  • Demos: text-mining, repositories
  • Growing points:
    • Open (Web) Technology continues to advance
    • Linked Open Data / Semantic Web
    • Graduate students
    • Scholarly poor
    • Wikip(m)edia
    • Open Knowledge Foundation