I am honoured by this award; I ‘ll describe the current struggle for ownership of digital scholarly knowledge, emphasize young people and machine-understandable theses and suggest practices.
Early Career Researchers see the digital literature – including theses - as a primary research resource. We’ve set up ContentMine – a non-profit supporting machine reading and analysis of scholarship. There are 10-20,000 journal articles a day – and several hundred theses - so machines are essential. Today we’re announcing 6 ContentMine fellows – all of whom have exciting projects to create new bioscience from the scholarly literature.
But this brave new world is often opposed by the Publisher-Academic complex. Academia feeds knowledge and public money into companies who in return define the scholarly infrastructure and the rules by which Academia has to play.
The key issue is who controls scholarship? Universities? Students? Researchers? Or corporations only answerable to their shareholders? How many universities have been arbitrarily cut off by publishers with the accusation that “their” content is being stolen? Knowledge that should be available to the whole world is being controlled and monitored. Increasingly, universities are acquiescent and even required by publishers to police “compliance”.
Last month one of our fellowship - a graduate student colleague in the Netherlands - was legally mining the literature to detect malpractice – such as unjustifiable statistical procedures. After 30,000 downloads a publisher cut off the University and – without discussion - wrote denouncing him for “stealing” content. They required his research be stopped. The University complied. Then another publisher. And a third. Last month Cambridge was cut off for 3 weeks by one publisher. No explanation. No dialogue.
Europe is trying to reform copyright to support research. I am working with them, but there’s massive lobbying by publishers. They want to control and monitor everything. Textual content, repositories for data, metadata, metrics for academic glory.
Machine-understandable e-theses represent one of the remaining areas not controlled by publishers. They are a new opportunity for universities and a knowledge resource for everyone – citizens as well as academics. They report billions of dollars of research, and are often the only place where it’s published. To maximize the spread of knowledge – which young people are passionate about – some suggestions.
- Be proud of theses.
- Think of “use” rather than “deposit”
- Make theses globally discoverable.
- Involve citizens everywhere. Think of the Global South.
- Don’t repeat the mistakes of the “West”. Do it differently.
- Release immediately.
- Use DOCX, Tex, CSV, SVG, XHTML, besides PDF.
- Use versioned text and data GIT, DAT …
- Use openly controlled international repositories.
- Use permissive licences allowing mining and re-use.
- Do not hand over rights for content, discovery or access.
- Don’t buy systems - Encourage young people to build them.
- Experiment with Open Notebook Science.
- Encourage and use e-theses as a primary tool for research.
- Use Wikipedia / Wikidata as the default metadata for scholarship.
And a warning: Unless libraries take this type of opportunity now they will be increasingly replaced by commercial services and disappear. E-theses and young people are your chance.