These are thoughts for my 15-minute session at #rds2013. Feel free to comment. I’d particularly like to know of any F/OSS that manages timed slide presentation on Windows so I don’t have to use Powerpoint. I have 900 seconds including 5 at each end for stepping up and stepping down. I shall refuse to be introduced – it’s all in Wikipedia (http://en.wikipedia.org/wiki/Peter_Murray-Rust ). It’s therefore essential to have timed transitions, a la PechaKucha. The cryptic notes here will be elaborated in each detailed blog post. The order is random and the numbers of principles will change.
Management of data is a state of mind, not a process or technology. Follow Ranganathan.
- The world owns the data, not you.
Use CC0. (see Ross Mounce’s work on licences).
The data you work with is provided by the universe of things and ideas. It is yours to nurture, refine and evangelize, but not yours to own.
- You do not fully understand the potential of your data.
Encourage downstream use. Data increases in value with refinement, subtraction, and addition. Example: The historic observation of a Chinese eclipse has been used to calculate the coefficient of dynamic viscosity of the earth’s mantle.
- Walled gardens destroy the potential of data and innovation.
Walled gardens, however benign, control access and seriously limit innovation and re-use. You cannot get all of the data out for Open re-use. Examples: Sciverse, CCDC crystallographyReaxys, Chemical Abstracts. Now , Mendeley. Will Figshare remain unwalled for long?
#animalgarden have made a 3.5 minutes video (http://vimeo.com/34323486, there won’t be time to show; it will exercise all your emotions).
- Build the memex for data. (http://en.wikipedia.org/wiki/Memex )
Manage data without noticing. Sourceforge/Github capture our code with zero effort, because we want to use them, not because we have to. We can do this for data. Turn instruments, laboratories and authoring systems into memexes. If you have to “put it in the repository” the system has failed.
- Revere the long-tail.
Most data is in the long-tail of science, collected in individual laboratories on unique protocols and strange instruments. This can only be tackled by giving scientists toolkits for informatics and allowing them to build the solutions.
- Text, data, audio, images, movies are different views of “data” – scientific truth.
They must all be free. The idea that scientific images, video, audio “belong” to people or institutions must be challenged. They are all CC0.
- Mentor young people in data and let them mentor you
Young people have a different, fearless attitude. I’ve seen them attempt the impossible. Sometimes they succeed. (Sophie Kershaw (doctoral student) has been mentoring Oxonians in how to manage data )
- The problems of data are people, not storage or bandwidth.
A computational chemistry program solves Schroedinger’s equation. If you publish the results in full the company will send the lawyers.
I can mine 500,000 reactions from patents (and my colleague Daniel has). Elsevier won’t let me mine any. Nor will ACS. Or the others. These restrictions destroy imaginative thought.
- Develop Patterns for Data
Cameron McLean has shown me how the architects have patterns for building. These were adapted to patterns for software. He’s adapting these to research. We don’t yet have patterns for data.
- Honour Tim Berners-Lee’s 5 stars of Linked Open Data.
Yes. Open Data, Open standards, Open links and Open minds.
- Work collaboratively.
Share tools and ideas. Use hackfests. The library should run hackfests. Not for academics, For everyone. You would be surprised who you get.
- Computing and Bioscience have got it as right as possible.
Emulate them. Use their tools. Create communities like theirs.
- Build your own tools, don’t buy anything.
“Rough consensus and running code” built the Internet and the web. Build, test, teardown, rebuild. Building teaches you. Buying things numbs your imagination, Renting information is even worse.
- Get out more.
Wikipedia was built by non-academics. Academics sneered (and some still do). Wikipedia is the future of scientific information, Steve Coates built OpenStreetmap, Galaxyzoo brought in hundreds of thousands of citizens. Academia neglects the #scholarly poor – non-academics (everywhere) facing daily paywalls.
- Campaign for change.
Read and honour Aaron Swartz. Mail your representatives. Blog. You don’t have to go to jail if enough people protest.
- Use domain repositories
Institutional repositories don’t work – for science and for data. We must create our own. Commercial ones will be constraining and controlled.
- Start bottom-up Communities.
Wikipedia is a bottom-up community. It creates not only knowledge but models of governance. We’ve created the Blue Obelisk for chemistry
PMR: has been involved in all of the above and will no doubt think of more.