update and OR08 postscript

I’m behind at the moment as I was grounded for a whole day in Amsterdam Airport. There is too much to blog about so this is an interim. In no particular order:

  • still need to finalise thoughts from Dagstuhl on text-and data-mining (especially wrt NIH policy and UKPMC)
  • what did I say at OR08 (appended below)
  • congratulations to Egon Willighagen on his doctorate (I’ll try to find space later)
  • unofficial (as always) meeting of the Blue Obelisk in Nijmegen
  •  ORE. The final OR08 day was on ORE.  In UCC we now have the TheOREM and OREChem projects. My self-appointed role is to keep ORE simple.

Also next week I am giving an invited talk at the UK Serials Group in Torquay (on Open Scientific Data, I think).
So here is what I think I said at OR08. (I’d appreciate knowing whether it was recorded). Since about 20% was talk without any slide on the screen it’s easy to show those slides  – “a perfect and absolute blank” [1]. (Seriously, it’s good discipline to occasionally give a presentation without any visible aids.)
Here are my notes. I’m working towards a system where the notes and interwingled with the menu of slides. So every few slides one of the notes appears in the menu. (I wish that more people were trying to create HTML-based slides – Slidy, S5, etc don’t give me what I want which is the freedom to choose when I feel like it. Remember acetates? that was a good system. You could select them in different orders and even write on them to record new thoughts. Powerpoint stifles innovation)
=============

“Repositories and Scientific Data

Rough agenda:

  • What scientists actually do (esp. biologists and chemists)
  • Data loss
  • My use of repositories: (I have been using repositories for 20 years and they are far better that DSPACE, Fedora, Eprints. I realloy wanted to show these in action)
    • PDB (Protein Data Bank)
    • SF (computer code using subverions

UNIT TESTS AND SUBVERSION (I wasn’t able to show unit tests – making sure that every time you write something it is (a) valid and (b) preserved.)


  • What scientists want: pervasive data management
  • We have to build the systems and create the data market
  • Problems of data in current repositories (DSpace)
  • Examples from chemistry:
    • theses
    • publications

  • Demos of Semantic markup (OSCAR, CML, Prospect) (I managed to show OSCAR)
    • data
    • text

  • The LONG-TAIL in Science

      How data gets published:
    • it doesn’t
    • as supplemental
    • on web pages
    • through community / organizations

    You can only rely on a scientist having a knowledge of the following informatics tools:

    • hierarchical filing systems and directories
    • Data-typing through file extensions
    • Addressing through URLs

    and tools

    • Word
    • programs with input and output
    • spreadsheets
    • click-on-the-web
    • Google full-text search
      The SPECTRa project:
    • survey
    • misconceptions
    • temporal disconnect between experiment and publication
    • community

    Bottom-up developments:

    • Open Notebook Science (Bradley)
    • Community through data (Neylon, Piwowar, Coles)
    • Blue Obelisk
    • Wikipedia

    Recording and reporting…NOT the same

    • electronic lab notebooks don’t work
    • Reporting in theses
    • Reporting as publication

    SEMANTIC WEB


    DBPedia (didn’t manage to demo)

      Text and image mining:
    • PubMED
    • A necessary step to create the scientific semantic web
    • Enhances the formal understanding of the subject
    • helps to create ontolgies
    • enhances the value of existing repositories

      We shall not create the appropriate systems unless we know what people actually want…
    • What do scientists want?
    • What do their institutions want?
    • What does the LIS want?

    • RECOMMENDATIONS:

      • Train students to understand the value of information management
      • In their undergraduate projects
      • Create “repositories” with a natural structure
      • Gradually make tools more semantic – RACSO – ICE – WORD2007
      • Introduce validation / unit test for data
      • Use the thesis… academia’s primary advantage
      • Use free- and semi-structured text.
      • Always provide alternatives to PDF
      • Promote Open Data
      • Resources (all Googlable)
        “Open Data in Science” (Murray-Rust on Nature Precedings (http://precedings.nature.com)
      • “Open Data” on Wikipedia
      • Science Commons, Open Knowledge Foundation
      • (http://wwmm.ch.cam.ac.uk) WWMM (WorldWideMolecularMatrix) + Murray-Rust

    =========
    In summary the main idea that I wanted to promote are:

    • Young people are the future. Help them create it
    •  Theses are a major opportunity. Use them. Don’t hand them over to commercial control
    • Create semantic resources. Don’t rely on PDF. Use Word or LaTeX. (I think this message is starting to be heard)
    • HELP the scientists at the beginning of their process – not just at the end.

    Most scientists have never heard the word repository and those that have regard it with as much enthusiasm as “research assessment exercise” (with which it is synonymous in far too many places). Instead LIS staff should work with the scientists to solve their real problems – lost data, unsharable data, lost data. You will have to start wearing white coats.
    =================
    [1] from The Hunting of the Snark (how lovely to findsomething out of copyright)
    He had bought a large map representing the sea,
    Without the least vestige of land:
    And the crew were much pleased when they found it to be
    A map they could all understand.
    “What’s the good of Mercator’s North Poles and Equators,
    Tropics, Zones, and Meridian Lines?”
    So the Bellman would cry: and the crew would reply
    “They are merely conventional signs!
    “Other maps are such shapes, with their islands and capes!
    But we’ve got our brave Captain to thank”
    (So the crew would protest) “that he’s bought us the best—
    A perfect and absolute blank!”

    This entry was posted in Uncategorized and tagged . Bookmark the permalink.

    One Response to update and OR08 postscript

    1. Pingback: Open Repositories conference - OR08 « my:self-archive

    Leave a Reply

    Your email address will not be published. Required fields are marked *