#animalgarden welcome Charlie the @peerJMonkey

#animalgarden like everything that’s Open and especially animals so they were very excited to hear of @thePeerJ and its blue monkey. PeerJ is an Open Access publisher (and PMR thinks it’s a good thing and may publish in it). But not everybody liked the monkey. Here’s Kent Anderson of the #scholarlykitchen http://scholarlykitchen.sspnet.org/2012/05/22/the-risks-of-launching-a-new-services-business-branding-cash-flow-and-the-fraught-start-of-peerj/

The PeerJ branding start isn’t promising … [it] currently has a blue monkey as its branding element … Apparently, the blue monkey’s days are numbered, but this is what we have today. … The problems with the PeerJ brand are obvious and easily fixed. Here’s the prescription:

  • No silly blue monkeys.

#animalgarden were upset. It sounded like people didn’t like the monkey. And for many weeks monkey had no name. It’s not easy to be friends with someone with no name. But then @thePeerJ had a name-the-monkey-competition http://blog.peerj.com/post/43649138800/name-the-peerj-monkey . That meant that monkey was going to be permanent. And a few days ago (http://blog.peerj.com/post/44637115411/naming-monkey ) the monkey has a name:

Charlie

Named after Charles Darwin.

That’s tremendous. So #animalgarden had a welcoming party – complete with red carpet:

[Photo: Tom Murray-Rust, CC-BY 3.0]

 

Charlie is a bit overwhelmed. He’s not met three-dimensional animals before and that white stuff in the garden is wet and cold and not very good for cardboard. He hopes PeerJ give him a “proper body” soon.

When they do, #animalgarden can interview him at the Panton Arms as they did with Gulliver Turtle from BioMeCentral.

Hear/see interview on http://vimeo.com/34259668. Charlie likes the idea of a fire, but not while he is still cardboard.

And #animalgarden think mascots are very important. Here are some of them:

[Photo: Tom Murray-Rust, CC-BY 3.0]

  • 12 o’clock: Tux, Linux penguin
  • Chuff, @okfn_okapi
  • Sleepless: (from Seattle)
  • Gulliver Turtle BioMedCentral
  • ‘#ami2 liberating STM publications by software
  • CrazyFrog (Gilles Frydman @gfry and ACOR www.acor.org
    : Assoc Cancer Online Resources)
  • WOL: the Semantic Web Ontology Language (OWL)
  • JUMBO: PMR chemical software
  • Felix Q Potuit (London School of Economics) [the LSE beaver is on Felix’s T-Shirt]

So, #animalgarden have two requests:

  • When is PLoS going to get a mascot?
  • When will we be able to get a three-dimensional #peerJMonkey ?
Posted in Uncategorized | Leave a comment

#ami2 #animalgarden #ignorantchemist use SVG2XML to turn PDFs into XHTML for BeyondThePDF2

#animalgarden are excited. They are going to Beyond-the-PDF2 #btpdf2 and have been accepted for a demo. That’s a lot of hard work so they are working hard. Last time you saw them (/pmr/2013/03/05/animalgarden-ami2-svg2xml-ignorantchemist-transforming-pdfs-into-xml/ ) they had extracted the characters from PDF and turned them into SVG. Now they are creating HTML – the language of the web.

 

#ami2 is explaining sub- and super-scripts, italics, colours and a lot more. (AMI got some bling from @kitware on their 15th birthday. She’s not really a bling-animal and it gets in the way of the keyboard). Here’s what she started with:

And here is what she has produced:

#animalgarden and #ignorantchemist think that’s great. It’s captured everything that’s necessary for further interpretation (sub/superscripts, italics) and coloured (red) the characters they had to translate from non-Unicode fonts (mainly MTSYN). Although #scholarlykitchen might condemn the lack of beautiful typesetting it would be possible to improve it automatically using CSS stylesheets (e.g. to create equispaced lines). That’s not the point. It will be possible to extract the mathematics from it. And the data. We can get the rate constant. And turn it into SI units (by dividing by 3600*1000) . ( You can’t do that from PDF.)

That’s real beauty.

(unlike the images from my phone…)

Posted in Uncategorized | Leave a comment

#animalgarden #ami2 #svg2xml #ignorantchemist Transforming PDFs into XML

 

#ami2 has “solved” the problem of transforming PDFs into SVG and Unicode. “solved” is relative as there is a perpetual increase in non-conformant PDFs with strange fonts, but AMI2 has transformations for the most important. She probably has a conversion rate of better than 99.9%. Now she has to turn these into structured XML.

As before the protagonists are Sleepless (project manager), Chuff (@okfn_okapi, Open enthusiast), #ami2 (semantic expert but no understanding of humans or pragmatics) and PMR (#ignorantchemist).

S: Why is PMR called #ignorantchemist?

PMR: Because the Scholarly kitchen called me that and said I knew nothing about typesetting.

S: Is that true?

PMR: I think I know enough to instruct #ami2.

A: I only understand what PMR tells me in semantic form. It must be deterministic or occasionally controlled stochastics.

S: Here’s an example. PMR didn’t say where it came from. It probably doesn’t violate copyright.

S: There are 11 lines here…

A: … there are 17 different values of y-coordinate so I count 17 lines.

S: I don’t understand.

A: There is no concept of “line” or “word” in PDF. Only characters at given coordinates. I count each set of characters with the same y-coordinate as a “line”.

C: #ami2 never makes a mistake.

S: what’s the first line, #ami2?

A: spaces “minus (#8722)” “one”, spaces, “minus (#8722)” “one”. They all have y=”343.872″. I put them in order of increasing X-coordinate. There is normally no concept of “space” so this means “estimated disjoint characters”.

C: Why isn’t “The rate constant…” the first sentence?

A: because it has a greater y value. Here is the end of the first line and the start of the next. I’ve simplified it for you.

<text svgx:width=”333.0″ x=”441.801″ y=”343.872″ font-size=”7.074″>−</text>

<text svgx:width=”500.0″ x=”447.319″ y=”343.872″ font-size=”7.074″>1</text>

<text svgx:width=”611.0″ x=”297.915″ y=”347.308″ font-size=”9.465″>T</text>

<text svgx:width=”500.0″ x=”303.698″ y=”347.308″ font-size=”9.465″>h</text>

C: That’s simple???

A: I have removed the font and colour information. “−” is a minus sign. See the font of line 0 is smaller than line 1.

S: Why is the y-coordinate bigger? It’s further down the page.

A: Because SVG has y going DOWN the page…

S: OK. So we have to work out that -1 is a superscript because it’s 3.5 units above the next line and because the font-size is smaller.

A: Yes.

S: And “130 and 200…” has both sub and superscripts. That’s very complicated.

A: once I have been taught complicated things precisely I can do them. PMR has told me what to do with sub/superscripts.

S: so what does this line say?

A: “1”,”3″,”0″ space “2”,”0″,”0″ space SUPERSCRIPT(WHITE_BULLET) “C”

PMR: ARGHHH! That’s an abomination!

S: What’s the problem?

PMR: It is meant to mean “degrees” “C” but they have used the wrong symbol. It should be a degree sign (“°”). There’s a perfectly good Unicode symbol.

S: But the #scholarlykitchen said they are the experts in typesetting and you are #ignorantchemist. And people have paid a lot of money for the typesetting. It is more important to be beautiful than correct.

PMR: Well they have got it wrong. It’s garbage. It’s not even beautiful

S: Please calm down. #AMI2 can you detect when people use superscript(whiteBullet)?

A: Yes.

S: So we can read all the published papers and find the errors. It would be a form of tidy().

C: Yes and provide a service to the world.

PMR: *****

Posted in Uncategorized | 2 Comments

#animalgarden find a name for @thepeerj #PeerJMonkey

#animalgarden are excited – the Blue Monkey needs a name: http://blog.peerj.com/post/43649138800/name-the-peerj-monkey

Many of you already know and love the PeerJ Monkey Mascot. Up until now, we just called it “Monkey” and Monkey has already starred in its own line of T-Shirts, Mugs, Stickers and Pins. But it doesn’t seem right for Monkey to go nameless anymore, especially now that Monkey is an International Media Star and Monkey’s Infinite Troupe of Monkey Co-authors have almost finished their Shakespeare play.

So, to help celebrate our launch, we are holding a ‘Name the Monkey’ competition. We don’t know if Monkey is male or female; old or young; what species of monkey; or from what part of the world (universe?), but we are hoping you can help us out! Please suggest suitable names and a backstory for Monkey. Why is Monkey holding a test-tube and a pencil? Why that enigmatic smile? And what is Monkey sitting on? Only you can tell…

   

How to Enter: Enter by tweeting to Twitter; or by replying to this post on Facebook; or by replying to this post on Google+; or by making a blog post (or even an entire web site) all of your own!  

Simply make a Post(s) or Comment(s), with your suggested name and any suggested backstory. You must include the hashtag #PeerJMonkey and you must include a link to a PeerJ page (you choose the most appropriate link). Otherwise all remaining characters are yours, but there are bonus points if you can make the Monkey appear in unusual places on the internet, or graphically re-imagine the Monkey…


On Monday 4th March we will make a decision, and announce the new name. Humorous, scientifically oriented, or generally ingenious names are preferred!

And what is the prize? Fame clearly. Fortune may have to wait. But we will open a twitter account in the name of Monkey, and Monkey will start tweeting (with its own personality). Plus, you get a Monkey branded T-Shirt and a Monkey Mug! Short of a lifetime’s supply of bananas, it doesn’t get any better than this.

So stop monkeying around and send us your entries! 

AnimalGarden had a meeting:

Here’s Sleepless, Gulliver, WOL, Tux, and Chuff. #AMI2 has been working and blogging on https://peerj.com/articles/36/ DinoMike’s article

They couldn’t agree on a name. Too many ideas. Animals acquire names and you can’t rush the process. They have to do it by tomorrow. That’s too short. So they asked PMR, and PMR asked his brother. And the name of his childhood monkey was:

EFFIANMINGO

They think that’s a great name.

What’s the background story? It’s true

Many years EffianMingo, a much loved and worn monkey, was travelling with PMR and HammondMR in a their parents’car in rural England. It was a Morris 10 with a sliding roof. As it was a lovely day outside the roof had been slid back and the sun shone in and the wind whistled over the top. EffianMingo created a hypothesis: Monkeys can fly. He asked Hammond to test this hypothesis and throw him out of the car into the wind. This was an easy experiment to carry out and soon EffianMingo hurtled through the roof. He didn’t fly back.

So the car was stopped and an expedition mounted and after a long and tearful time EffianMingo was found on the road. He didn’t like hitting the road – it was hard and bumpy.

From N=1, EffianMingo asserts the NullHyphothesis:

Monkeys can’t fly.

When BlueMonkeys are available as real monkeys (not just T-shirts) then we’ll get one into #animalgarden, just like @gulliverturtle from BioMedCentral.

 

 

 

Posted in Uncategorized | 1 Comment

#animalgarden “Hybrid gold” and “Universal Access” #elsevier at “Beyond the PDF” #btpdf2

#animalgarden are very excited. They are going to Beyond-the-PDF-2 #btpdf2 to give a demo of AMI2. AMI2 is a tool to read the whole scientific literature and extract the factual data. That will be legal in the UK in October 2013, so they are preparing by reading Open Access articles. AMI2’s mascot is AMI, the kangaroo. Here she talks to Chuff the @okfn_okapi (Open Knowledge expert) and Sleepless the project manager. Sleepless will contact PMR for human problems. Note that Chuff is bouncy, full of enthusiasm and wants to see the whole world Open. AMI has no emotions and doesn’t understand humans. She can be taught algorithms, heuristics, semantics, logic. She never gets tired, angry, bored or makes a mistake (unless PMR has given her something incorrect).

The demo will show how Scientific Technical Medical (STM) PDFs can be converted to semantic form automatically. #ami2 has done a lot of this already. She wants PMR to work hard until #btpdf2 to give her as much power as possible. That means analysing a range of publishers. #ami2 has already done the CC-BY Open Access publishers BioMedCentral (#animalgarden likes Gulliver Turtle), eLIfe (no animal yet) and PeerJ(The blue-monkey-with-no-name). Now they have come to #elsevier:

S: Elsevier is not an OpenAccess publisher so almost all their content is closed and AMI2 cannot analyse it.

C: But there are some hybrid articles which are author-paid to be Open.

A: Where is the list? I can download them and analyse them.

S: There is no list.

C: Why not. If the authors have paid for them, Elsevier should list them.

S: That’s what PMR thought. He asked @wisealic , the director of Universal Access, (/pmr/2012/08/05/elsevier-replies-about-hybrid-openacess-i-am-appalled-about-their-practices-breaking-licences-and-having-to-pay-to-read-open-access/ ) on 2012-08-05

5. Where is the machine-readable list of all articles published under this scheme? I wish to download and analyze all of them.

At this time we do not publish a separate machine-readable list of all sponsored articles, but I will share this suggestion with appropriate colleagues involved in our various open access infrastructure projects.

C: How many articles were there? And how much had the authors paid?

S: about 2000 articles. Assuming 3000 USD each that is SIX MILLION DOLLARS

C: So Elsevier can afford to pay someone to make a list?

S: Yes. You can get a lot of human work for a small fraction of 6million dollars

C: I would have thought that Elsevier would want people to read these artciles.

S: That’s what PMR thought. But obviously #elsevier thinks otherwise (PMR: or doesn’t think). Could AMI make a list by reading all the #elsevier splash-pages and seeing which are Open.

C: Elsevier doesn’t label them consistently. Every publisher is different. So we don’t know what to look for.

PMR: and why should WE do Elsevier’s work for them. They take our money …

S: No rants, PMR. This is a constructive discussion. Last August @wisealic said they were working on the problem:

August 8, 2012 at 3:04 pm  (Edit)

Hi Peter,

We are currently investing in a major overhaul of our open access infrastructure and until that upgrade is complete do have various systems limitations in presenting open access content. To make our open access more clear and visible in the interim we’ve created various work-arounds. You have found two problems with these – many thanks for flagging them for our attention.

A: Six months have passed. What have they changed?

S: Here’s @wisealic a few days ago

What is Elsevier doing to ensure that OA content in hybrid journals is discoverable by institutions that do not subscribe to that title? My colleagues inform me that they are discussing the issue with an array of vendors to find a solution.

A: I understand an Array. It is a sorted list over which I can iterate. So Elsevier has

Array<Vendor>

If you tell me what a vendor is and how to get the articles from it I can iterate over all the articles.

S: A vendor is somebody who sells things to libraries. I do not understand this reply, I’ll ask PMR:

PMR: I do not understand it either. Elsevier does not need vendors to count 2000 articles and make a list. They simply need the will to do it and give the community what they have paid for and deserve. I do not feel Elsevier is behaving in a constructive manner. Since some of their “open access” articles appear still to be behind a paywall. I think such a list might give problems.

S: AMI, I cannot give you a milestone date for when Elsevier will release a list of its Open Access articles. Maybe @wisealic will at list give us a short list.

C: Does she read this blog?

S: I will ask PMR to mail her.

A: If you give me a list I will download the articles and iterate over them.

PMR: Let us hope @wisealic gives us some answers that we can act on.

 

 

Posted in Uncategorized | 5 Comments

Liberation software

[This is a first draft! Will be used for the talk and then refined].

I’m talking this morning at Kitware, a US company (SME) spun out of GE and making enormous contributions to Openness. It creates and distributes a widely used toolkit (VTK, http://www.vtk.org/ ), and the molecular visualizer and builder Avogadro . They’ve invited me to Clifton Park, NY and I’m launching my ideas on Liberation Software.

Liberation Software is the concept that software is a critical element of fighting the digital battle (see Eben Moglen’s Guardian article). There are many things software can do:

  • Publication of information (my first HTTP server gave me immense power)
  • Discovery and linking of information
  • Creation of communities
  • Speaking truth to power (e.g. online petitions)

Today I’m going to talk about software that liberates knowledge. I’m a bit rushed because I only got my ideas together at breakfast and I haven’t drawn a good diagram. I’ll summarize the problems, then the liberation. There’s probably a bias on scholarly publication but it’s meant to be general.

Problems

  • Gatekeepers. An excellent example is STM publishers who create “paywalls” (usually 30-50 USD to read a single paper for a day). No rights are given and control is absolute. There are reverse walls – the difficulty of getting your voice heard.
  • Technological mismatch. Typical examples are binary files (sometimes encrypted). These can often be deciphered with enough effort.
  • Missing metadata. The information only works with a given program which “knows what the data means”. Or the creator uses abbreviations on their Excel columns
  • Apathy. Many people don’t care to make their material available at all
  • Lawyers, Politicians, lobbyists. By far the hardest to tackle. Organizations assert they own the info and will sue/jail people who have succeed in the earlier steps.

Solutions

[Placeholder until I get time to draw it.]

Components :

  • Crawler: visits legally visible sites
  • Format Converter: deciphers what it finds
  • Semantifier: makes semantic sense out of data.
  • Repository: easy storage of semi-structured information (RDF/XML/Mongo, etc.)
  • Services: (often domain specific) addition/filtering/transformation/conversion of information (taxonomy, content-mining, annotation…)
  • Browser: Human interacts with information
  • Javascript in browser to extract and transform received information
  • Amanuensis: “desktop” companion providing close coupling with human. (e.g. AMI2, Avogadro). Lightweight “AI”.
  • LinkedOpenData Cloud everywhere.

The emphasis is to transfer power from the gatekeepers to the human-amanuensis partnership.

Posted in Uncategorized | 1 Comment

The Scholarly Kitchen Challenges me over STM PDFs; I’d like your help

I have recently blogged about the standard of “typesetting” in STM publishing and commented that much of it was very poor in that it destroyed the identity of the characters in the document (i.e. many fonts do not use Unicode so machines cannot automatically read the PDFs (correctly)). This has drawn criticism from “The Scholarly Kitchen” blog about me:

Ignorance As Argument — A Chemist Alleges Publishers Exploit Typography for Money

 

In a recent blog post, Cambridge chemist and crystallographer Peter Murray-Rust argues that publishers are using typesetting and typography to slow down science, extract fees, and control business:

It’s interesting that Murray-Rust explicitly admits ignorance about how typesetting is done at most publishers while also implicitly admitting ignorance about typography in general.

I wrote a comment to the SK challenging their statement and starting to put my position. (Having studied the output of STM publishers on a daily basis for 8 months I am extremely aware of the differences between Helvetica and Arial). However the editor of SK, Kent Anderson deleted my comments and replied in private email. (I had offered to write a guest blog for the SK). I am therefore answering their criticisms on my own blog.

For those not familiar with SK, it’s a group of experts in the STM publishing industry. Some of the contributions are useful and informative. But they are mixed with a group-think politics that questions Open Access, the role of the NIH and PubMedCentral, as is generally self-congratulatory and conservative. [I’ve put that gently]. Recently Anderson has taken an anti-eLife, anti PMC, anti-OA approach with the religious zeal of Senator McCarthy and the sleuthing ability of Woodward and Bernstein. He’s been extracting emails through FOI/A from anyone connected with NIH/PMC etc. [Note: I am on the Project Advisory Board of EuropePMC so I’m not de facto persona grata at SK].

The charge of “ignorance” is meant pejoratively (as, I suspect, is “chemist” – i.e. someone outside the sanctum of STM publishing). But I take it as a compliment. I AM ignorant of everything that goes on in STM backrooms (typesetting, format conversion, graphics, etc.) because it’s highly secret for each publisher. I therefore have to guess, and there is no shame in that. My current hypothesis is that STM backroom technology is inefficient, costly, unsuited to the modern age and could and should be swept away in the same way that Murdoch swept away hot metal in the UK newspaper industry.

But my main concern is that STM publishing corrupts and destroys scientific communication, especially to unsighted humans and machines. The SK has countered by saying that the care they devote to creating typeset PDFs is exactly what readers want. They’ve asked them. [I am not sure what options they gave.]

So my question to the readers of this blog are:

“IN AN ELECTRONIC AGE ARE YOU HAPPY WITH THE PDFs THAT STM PUBLISHERS CREATE?”

For myself I never print PDFs (except boarding passes) and so I’ll give you two examples of what I find seriously problematic:

  • Cut-and-paste from PDFs garbles the content if it’s not in Unicode
  • Double-column PDF is extremely difficult to read naturally on a modern laptop

I’d like other indications of (dis)satisfaction with PDFs. Please avoid commenting on XML or HTML – the former is hardly ever available from closed STM publishers and the latter has other problems I’ll address later. I’d like to stick to READING – I will address authoring later. Please post comments here – they will all be posted (unlike SK) though there is a delay when I’m asleep. Or you can tweet (@petermurrayrust) if you can express yourself in 140 chars

 

 

 

Posted in Uncategorized | 14 Comments

#rds2013: Managing Research Data: What I said

This is a list of tweets from my talk. It’s a very good summary – thanks everyone. I have removed duplicates so that each tweet is a separate topic. They aren’t in true order of my presentation or in time. I’ve removed tweeters comments and also most of the tags (e.g. no #rds2013). In some cases this is a retweet so the original tweeter is removed (sorry). Most are either direct or indirect PMR speech.

 

 

Wikipedia best knowledge development of 21st century. @AfrolatinProjec

When [Wikipedia] opened, acad[emia] scoffed but [PMR] said “I believe in Wikipedia, the bits I wrote are right.” @anitawaard

Need proper structure &amp; communities for data. We are building walled gardens ie FB. Public governance that is believable needed @jvinopal

“The challenge is whether @figshare can be not a walled garden” – have public governance @rmounce

[PMR] celebrates this excellent youtube video [NYU HSL on lost data]. Applause for its creators http://t.co/GAIq7StTK6 #opendata @DataAtCU

Data gets lost for lack of proper social structure @planarrowspace

Young people aren’t accepting what we have given them and are changing the world. If you aren’t with them, you’ll be left behind @gailst

[PMR] adapts Ranganathan’s principles for data (“data is for use,” more). Consider applying Ranganathan’s 5 laws http://t.co/fKM6RZUJlF @moncia

Encouraging us to read Aaron Schwartz’s guerrilla manifesto // find it here: http://t.co/VdM3jOS2gC @rmounce

[PMR] sure that Ranganathan would agree “Data belongs to the world.” #sixthlaw @yasmeen_azadi

[PMR] gives a shout out to @McDawg ‘s recent post on Open Access here: http://t.co/4vNHRawZE2 @rmounce

[PMR] Cites the excellent work of @jackandraka – this wouldn’t have been possible w/o #openaccess @mew687

The lack of access to scholarly information means people die @rmounce

Access to public knowledge is a *fundamental* human right @lwillm

Journalists, doctors, everyday citizens like #opendata Science should also provide this @moncia

Global Knowledge Commons needed @carolynthelib

“values matter, the community; technology and protocols then follow” @jvinopal

Publishers in Europe trying to limit text mining initiatives #openaccess @jvinopal

AMI is about taking the literature into semantic form…if the publishers and their lawyers will let it happen. @elotroalex

Open Research Data Handbook http://t.co/IPWeETNvVr. @AndyDrewCreamer

We [scientists] have to work for the benefit of humanity. Please work with us, join us @jvinopal

Will be legal to text mine scientific articles in the UK soon. @ResearchAtCU

Current Problems in Managing Research Data. If we want to do it, we will solve them http://t.co/JW5fzi6JFI @jvinopal

in the UK the Hargreaves Report copyright reforms will be implemented so it will be legal to mine papers 🙂 @greentea166

The only major barrier to getting data out of papers in LEGAL – the lawyers. We have the tools e.g. #AMI2 @ResearchAtCU

[PMR] salute to Wikipedia, Open Street Map…not worried about impact factor @DataAtCU

Target Research data management training at graduate students @anitawaard

[PMR] shoutouts for PhD students @rmounce &amp; @StilettoFiend , @OKFN Panton Fellows for #openscience @lisarnorberg

[PMR] give shoutout to linkedopendata @rmounce

Blue Obelisk open source community for chemistry providing open data &amp; tools, costs just $20 p/a to run @rrkennison

[CrystalEye tool live demo!] Fantastic tool that demonstrates the utility of #opendata @moncia

[PMR] gives examples of open data sharing: http://t.co/JnRU3zg9pq, http://t.co/KVpMC6m8y9 (code hosting). @Wilderbach

We must build or make tools, not buy or rent them @planarrowspace

[paraphrase PMR] a repository should provide benefits to the data originator, it’s like a bee and a flower – symbiosis @robincamille

We the community must BUILD, not buy/rent, our tools. @yasmeen_azadi

[PMR] “I made a video. If you’re bored you can watch it…It will break your heart”:. [You can find it on his blog.] @anitawaard

[PMR] shoutout to https://t.co/QoIsFXzr7h @ashleyrjester

I put my software openly on bitbucket not because I’m mandated to but because it’s helpful, better than repos @rmounce

To help humanity we scientists need to release the grip on our data and let people USE IT! @kcrews

[PMR] singing the praises of #Wikipedia. @DataAtCU

Examples of #opendata success: Wikipedia, Open Street Map (not worried about impact factor) @rmounce

[PMR] Cites OpenStreetMap as an excellent example of excellence that doesnt need cash, just willpower @ResearchAtCU

 

Posted in Uncategorized | Leave a comment

#rds2013 Managing Research Data: How I put talks together and thanks to CDRS

The Center for Digital Research and Scholarship (CDRS) at Columbia have done a truly magnificent job of capturing the Managing Research Data event. As a result we have access to videos, aggregated tweets etc. http://www.ustream.tv/recorded/29603442 contains my presentation (mins 31-50). Others have also contributed tweets analysis (see graph), tagExplorer (awesome), aggregation etc.

The reason it matters is that in my presentations I never know what I am going to say in detail. I work hard beforehand to get the most likely material into my head and turn it over and over. Normally I have probably 1,000++ “slides” to choose from, organised in directories, and make a list of the most likely directories. I often blog my thoughts beforehand and this helps in several ways: listing the items, possibly getting comments from the world and refining, and also something to fall back on if I can’t present from my own machine. (This happened on my farewell to CSIRO – because there were remote attendees the connection had to be central and I couldn’t use my laptop. But all of what I wanted to say was already on the web or the blog).

I am often nervous before talking. This is a good thing. It means I am taking it seriously. Indeed a very good touchstone is that if I take a talk casually (“I’ve done that before so I don’t have to prepare”) I may give a bad one. Audiences deserve commitment. I have – on 2-3 occasions – been truly terrified (one just a month ago). What matters is getting the right story for the right audience. And because I need to have a feel for the audience it can be very difficult to get it right until very close to the meeting. The “right” talk for the wrong audience can be a poor talk.

Since I use my own laptop (and insist on it) and because I agree that “Power corrupts; Powerpoint corrupts absolutely” (Edward Tufte) I use HTML. HTML has many virtues – it scales to the window, it wraps, and when I download a web resource I can just use it (although font size can be a problem). The disadvantage is that it is difficult to add multimedia without significant editing and it’s almost impossible to distribute the presentation (Powerpoint is a good container format and I don’t understand why the W3C has failed to generate good container approaches for multiple pages – which would then spawn editing tools). BTW I sometimes do PPT when I am forced to as part of a larger presentation – they want my “bit”.

I also edit my presentation constantly and – if I have the chance – I may be editing it during the speaker in front of me. This isn’t lack of preparation, so much as adding little details that reflect the makeup of the audience. If I don’t have this chance I find that I am always working after midnight the night before.

So when I give a presentation I know roughly what I want to say but I have far too much material to cover in the time. Because the slides are organised through links I don’t have to “skip over some” as so many Powerpoint presenters have to do. I note those slides which are essential and mark them so that I make sure I don’t forget them. I then ask the host to signal when I have 3 minutes left and make sure that I have wrapped up OK – I try never to overrun. I usually put the thanks first since I might forget them.

This works well for 30+ min presentations. The experience is slightly like my skiing – just out of control and having to think ahead while talking. That’s not the same as being rushed – it’s that I have to constantly make decisions about directions. (Linear Powerpoint is just click-click-click).

However one problem is that I can’t easily “mount my slides”. That’s also because there are interactive demos. So whenever I get the chance I ask to be recorded. And yesterday has turned out wonderfully – thanks again CDRS.

Yesterday however I had to fit a plenary lecture into 15mins. That’s tight, especially as I was going to be controversial. I agonized about how to do this. I knew that if I did my normal process I would seriously overrun. I therefore thought hard about using Powerpoint with timed transitions. But I just couldn’t feel happy about that. I had reserved a whole day beforehand to prepare. I was still exhausted from travelling back from Australia and not sleeping very well. So I spent the day writing blog posts. I wrote 8 posts in a day-and-a-half. They have the added advantage that people not at the symposium can read them. (Many slide presentations often don’t contain explanatory detail). At this stage I was very worried that the presentation would be woolly and unfocused. But during the blog posts I discovered the story to hang the presentation on. The messages that I wanted people to take away (see next post).

I created a linear list of topics. In 15 minutes you have to be close to linear but there were linkouts. I certainly needed to show some of those (e.g. Aaron Swartz). But a list of 20-odd links isn’t exciting as the main stream. So I interspersed those with images from the linkouts. (Generally my images are meaningful – If I show flowers, then the flowers have a clear message. This time I showed a cow on a common – because I was talking about commons).

So the mainstream is a mixture of images, links and short phrases or sentences that I scroll through. I haven’t done this before but feedback was positive. I start at the top and scroll down manually –there is no “slide” but often a concept fits on one screen. I don’t know how meaningful the final result is to people who weren’t there, but at least it links to the blog posts.

The next post shows the tweet analysis of what I said.

 

Posted in Uncategorized | 2 Comments

Kitware: Liberation Software

This is a talk given today at Kitware, Clifton Park, NY state. It overlaps with #rds2013

Peter Murray-Rust, University of Cambridge and Open Knowledge Foundation

 

This work is licensed under a Creative Commons Attribution 3.0 License.

 


Neelie Kroes. (Vice President European Commission). “This is personal for me. I am 71; I don’t have to do this job. But I want to. I want to because I am inspired by this new generation.” [PMR: I’m exactly the same]

 

Overview

Many thanks to Kitware for advancing the cause of Openness and sustainability

We are in the middle of a global battle for our digital future

ICT can control us; it can also liberate and democratise us

We can develop software to liberate knowledge [Crawlers, repositories, amanuenses]

Communities: OpenKnowledge Foundation, Blue Obelisk, Citizen Science

Demos: Semantic software; content mining

[For next week: Open ontologies; reproducible declarative computing;

 

/pmr/2013/03/01/liberation-software/

Where are we at and who are we? (Scale, “market”)

  1. Values matter; then community; technology and protocols then follow

  2. Our current problems are people problems not technology

  3. Communities and ideas that have worked – demos

  4. What can and should we do?

  5. BUILD A GLOBAL KNOWLEDGE COMMUNITY WITH CITIZENS

[#opendataday PMR]

 

We should CREATE A GLOBAL KNOWLEDGE COMMONS

Midsummer Common (Cambridge) – traditional grazing

…USING HACKATHONS …

The City of Palo Alto teams with Stanford University to complete the City’s first hack-a-thon. The challenge, build an application in twenty-four hours to utilize geographical information system data provided by the City.

 

30-40 people at OKF London #opendataday

 

Values and Principles

 

[Wikipedia]

We must remember Aaron [READ GUERIILA MANIFESTO]

Closed data mean people die. (Jack Andraka, 14, invented a new diagnosis for pancreatic cancer)

“This was the [paywall to the] article I smuggled into class the day my teacher was explaining antibodies and how they worked. I was not able to access very many more articles directly. I was 14 and didn’t drive and it seemed impossible to go to a University and request access to journals“.

We are in the middle of a digital revolution. We are fighting for our digital commons against digital enclosure.

“Bliss was it in that dawn to be alive,

But to be young was very heaven!” (Wordsworth: FRENCH REVOLUTION)

 

[Bastille: Wikipedia]

Ideas from Ranganathan in the data age.

 

Problems

Current problems in managing research data. (Vested interests, academic apathy, intrinsic difficulty, finance)

Walled gardens

 

Animal Garden video at Serpentine Gallery. Plot: Some animals grow flowers and give them away. Other animals build walls and sell them back. Happy ending? Who knows.

A garden is walled if you cannot fork, or download the whole contents. Unwalled gardens MUST have clear governance.

 

Solutions and communities that work

Wikipedia (piezoelectricity)

OpenStreetmap

(building a world map for social good)

Melbourne Bicycle Map

 

Linked Data

http://linkeddata.org/

Bitbucket commits https://bitbucket.org/petermr/pdf2svg/commits/all

http://stackoverflow.com/questions/15085814/unusual-floating-point-numbers-in-tables#15094017

16 principles for managing research data

. (PMR)

 

We (community) must BUILD (not buy/rent) our own Tools

 

Semantics [Flags]

 

YOU CAN USE THESE. IDEAL FOR (a) TEACHING (b) REPOS

CRYSTALEYE (a chemistry/solid state repository)

 

QUIXOTE (compchem repo) (a computational chem repository – RDF/LoD)

Complete CML ecosystem

/pmr/2011/11/02/searchable-semantic-compchem-data-quixote-chempound-fox-and-jumbo/

AVOGADRO (Chemical Desktop, Kitware, NY).

 

NWChem (PNNL US Lab)

OSCAR

 

Chemical Tagger (tags chemistry and geo)

==geo==

 

 

Content Mining

 

Ross Mounce has got an AMI-award

AMI2 mines PHYLOGENETIC TREES (thousands of CPU) in “print”

TO CREATE:

CMLSpect mining

 

Communities

Blue Obelisk

Semantic web for materials science

 

People

http://blog.okfn.org/2012/03/30/introducing-our-panton-fellows/

    

http://rossmounce.co.uk/

The Stilettoed Mathematician


Open Science Atop 5 Inch Heels… http://sophiekershaw.wordpress.com/author/sophiekershaw/

 

Fight

 

/pmr/2012/12/21/opencontentmining-massive-step-forward-come-and-join-us-in-the-uk/

Business Secretary, Vince Cable said:

“Making the intellectual property framework fit for the 21st century is not only common sense but good business sense. Bringing the law into line with ordinary people’s reasonable expectations will boost respect for copyright, on which our creative industries rely.

Europe must legitimize Text+DataMining

We believe that without assurance from the Commission that the following points will be reflected in the proceedings of Working Group 4, there is a strong likelihood that representatives of the European research and technology sectors will not be able to participate in any future meetings:

WHAT SHALL WE DO?

We are creating

Collaborate with us! CREATE HACKATHONS:

  • BIODIVERSITY
  • ECONOMICS OF CITIES
  • TRANSPORT
  • MATERIAL SCIENCE

 

My fellow citizens of the world: ask not what [knowledge] will do for you, but what together we can do for the freedom of [knowledge]. (adapted from J F Kennedy)

 

[1] [Power corrupts, Powerpoint corrupts absolutely]

[2] disclaimer: I asked to be freed from Elsevier sponsorship so I could speak my mind. [4 years wasted]

Many thanks to Columbia, to our group, both at Cambridge and throughout the world

Posted in Uncategorized | Leave a comment