Content Mining starts today!

Posted on December 12, 2013 by pm286

There is now an unstoppable interest and desire for content-mining. People want to know how, when where – what the problems are … all sorts of things. So Jenny Molloy and Katelyn Rogers (OKFN) have set up a mailing list. https://lists.okfn.org/mailman/listinfo/open-contentmining join in the normal way:

Here’s my second post:

Many thanks to Jenny Molloy and Katelyn Rogers for setting up this list.

Last night we had a get-together in London catalysed by PLoS with

representation from OKFN, BioMedCentral, CrossRef, eLife, … all the usual

suspects … and there was lots of discussion about content mining and I

encouraged people to post their ideas to this list.

Here are some potential topics:

* what’s a responsible way to run a crawler over content?

* what are current practises obtaining content

* what are the legal and contractual aspects of CM?

* what types of content can be mined? What are the technical, social,

contractual bases?

* what software exists?

* how do I do Natural language processing

* what can I get from images?

* where can we put the mined content?

* where can we find dictionaries for annotating content?

* where’s the next meeting on content-mining?

etc.

We are also developing the technology very rapidly. We have two trial datasets in the CKAN datahub.io where we’ve extracted species and we’ll be discussing these over the next 2-3 days. The intention is to extract facts from about 150 PLoSONE articles every day and put them in the Datahub. We’re talking with Amye Kennall from BioMedCentral about the best way to crawl all of BMC daily and we’ll be revisiting BMC after Ross’s viva. We’ve asked Geoff Builder from CrossRef to post some exciting ideas which we discussed last night …

(Must rush to the OKF/BL hack/love-in today…)

Posted in Uncategorized | Leave a comment

Would you share your genome sequence? Come to the Panton Arms on Monday!

Posted on December 12, 2013 by pm286

On Monday, December 16, 2013 from 6:30 PM to 8:30 PM (GMT) Cambridge, United Kingdom

http://www.eventbrite.co.uk/e/would-you-share-your-genome-sequence-tickets-9293969513

we are running our second Open Science meeting, in the Panton Arms. It’s led by Fiona Nielsen, founder of DNADigest, a non-profit startup :

“The genomic era is at our doorstep together with a lot of promises to personalized medicine – But what exactly is a genome? Do I have to share it? Do I have to share it all? With whom should I share it? What ethical issues might arise? What are my rights concerning my genome? – ”

Fiona came round for dinner on Monday and we had a discussion of what’s involved – people want their genome to be private, but they also wish it to be used by others for science. These are difficult to reconcile (I have been rightly taken to task on this blog for not recognising the difficulty). It’s said that data can be either anonymous or useful but not both. See http://gigaom.com/2013/03/28/when-theres-no-such-thing-as-anonymous-data-does-privacy-just-mean-security/ for a recent blog post.

So DNADigest (http://dnadigest.org/) is looking for solutions. It’s a hard problem, but if it’s crackable Fiona and colleagues will crack it… From her site…

The genomics revolution is already here

The techniques for researching and characterising genomics diseases are available to both researchers (next generation DNA sequencing) and the general public (in the form of personal testing), so we should soon be able to diagnose any genetic disease by sequencing a patient’s DNA.
This is the ultimate goal of research into all genetic diseases, including into hereditary diseases and cancer.

But the sharing of the data isn’t

However, while data output is flooding research centres around the world and genomics results are published in highly prestigious journals, the sharing of the data that enables this research is embarrassingly limited.

The data ownership, the legal consent of the patients involved, the privacy of the patients involved and the mere volume and complexity of these datasets are a major hindrance to sharing of personal genetics data.

So genetic discoveries remain hidden

As a result, many research units are currently maintaining their own ‘silos’ of potentially valuable sequence and patient data. Needless to say, there may be several big genetic discoveries “out there” already sequenced, but not discovered, because no-one has had the means to bring together the matching pieces of the puzzle.

Solution: Secure the data, share the knowledge

DNAdigest is a non-profit organisation, founded for the purpose of solving the problem of accessing genomics data for research purposes, while addressing all of the above concerns.

DNAdigest presents a secure mechanism for querying genome data, which would otherwise not be shared with the broader research community.

Posted in Uncategorized | Leave a comment

Can we trust Commercial Publishers or are we moving to 1984-like “Publishers of Truth”? We must act now

Posted on December 6, 2013 by pm286

In Orwell’s 1984 the Ministry of Truth rewrote history and rewrote the present. Orwell showed that if you control the provision of information you can alter people’s thoughts and values. I think we are in great danger of scholarly publishing moving in that direction, where commercial organisations, answerable to no-one except their money-oriented shareholders, reengineer truth in scholarship to generate profits, rather than reflect three thousand years of hard won values.

At least three events have deeply troubled me.

The distortion of Content. The most recent is the implication from – I think – ISIS in the Ecologist blog that the accepted values of scholarly publication are becoming distorted by the publisher. See http://www.theecologist.org/blogs_and_comments/commentators/2187010/scientists_pledge_to_boycott_elsevier.html which I reproduce in full at the bottom of the post. It contains the phrases:

This arbitrary, groundless retraction of a published, thoroughly peer-reviewed paper is without precedent in the history of scientific publishing, and raises grave concerns over the integrity and impartiality of science.

…

The retraction is erasing from the public record results that are potentially of very great importance for public health. It is censorship of scientific research, knowledge, and understanding, an abuse of science striking at the very heart of science and democracy, and science for the public good.

And :

the appointment of ex-Monsanto employee Richard Goodman to the newly created post of associate editor for biotechnology at FCT [the journal in question]
the retraction of another study finding potentially harmful effects from GMOs (which almost immediately appeared in another journal)
the failure to retract a paper published by Monsanto scientists in the same journal in 2004, for which a gross error has been identified.

Readers may recall that Merck paid Elsevier to publish a fake journal promoting their products (https://en.wikipedia.org/wiki/Australasian_Journal_of_Bone_&_Joint_Medicine )

Merck paid an undisclosed sum to Elsevier to produce several volumes of [Australasian Journal of Bone and Joint Medicine], a publication that had the look of a peer-reviewed medical journal, but contained only reprinted or summarized articles—most of which presented data favorable to Merck products—that appeared to act solely as marketing tools with no disclosure of company sponsorship.^[4]^[5]

I cannot comment authoritatively about the present case but readers should follow it.

The distortion of merit. In 2012 Thomson-Reuters discontinued the indexing of Acta Crystallographica E (http://journals.iucr.org/services/impactfactors.html ). I have lauded this Journal as the best data-journal in the whole of science – it is meticulously peer-reviewed by humans and has a world-beating data-review system with over 500 checks. I [2] have read every single article(over 10,000). TR arbitrarily removed it from their index without telling the IUCr, who then reported a drop in submissions to the journal. TR have the arbitrary power to decide what is an acceptable 1984-journal and what is not.
The distortion of discovery. If an article cannot be discovered it does not 1984-exist. The academic world has sat back and waited for commercial organizations to index its material. When Google Scholar was created it was a Friday-afternoon project, but it gained traction and is now the main public arbiter of where a journal is to be found. Last month a major linkup between Google Scholar and TR was announced: http://www.against-the-grain.com/2013/11/newsflash-thomson-reuters-google-scholar-linkage-offers-big-win-for-stm-users-and-publishers/ .Effectively this means that if an article is not exposed in the first page by Google Scholar it does not exist. Neither Google nor TR are answerable to anyone except shareholders.

The bibliographic management system Mendeley was acquired by Elsevier. Mendeley are answerable only to Elsevier shareholders. No one knows what content Elsevier has acquired. No one knows what content is exposed, with what priority.

This means that the control of the content of scholarship, the dissemination of scholarship, and the valuation of scholarship is in the hands of mega-corporations. Do you trust that this is not becoming the Ministry of Scholarship?

What can we do?

A lot. We can’t look to Universities as they have completely failed to address C21 scholarship. But Wikipedia and Mozilla (and others) have shown that concerned citizens can create massive value, which, being Open is at a high level of Truth. The technology is now in our hands. What we must do is:

Build our own index of scholarship. It’s technically possible, and in my own Content Mine project I am making a start. The only things holding us back are lawyers and apathy.
Make it blindingly better and more useful than the present system. That’s a challenge, but Wikipedia is already the best scholarly publishing system in C21 and much of the hard work has been done. We can build a better content, discovery and valuation system for Scholarship.

Join us before it is too late.

============================

[1] Full text of the Ecologist blog

Following the retraction of the Seralini et al scientific paper which found health damage to rats fed on GM corn, by the Journal ‘Food and Chemical Toxicology’, over 100 scientists have pledged in this Open Letter to boycott Elsevier, publisher of the Journal.

To: Wallace Hayes, Editor in Chief, Food and Chemical Toxicology; Elsevier

Re: “Long term toxicity of a Roundup herbicide and a Roundup-tolerant genetically modified maize”, by G E Séralini et al, published in Food and Chemical Toxicology 2012, 50(11), 4221-31.

Your decision to retract the paper is in clear violation of the international ethical norms as laid down by the Committee on Publication Ethics (COPE), of which FCT is a member. According to COPE, the only grounds for retraction are

clear evidence that the findings are unreliable due to misconduct or honest error,
plagiarism or redundant publication, or
unethical research.

You have already acknowledged that the paper of Séralini et al (2012) contains none of those faults.

This arbitrary, groundless retraction of a published, thoroughly peer-reviewed paper is without precedent in the history of scientific publishing, and raises grave concerns over the integrity and impartiality of science. These concerns are heightened by a sequence of events surrounding the retraction:

the appointment of ex-Monsanto employee Richard Goodman to the newly created post of associate editor for biotechnology at FCT
the retraction of another study finding potentially harmful effects from GMOs (which almost immediately appeared in another journal)
the failure to retract a paper published by Monsanto scientists in the same journal in 2004, for which a gross error has been identified.

The retraction is erasing from the public record results that are potentially of very great importance for public health. It is censorship of scientific research, knowledge, and understanding, an abuse of science striking at the very heart of science and democracy, and science for the public good.

We urge you to reverse this appalling decision, and further, to issue a fulsome public apology to Séralini and his colleagues. Until you accede to our request, we will boycott Elsevier, i.e., decline to purchase Elsevier products, to publish, review, or do editorial work for Elsevier.

[2] in conjunction with my colleagues and machines.

Posted in Uncategorized | Leave a comment

Why does scholarly publishing give me so much technical grief? (A post from “Ignorant Chemist”)

Posted on December 1, 2013 by pm286

[The Scholarly Kitchen branded me as an “Ignorant Chemist” for criticising the technical standard of scholarly publishing. So read this with caution.

BTW I feel slightly unhappy criticising an Open Access publisher but their technology is just as bad as the legacy publishers].

As readers will know we are gearing up to index the whole of the scientific scholarly literature. The idea is simple. Download all the links to papers, and then read the papers [NOTE: we’ll only do what we are allowed to do, and we promise not to burn out your servers].

So we’ve started with PLoSONE. AMI is reading the page http://www.plosone.org/#recent which gives the latest PLoS papers. She’s going to parse it into a XOM (http://about.validator.nu/htmlparser/ ) and then extract all the papers using Xpath. She knows how to do this . So off we go:

It crashes on the parse. This does not upset AMI (who has the emotional capacity of a FORTRAN compiler) but it drives me wild. I should be able to read a modern document with modern tools. I know what I am doing. What has crashed it is the tag “<italic>”. Note that it’s not terminated by an “</italic>” tag. Moreover I can’t find it in the HTML5 vocabulary (which has a perfectly good, 22-year old <i> tag: http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-i-element ) So I have to create a kludge. Since I have no idea what other horrors are waiting in PLoS (or any other publisher’s HTML – this is not a PLoS-specific complaint) it can throw my work of by days.

It’s not fair to take USD 2900 (the highest PLoS charge) or even USD 1350 (PLoSONE) from authors and produce non-conformant output. [I criticize elsewhere the destruction of vector diagrams into JPEGs.] Nor should this non-standard HTML ever have happened. HTML is produced by the W3C, and a huge amount of effort has gone into producing HTML tools, including validation and compliance, Because the W3C cares about technical quality. So it’s possible and easy to validate. And it’s free.

Is this XML? No. Is this HTML5? No. Are the contents of the tag rendered in italic font? No. Does this matter?

Yes. Because I am looking for italic content since its may contain species. In fact Gorilla gorilla IS a species. You might be able to guess what genus.

The file starts with

The DOCTYPE is a statement that defines an XML document. I am not ignorant about this because I spent 2 years helping the W3C develop XML and was co-inspirer of the SAX protocol (http://www.saxproject.org/sax1-history.html ). The statement says that the document following must be well-formed XML.

Which it isn’t. “itemscope” is NOT XHTML.

Here’s another illiteracy (< and > are not allowed in attribute values) :

title=”Gorilla Mothers Also Matter! New Insights on Social Transmission in Gorillas (<italic>Gorilla gorilla gorilla</italic>) in Captivity”

This isn’t XML. It’s wasted 3 hours of my time. I am going to have to write a TidyPLOS2WellFormedXML tool.

If it’s HTML5, then the standard way the W3C indicates it (http://www.w3.org/wiki/Doctypes_and_markup_styles#The_HTML5_doctype ) is:

<!DOCTYPE html>

Hardly difficult. Easier than putting all the wrong stuff into a non-well-formed document.

Now I can guess the technical reason why PLOS got it wrong. I’ll leave you to guess.

I can also guess the social reason. It stems from the observation that scholarly publishing doesn’t care about technical quality. The look and feel of a journal is all that matters. (That’s not much use for unsighted humans and machines). I suspect there are a relatively small number of typesetters and that most of them (perhaps Kaveh excepted) don’t care about technical quality.

The technical quality of material in the arXiv preprints is pretty good. So it should be – maths and physics are high quality subjects. But when it comes out of the publishers the technical quality is often significantly worse.

But then it’s not their money that universities are spending – 15 Billion USD – it’s taxpayers and students. They don’t care about the quality of what they are paying for. And until somebody cares this will probably continue.

Posted in Uncategorized | 2 Comments

Content Mining; #ami develops her commandline and meets Ducktyping

Posted on November 26, 2013 by pm286

Sleepless the bear and #ami2 the kangaroo meet Duck.

S: Hello Duck. What are you doing?

D: I’m helping #ami2 create her commandline parser. We’re going to use ducktyping.

S: what’s a commandline? [*]

Chuff: it’s one of the greatest inventions in computing. (https://en.wikipedia.org/wiki/In_the_Beginning…_Was_the_Command_Line) You type commands that you and the machine both understand. It’s better than GUIs. It interfaces with UNIX tools. #animalgarden will use commandlines for #ami2.

S: so what’s Ducktyping?

D: Let’s look in Wikipedia (https://en.wikipedia.org/wiki/Duck_typing ). “the duck test, attributed to James Whitcomb Riley (see history below), which may be phrased as follows:

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.^[1]

I walk like a duck and I quack like a duck so I am a duck.

A: The test is a rule. I can implement rules. Is there more?

In duck typing, one is concerned with just those aspects of an object that are used, rather than with the type of the object itself. For example, in a non-duck-typed language, one can create a function that takes an object of type Duck and calls that object’s walk and quack methods. In a duck-typed language, the equivalent function would take an object of any type and call that object’s walk and quack methods. If the object does not have the methods that are called then the function signals a run-time error. If the object does have the methods, then they are executed no matter the type of the object, evoking the quotation and hence the name of this form of typing.

Duck typing is aided by habitually not testing for the type of arguments in method and function bodies, relying on documentation, clear code and testing to ensure correct use.

S: So why are we using it?

D: Humans don’t like complicated commandlines, so we’ll make it very simple for them. The commandline has to work out what they want?

S: You mean guess?

A: No. I am not allowed to guess. I have to have precise rules.

D: Exactly. So here’s the problem. #ami2 can run over many different types of object. If we had commandline options for all of them the humans would forget them and muddle them. We’ll do the hard work. First the command itself. #ami has lots of things it can do, but if we write something like:

java –jar org.xmlcml.xhtml2stm.species.SpeciesVisitor

the humans will mistype it or get bored and never even try. So we’ll replace this with a script:

species

S: Isn’t that a lot of work for PMR and #ami2?

A: No. Mark Williamson showed us how to use maven to do it automatically. It’s already in the POM file.

D: Now we come to the ducktyping. Let’s say we ask a human to type:

species –-input ducks.html –-inputFormat HTML

half of them will mistype it, half will forget the “–inputFormat” and half will get bored. (The numbers add up since some humans will make TWO mistakes). So we use ducktyping to work out what they want.

species –i ducks.html

S: “-i”??

D: Some humans don’t like long words so we give them two options (–input and –i). It’s hardly any more work for PMR.

Chuff: So how do we know it’s an HTML file?

D: we use https://en.wikipedia.org/wiki/Convention_over_configuration . We assume that the suffix “.html” means it’s an HTML file.

S: Suppose it isn’t?

D: Then the job fails.

Tasmanian Devil: Serves them right. Devil will drag them off to hell.

S: No, we like to think humans can learn gradually. We’ll try to fail gracefully. We can ask the file what type it thinks it is.

D: Many good file formats have a magic number that tells you what they are. XML files should know what namespace(s) they have. For example I could say “this XML file contains Chemistry (CML) and Maths (MathML) embedded in XHTML.

S: Wow. And are you implementing this, #ami?

A: PMR said he might. At present there are the following input types:

HTML (from the publisher’s site)
HTML from #ami2
PDF
XML from publisher
XML from PubMed (NLM)
SVG (from #ami)
Text
DOI

And lists or directories of all the above

S: Wow! What a lot to have to manage. Can it go wrong?

D: Yes. If a human points #ami2 at a list of holiday photos we won’t get much sense out.

Chuff: If we want people to use it we have to assume they will goof up whenever they can. We have to do almost all the work.

D: But when it’s working, then it’s easy for all of us.

S: So are there any optional options?

D: Yes. For example where we have a directory with many files in, we might want to filter those we want. For example:

species –i ducks/ –inputFormat xml htm html

will look for files of the form

ducks/*.xml ducks/*.htm ducks/*.html

S: So presumably the bored ones can just type “-if”?

D: Yes.

OWL: What happens if the directory contains other directories?

D: What a smart question!

OWL: I am the semantic OWL after all.

D: We have an option for recursion through the directories.

S: “Recursion”??

D: One of the most powerful and beautiful concepts in software (https://en.wikipedia.org/wiki/Recursion ). You ask the method to invoke itself (either directly or implicitly).

S: Won’t that go on for ever.

D: If someone’s made a mistake, yes. Then you get a Stack Overflow …

Devil: Or I cart you off to hell.

D: … but good programs trap this. #ami will use the (–r –recursive) flag. If present it will go through all directories.

S: So I can write:

species –i notebook/ -r –if html pdf

D: exactly. The “/” tells us it has to be a directory. And if you type:

species –i / -r –if html pdf

it will traverse all your disk…

Devil: Unless I carry you off before that.

S: OK. What about output.

Duck: we need you to specify it. If you omit it it would either have to use the same place each time (overwriting) or use the input as a template. We might do that later. But at present we write:

species –i ducks/ html –o my/results/species.xml

and the results go to a named file.

S: Isn’t it too complicated for humans to work out where to put their results?

D: We try to make it easy for them. But at some stage a scientist has to work out what they are doing…

Devil … because if they don’t, I’ll cart them off…

All:TO HELL.

[*] I was asked this question by an informatics student [no names or places] halfway through their PhD. People just assume button pressing will solve everything.

Posted in Uncategorized | Leave a comment

Content-mining: Using Tabula

Posted on November 26, 2013 by pm286

Extracting tables from PDFs is not fun. But @TabulaPDF have made it possible. So we are going to learn how to extract tables at Jenny’s liberation-fest tomorrow. Here’s how:

Got to http:// http://tabula.nerdpower.org/

(That’s right: NERD POWER. Nerds are good. Nerds are clever). We find:

Tabula

Tabula is a tool for liberating data tables trapped inside PDF files.

Why Tabula?

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface. And now you can download Tabula and run it on your own computer, like you would with OpenRefine.

Download and install Tabula

Note: You’ll need a copy of Java installed. You can download Java here.

1. Download the version of Tabula for your operating system:

Windows:
tabula-win.zip
Mac OS X:
tabula-mac.zip
- OS X 10.8+ users: if you have issues opening the app, see note at bottom of download page
Linux/Other:
tabula-jar.zip (view README.txt inside for instructions)

2. Extract the zip file using the file extractor of your choice (such as 7-zip).

3. Go into the folder you just extracted. Run the “Tabula” program inside.

4. In your web browser, go to http://localhost:8080/ . (This should automatically happen, actually.) There’s Tabula!

5. Upload a file of your choice. Select a section of a table, and go.

That’s it.

It works! Here’s a table (they are Gibbons!) [#animalgarden needs more monkeys…]

We use Tabula. Select the table and we get:

WOW! It’s actually in a CSV table

OK the spaces are occasionally elided, and the italics are gone, but we are working on that…

… TOGETHER!

Tabula is better than #ami2 on tables (AMI relies on the English word “Table” and her tables are currently being refactored). Tabula is developing auto-detect. #ami2 is probably better on italics and spaces.

So we join forces. Everyone gains.

Posted in Uncategorized | Leave a comment

Content-mining the scientific literature into CKAN

Posted on November 26, 2013 by pm286

CKAN was pioneered by the Open Knowledge Foundation as an Open Source tool to make government and related data more easily available. Governments love it, because it’s good and it’s free and it’s open. But why would we use it for science?

Because it’s good and it’s free and it’s open.

Here’s Wikipedia:

The Comprehensive Knowledge Archive Network (CKAN) is a web-based
open source data management system for the storage and distribution of data, such as spreadsheets and the contents of databases. It is inspired by the package management capabilities common to open source operating systems like Linux, and is intended to be the “apt-get of Debian for data”.^[2]

Its code base is maintained by the Open Knowledge Foundation. The system is used both as a public platform on thedatahub.org ^[3] and in various government data catalogues, such as the UK’s data.gov.uk,^[4] the Dutch National Data Register, the United States government’s “data.gov 2.0”^[5] and the Australian government’s “Gov 2.0”.^[6]
SOUTH Australian state government has also joined the ranks of many jurisdictions world-wide in making government data freely available to the public on CKAN platform^[7]

Well if Barrack Obama uses it, it must be good. But how to use it?

You can use it in private , without permission,of course (that’s what Open Source means). But the OKFN runs a public server and Chuff (@okfn_okapi) spoke very nicely to Mark Wainwright who set up an Organization: November25). Chuff asked for petermr to be an admin.

It’s very easy. You just enter some metadata and tip the data in. And here what it looks like

This is the prototype. The animals know that Hydraena dentipes and H. dentipes are the same, so they’ll replace all H. with Hydraena. (probably before the data gets in). There’s between 5 and 20 species per paper full-text, but often far more in the diagrams and tables (their speciality).

So soon they will have the best collection of published animals in the scientific literature…

.. and the best plants, and bacteria and …

It’s easy to add metadata to CKAN, and we’ll do it by machine. That means we can search the literature by species.

And soon (maybe Wednesday) by:

Chemistry
Sequences
Phylogenetic trees
Identifiers.
And whatever YOU can contribute (it’s easy)

So if this excites you – and it should – please let us know.

Posted in Uncategorized | Leave a comment

Content-mining: #animalgarden discover CKAN/Datahub and create the November25 declaration of the Right to Mine

Posted on November 26, 2013 by pm286

A massive day for #animalgarden. They’ve made a huge technical breakthrough. They can now store all the facts from the scientific literature.

They’d been worrying about where to put the data they extracted, now that the STM publishers have blessed Content-mining. Today Mark Wainwright from the OKFN visited and told them about CKAN. It holds very flexible metadata and can also hold the data. CKAN was developed for governments to manage Open Data, but can it manage science? Mark said yes – so they had a try. But before that, Mark said, you need a name for the Datahub project.

#animalgarden? Not quite right. Panton? Again not quite right?

November 25! The November 25 revolution! The first liberated data set in CKAN. That sounded just right.

But if there is a revolution, shouldn’t there be a declaration? Yes, of course. They know it by heart: “The right to READ is the Right to MINE”!

And a proper declaration need to be big and signed by everyone. So here are all our names.

What about the data? No problem. You can see it at

Posted in Uncategorized | 1 Comment

Would you share your genome sequence? (can you get to Cambridge?)

Posted on November 19, 2013 by pm286

Would you?

The next event in Cambridge Open Research is http://www.eventbrite.co.uk/e/would-you-share-your-genome-sequence-tickets-9293969513

We are going to be led by Fiona Nielsen who created DNADigest.org

The genomic era is at our doorstep together with a lot of promises to personalized medicine – But what exactly is a genome? Do I have to share it? Do I have to share it all? With whom should I share it? What ethical issues might arise? What are my rights concerning my genome?

EVERYONE welcome

The only requirement is possessing a genome.

Posted in Uncategorized | Leave a comment

Legacy publishers! The Berlin moment: your paywalls are history. We want freedom and we want it now!

Posted on November 19, 2013 by pm286

Yesterday was the Open Access Button ThunderClap: https://www.thunderclap.it/en/projects/5675-open-access-button-launch.

I’m proud to have been one of the 434 and to have donated 2000+ followers. That means that yesterday they will all have got a tweet like the ones below:

I’d love to see the list . But note our own Cambridge MP Julian Huppert (immediately above) retweeting the clap.

For me this is the Berlin moment. The critical date when the wall started to fall down. A million people telling the conventional publishing system:

Your world is over.

Open access is not fundamentally about free access to the literature (such as Green) or extortion through Hybrid Gold.

It’s about Freedom. Freedom to build our own world where rich corporates and out-of-date rich scholarly societies do not control the means of production. Where everyone, not just academics, feels ownership of the publishing system as a modern means of communication.

Where young people are seen as the wellspring of the future.

As happened yesterday with Joe and David’s great , simple , vision.

I grew up in the time of great political movements. Read John Lewis’ speech to The March On Washington (http://www.crmvet.org/info/mowjl2.htm ). Read it all. Fifty years ago the issues were different, but this echoes the feeling of many of us in the present:

To those who have said, “Be patient and wait,” we must say that we cannot be patient. We do not want our freedom gradually but we want to be free now.

We are tired. We are tired of being beat by policemen. We are tired of seeing our people locked up in jail over and over again, and then you holler “Be patient.” How long can we be patient? We want our freedom and we want it now.

We do not want to go to jail, but we will go to jail if this is the price we must pay.

Open Access may seem a smaller issue than racism and human rights. But we are fighting for our digital future and if we lose it we may lose our humanity.

Posted in Uncategorized | Leave a comment

petermr's blog

Content Mining starts today!

Would you share your genome sequence? Come to the Panton Arms on Monday!

The genomics revolution is already here

But the sharing of the data isn’t

So genetic discoveries remain hidden

Solution: Secure the data, share the knowledge

Can we trust Commercial Publishers or are we moving to 1984-like “Publishers of Truth”? We must act now

Following the retraction of the Seralini et al scientific paper which found health damage to rats fed on GM corn, by the Journal ‘Food and Chemical Toxicology’, over 100 scientists have pledged in this Open Letter to boycott Elsevier, publisher of the Journal.

Why does scholarly publishing give me so much technical grief? (A post from “Ignorant Chemist”)

Content Mining; #ami develops her commandline and meets Ducktyping

Content-mining: Using Tabula

Tabula

Why Tabula?

Download and install Tabula

Content-mining the scientific literature into CKAN

Content-mining: #animalgarden discover CKAN/Datahub and create the November25 declaration of the Right to Mine

Would you share your genome sequence? (can you get to Cambridge?)

Legacy publishers! The Berlin moment: your paywalls are history. We want freedom and we want it now!

Recent Posts

Recent Comments

Archives

Categories

Meta