CopyCamp 2: workshop on ContentMining - what is it and how to do it

In the last post I explained why I became interested in contentmining to do scientific research and started to explain how it it is still a major political and legal challenge. I am excited that I have been asked to run a workshop at CopyCamp, and here is the information I am giving to participants. (You may also find my slides useful ).

Workshops on TDM/contentmining cover many areas and the precise format of this one will depend on the participants. On the program notes I suggested:

  •  hackers (who can make tools such as R, Python, etc.) do exciting things
  • scientists (including citizens) which want to explore questions in bioscience
  • librarians who want to explore C21st ways of creating knowledge
  • open activists who want to change policy both by political means and using tools
  • young people. we have had wonderful contributions from a 15-year old

So if everyone wants to talk about European and UK copyright politics, that's fine. But we also have tools and tutorial showing how mining is done and we suggest people get some hands-on. It's probably going to be a good idea to work in small groups where there are complementary skills:

Dear workshop participant:
I am delighted that you have signed up to my workshop  on Friday 29th at CopyCamp.
Wikidata, ContentMine and the automatic liberation of factual data: (The Right to Read is the Right To Mine)  The workshop will explore how Open Source tools can extract factual information from the Open Access scientific literature (specialising in BioMedicine). We will introduce Wikidata, a rapidly growing collection of of 30 million high-quality data and metadata and use it to index scientific articles. Participants will query the literature at EuropePMC using "getpapers" and retrieve hundreds or thousands of full-text articles
We will adapt the workshop to the skills and wishes of participants when we assemble, though please contact me earlier if there are things you would like to do. Topics can be chosen from:
* online demo of mining
* installation of full ContentMine software stack, and use of public repositories (EuropePubMedCentral, arXiv)
* introduction to WikiFactMine for extracting facts from open access publications.
* political and legal aspects of contentmining (with a European and UK slant)
If any participants are connected with (Polish) Wikipedia that could be valuable and exciting. (By default we shall use English Wikipedia). Note that Wikidata carries a large number of links to other language Wikipedias and this may be a valuable resource to explore.
If you want to run the full ContentMine stack it's a good idea to install beforehand, so here are the instructions for *adventurous* members of the workshop:

This is a VM and should be independent of the operating system of the host machine. It has been tested in several installations but there may be problems with non-US/UK keyboards and encodings. By default the tutorial is in English (all the resources, EuropePMC, dictionaries are also in English and generally use only ASCII 32-127.

Of course anyone anywhere can also try out the tutorials.
