Monthly Archives: May 2013

Jailbreaking the PDF – 4; Making text from characters

In previous posts I have shown how we can, in most cases, create a set of Unicode characters from a PDF. If the original authors (e.g. many Government documents) were standard-compliant this is almost trivial. For scholarly publications, where the … Continue reading

Posted in Uncategorized | Leave a comment

“Licences4Europe” has not accepted “The Right to Read is the Right to Mine”

One sentence summary (this link has all the documentation) Stakeholders representing the research sector, SMEs and open access publishers withdraw from Licences for Europe   I have formally been a member of EC-L4E-WG4 a working group of the European Commission … Continue reading

Posted in Uncategorized | 2 Comments

Jailbreaking the PDF -3; Styles and fonts and the problems from Publishers.

Many scientific publications use specific styling to add semantics. In converting to XML it’s critical we don’t throw these away at an early stage, yet many common tools discard such styles. #AMI2 does its best to preserve all these and … Continue reading

Posted in Uncategorized | Leave a comment

Jailbreaking the PDF – 2; Technical aspects (Glyph processing)

A lot of our discussion in Jailbreaking related to technical issues, and this is a – hopefully readable – overview. PDF is a page description format (does anyone use pages any more? other than publishers and letter writers?) which is … Continue reading

Posted in Uncategorized | 3 Comments

Jailbreaking the PDF; a wonderful hackathon and a community leap forward for freedom – 1

Yesterday we had a truly marvellous hackathon http://scholrev.org/hackathon/ in Montpellier, in between workshops and main Eur Semantic Web Conference. The purpose was to bring together a number of groups who value semantic scholarship and free information from the traditional forms … Continue reading

Posted in Uncategorized | 11 Comments

SePublica : Overview of my Polemics presentation #scholrev

This is a list of the points I want to cover when introducing the session on Polemics. A list looks a bit dry but I promise to be polemical. And try to show some demos at the end. The polemics … Continue reading

Posted in Uncategorized | Leave a comment

SePublica: Making the scholarly literature semantic and reusable

Scholarly literature has been virtually untouched by the digital revolution in this century. The primary communication is by digital copies of paper (PDFs) and there is little sign that it has brought any change in social structures either in Universities/Research_Establishments … Continue reading

Posted in Uncategorized | Leave a comment

SePublica: What we must do to promote Semantics #scholrev #btpdf2

In the previous post (/pmr/2013/05/23/sepublica-how-semantics-can-empower-us-scholrev-scholpub-btpdf2/) I outlined some of the reasons why semantics are so important. Here I want to show what we have to do (and again stick with me – although you might disagree with my stance). The … Continue reading

Posted in Uncategorized | Leave a comment

SePublica: How semantics can empower us; #scholrev #scholpub #btpdf2

I’m writing blog posts to collect my thoughts for the wonderful workshop at SePublica http://sepublica.mywikipaper.org/drupal/ where I am leading off the day. [This also acts as a permanent record instead of slides. Indeed I may not provide slides as such … Continue reading

Posted in Uncategorized | 1 Comment

#scholrev #ami2 #btpdf2 Jailbreaking content (including tables) from PDFs

We’ve got a splendid collection of about 600 Open PDFs for our jailbreak hackathon. They seem to have a medical focus. They are of very variable type and quality. Some are reports, guidelines , some academic papers. Some are born … Continue reading

Posted in Uncategorized | Leave a comment