#ami2 #ukont2013 15-min demonstration of AMI2 (and maybe OPSIN and ChemicalTagger)

I’m demoing after lunch to the 2nd UK Ontology Network Workshop in Edinburgh and it’s billed as AMI2 (our content-mining software for #scholpub and related documents). Why content-mining at an ontology meeting? Because many ontologies are created “bottom-up” from the language we use. This post is just to announce what I am going to show (hopefully) and also to give URLs.

  • AMI2 will read PDFs and convert them to XHTML (prior to creating domain-specific XML). AMI2 is at: https://bitbucket.org/petermr/pdf2svg (for converting PDF to SVG) and https://bitbucket.org/petermr/svg2xml (for converting SVG2XML). Use https://bitbucket.org/petermr/pdf2svg-dev and https://bitbucket.org/petermr/svg2xml-dev for the code for the bleeding edge versions (I’ll be demoing the latter, using Maven from the commandline). We’re beginning to get collaborators – recently AMI2 started working with Renaud Richardet in EPFL Lausanne , for example.

    For newcomers, AMI2 reads a PDF using PDFBox, and uses PDF2SVG to interpret STM publisher characters (which usually are not Unicode). That creates a raw SVG made up of single characters and discrete paths and images. Then she uses SVG2XML to create running text and separate figures and tables. We’ll show how species can be extractedThat’s where today stops. (In the final phase, AMI2-Aaron (in memory of Aaron Swartz) we shall support domain-specific plugins).

  • Then we’ll show OPSIN to show an example of a domain-specific plugin that translates chemical names to Chemical Markup Language.
  • Lastly we’ll show Chemical Tagger (http://chemicaltagger.ch.cam.ac.uk/ ) which uses Natural Language Processing to create semantic chemistry (using CML/XML ontology).

PARTICIPANTS: PLEASE LET AMI2 HAVE SOME PDFs TO EAT!

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to #ami2 #ukont2013 15-min demonstration of AMI2 (and maybe OPSIN and ChemicalTagger)

  1. Can I email you some for demo-ing?

  2. Henry Vieira says:

    Can’t get svg2xml-dev to compile. I always get:
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 2:41.393s
    [INFO] Finished at: Fri Apr 12 09:04:13 AMT 2013
    [INFO] Final Memory: 15M/148M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project svg2xml-dev: Compilation failure: Compilation failure:
    [ERROR] /home/henry/pdft/svg2xml-dev/src/main/java/org/xmlcml/svg2xml/tools/BoundingBoxManager.java:[290,26] error: cannot find symbol
    [ERROR] variable totalBox of type Real2Range
    [ERROR] /home/henry/pdft/svg2xml-dev/src/main/java/org/xmlcml/svg2xml/text/WordSequence.java:[99,29] error: cannot find symbol
    [ERROR] variable boundingBox of type Real2Range
    [ERROR] /home/henry/pdft/svg2xml-dev/src/main/java/org/xmlcml/svg2xml/text/TextLineContainer.java:[580,17] error: cannot find symbol
    [ERROR] variable bbox of type Real2Range
    [ERROR] /home/henry/pdft/svg2xml-dev/src/main/java/org/xmlcml/svg2xml/paths/LineMerger.java:[123,29] error: cannot find symbol
    [ERROR] -> [Help 1]
    [ERROR]
    changeset: 48:ad23191a537f
    tag: tip
    user: petermr
    date: Thu Apr 11 22:52:10 2013 +0100
    summary: tidied debug output added cunk ids
    Any idea?

    • pm286 says:

      Many thanks Henry,
      It looks like the Hudson/maven at https://hudson.ch.cam.ac.uk/ is stalled. It may have to wait till Monday as I can’t restart it
      Try the following:
      * hg clone http://bitbucket.org/euclid
      * If this fails let me know
      * mvn clean install
      * check out pdf2svg-dev (separately from svg2xml-dev) – if you haven’t done this
      * mvn clean install
      * If this fails let me know
      * cd ../svg2xml-dev
      * update
      * mvn clean install
      I hope this works. In principle Hudson should build euclid seamlessly and redistribute under maven. I will mail you so you have my address

Leave a Reply

Your email address will not be published. Required fields are marked *