Crystaleyesing The Fascinator

We have covered a lot of ground here in Toowoomba USQ (and we haven’t finished as we have a pub visit). We didn’t know what we were going to do at 9 am yesterday but we took these strands:

  • Understanding Chem4Word in an authoring environment. We arrived at the (reasonable) conclusion that this is a one-way process documents coming from Chem4Word can be repurposed using any number of dowmnstream tools (JUMBO, ICE, etc.) but that we were not looking to reinject documents into Word2007 (at this stage) nor to carry behaviourable interoperability into other wordprocvessors. Of course we require semantic interoperability.

  • Review of ICE-TheOREm. This is a JISC project where we are working with USQ who provide the bulk of the development. It’s a proof of concept to show how a thesis can be assembled from components using ORE, sent to a Board of Graduate Studies and where components can be embargoed. All will be revealed at Open Repositories (OR09) by Jim Downing and Peter Sefton too bad I can’t be there but maybe we’ll show something at ETD2009…

  • Reusing UCC content in USQ tools. The most immediate attraction was to port Crystaleye into the Fascinator .

  • From the USQ description:

    The Fascinator is a software platform for eResearch. Development started in 2008 as an attempt to create a clean and usable Institutional Repository user interface. We succeeded in creating a faceted search interface for repositories such as ePrints and Fedora Commons.

  • I’m blogging this to make sure I have understood it… The Fascinator has a back-end repository (currently Fedora though it might be changed for other engines, or a file system). It currently is populated mainly with metadata from other resources which hold the original blobs, movies, etc. – one of these is 20 TB of Vietnam history. The content of the repository is then indexed by local scripts (Python/Jython) and passed to solr a web interface for Lucene. This provides an indexed search engine for the content.

  • There is no reasons why finely grained data should not be held and Oliver has ingested Crystaleye as XML. We have written filters for all the important content such as atom counts, CIF dictionary items, etc. A preview can be seen at http://rspilot.usq.edu.au:8080/the-fascinator/search/cml where 29 entries have been indexed. Try browsing the entries or searching with blue – which will find blue crystals. Obviously it is possible to customise the interface to provide specific search boxes and terms.

  • The main point is that Daniel hacked this in about an elapsed day. That’s a tribute to Daniel and also to the flexibility of the platform. Daniel’s planning to put 100,000 entries into the system and virtualise it.

  • There are overlaps and differences between the Fascinator and our (Jim’s) Lensfield. The Fascinator is a faceted indexer and has been designed for indexing large documents. Lensfield is a native RDF approach which is more fine grained and which can hold more complex structures. Jim and Peter will be comparing notes at OR09 and I am sure that we will be able to use the synergy of both. It’s clear that tools for the semantic web are starting to arrive.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *