#quixote
#chempound
The eResearch workshop on Semantic Physical Science was a technical success and has convinced me that now anyone can deploy our Chempound semantic repository. If you don’t know what a repository does, here is an explanation (you need to know that :
“It’s a Useful Pot,” said Pooh. “Here it is. And it’s got ‘A Very Happy Birthday with love from Pooh’ written on it. That’s what all that writing is. And it’s for putting things in. There!”
When Eeyore saw the pot, he became quite excited.
“Why!” he said. “I believe my Balloon will just go into that Pot!”
“Oh, no, Eeyore,” said Pooh. “Balloons are much too big to go into Pots. What you do with a balloon is, you hold the balloon ”
“Not mine,” said Eeyore proudly. “Look, Piglet!” And as Piglet looked sorrowfully round, Eeyore picked the balloon up with his teeth, and placed it carefully in the pot; picked it out and put it on the ground; and then picked it up again and put it carefully back.
“So it does!” said Pooh. “It goes in!”
“So it does!” said Piglet. “And it comes out!”
“Doesn’t it?” said Eeyore. “It goes in and out like anything.”
“I’m very glad,” said Pooh happily, “that I thought of giving you a Useful Pot to put things in.”
“I’m very glad,” said Piglet happily, “that thought of giving you something to put in a Useful Pot.”
But Eeyore wasn’t listening. He was taking the balloon out, and putting it back again, as happy as could be….
[reproduced without permission; however, my aunt was Marjorie Milne and I am sure she would forgive me]
If you can follow this, you will also be able to follow my account of how a Chempound server works, and how YOU can set one up and run it.
Our Chempound repository is somewhere to put chemistry (and other science) in and take them out.
Now a “repository” sounds frightening, but it’s only a piece of software. But you have to understand the concept of “server”.
The client-server (http://en.wikipedia.org/wiki/Client%E2%80%93server_model ) is for me one of the great advances in the last 50 years. Partly because it decouples and modularises functionality and represents clean design. Partly because it allows complex operations to be concentrated on one place and so more easily maintainable, especially when there is software than cannot be deployed (complexity, licence, etc.).
[Wikipedia] – the server is in the middle – the clients are distant in space and can be disconnected without disturbing the server or the other clients.
But mainly because the HTTP servers of the 1990’s brought power and democracy to individuals.
Huh?
Yes. I used to think that information systems could only be set up by a priesthood. You had a mainframe and dumb terminals. You couldn’t do anything without a mainframe. The early generations of client-server were proprietary and opaque. A different protocol for each system.
But HTTPD and NCSA changed that (http://en.wikipedia.org/wiki/NCSA_HTTPd ). I have heard it said that the great breakthrough for the takeoff of the web was not the HTML browser but the HTTPD server. Not Mosaic (fantastic though that was) but the NCSA server.
In 1994 I discovered that I could run a server!
It was a revelation. I could publish whatever I wanted to whomever I wanted! I was free. I was doing it through Birkbeck College Crystallography – all they had to do was give me a directory where I could put all my stuff and then run the server software.
Ordinary people could set up their own radio stations on the new World Wide Web.
Now we have become accustomed to this. We can tweet with zero effort. Get a WordPress blog and tell the world what we think. The client-server model means that the client doesn’t have to listen if it doesn’t want to! Publishing on the web doesn’t mean that people have to take any notice. Which is what democracy is.
So Chempound now brings democracy to physical science.
Anyone can set up a server but you have to have a place where you can run one. If you want others to play that means having a web-hosting service and is able to run a Java-based server. That may be a question of talking to your university/company sysadmin, or alternatively you can pay a few dollars to get your own domain.
But even if you don’t have this you can publish to yourself! You can discover the power of semantic resources on your own laptop. Everyone can publish to http://localhost:8080 and practice.
By now you will have realised that you need two bits of software:
- The client. That’s easy; your browser is all you need. That’s because Chempound uses REST (a convention for using HTTP).
- The server. That’s Chempound.
(Actually you also need another piece of software to load the data because it needs to be converted into semantic form).
A repository should support CRUD (http://en.wikipedia.org/wiki/Create,_read,_update_and_delete ). At present we don’t normally support Update but rather retransform and reinsert the whole entry. This is because Chempound/Quixote or Chempound/Crystaleye are “final” snapshots of a piece of work. And I shan’t cover Delete today.
OK – what do you have to do? I’m only going to paint outlines here as it’s all been documented by Jorge Estrada. Don’t worry about the details. This is NOT a full set of instructions. The point is to show how easy it is:
- You must have a machine for the server which runs Java and you must tell it where JAVA_HOME is. Normally no problem. If you don’t have Java you will have to install it. Again normally not a problem
- You need some generic Java server – either Jetty or Tomcat. They may be bundled in our distrib to make it easy for you.
- You’ll need some workspace for the server to put stuff.
If you know Maven (and I’d recommend it) you can follow procedure 1:
Procedure 2.1. Steps to install a Chempound server
[You will need Mercurial to download the Chempound code. This is very easy – suggest you use Tortoise Mercurial]
- Clone the Quixote Chempound sources from https://bitbucket.org/chempound/quixote-repository
- Create a directory for Chempound to store its files during runtime:
-
Launch the Chempound server. You will need to provide the path to the workspace directory (chempound.workspace) and the root URL where Chempound will be running (chempound.uri).
A typical execution will run Chempound at localhost and port 8080:
That takes a few minutes at most.
OR you can run it directly under Jetty. We supply a huge file
quixote-webapp-version-jar-with-dependencies.jar
With everything you need. You will still have to install Jetty.
Procedure 2.2. Installing Jetty
- If you do not have Jetty installed, download a current Jetty distribution from http://jetty.codehaus.org/jetty/. This will be a file similar to jetty-distribution-7.4.5.v20110725.tar.gz
- Unpack the downloaded file. You will find a directory with several files (including start.jar) and directories. This directory will be referred here on as /path/to/jetty
After this step, you will find the files quixote-webapp-version.war and quixote-webapp-version-jar-with-dependencies.jar in the target directory.
- Clone the Quixote Chempound sources from https://bitbucket.org/chempound/quixote-repository
- Create the WAR package and the JAR with the dependencies with the Maven package phase
- You will have to copy both the WAR and JAR files to the webapps directory of your Jetty server, changing the name of the WAR file to just quixote.war
- Create a directory for Chempound to store its files during runtime:
-
Configure Jetty to run Chempound as the root application by deleting Jetty default files ml under the contexts directory of your Jetty installation:
And then, create a quixote.xml configuration file in that same directory.
In this file, you would replace URL with the URL of your Chempound service[2] (for example, http://localhost:8080/). /PATH/TO/CHEMPOUND/WORKSPACE should point to the directory you want to use as workspace for Chempound. Finally, when specifying the JAR file with the Chempound dependencies, you would substitute VERSION with the appropriate value.
- Finally, launch the Jetty server from the /path/to/jetty directory:
- You can stop the server at any time by typing CTRL+C.
Again a few minutes. Now you have a working Chempound repository, running on http://localhost:8080
But there isn’t anything in it!
OK, Eeyore, let’s put a balloon in it. (Or more accurately, ingest a legacy compchem logfile):
Procedure 3.1. Steps to build the Quixote utils software
- Clone the Quixote utils repository from https://bitbucket.org/sea36/quixote-utils repository.
-
Build the JAR packages needed, using the Maven profile uberjar and the target package.
OR we will supply quixote-utils-0.1-SNAPSHOT-jar-with-dependencies.jar directly ands you can omit this
Depositing NWChem log files in a Chempound server
Using the JAR packages created previously, you can deposit your NWChem log files by running the following command:
java -cp target/quixote-utils-0.1-SNAPSHOT-jar-with-dependencies.jar net.quixote.utils.DepositNWChem {chempoundSwordEndpoint} myfile.log
Where myfile.log is the file you wish to ingest.
Now DepositNWChem carries out the following magic:
- Converts myfile.log to CML (myfile.cml)
- Validates the CML
- Converts the CML to RDF (myfile,rdf)
We now have three files and we direct SWORD2 to:
- Upload all of them
- Index them
- Add them to the RDF triplestore
- Create a web page
That’s a great deal of magic for one command! Thank Sam, Jorge, and PMRGroup for all the code. At present it has to be done from the commandline (which is the best) but it would be easy to create a simple GUI so you could select files and upload them). Volunteers from GUI-writing addicts?
The server allows you to browse and search the repository. It “exposes a SPARQL endpoint”. That means you can search the RDF. So I have described the CR of CRUD.
It’s alpha. We are proud of it, but it has bugs. If you like hacking alpha software please let us know. If you are interested in public semantic chemical software and content let us know. Because with CSIRO and PNNL and YOU we are going to revolutionise semantic physical science – starting with computational chemistry, solid state, and spectra.