Can we mashup pollution data onto OpenStreetMap in realtime?

Sometimes a fantastic idea hits you in a millisecond and that’s just happened to me at Coffee in the Chemistry Department. I happened to bump into Mark Calleja (who is part of our eScience (eMinerals) collaboration) and he told me about their latest project (Cambridge Mobile Urban Sensing)

CamMobSens is the Cambridge end of the MESSAGE project, a collaboration between Cambridge University, Imperial College London, Leeds University, Newcastle University and Southampton University. In Cambridge we mount sensors on pedestrians and cyclists to monitor pollution and send back the information to a website as soon as it is gathered.

graphics1Carbon Monoxide (CO)

graphics2

Nitric Oxide (NO)

Team members:

Prof. Jean Bacon, Department of Computer Science         
Dr. Mark Calleja, Cambridge eScience Centre
Mark Hayes, Cambridge eScience Centre
Prof. Rod Jones, Department of Chemistry

Prof. Peter Landshoff, Department of Applied Mathematics
Dr. Iq Mead, Department of Chemistry
Michael Simmons, Cambridge eScience Centre
Dr. Eiman Kanjo, Department of Applied Mathematics

Now anyone who knows anything about OpenStreetMap.org will immediately make the connection as I did. OSM has been built by the voluntary efforts of zillions of pedestrians and cyclists who have used GPS to map the world. They’ve now built the best map of Cambridge and at least one has cycled every street.

Mark and colleagues need volunteers to go out and monitor pollution on a regular basis. The technical aspects are solved a mobile phone in one pocket and a sensor in the other emitting Bluetooth. The signal is routed to satellites and then to an Openly accessible database run by the project. All you have to do is follow the simple instructions of the project.

So I’ve asked Mark if I can be first on the list for the kit volunteer activity starts in late summer. I cycle every day down East Road (at the top left of the CO picture). It must be one of the most polluted roads (although the bus station is worst).

So I am appealing to OSM volunteers IN CAMBRIDGE to contact Mark. (The idea is clearly adaptable to other cities but we shouldn’t overwhelm the project at this stage). If you know how to spread the word in the OSM and similar communities, please do so. There is no technical reason which this couldn’t rapidly spread just as OSM has done.

Posted in Uncategorized | 4 Comments

More on Avogadro

More on Avogadro and the Blue Obelisk.

First, many apologies to Marcus Hanwell who is the real Doctor Who of Avogadro (but who has been temporarily transported through time and space due to his first child). It’s always great to see new people joining.

Then to acknowledge the great synergy shown by Jan Jensen from Copenhagen who has adopted and promoted Avogadro and got some stunning movies. These are both fun to watch and also show the very nice interface. Here are some posts, enjoy

Nicking transition states from Nick Greeves

Vote early, vote often

A useful equation

The force is strong in this one

Just one of those links

It takes a village to solve a Jmol puzzle

Symmetry Prozac

An Atkins diet of Molecular Workbench

Some Jmol basics

Do I have to draw you a phase diagram?

Cool new build option in Avogadro 0.9.5

Building a Transition State

Showtime!

Tools of the trade

Getting started

Posted in Uncategorized | 1 Comment

Vote for Avogadro

Here is a really exciting message from Geoff Hutchison a founder member of the Blue Obelisk

PMR: EVEN IF YOU ARE NOT A CHEMIST, READ ON = PLEASE = WE NEED YOU INPUT (ESPECIALLY THE VOTE)

Avogadro has been nominated as a finalist for the SourceForge
Community Choice Award for “Best Project for Academia”:
https://sourceforge.net/community/cca09/vote/

This is a real honor for us, and we appreciate everyone who nominated
us for the award. I certainly didn’t stuff the ballot box, so many of
you must have voted for us.

We haven’t yet released version 1.0, but we’re working hard on it. So
far, we’ve had over 40,000 downloads, been translated into ~14
languages on Launchpad.Net and are amazed and humbled by everyone
who’s contributed in different ways.

PMR: Geoff has been working on a new OPEN SOURCE molecular editor. When I visited Geoff in Pittsburgh he showed me it in the cafe at lunch.

IT’S BRILLIANT. JUST GO TO SOURCEFORGE AND VOTE (ONCE ONLY, BUT GET YOUR FRIENDS TO VOTE)

EVEN IF YOU ARE NOT A CHEMIST, WATCH THE VIDEO (SEE BELOW) AND IF YOU ARE COMPUTER-LITERATE DOWNLOAD AND PLAY WITH IT. THE WAY IN WHICH MOLECULES ARE BUILT

So why is this important? It’s because the Blue Obelisk has now reached critical mass and is able to build on what it already has. So Avogadro uses the library in OpenBabel to minimise molecules. The almost haptic-like feel of the build depends on the system optimising the molecule in real time. It is the first time I have ever seen a system which can easily convert a chair to a boat. And, of course, it is currently limited to a mouse. When we go touchy-feely then this will be way in front.

Geoff is the Doctor Who of Avogadro. There’s been a lot of contributions and he told me of one who had contributed a complete library. And Avogadro has happened within about 2 years. Most Blue Obelisk projects have taken 5 years or more to reach critical mass. That”s of course a tribute to Geoff but it also represents maturity in the libraries (OpenBabel has taken ca 10 years) and the better collaborative and engineering tools. And the fact that an increasing number of people believe in the Blue Obelisk.

Remember that most BO software is not directly funded (many of the competing software projects like OpenOffice have contributors who at least in part are expected to donate code as part of their day-job). It’s probably fair to say that some like OSCAR are funded on the margins of grants and OPSIN now has a full-time graduate student Doctor Who (Daniel Lowe). But most are found in the recesses of the early mornings and weekends. And they are often not approved of by the establishment what is X doing when they should be doing science?

The Blue Obelisk software is like a series of telescopes. They will shortly reach the power of many commercial offerings and then they will go beyond them. That’s because there is a great drive for innovation, for Open methods to ensure quality, for re-use of existing code. We’ve got a few problems to iron out different libraries and OS’s but there is now enough redundacy.

So when you vote for Avogadro (as you will) you are not just voting for a piece of software but you are voting to add Openness to a major scientific domain which has been suffering from the darkness of closed source and hidden data for far too long. Just as mySociety is liberating our democracy, the Blue Obelisk is liberating chemistry.

Posted in Uncategorized | 6 Comments

WhereShallIGo.org on July 4th (ans FOI in London)

From mySociety Blog. This looks like a must attend day. My only problem is do I go as a wonky geek or a geeky wonk? I think the former.

But it’s stuffed with great contributors Heather Brooke who tried for years to bring MP’s expenses into the light. Ben Goldacre who I heard last year but almost certainly has a different Bad Science.

And when I WroteToHim (using WriteToThem) on Net Neutrality I have only got acknowledgements. It’s a race between getting a reply and being cut off by HADOPI-UK.

It is this sort of thing not conventional politics – that keeps my faith in democracy alive.

================================================================

mySociety blog » Share tips with 6 brilliant Freedom of Information experts on 4th July

By Francis Irving on Monday, June 22nd, 2009

Is there something part of the government is doing that youd like to investigate? Find out everything from MPs expenses, to the length of allotment waiting lists, to whether your councils Guy Fawkes bonfire is properly checked for hedgehogs.

mySociety are running a practical workshop on Freedom of Information at OpenTech on 4th July.

The workshop will help you make your first Freedom of Information request, including working out what to request, where to request it from and what exactly to write.

If youre an old hand, you can get and give tips on how to take requests further.

Weve got a fantastic team of Freedom of Information (FOI) experts to kick things off and answer hard questions.

Heather Brooke used FOI to cause the frurore over MPs expenses.

Francis Davey is a lawyer with a specialist knowledge in FOI.

Elena Egawhary is a freelance journalist, currently working and using FOI for Panorama.

John Cross, Alex Skene and Richard Taylor are volunteers who run and improve WhatDoTheyKnow, and all use it for their own activism.

Bring a laptop if you have one. Internet will be provided for the workshop only, so we can scour Government websites, and make requests on mySocietys WhatDoTheyKnow.com website.

As usual, the rest of OpenTech is brimming with great talks, and will be full of interesting geeky wonks and wonky geeks. Book your place here so you can go to them and to the workshop. Hurry, its nearly sold out.

This entry was posted on Monday, June 22nd, 2009 at 8:21pm and is filed under Events, WhatDoTheyKnow. Follow responses to this entry (RSS2 feed).

Posted in Uncategorized | Leave a comment

Peters S nd MR on NPG and fair-use

Recently Nature Publishing Group released a policy allowing users to test- and data-mine some of their content (specifically that which was in some way Open Access). This policy was negotiated with the Wellcome Trust who applauded it. Peter Suber attached some words of approbation. In contrast I feel it’s s serious step backwards. I have set the scene in the previous article by addressing free, gratis and fair-use.

The first question, which no-one seems to have addressed, is why do we need permission at all. I believe that we actually have a right to mine both text and data without permission. I’m noT reproducing significant amounts of the original work in its original form so I don’t believe I’m actually infringing fair-use. I invite any publisher to explain why content created by the community cannot be mined by machine in ways that have been done for centuries by hand.

If, however, the community agrees that we need specific permission to do mining then (to me at least) it is logical that it is a greater libre-dom than fair-use. If, after all, it is simply fair-use then the publisher should say so. And remember that fair-use applies to all the publisher content, not just the OA stuff. (Note, however, that librarians and other purchasing officers normally sign publisher contracts which are more restrictive and limit subscribers use to less than fair-use. That affects the closed access articles that I have access to in my institution).

So what really worried me was that Wellcome Trust (for whom I have a high regard) seem have impicitly agreed that NPG’s free-doms for mining need special negotiated permissions. Whereas I regard them as fair-use. So, the agreement, formally limits fair-use to less than I thought we had. When it comes to other publishers and pay for authorFunder-pays articles then funders may be paying large amounts for what was our right anyway.

That’s why the precise interpretation really matters. And where I finally get round to PeterS’s comments:

Peter Suber says:

June 21, 2009 at 3:30 pm  (Edit)

Hi Peter. There are several threads here that Id like to separate.

Weak/strong was an early, regrettable proposal for the distinction now captured by gratis/libre. Dont think of it an additional pair of terms but as a superseded or deprecated pair of terms.

PMR: agreed and I am very glad to see the new pair.

When I introduced the terms gratis/libre into the OA context (borrowing them from the FOSS context), I tried to be clear, careful, and detailed about what I meant by them. I dont legislate usage, of course. But if the question is about how I use the terms, then my original article should answer it. I also think that my article will answer your questions about what the terms mean in practice, or how they can make our discussions less confusing rather than more confusing.

PMR: Yes. Peter writes a great deal of extremely clear explanations. For him libre represents the granting of at least one freedom. (For me it represents all in BBB but I’ll go with Peter here).

I dont know whether the new NPG policy goes beyond fair use either. This depends on whether fair use already covers text-mining, a question on which informed people continue to disagree. We may not know whether fair use allows the downloading of full-text copies for processing, but at least we now know that NPG does allow it.

PMR: This is the central issue. But if it is fair-use, then let’s recognise it as such and announce it as a clarification, not new added value.

Whether the NPG policy is libre OA in my sense depends on whether it exceeds fair use, and Im admitting that thats unclear. If the policy exceeds fair use, then its libre OA (barely). If it doesnt exceed fair use, then it isnt.

PMR: we agree on the central issue

Remember that libre OA is not a synonym for BBB OA. Libre OA covers all the different ways of exceeding fair use or removing permission barriers. It covers a *range* of positions, not just one position. If the NPG policy is libre at all, its at the lower or minimal end of the range; the BBB OA is a position at the higher or maximal end of the range.

PMR: we agree completely. I personally find it difficult to write libre-OA as meaning other than BBB-OA (as it gives an air of legitimzation) but I am happy to write enhanced-permission-OA or some-libre-dom-OA

5. I agree with everything you say about the limitations built into the NPG policy. Removing many more permission barriers would greatly facilitate text-mining and (Im convinced) cause no harm to NPG.

PMR: one of the worst aspects is that this is a complete waste of my time and yours. I should have been writing unit tests to manage chemistry today and I’ve been sidetracked onto this. I don’t mind day-after-day trying to fight nature when trying to crystallise a glue, or distill an oil or whatever. But it incredibly dispiriting day-after-day trying to win back what is ours by right and with which we could actually do science.

But you inspire me so I shall struggle on.

Posted in Uncategorized | Leave a comment

Libre Gratis and Fair-Use

Peter Suber is one of the people I most respect, though we have never met, and we’ve been having a discussion about whether the text-mining policy announced by Nature Publishing Group is libre or fair-use. [Here I discuss his comments, but add some background first].

A major problem is the use of terms, whether derived from common English usage (fair and use) or specially constructed (libre). In either case the meaning is never self-evident and also interpreted differently by different people. For open access there is a huge spectrum. At one end is Klaus Graf, and PloS and BMC and me who want Open Access to mean complete adherence to the BBB declarations which means you can do anything with the paper (including selling it) as long as you acknowledge the authorship. At the other end are publishers who charge authors (funder-pays in my jargon) for the privilege of having their paper readable on the publishers’ web site but with no other permissions. (you may not download this paper, keep a copy, re-use it in whole or part for any purpose, put it in you repository, etc.). In the English language these are both free, which is highly confusing.In Open Source terms these are explained as free-as-in-speech and free-as-in-beer. To resolve the English ambiguity the terms libre and gratis are increasingly used. Wikipedia elaborates slightly.

Peter and I are agreed that it is really important to get this right. It’s not just theoretical if I mis-use a publisher’s item, because I think I can do something with it when they think I can’t I’ll get a lawyer’s letter or have my institution cut off (both have happened one to me, so they are not academic). And, if the UK passes HADOPI-UK the publisher will simply ask Ofcom to have my home broadband terminated, with no appeal. (That’s why I am writing about HADOPI it matters).

Words generate arguments. I warn everyone in our group that when we talk about ontologies we will fight. And we do. And that’s when we are all trying to reach a common goal a machine-implementation of human understanding. With publishing it’s worse because there are some publishers who deliberately want to make it difficult for us to use our (sorry their) content on their sites. So they have no interest in a common definition of Open Access and the more confusion the better. A publisher can now get funders to pay large amounts (1000-3000 USD) for a toll-free (gratis to readers) publication. So the precise meaning of the term can carry a great deal of money with it.

Some publishers such as BMC are quite clear. Author/funder pays and reader can do whatever they like. It’d defined by Creative Commons – Attribution licence. Clear and trivial to interpret. I’ve not heard any problem of people re-using CC-BY content.

You also have to understand that there is something called fair use. This is impossible to define precisely (see Wikipedia) but it’s country dependent, depends on the monetary damage to the copyright owner, depends on the amount re-used relative to the whole, etc. What is fair-use can only be resolved by paying lawyers huge amounts of money to fight it out in civil court. It’s generally agreed that reproducing chunks of text to back up one’s science is legitimate and photocopying papers for teaching is not. (Personally I disagree ethically with the latter after all it’s OUR content). There is a particular problem with images as most copyright regards images as creative works (e.g. cartoons, streetmaps, photographs in museums, etc.). But a spectrum? Created by a machine?

It would help a great deal if publishers actually said what they regarded as fair-use and what other privileges of re-use the author/funder may have bought. But they don’t. It’s far more profitable to keep everyone in FUD. Librarians are now so terrified of publishers that they will always err on the side of conservatism (I don’t know precisely what you want to do but assume you can’t). I’ve brought this up on my blog and publishers know there is a problem and they have failed to make any reasonable approach to the academic community.

After all a publisher can charge an author 100 USD for including a diagram (created by another author) in a review when the publisher hand no hand in its creation. Why give up the cash cow?

So what can a reader do without being sued?

  • They can write down in pencil and type up (on a manual typewriter) words and data from an article. They’ve been doing this for >100 years and no-one has objected

  • They can redraw diagrams. (I can remember a review I published where I included one of my own diagrams published in an ACS journal. The Royal Society of Chemistry redrew the diagram (it had ca 1000 data points). It was badly redrawn. How completely absurd.

  • They can compile facts (like melting points, spectra, etc.) as long as they write them in cuneiform

But when the material is electronic (which should make this process easier) the publishers absolutely forbid it. I can do text-mining by hand, but not apparently by machine.

Therefore, whether you approve of their motives or not:

Closed Access publishers deliberately make it difficult to re-use their information

They claim to be supporting science. They aren’t, they are supporting their shareholders or CEO’s remuneration.

So where does the NPG text-mining issue come? I’ve written enough, so I’ll cover that in the next post.

Just remember that we are allowed fair-use though no-one agrees what it means

Posted in Uncategorized | Leave a comment

ILI2009: Repositories – I can't and may not extract my own data

In the previous post I indicated that I might be able to search University and other repositories through OpenDOAR a repository of repositories. I can technically crawl the content of these repositories if I can get:

  • a list of the repositories

  • for each repository a list of the content.

I am quite prepared for these to be nested , or for modern technology such as RSS/Atom or similar to be used.

So here we go… Here’s the list of repositories:

OpenDOAR – Countries and Organisations

Africa | Asia | Australasia | Caribbean | Central America | Europe | North America | South America

Click on a name to see the corresponding OpenDOAR summaries, or on a URL to visit the relevant website.

AFRICA

Cape Verde | Egypt | Ethiopia | Kenya | Namibia | South Africa | Uganda | Zimbabwe

Cape Verde

Universidade Jean Piaget de Cabo Verdehttp://www.unipiaget.cv/

Biblioteca Digital da Universidade Jean Piaget de Cabo Verde
http://bdigital.cv.unipiaget.org/dspace/

Egypt

Bibliotheca Alexandrina (مكتبة الإسكندري)http://www.bibalex.org/

Digital Assets Repository (DAR)
http://dar.bibalex.org/

British University in Egypthttp://www.bue.edu.eg/

The BUE e-print repository
http://e-prints.bue.edu.eg/

and presumably about 1400 more. So far so good. Our robots can navigate a table like this. I’d prefer RSS (and maybe there is something I’ve missed) but HTML will do.

So let’s go to Cambridge, which uses DSpace repository technology…

Description:A community centred university repository with a wealth of supporting information and documentation. Most of the items are CML files from the WorldWideMolecularMatrix dataset of small molecules. Otherwise, it is especially rich in multimedia (images and video) objects, less well populated with full-text papers. Some articles are restricted access and are not freely visible. Users may set up RSS feeds to be alerted to new content.

Yes, Jim Downing populated the repository with about 150,000 molecules, one entry per molecule.

Now the metadata and policies:

Grade: Metadata re-use permitted for not-for-profit purposes

Anyone may access the metadata free of charge.

The metadata may be re-used in any medium without prior permission for not-for-profit purposes provided the OAI Identifier or a link to the original metadata record are given.

The metadata must not be re-used in any medium for commercial purposes without formal permission.

For more information, please see webpage: http://www.lib.cam.ac.uk/repository/about/policies.html

Standardised Data Policy for full-text and other full data items

    Grade: Harvesting full data items by robots prohibited

Anyone may access full items free of charge.

Copies of full items generally can be:

reproduced in any format or medium

for personal research or study, or not-for-profit purposes without prior permission or charge.

Full items must not be harvested by robots except transiently for full-text indexing or citation analysis

Full items must not be sold commercially in any format or medium without formal permission of the copyr
ight holders.

Mention of the repository is appreciated but not mandatory.

For more information see webpage: http://www.lib.cam.ac.uk/repository/about/policies.html.

And this is where we hit the major problem:

Harvesting full data items by robots prohibited

Full items must not be harvested by robots except transiently for full-text indexing or citation analysis

So, simply, I put 150,000 items in the database and I am not allowed to extract them by robots. OK, I doubt Cambridge will dismiss me if I do, but consider the import of that message:

We don’t want any old hacker using our repository.

I’ve chosen Cambridge because it’s my instituion, but this restriction is extremely common in repositories.

We repository managers don’t want you using them.

If this is a problem with server overload there are well-known ways of getting round it. And most people who write crawlers will try hard to avoid damaging servers. So this can’t be the motivation.

No, the motivation is that most repositories don’t want to take the risk of anyone downloading material and possibly breaking copyright. The repositories are for preservation look, not touch. Here’s Edinburgh:

    Grade: Metadata re-use policy explicitly undefined

Anyone may access the metadata free of charge.

No metadata re-use policy defined. Assume no rights at all have been granted.

Standardised Data Policy for full-text and other full data items

    Grade: Full data item policies explicity undefined

Anyone may access full items free of charge.

No full-item re-use policy defined. Assume no rights at all have been granted.

So what does explicitly undefined mean? It means that the repository managers will not help the user in determining what they can do with the material. Essentially it’s your problem, not ours.

Assume no rights at all have been granted.

So this is why scientists don’t use repositories, don’t use libraries. Why they use PubMedCentral, not their library. Why the use PubChem rather than their repository.

I actually want to give libraries something that they might be interested in a tool which will extract chemistry from their theses. But their whole attitude is so web-unfriendly that I’m not sure it’s worth it. It’s far more important to uphold copyright than try to do something innovative in the C21.

I am still trying to get some positive input from libraries for my talk. So far nothing. Time is getting short.

Posted in Uncategorized | 2 Comments

ILI2009: Is OpenDOAR (a repository of repositories) the answer?

I am now feeling worried about my talk at #ILI2009 I can’t think what to say without sounding dismissive of current academic libraries and I know that will upset people. But I came away from #ETD09 with feelings that I was 20 years in the past the library is a priesthood and normal mortals are expected to visit its temple. We hear from time to time that the library is trying to engage with faculty and the faculty don’t listen. Sorry, that’s like saying to the mice that you have a better mousetrap. Unless you provide something they want they have no reason to come. And for scientists that’s essentially true already.

Over the last few days I have posted about my talk at #ILI2009 asking for suggestions, especially about repositories. I’ve not got any. I read the FriendFeed comments on my posts (http://friendfeed.com/petermr) I won’t quote but I think you can read them. I had asked whether I could search the whole world repository collection by content, and I assumed I would get some help. Ok, whatever maybe I will sometime.

Meanwhile I revisited OpenDoar (The Directory of Open Access Repositories) which was set up a few years ago by JISC/OSI/SPARC and others. It’s a worthy effort and it collects information about repositories and publishes it. There are 1409 repositories and the list is updated daily. It states:

OpenDOAR is an authoritative directory of academic open access repositories. Each OpenDOAR repository has been visited by project staff to check the information that is recorded here. This in-depth approach does not rely on automated analysis and gives a quality-controlled list of repositories.

As well as providing a simple repository list, OpenDOAR lets you search for repositories or search repository contents. Additionally, we provide tools and support to both repository administrators and service providers in sharing best practice and improving the quality of the repository infrastructure.

I’d heard of the content search because someone (perhaps Peter Suber’s blog) posted an account that by using Google Custom search they results were at least as good as searching by human metadata. This seemed believable to me then and I believe it even more strongly now. Human metadata does no scale either with volume or diversity software does. So the search site shows:

Search Repository Contents

OpenDOAR is pleased to present a trial search service for the full-text of material held in open access repositories listed in the Directory. This has been made possible through the recent launch by Google of its Custom Search Engine, which allows OpenDOAR to define a search service based on the Directory holdings.

Users of this service can search through the world’s repositories of freely available research information, with the assurance that each of these repositories has been assessed by OpenDOAR staff. This quality controlled approach will minimise (but not eliminate!) spurious or junk results, and lead more directly to useful and relevant information.

As this is a trial service, please send us feedback on your experiences.

To search for open access repositories rather than their content, please use the Find page.

This service does not use the OAI-PMH protocol, or the metadata held within repositories. Instead, it relies on Google’s indexes, which in turn rely on repositories being suitably structured and configured for the Googlebot web crawler. If you are an administrator and your material is not being retrieved, first check that your repository is listed in OpenDOAR. If it is listed, you may need to review your set-up against Google’s Guidelines for Webmasters and see the related pages in the Webmaster Help Center, especially the FAQ on how Google crawls sites. There is also excellent advice on How to Facilitate Google Crawling prepared by Peter Suber.

This sounds like what I want so I tried it with my query. You will remember that when I searched Google Scholar for theses with the term aminobenzoic I got only 2 hits, but I knew there were many more because I’d seen over 20 in one repository alone. So I searched…

Results 110 for aminobenzoic. (0.25 seconds) 

graphics1
 Custom Search

QUT ePrints

23 Jan 2009 Smith, Graham and Botta, Raymond C. and Lynch, Daniel E. (2000) The 1:1 adduct of 4-aminobenzoic acid with 4-aminobenzonitrile.
eprints.qut.edu.au/13136/
by G Smith – 2000 – Cited by 2Related articlesAll 11 versions

Radiation chemical studies of P-aminobenzoic acid derivatives

by Karl Ford Nakken Published in 1966, Universitetsforlaget (Oslo). Radiation chemical studies of P-aminobenzoic acid derivatives. Karl Ford Nakken
openlibrary.org/b/OL5575399M

SEPARATION OF p-AMINOBENZOIC ACID BY REACTIVE EXTRACTION. 1

21 Apr 2009 The comparative study on the reactive extraction of p-aminobenzoic acid with Amberlite LA-2 and D2EHPA in two solvents with different
eprints.kfupm.edu.sa/104280/
by AI GALACTION – 2008 – Cited by 1Related articles

Docking of oxalyl aryl amino benzoic acid derivatives into PTP1B

PTP1B inhibitors such as Formylchromone derivatives, 1, 2-Naphthoquinone derivatives and Oxalyl aryl amino benzoic derivatives may eventually find an
www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2637956
by N Verma – 2008 – Related articlesAll 4 versions

Decarboxylation of substituted 4-aminobenzoic acids in acidic

Decarboxylation of substituted 4-aminobenzoic acids in acidic aqueous solution. Dewey: 547/.637. LC: QD341.A7 T6. Subject: Aminobenzoic acids.
openlibrary.org/b/OL5767515M

These are real documents and real results containing aminobenzoic in the text and should be really useful. But they really aren’t for several reasons:

  • There is no indication how many hits I have got. I know there are 10 pages, but how many more after that. Google gives a number, OpenDoar/GoogleCSE does not.

  • There is no indication of priority. OK, we have got used to Google’s page rank through the eigenvectors of hyperlinks but we need some guidance. Perhaps the number of accesses? Maybe the number of lexical occurrences of the search term? There are other clever things that could be done with Lucene.

  • There is no indication of the type of document. A thesis, a data set, a Green-Access publication?

  • The search cannot be restricted to document types (e.g. theses , which is what I want).

What I want to be able to do is retrieve ALL documents which are relevant to my search. Here I can only do this by manually clicking on each one.

To be slightly fair to OpenDoar, this was an experimental service installed 3 years ago and I suspect with no maintenance since. But on the other hand this is the single reason why a scientist might actually want to use repositories to search for a term (yes, I know I really want chemical searches but I can inly do it if the priesthood allows it see next post).

The whole site and project is clearly not aimed at the general scientist but at repository managers and experts it says so…

The aim is to provide a comprehensive and authoritative list of such repositories for end-users who wish to find particular archives or who wish to break down repositories by locale, content or other measures. OpenDOAR will also provide listings to third-party “service providers” – typically search services who wish to use the categorised lists within their service. This will increase the accessibility and use of the content of these repositories, which will benefit the authors of the research material and the researchers who wish to find it.

So OpenDOAR provides a platform that service providers can build services on fair enough. It’s been going three years or more, how many services are there based on on it? And are they useful to scientists like me can I get things there that I can’t get easier at PubMedCentral or PubChem?

Anyway I (or rather Nick Day) is a service provider (CrystalEye) and we know how to build crawlers and we know how to index text and chemistry so this looks great. The repository is Open – I assume so why don’t we just add theses to CrystalEye’s offering?

I’ll let you know in the next post how we get on…

Posted in Uncategorized | 2 Comments

ILI2009: Why scientists can't search institutional Repositories

I am trying to search University repositories for chemistry and recording my experiences. Here’s my results for Google Scholar, the main free engine for searching academic publications I’m looking for 4-aminobenzoic acid:

Results 110 of about 51,800 for 4 aminobenzoic. (0.13 seconds) 

The use of 4aminobenzoic acid as a marker to validate the completeness of 24 h urine collections in


S Bingham, J Cummings – Clinical Science, 1983 – cs.portlandpress.com
Send to a friend. The use of 4aminobenzoic acid as a marker to validate the
completeness of 24 h urine collections in man. Bingham S, Cummings JH.

Cited by 133Related articlesCachedWeb SearchAll 7 versions

Covalent modification of a glassy carbon surface by 4aminobenzoic acid and its application in


J Liu, L Cheng, B Liu, S Dong – Langmuir, 2000 – pubs.acs.org
Covalent Modification of a Glassy Carbon Surface by 4Aminobenzoic Acid and Its
Application in Fabrication of a Polyoxometalates-Consisting Monolayer and

Cited by 81Related articlesWeb SearchBL DirectAll 3 versions

Use of the derivatizing agent, 4aminobenzoic acid 2-(diethylamino) ethyl ester, for high-


K Yoshino, T Takao, H Murata, Y Shimonishi – Analytical Chemistry, 1995 – pubs.acs.org
Use of the derivatizing agent, 4aminobenzoic acid 2-(diethylamino)ethyl ester,
for high-sensitivity detection of oligosaccharides by electrospray ionization

Cited by 54Related articlesWeb SearchBL DirectAll 3 versions

Mechanism of inactivation of myeloperoxidase by 4aminobenzoic acid hydrazide.

nih.gov [PDF] 
AJ Kettle, CA Gedye, CC Winterbourn – Biochemical Journal, 1997 – pubmedcentral.nih.gov
Page 1. Biochem. J. (1997) 321, 503508 (Printed in Great Britain) 503 Mechanism
of inactivation of myeloperoxidase by
4aminobenzoic acid hydrazide
Cited by 44Related articlesView as HTMLWeb SearchBL DirectAll 4 versions

and I have stopped there there were 52000.

How many of these are theses? I have no idea. But I know that university IRs contain many references to 4-aminobenzoic acid (see below). First, Here’s Google Scholar. I don’t know whether there is a better way but I have simply put: published in: thesis:

 Scholar 

Results 12 of 2 for 4 aminobenzoic. (0.03 seconds) 

[DOC] Bacterial Cellulose


D Holmes, NZ Christchurch – A
Thesis Presented for the Degree of Master of Engineering , 2004 – rpi.edu
Sulfaguanidine is an analog of p-aminobenzoic acid, hence, the increased production
(Ishkawa
in aerated (by shaking) cultures the organism doubles every 4 to 6
Cited by 1Related articlesView as HTMLWeb Search

[PDF] The interviewer-administered, open-ended diet history method for assessing usual dietary intakes in


GS Martin – University of Wollongong
The
sis
Collection, 2004 – ro.uow.edu.au
p Physical activity level (equation) PABA Para-amino benzoic acid PAL Physical activity
level
Page 13. 11 20:3 Homo-gamma linolenic acid 20:4 Arachidonic acid
Related articlesWeb SearchAll 4 versions

2 results.

The first is (apparently) on a personal web page. The second IS in an IR:

Research Online is an open access digital archive promoting the scholarly output of the University of Wollongong, Australia. For further information contact Michael Organ, Manager Repository Services – 02 4221 3108.

So kudos, at least, to the University of Wollongong.

But they are in there. Let’s go to the Edinburgh University Research Archive: http://www.era.lib.ed.ac.uk/. I’ve chosen this because Scotland is more enlightened about Open Access than England and several Universities have mandates. The ERA states:

ERA is a digital repository of research produced at The University of Edinburgh. Here we present a selection of our best research including full-text digital Theses and Dissertations, book chapters, working papers, technical reports, journal pre-prints and peer-reviewed journal reprints.

If you are a member of the University of Edinburgh and would like to deposit your items, please send an email to prg-help@ed.ac.uk

If you believe that any material held in ERA infringes copyright, please contact prg-help@ed.ac.uk providing details and we will remove the work from the repository and investigate your claim.

Comments: Why a selection of our best material? Why not all our theses, by mandate. And there is Mordor the Copyright officer doing the great job of removing academic copyright material from public view.

So let’s have a go:

Edinburgh Research Archive  >

Search Results

Search: 

for  

Results 1-10 of 33.

[PMR: 4-aminobenzoic acid doesn’t work]

Item hits:

Date of Issue

Title

Authors

Type

2005

Natural formation and degradation of chloroacetic acids and volatile organochlorines in forest soil: challenges to understanding

Laturnus, Frank; Fahimi, Isabelle; Gryndler, Milan; Hartmann, Anton; Heal, Mathew R; Matucha, Miroslav; Schoeler, Heinfried; Schroll, Reiner; Svensson, Teresia

Research Paper

2007

Design and synthesis of benign, N- and O-containing, organic ligands for surface engineering

Renz, Robert Phillip

Thesis or Dissertation

2006

Structural and Computational Studies of Small Organic and Biological Molecules

Lozano-Casal, Patricia

Thesis or Dissertation

I look at the first thesis. Yes! It has 4-aminobenzoic acid on page 111 (page?? well, of course it’s in PDF).

Can I download it?

==========================================================

URI: 

http://hdl.handle.net/1842/2578

Type: 

Thesis or Dissertation

Appears in Collections:

Organic Synthesis PhD thesis collection

Files in This Item:

File

Description

Size

Format

Robert Renz PhD Thesis.pdf

Open Access version

4697Kb

Adobe PDF

View/Open

Word Documents.zip

Original files are restricted access

7016Kb

Zip file

View/Open

Xray Crystal Structures.zip

Original files are restricted access

567Kb

Zip file

View/Open

All items in ERA are protected by copyright, with all rights reserved.

============================================================

 Well can I? Yes, it’s Open Access; No it’s copyright with all rights reserved. ???? The thesis contains no copyright or licence so Mordor will tell me that I am not allowed to do anything with it except read and destroy after reading.

If I want to get all the material out I have to visit all 15 documents (Thesis or Dissertation).

There are probably about 1000 universities world wide that I might wish to visit. It will take me half a day to extract stuff from Edinburgh, doing it manually.

That is a year searching the academic archive for a single term

Google can give me an answer in seconds. Nick Day’s CrystalEye has a complete archive of publicly visible crystallography on publisher web sites. But none on academic sites as it simply isn’t possible.

Unless the academic world starts to provide modern search technology for its archives they will remain preservation-only. It will be impossible to get scientists to put material into places which are fundamentally unsearchable.

[Note, of course, that this is a full-text site. That’s not very useful for chemistry. Our new technology can PROPERLY index the chemistry in these sites vastly cheaper than the human methods used by conventional chemical abstracters. But we aren’t even able to start.]

Posted in Uncategorized | 1 Comment

#ILI2009: The challenge of searching in Instituional Repositories

In preparation for my challenge to Internet Librarian #ili2009 I am going to explain what I want to do. I want to search for a common and important chemical . I want to be able to search University Repositories for this. If I can do this for several hundred repositories simultaneously I will be moderately pleased.

Even if you are not a scientist, read on because I will explain why Google/Bing works and Institutional Repositories don’t. You MUST understand why IRs are failing because they don’t use the web properly.

I have chosen to search for 4-amino-benzoic acid because:

  • it’s relatively unambiguous (I’m glossing over some problems of chemical names for non-chemists)

  • it’s common in several fields (e.g. chemical research, bioscience, healthcare, medicine, industry)

  • many people buy it every day in sunscreens (even if they don’t know it).

  • It’s unlikely to have been specifically indexed by a human metadata librarian

  • and I know it’s in some repositories because I have looked manually

If I asked any undergraduate student to find out about 4-amino-benzoic acid they would go to Google. (I also include Bing because I want to be nice to our Microsoft funders). This is not the most accurate way as they will miss synonyms and they will get some noise, but it’s pretty good. (Science librarians will say they ought to go to Chemical Abstracts Scifinder, but that costs a lot of money, and doesn’t contain much of the information you will see below – and I will get flak for saying this and I’ll deal with it). And no doubt Wolfram will improve its (currently awful) chemistry. So here’s what they get from Google (the first 9 out of 194000). I add comments as [FOO]

4-Aminobenzoic acid – Wikipedia, the free encyclopedia

4-Aminobenzoic acid (also known as para-aminobenzoic acid or PABA) is an organic compound with the molecular formula C7H7NO2. PABA is a white crystalline
en.wikipedia.org/wiki/4-Aminobenzoic_acidCachedSimilar – [WIKIPEDIA]

p-AMINOBENZOIC ACID (PABA)

FORMULA, H2NC6H4COOH. MOL WT. 137.14. H.S. CODE, 3922.49. TOXICITY, Oral rat LD50: 6000 mg/kg. SYNONYMS, 4Aminobenzoic Acid, 4Amino-Benzoesaeure;
chemicalland21.com/…/p-AMINOBENZOIC%20ACID.htm – CachedSimilar [A BROKER/PORTAL FOR SUPPLIERS OT THE REAL STUFF]

File:4-Aminobenzoic acid.svg – Wikimedia Commons

English: 4-Aminobenzoic acid; p-aminobenzoic acid; Hachemina; Paraminol. Deutsch: p-Aminobenzoesäure; PABA; 4-Aminobenzoesäure; p-Carboxyanilin
commons.wikimedia.org/wiki/File:4-Aminobenzoic_acid.svg – CachedSimilar [WIKIMEDIA, the structural formula as an image, presumably because it’s linked from Wikipedia]

4 Aminobenzoic Acid – a comprehensive view – Wellsphere

Expert articles, personal stories, blogs, Q&A, news, local resources, pictures, video and a supportive community. 4 Aminobenzoic Acid – Health Knowledge
www.wellsphere.com/wellpage/4-aminobenzoicacidCachedSimilar [A HEALTHFOOD SUPPLIER; this community is often not evidence-based]

4-aminobenzoic acid (CHEBI:30753)

29 Sep 2008 Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical
www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:30753 – CachedSimilar [CHEBI; ONE OF THE MAIN ONTOLOGIES FOR CHEMISTRY]

IngentaConnect Am
in
aftone, a Derivative of 4-Aminobenzoic Acid

Aminaftone, a Derivative of 4-Aminobenzoic Acid, Downregulates Endothelin-1 Production in ECV304 Cells: An In Vitro Study. Authors: Scorza, Raffaella1;
www.ingentaconnect.com/content/adis/rdd/2008/…/art00005 – Similar
by R Scorza – 2008 – Related articlesAll 3 versions [A PUBLICATION IN A CLOSED ACCESS SCHOLARLY PUB a bargain at only 55 USD for human readers; I didn’t pay it]

A9878 4-Aminobenzoic acid 99%

A9878 4-Aminobenzoic acid 99% Linear Formula: H2NC6H4CO2H. Molecular Weight: 137.14. Beilstein Registry Number: 471605. EC Number: 205-753-0
www.sigmaaldrich.com/catalog/ProductDetail.do?… – CachedSimilar – [A MAJOR SUPPLIER OF CHEMICAL FOR RESEARCH SCIENTISTS]

Intermediates /4-Aminobenzoic Acid ( P-Aminobenzoic Acid, PABA

China Intermediates /4-Aminobenzoic Acid ( P-Aminobenzoic Acid, PABA) and China 4-hydroxyindole, 3-Aminobenzoic acid, 4-aminobenzamide, 4-nitrobenzamide,
www.made-in-china.com/…/China-Intermediates-4-AminobenzoicAcid-P-AminobenzoicAcid-PABA-.html – CachedSimilar – [ANOTHER SUPPLIER; CHINA IS A MAJOR SOURCE OF FINE CHEMICALS]

Safety (MSDS) data for 4-aminobenzoic acid

20 Aug 2003 Safety (MSDS) data for 4-aminobenzoic acid. Synonyms: p-aminobenzoic acid, PABA, vitamin BX, anticanitic vitamin
msds.chem.ox.ac.uk/AM/4-aminobenzoic_acid.html – CachedSimilar – [SAFETY DATA FOR 2000 SUBSTANCES COLLECTED BY OXFORD UNIVERSITY; on the department web site (that’s where students look, not in the IR)

I have also used BING the new Microsoft engine. It returns several of these sites, doesn’t get ChEBI but gets:

1-10 of 9,520 results·

ABAH

Acronym Finder: ABAH stands for 4-Aminobenzoic Acid Hydrazide

www.acronymfinder.com/4_AminobenzoicAcid-Hydrazide-(ABAH).html

4-Aminobenzoic acid – definition from Biology-Online.org

Definition and other additional information on 4-Aminobenzoic acid from Biology-Online.org dictionary. [A GLOSSARY]

AccessMedicine | 4-aminobenzoic acid

Table 57-5 FDA Category 1 Monographed Sunscreen Ingredients a Harrison’s Online > Chapter 57. Photosensitivity and Other Reactions to Light > Photoprotection [HEALTHCARE – sunscreen]

AccessMedicine | Mechanisms of Action of Antimicrobial Drugs

Topics Discussed: 4-aminobenzoic acid; aminoglycosides; antimicrobials; azalides; beta-lactam antibiotics; beta-lactamase; cell membrane transport; cell wall biosynthesis … [MEDICINE – infection]

4-Aminobenzoic acid

PubChem Substance (SID) 152180 3847 PubChem Compound (CID) 978 KEGG Compound ID C00568 CAS Registry IDs 150-13-0 8014-65-1 Miscellaneous Databases and IDs 30753- CHEBI 7627 – NSC 6840 – HSDB 6209 – CCRIS 4-27-00-07875 – Beils
tein Handbook Reference 205-753-0 – EINECS Natural Isotopic Abundance Mass 137.1359800000 Mono-Isotopic Molecular Masses

Biological Magnetic Resonance Data Bank A Repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules [CHEMICAL DATA (NMR)]

web.grcc.cc.mi.us

web.grcc.cc.mi.us/Pr/msds/physicalscience/2006/4AminobenzoicAcid99percent.pdf [SAFETY – MSDS]

To sum up.

This is a very good place to start from. Wikipedia has a good overview, several useful links. ChEBI has all the links to Open sites that you could want. Pubchem has comprehensive but variable (author-supplied) information. I haven’t looked at Google Scholar yet. A student will conclude (correctly) that Wikipedia and Bing provide useful high-quality information.

So what If I want to get 4-amino-benzoic acid out of Institutional Repositories. I can’t, or at least I don’t know how to. I know it’s in those temples but I can’t get at it.

So why do Google/Bing work so well at finding what people want.

It’s about the hyperlinks.

Uhh?

It’s about the hyperlinks.

Google collects the information about which document link to which other documents. The links are based on HTML which contains a special tag (<a href>) to point to other documents. Google collects all these hyperlinks and builds a giant network. It then computes the eigenvectors. Don’t switch off, I only put this in to show that there is a clear algorithm for deciding the relative popularity of various sites. In very simple terms the sites which are most linked to are given the highest rank.

This ranking is based on exposing static HTML pages with hyperlinks to the search engines. If you don’t expose HTML pages you don’t get indexed. If you expose a database interface (e.g. a form) you don’t get indexed. (There are other methods, and Google will trawl OAI-PMH) but the primary linking is through HTML.

Theses are reposited in PDF so they don’t contain hyperlinks. So a thesis doesn’t produce GoogleJuice). Theses are exposed through forms so they don’t get indexed that way.

So generally a scientific thesis in an IR is largely invisible to the main web. I am happy to modify this statement if anyone can provide evidence that a significant number of scientific theses have been discovered by Google and indexed.

So my question is simple:

How do I search for all occurrences of 4-amino-benzoic acid in theses worldwide. A simple, useful request. I don’t believe I can do it. If I still can’t do it by October (#ILI2009) I will highlight the issues.

Posted in Uncategorized | 7 Comments