Copyright madness – story 2

Continuing the theme of my last post on copyright absurdities here is the gist of a mail I got today – I have rewritten it in my own words and changed identities.
My correspondent is writing a scholarly account of the work of a famous Hungarian scientist (Prof. Z) and needs 3 articles in Hungarian published 85 years ago. Her (highly prestigious) institution does not have paper copies either locally or in the stores. So the only option was to request interlibrary loans from the British Library. These are photocopies, not the actual volumes of course. When the photocopies came, she was called to SIGN for them. SIGN?? (I remember ILL as something that came through the post – I never had to sign for them). The signature indicated – in her words – “what I could NOT do. Thus

1. I could not copy it, or covert it to any digital form
2. I could not pass the original paper on to anyone else (including students).
3 I could not quote from it without permission.”

PMR: What on earth is happening here??? If a alien visitor came here tomorrow it would be told:

  • Scientists do science which they wish to communicate to other people, especially scientists.
  • The do this by writing papers. They do not expect to get paid by the people who read the papers.
  • They are delighted when people read their papers. They never refuse permissions for people to re-use their material.
  • They only wish to be recognised as the authors of the work.
  • Eventually they die. Dead scientists are unable to respond to requests for permissions. Prof. Z has been dead for 40 years. That’s a long time.
  • The national library of Britain has a policy which is – presumably – based on some idea of copyright which effectively restricts the readership of these articles to near zero.

PMR: I think the whole library world is now infected with copyright paralysis. I have come across this several times in the case of theses. “Oh we cannot let you actually datamine any useful information from this thesis because we are not the copyright holder. Oh and s/he is dead.”
So what is actually the problem? My feeble understanding of copyright is that (at least in the UK and much of Europe) this is a civil matter between me and the copyright holder. The worst that can happen is that the copyright holder can sue me. And that the damages will reflect the material loss that they have suffered. I remind you, Prof Z. is dead. He is unlikely to sue anyone. If he were alive he would give my correspondent permission. Perhaps his estate (were we able to locate it) would sue her? And what would be the damage? letting a student read his work?
So here is a risk assessment so she can decide whether to take the risk of giving the photocopy to a student. These are probabilities

  • Chance Prof Z rises from grave: 6 * 10-233 (a conservative estimate)
  • Chance his estate is notified of apparent copyright violation 10-12
  • Chance his estate regards this as a material breach 10-10
  • Chance court awards judgement in favour of Prof. Z’s estate 10-10
  • amount of damages: 1 forint

So her likely loss is 10-32. I could live with this
On the other hand she is very likely to be taken to task by the British Library for violating something I do not understand and which is totally irrelevant.
Some simple facts (not FACT)

  • scientific papers are different from Harry Potter novels
  • scientific papers are different from Mickey Mouse cartoons
  • scientific papers are different from Beatles records
  • scientific papers are different from videos

So why don’t we decide as a community that copyright of scientific papers written by dead scientists is an irrelevancy in the modern age. Yes, we recognise that copyright on recent publications is different because the publishers will sue us or cut off our supply and that’s a different argument for a different day.
Please, librarians, help us take some positive steps for once.

[NOTE ADDED LATER.
Professor Z is not, in fact, dead – see comments – and I apologize profusely for this slur. I will revise the probability downwards to ca 10-6. Even at these odds, however, I will take the risk. ]

Posted in open issues | 2 Comments

Copyright madness – story 1

Today I have come across two accounts of copyright problems which highlight the complete absurdity of our current practices in the twenty-first century. We are crippling our scientific process. Here’s the first from my colleague Nico Adams’ blog. It’s a full, racy account and I’ve only omitted the images.

Requesting Permissions for Re-use of Copyrighted Material. – August 15th, 2007

Now I am not normally one to rant, at least not on a blog, but today I encountered something that makes me just mad…..and I mean hopping mad.
I have just finished writing a review paper […] I have included some figures, which were taken from the original research papers forming part of the review. Given that these were not my figures and that I respect and honour the copyright of other authors who have worked hard to produce high-quality and illustrative figures for their publications and the copyright of publishers who have been assigned those rights by an author, I went off to request permissions for re-use of copyrighted material from the relevant publishers. The review was based on about 150 papers, and I had taken figures from a few of them…ACS, RSC…no problem. Their procedures are all more or less automated and relatively pain free, although time consuming. And then, well then I came to Elsevier……
Elsevier has outsourced their copyright clearance procedure to a company called the Copyright Clearance Centre (I have included the link for your edification), which, on its website claims to “help to advance education, innovation and the free flow of information.” So far so good. Following Elsevier’s instruction, the first thing I have to do to obtain permission, is to go and find the resource I took the figure from on ScienceDirect. So off I go and locate the relevant journal (Talanta) and citation on Science Direct. Next, the website instructs me to find the abstract of the paper and to press the “Request Permissions” button.
[… image omitted …]
abstract.gif
Pressing this button launches a pop-up window which asks me what I want to do and I make my
selections:
[… image omitted …]
abstract.gif
I am somewhat curious as to why it asks me which currency area I am currently in, but decide to ignore it for the moment. Having made my choices, I hit the “continue” button. I am then asked to set up an account as I have never used rights link before. Ok, getting tedious, but I hit the button to set up an account (note: none of this is necessary with the other publishers). I am now taken to a page where the anger really sets in: they are asking me how I want to pay.
[… image omitted …]
abstract.gif
secondpage.gif
How I want to pay?? All I want is to request permission for reuse of one small figure. I do not want to pay anything – my institution is subscribing to the journal for me. Why on earth would you want to lump requests for re-use of copyrighted material together with a business process that may be appropriate for the purchase of pay-per-view access? If I do not want to have pay-per-view access, why do I need to hand over payment details? However, the dropdown menu only gives me the opportunity to choose between a credit card payment and an invoice.
Hmmm…..on I go and fill in my details hoping that the “payment” thing is just going to go away down the line. But no such luck and sure enough, on the next screen I am being asked for my credit card details IN ORDER TO BE ABLE TO SET UP AN ACCOUNT to request re-use permissions.
thirdpage.gif
[… image omitted …]
abstract.gif
At this stage, I broke off the procedure. I understand that it might be convenient for the “Copyright Clearance Centre” to set up an account for me in such a way, that if I ever wanted to purchase a journal article from one of their customers, they have all the necessary information. IT IS NOT CONVENIENT FOR ME. All I want is permission to re-use a figure in a paper. I do not think that I should have to hand over my credit card details for this and I refuse to do so.
So what is the consequence of this? I am not prepared to set up a Rightslink account with the Copyright Clearance Centre under these circumstances. Therefore I cannot obtain permission to reproduce the figure I wanted and therefore I cannot use the figure in my paper. Furthermore, there is the personal inconvenience: I now have to throw the figure out of the manuscript and to renumber all of my figures in the text. This will cost me at least half an hour.
More significantly though, this has a negative impact on scientific dissemination. On the grand scale of things, it is only a tiny thing, but in effect this has stopped me from re-using a figure created by other scientists, which, I am sure, have a vested interest in their research being talked about, evaluated and disseminated. That is part of a scientist’s core business. The Copyright Clearance Centre has neither helped to advance education and innovation, nor indeed the flow of information, but rather has impeded it. And Elsevier is indirectly guilty: they have not done their best for their authors by helping to disseminate their science, but are collaborating with an organisation which actually puts people off reusing science. They have allowed requests for re-use of material to be lumped into the same procedure used for the purchase of pay-per-view articles. At best that is thoughtless and very poor customer service.
Now as I say, I don’t like to rant, but this kind of thoughtlessness makes me mad.

PMR: I feel the same emotions. I would not have paid to include and image from someone else’s paper. It could even have been in another Elsevier article – I don’t know the details. I would also have omitted the image, as it’s too much effort. So all that this has achieved is:

  • to impoverish science. The article is worse that it should have been because a publisher made it not worth the author’s while. The publisher has made no money from the non-deal. It has also antagonised a scientist. So little wonder that the scientific community feels that many publishers are part of the problem.
  • to waste time. This sort of stuff takes ages. If, by contrast, the works are licensed under Creative Commons, all the author has to do is acknowledge the source of the image(s) which takes a minute or two. So all this negative stuff is also wasting time. More impoverishment.

PMR: Nico has been very balanced and posted an article from a helpful publisher, PNAS (see Copyright Permissions – How they can also be done! ). Again worth reading in detail, including a clear account of why PNAS (fairly recently) requested copyright transfer. Partly because:

“Unfortunately, PNAS cannot provide permission for others to use all or part of articles published from 1915 to 1992 because we do not hold copyright. Only the original authors or their designees can grant permission. Researchers are frustrated when they contact us for permission to use seminal works and we are unable to grant their requests. “

PMR: Possibly true. But a paper in 1915 would require most authors to be about 112 years old or more. I have not been able to contact many spirits for permissions so it’s rather difficult. But more in the next post…

Posted in open issues | Leave a comment

CrystalEye GreaseMonkey

Nick Day has just released a Greasemonkey script which provides a full crystallographic overlay for existing journals. It’s worth trying as it’s visually exciting as well as very useful. This post tells you what it does, how it works, and why all publishers will actually benefit from making their crystallographic data Open.
The CrystalEye GreaseMonkey (Javascript) needs to be installed (from http://userscripts.org/scripts/show/11439) inside your Firefox browser. (I don’t believe this is a risk, but make your own decision). It is then activated whenever a new page is loaded from certain sites (e.g. pubs.acs.org/* for any ACS journals). You scan switch in on of off from the box and also decide which sites you wish to visit.
crystaleye0.PNG
When it finds a DOI in the page (usually from a TOC) it asks the CrystalEye site whether this DOI is listed as one containing one or more crystal structures. (CrystalEye contains over 100,000 crystal structures, most from the last 5 years but some, via the Crystallographic Open Database, going back several decades). CrystalEye returns the addresses of those structures corresponding to the given DOI. The Greasemonkey then adds the CrystalEye logo (I have removed the publisher’s graphic because of copyright).
crystaleye1.PNG
The blue eye (because this is a BlueObelisk-eyed monkey) indicates crystals and in this case there are three [1][2][3]. Clicking on the first immediately loads the Jmol applet and the metadata:
crystaleye2.PNG
The links are direct to the publisher’s site and if you have a licence (or if the article is Open Access) you’ll be able to read the fulltext. The material here is all automatically derived from the data (no images or text have been taken). You can even see what we calculate the chemical structure to be:
crystaleye3.PNG
Again all this is automatic. (Credits to Jmol and – right- CDK structure diagram generator).
So here we have something very close to an overlay journal. No textual commentary, but we are working on that.
So thank you to Dave Martinsen of ACS for reviewing Greasemonkey. And we hope that it increases the clicks on your full-text – people will see the crystal structure and be so excited they will wish to read the full article.
It also works for RSC, IUCr and others like American Mineralogist. But not for Wiley, Springer and Elsevier. Not because we have anything against them, but because they don’t make their structures available. CrystalEye cannot find them, so it can’t point to them. And so, publishers, you are losing out to those publishers who DO expose their crystallography. And perhaps CrystalEye will persuade authors to publish their structures where they can be most seen.

Posted in "virtual communities", blueobelisk | Leave a comment

Gerry Toomey, Richard Jefferson and open science

I was very pleased to meet Richard Jefferson of CAMBIA at scifoo. I was reminded of our conversation by a quote in a recent item on Peter Suber’s blog (below), and thence tempted into reading the whole article which is very compelling on Open Science (at least to people like me who don’t need convincing). It contains a number of very useful case studies and interviews indicating that wholesale patenting (e.g. of biotechnology) is counter productive (The tragedy of the anticommons). I would like to quote the whole article but will let Peter’s excerpt suffice.
16:07 15/08/2007, Peter Suber, Open Access News
Gerry Toomey, Sharing the fruits of science, University Affairs, August/September 2007. Excerpt:

…We…know that the social behaviour of modern science, and of the broader domain of innovation, is marked by a continual tug-of-war. At one end of the rope we find the forces of collaboration and sharing. At the other end are the instincts to compete and to protect one’s hard-earned intellectual property. While both kinds of behaviour lubricate scientific discovery and technological innovation, IP protection via patenting, with a view to future profits, has become a dominant trend in recent decades, particularly in the life sciences.
But now an international scientific counterculture is emerging. Often referred to as “open science,” this growing movement proposes that we err on the side of collaboration and sharing. That’s especially true when it comes to creating and using the basic scientific tools needed both for downstream innovation and for solving broader human problems.
Open science proposes changing the culture without destroying the creative tension between the two ends of the science-for-innovation rope. And it predicts that the payoff – to human knowledge and to the economies of knowledge-intensive countries like Canada – will be much greater than any loss, by leveraging knowledge to everyone’s benefit….
“The reason we talk about open source,” explains Richard Jefferson, a California-born biotechnologist now living in Australia, “is because it was the first movement to embed in the creative process, in this instance software engineering, the permission not just to inspect inventions but to use them to create economic value. Open source imposes covenants of behaviour rather than financial agreements. Unrestricted use and the right to make a profit don’t usually get in bed together. In open source, they’ve done so quite productively.”
Dr. Jefferson is founder of an international research institute in Canberra called CAMBIA. He and his centre are among the most outspoken and active proponents of open science….

Posted in data, open issues | Leave a comment

Dave Martinsen reviews ACS and Greasemonkey

Noel O’Boyle has highlighted a review by Dave Martinsen of the American Chemical Society. Dave has been very supportive of the new technologies and ideas that are emerging and has run sessions at the ACS meetings highlighting them. Here he reviews the last meeting (Chicago) and also adds a postscript about the Greasemonkey:

Yes, librarians are doing it too. To begin with, my Greasemonkey userscript for adding bloggers’ quotes to journal pages has just gotten an enthusiastic write up by Mark Rabnett, a hospital librarian and blogger.He learned of this userscript by reading a recent paper in ACS Chemical Biology by D.P. Martinsen, “Scholarly Communication 2.0: Evolution or Design?”. This was news to me, so I checked it up. It turns out that it’s pretty much a review of the Spring ACS sessions on Web 2.0. He begins by giving a good description of what the term Web 2.0 means, and why scientists should know about it. Then he goes on to discuss the presentations by Nick Day, Henzy Rzepa and Colin Batchelor among others (these are just the people I know or know of).
Then we come to the good bit. At the end of page 370 it says:

Two additional items, unrelated to the ACS meeting, are significant. Using Greasemonkey, a Firefox extension that allows anyone to write scripts that can change the way a web page looks, the Blue Obelisk group, a community of chemists who develop open source applications and databases in chemistry [ref to BO paper], has created several such scripts to enable chemistry-related features. One of these tools will insert links to blog stories about journal articles into the tables of contents of any ACS, RSC, Wiley, or NPG journal [ref to old BO wiki]. This enhancement to a journal’s table of contents is completely independent of the journal publisher.

That’s a pretty lucid summing up of the userscript and its significance. Somewhere I suspect PMR’s hand in this. 🙂
PMR: No hand in what Dave wrote, but we talked about the Greasemonkey and he sees the potential.
Just thinking about this we are close to having an overlay journal. That’s a journal where the editors create tables of contents for material that already exists, and adds some commentary. In fact I would call TotallySynthetic.com (and som other chemical blogs) overlay journals. The editor (TotSynth) select articles of note (almost always because of merit, though occasionally because of controversy) adds a top-class and engaging commentary and invites comments which are usually very much to the point (i.e. little wibble). I’d call that an overlay journal.
Crystaleye (created by Nick Day – above) is also close to an overlay journal. Currently it’s a comprehensive collection of all up-to-date crystal structures (esp. ASC, RSC, IUCr but necessarily without publishers who do not make the crystallographic material available). Here’s an example where making data (sic) Open leads to increased exposure.
We’ve been combining these ideas for Crystaleye, but that deserves a whole post to itself.
Posted in data, open issues, programming for scientists | 1 Comment

scifoo: academic publishing and what can computer scientists do?

Jim Hendler has summarised several scifoo sessions related to publishing and peer-review and added thoughts for the future (there’s mote to come).  It’s long, but I didn’t feel anything could be selectively deleted so I’ve left only the last para, which has a slight change of subject – speculation what computer scientists could do to help.

15:16 14/08/2007, Planet SciFoo
Here’s a pre-edited preprint of my editorial for the next issue of IEEE Intelligent Systems. I welcome your comments – Jim H.
=======================
[… very worthwhile summary snipped …]
I believe it is time for us as computer scientists to take a leading role in helping to create innovation in this area. Some ideas are very simple, for example providing overlay journals that link already existing Web publications together, thus increasing the visibility (and therefore impact) of research that cuts across fields. Others may require more work, such as exploring how we can easily embed semantic markup into authoring tools and return some value (for example, automatic reference suggestions) via the use of user-extensible ontologies. In part II of this editorial, next issue, I’ll discuss some ideas being explored with respect to new technologies for the future of academic communication that we as a field may be able to help bring into being, and some of the obstacles thereto. I look forward to hearing your thoughts on the subject.

PMR: I’d love to see some decent semantic authoring tools – and before that just some decent authoring tools. For example I hoped to have contributed code and markup examples to this blog and I simply can’t. Yes there are various plugins but I haven’t got them to work regularly. So the first step is syntactic wikis, blogs, etc. We have to be able to write code in our blogs as naturally as we create it in – say – Eclipse. To have it checked for syntax. To allow others to extract it. And the same goes for RDF, MathML. SVG is a disaster. I hailed it in 1998 as a killer app – 9 years later we are struggling to get  it working in the average browser. These things  can be done if we try hard enough, but we shouldn’t have to try.
It’s even more difficult to create and embed semantic chemistry (CML) and semantic GIS. But these are truly killer apps. The chemical blogosphere is doing its best with really awful baseline technology. Ideas such as embedding metadata in PNGs. Better than nothing but almost certain to decay with a year or so. Hiding stuff in PDFs? hardly semantic. We don’t even have a portable mechanism for transferring compound HTML documents reliably (*.mth and so on).  So until we have solved some of this I think the semantic layer will continue to break. The message of Web 2.0 is that we love lashups and mashups but not yet clear this scales to formal semantic systems.
What’s the answer? I’m not sure since we are in the hands of the browser manufacturers at present and they have no commitment to semantics. They are focussed on centralised servers providing for individual visitors. It’s great that blogs and wikis can work with current browsers but they are in spite of the browsers rather than enabled by them. The trend is towards wikis and blogs mounted on other sites rather than our own desktop, rather than enabling the power of the individual on their own machine.
Having been part of the UK eScience program (== cyberinfrastructure) for 5 years I’ve seen the heavy concentration on “the Grid” and very little on the browser. My opinion is the the middleware systems developed are too heavy for innovation. Like good citizens we installed SOAP, WSDL etc and then found we couldn’t share any of it – the installation wasn’t portable. So now we are moving to a much lighter, more rapid environment based on minimalist approaches such as REST.  RDF rather than SQL, XOM rather than DOM, and a mixture of whatever scripts and templating tools fit the problem. But with a basic philosophy that we need to build it with sustainability in mind.
The Grid suits communities already used to heavy engineering – physics, space, etc. But it doesn’t map onto the liberated Web 2.0. An important part of the Grid was controlling who could do what where. The modern web is liberated by assuming that we live our informatics lives in public. Perhaps the next rounds of funding should concentrate on increasing the emphasis on enabling individuals to share information.

Posted in cyberscience, programming for scientists, scifoo | Leave a comment

lemon8-XML and theses

Via Peter Suber. Although the full post is important for Open Access new, I concentrate on an XML tool I hadn’t heard of:
15:13 13/08/2007, Peter Suber, Open Access News
Dean Giustini, UBC’s John Willinsky – Stanford Takes Him (For Now), Open Medicine blog, August 12, 2007. Excerpt:

UBC’s Dr. John Willinsky is no stranger to open access advocates. His book The Access Principle is ‘required reading’ for all those who believe in the connection between access to information and the economic and social well-being of knowledge-based societies. Recently, John accepted an appointment at Stanford University….
[…]
As for what’s next for PKP, we will be releasing the next version of OJS, in a few months time, in association with our parallel release of Lemon8-XML, developed by MJ Suhonos, which will will automate XML conversion from Word and ODT documents.

PMR: So I looked it up:

Lemon8-XML

Lemon8-XML is a web-based service designed to make it easier to convert academic papers from typical word-processor editing formats such as MS-Word .DOC and OpenOffice .ODT, to publishing layout formats such as XML. It provides the ability to edit document metadata such as the list of authors, as well as robust citation editing, checking and correction.Lemon8-XML is a project developed by the Public Knowledge Project, as a demonstration of technology that can help significantly decrease the cost and effort of scholarly publishing. Although it is a standalone service, Lemon8 works well with journals published using Open Journal Systems.
Much of the work involved in Lemon8 has been developed from years of journal publishing experience, and continues to take advantage of the newest web-based technology as it becomes available.
We will soon be creating a mailing list for interested developers and beta-testers, along with some documentation, an FAQ, and a PKP discussion forum for Lemon8-XML.
If you’d like to be kept up-to-date on Lemon8-XML developments, please let us know.

PMR: This is very exciting for our SPECTRa-T : Submission, Preservation and Exposure of Chemistry project where we are capturing metadata from academic theses. Although the preferred method of presentation is PDF these theses are originally born-digital as Word or LaTeX. But these versions are often hidden away and not reposited. The PDF looks so wonderful, doesn’t it? Surely no-one wants that ugly Word doc? But for use it’s a 100 times better. And if the lemon8-XML can capture authors and other metadata that’s a really important advance.
Because the more structured the document is the better we can analyze it. For example it’s not a good idea to look for chemical names in author lists. (Murray-“Rust” could be indexed as Fe3O4 and PMR as proton magnetic resonance). But normal word documents just contain different paragraphs, usually no sections. Bold 12 is not obviously a chapter, author, or citation.
I couldn’t find a download button. (I am assuming that it is Open Source, given that it comes from the home of Open Journals. No logical connection, of course, but…)
NOTE ADDED LATER:
There is a forum http://pkp.sfu.ca/support/forum/ for lemon8-xml and some slides from a meeting: Lemon8-PKP-Conference.pdf
The slides have a bit more information suggesting this is an early adopter tool at present. I have written asking for more info and will post when it appears. Since they have other Open Source software on their site it should be a good bet that lemon8-xml is Open.

Posted in theses | Leave a comment

touchgraph for this blog

Having mentioned touchgraph Egon has already gone and got it running.

Touchgraphing my blog

Via SciFoo Planet (from Partial immortalization)I learned about TouchGraph Google (Peter brought it into Chemical blogspace). It’s cool, though not open source. Here’s the touch graph for my blog:

As you can see, plenty of blogspot bloggers around me, among which, in purple, Useful Chemistry. Funny thing is, each time I repeat the Google search, the output is different. Oh, and make sure to drag one of the halos around; that will keep you procrastinating for the whole afternoon 🙂

Actually it only takes half a minute or so – so I typed this blog’s URL into the applet. The graph is a measure of connectedness of the sites, possibly links, possibly content similarity, possibly co-mentions by a third party. Perhaps not surprisingly Google sees little linking to the chemical blogosphere but a lot to Open Access and Open Data.
touchgraphthumbb.PNG
(Just about readable). Co-existing with Peter Suber, SPARC, Talis, etc.

Posted in Uncategorized | Leave a comment

open data: centralised or decentralized?

Deepak Singh highlights one of the emerging approaches to global data, Freebase. Recall that at scifoo we also heard about Google’s offer to host scientific data:

Freebase at Scifoo

Published 15 hours, 44 minutes ago

One of the sessions at Scifoo that left me a little confused was the demo by Danny Hillis and colleages on Freebase, something that I have discussed previous at bbgm. I love the concept of Freebase, the ability to create structures on top of data in a collaborative, somewhat ad hoc way.
Something that I wasn’t aware of was that the folks at Metaweb are using Freebase (the website) as a test case, and expect that the primary use will be for developers to build applications using the Freebase API. The killer application that was mentioned was people search. I wonder how people search using Freebase would get significantly better traction that something like Spock, although it’s easy to see how a proper implementation could easily leap ahead of any people search engine (and someone should develop one right now).
The somewhat disappointing aspect, at least as I understand things today was that all data had to be local to Freebase. That would mean that if I wanted to use Freebase as an annotation engine for multiple distributed data sets (e.g. at NCBI or EBI), it would not be too practical. However, I wonder if there was a way of using Freebase as a store for annotations, etc, which link out to all these data sources, e.g. a store for protein interactions based on literature data stored elsewhere.
I believe that to be applicable in the biosciences, and perhaps elsewhere, Freebase needs to be untethered. While the website can remain a source of information, and people can use it as a backend data source, an open data model, query language and API which can be run anywhere and put on top of any data source would make things very possible. Does it make sense for the folks at Freebase to do that? I don’t know and haven’t had the opportunity to quite put my head around the problem, but if all data has to be local, it’s going to be hard to use the power in a practical way. The metaweb, as it were, should not be centralized. Perhaps Freebase is just one example, a test ground for what Metaweb Technologies will make available, and we just need to wait for that.
Can you make out that I am a little confused?

2 Responses to “Freebase at Scifoo”

  1. 1 Jim Hendler Aug 12th, 2007 at 4:56 pm
    A lot of the Semantic Web vision is based on exactly what you are asking for – something like MetaWeb, but open and distributed – like the difference between a great ebook and the Web – each has its place, but the place for an open distributed store as a way of linking things seems to be important — check out the W3C’s Semantic Web Activity (http://www.w3.org/2001/sw)
  2. 2 Deepak Singh Aug 12th, 2007 at 8:00 pm
    I am quite familiar with the W3C work (I’ve blogged about it before as well), and I completely agree that each has it’s place. What appeals to me about Freebase is the ability of people without expertise in Ontologies and XML to build structure on top of data.

I am attracted by Freebase/Metaweb and also DBPedia/openlink. These are technologies which build ontological-supported repositories where large amounts of metadata can be centrally stored. I talked with some of the people involved at the www2007 meeting and some of the have the vision of vast central stores of metadata – loosely tera-triplestores or larger. I think that technology now allows this.
However I also picked up this centralist approach. There was also a view that the whole of the world’s information could be given unique IDs. This won’t work generally as there are many concepts which are important but too fuzzy to label. Copies, containers, addresses, versions etc. all cause major problems.
And I think Deepak is right for bioscience – it can’t be centralised and the semantic web has to be distributed.
But chemistry is smaller. I have already suggested that a year’s core information on new published compounds could be squeezed into a few terabytes. Not everything, perhaps, but enough to make it worthwhile. And, in chemistry, most concepts can be given unique labels. So, as always, it’d discipline-dependent.
Did I mention that such a repository has to be completely Open Data?
Posted in data, open issues | 1 Comment

miniblogosphere

Here’s Pimm (attilachordash) with a nice picture of the linkages in the scifoo tag cloud.

SciFoo links visualized by TouchGraph Google Browser

Posted by attilachordash on August 11th, 2007

The Google Hacks book from O’Reilly was one out of the free goodies on the SciFoo last weekend. Hack #3 is Visualize Google Results with the TouchGraph Java applet that allows you to visually explore the connections between related websites. Of course I started with the term “scifoo” with the setting of filtering single nodes out of the network in order to see the separate groups of nodes behind.

scifootouchgraph

Explore the detailed properties of the SciFoo URL cloud by double clicking the individual nodes in the network.

PMR: (the click didn’t work for me either in Firefox or IE – maybe something has to be enabled). Perhaps someone would like to do this for the chemical blogosphere?

Posted in scifoo | Leave a comment