Talk in Biochemistry: Can Machines Understand the Scientific Literature?

Typed into Arcturus

I am giving a talk in the Biochemistry Department (at Cambridge) this morning (1015) and here is a brief outline of some of the topics I might cover. As always I don’t know what I shall say in detail until I meet the audience. In this case I am hoping that I get useful feedback and discussion.

I think that within the next 5 years technology will allow scientists to have machines that manage their immediate information environment – reading the literature, recording many of their actions, remembering their preferences, helping them write papers, theses and grants, etc. These “amanuenses” will be built from standard technology and will have natural interfaces – speech, gestures, etc. They will be built on communally agreed semantics and ontologies – technologies where bioscience is leading the world. It should, for example, be possible to ask a machine (using speech)

“What papers published last week do I need to know about? Summarise the most important points.”

“Consult our lab thesis-bank and find the most commonly used buffers – has our use changed over 5 years?”

“find me all the selective inhibitors of CYP2D6 and compute the energy of docking them into my mutant PMR123”

This is possible in bioscience because the community has been committed for over 3 decades to publishing Open information in semantic form and building ontologies (e.g. Go, CheBI). There are agreed universal identifiers for information. As a result it is possible to build Tim Berners-Lee’s vision of a semantic web of Linked Open Data which then links to all the other Open data on the web – geonames, bibliography, institutions, etc. This distributed knowledge base is at the core of scientific machine intelligence.

But this vision is threatened by major information and media interests. We are starting to see new restrictions on our use of scientific information, e.g. through Digital Rights Management in the library system and the power of the publishers to restrict and control innovation to their advantage. As a result it is critical that the bioscience community continues to challenge for Open Science (Data, Source, Access, Standards).

The talk will give demos of scientific speech recognition, machine understanding of theses and the semantic web.

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *