Feature Extraction and Feature authoring

An interesting review: Deepak Singh: The value of feature extraction:

Let’s start with a quote from a talk on Ambient Findability
  • For every search on cancer.gov, there are over 100 cancer-related searches on public search engines.
  • Of these searches, 70% are on specific types of cancer.

There is another statement of interest in the same talk

… the ability to find anyone or anything from anywhere at anytime

The above statements bring to mind the subject of context. Let us agree that “data finds the data“. In that case we must also agree that data must be found in the correct context. . Don’t believe me, just ask Jeff Jonas. In my mind, if machines are to do this, semantic markup of some sort is the only way. Extracting information from documents, regardless of format, whether they be text, images, video, is one of the key challenges of our times. In the life sciences, right now, I don’t really know of any ways (if someone knows of any, let me know) that someone can extract the meta-data from an image or a video, and correlate it to meta-data in a set of text files and automatically come to a conclusion about the potential context of the two observations. I talked about Persistent Context for the life sciences in the past. Let me steal another of Jeff’s ideas, that of Sequence Neutrality. Essentially, “context engines must constantly be on the lookout for new observations that change earlier assertions – and if a new observation provides such evidence – the invalidated assertions from the past must be remedied.“. Context and feature extraction together make a very powerful mix, which can help pharma companies find better, safer drugs faster. This is especially critical in the kind of healthcare environment taking root today, with an emphasis on pharmacovigilance, early safety assessment, etc. If we can continuously update our safety databases based on new data, we are likely to identify adverse events faster, and essentially could carry out constant meta-analysis.
Jon Udell in a post commenting on Tim O’Reilly’s review of Twine talks about entity extraction and a firefox plugin called Gnosis. I had heard about Gnosis before, but only looked at it askance. However, Jon’s post made me take a second look, and all I can say is WOW. Take a look at the screenshot below [PMR: omitted here]. It shows the features that Gnosis extracted from my blog post on pharma futurology. The interesting thing is not the actual results, but the concept. If you could do the Freebase thing, and add additional information which gets stored in a dictionary somewhere, you have that much power available to you.

PMR: And OSCAR does pretty much the same for chemistry. Maybe the way forward is a mashup of domain-specific engines in a single framework. I’d certainly like to see the context added. There is so much experimentation to be done – and like all experiments we have to expect failures as well as successes. But the cost of each is getting less.

But shouldn’t we be getting these sort of tools to authors as well as readers? That’s one of our next steps.

This entry was posted in semanticWeb. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *