He talked about how natural language (NL) helps the Semantic Web (SW), especially both sides of the chicken-and-egg problem (the chicken AND the egg). On one side, annotations can be created from unstructured text, and ontologies can be generated, mapped and linked. On the other side, NL search can consume SW information, and can expose SW services in response to NL queries.
The goal of Powerset is to enable people to interact with information and services as naturally and effectively as possible, by combining NL and scalable search technology. Natural language search interprets the Web, indexes it, interprets queries, searches and matches.
Historically, search has matched query intents with document intents, and a change in the document model has driven the latest innovations. The first is proximity: there’s been a shift from documents being a “bag of keywords” to becoming a “vector of keywords”. The second is in relation to anchor text: adding off-page text to search is next.
Documents are loaded with linguistic structure that is mostly discarded and ignored (due to cost and complexity), but it has immense value. A document’s intent is actually encoded in this linguistic structure. Powerset’s semantic indexer extracts meaning from the linguistic structure, and Barney believes that they are just at the start of exciting times in this area.
Converging trends that are enabling this NL search are language technologies, lexical and ontological knowledge resources, Moore’s law, open-source software, and commodity computing.
Powerset integrates diverse resources, e.g. websites, newsfeeds, blogs, archives, metadata (“MetaSearch”), video, and podcasts. It can also do real-time queries to databases, where an NL query is converted into a database query. Barney maintains that results from databases drive further engagement.
He then gave some demos of Powerset. With the example “Sir Edward Heath died from pneumonia”, Barney showed how Powerset parses each sentence; extracts entities and semantic relationships, identifies and expands these to similar entities, relationships and abstractions; and indexes multiple facts for each sentence. He showed an interesting demonstration where multiple queries on the same topic to Powerset retrieve the same “facts”. The information on the various entities or relationships can come from multiple sources, e.g. information on Edward Heath or Deng Xiaoping is from Freebase and details on pneumonia comes from WordNet.
He gave an example of the search query “Who said something about WMDs?”. This is difficult to express using keyword search: to express that someone “said something” and that it is also about weapons of mass destruction. Barney also showed a parse for the famous wrestler / actor Hulk Hogan, with all the relations or “connections” to him (e.g., defeat) and the subjects or “things” that he is related to (e.g., André the Giant).
Powerset’s language technologies are the result of commercialising the XLE work from PARC, leveraging their “multidimensional, multilingual architecture produced from long-term research”. Some of their main challenges are in the areas of scalability, systems integration, incorporating various data and knowledge resources, and enriching the user experience.
He next talked about accelerating the SW ecosystem. Barney said that the wisdom of crowds can help to accelerate the Semantic Web. What starts as a broad platform gets deeper faster when it gets deployed at a large scale, realising a Semantic Web faster than expected. This drive comes from four types of people:
- The first category is publishers, who upload their ontologies to get more traffic, and can get feedback to help with improving their content.
- Users are the next group, as they will “play games” to create and improve resources, will provide feedback to get better search, and will create (lightweight, simple) ontologies for personalisation and organising their own groups.
- There are also developers, who can package knowledge for specialised applications (e.g., for vertical search).
- Finally, advertisers will want to create and upload ontologies to express all the things that should match their commercial offerings.
For the community, Powerset will provide various APIs and will give access to their technologies to build mashups and other applications. Powerset’s other community contributions are in the form of datasets, annotations, and open-source software.
Their commercial model is in relation to advertising (like most search engines) and licensing their technologies to other companies or search engines. Another related company (a friend of Barney’s) is [true Knowledge]™.
I’m still waiting for my Powerset Labs account to be approved; looking forward to getting in there and trying it out myself. Thanks to Barney for the great talk.