Wanted: “Joined-Up” News Search For Grown-Ups Via Linked Data

All the information in the world and beyond falls into one of three categories.

  • What we know
  • What we know we don’t know
  • What we don’t know we don’t know

    Sounds very much like the poetry of Donald Rumsfeld, but the idea of separated knowledge predates him all the way back to the time of Plato‘s Dialogues. When it comes to the news – the information updates that inform our knowledge of the world and our specific interests – there are two other conditions to take into consideration: the expected and the unexpected.

    Expected news is the regular features and updates that we get from feature writers, blogs that share our views, and is largely a comforting place to be: something new to consider on a regular basis but in a familiar continuum of form and style. Updates aren’t really more news, but just more information on a topic. Our knowledge is expanded in a staged manner, a bit like taking a walk along an unfamiliar path in a familiar piece of countryside.

    We all subscribe to blogs, tweets, Facebook pages, etc., because they give us more of what we want to know. Very often, the information is new but rarely is it a break from the past. We are mildly stimulated and feel comforted because we feel abreast of the times.

    This is the stuff we know about.

    The unexpected news item, by definition, comes as a huge surprise. In the traditional media, huge surprises tend to be huge, bad surprises (the Haiti earthquake being the most recent). They tend to arrive at irregular and unwelcome moments and usually have a moral, physical or spiritual affect on us: for those of us who care anyway.

    We can never predict these big stories and the most that journalists around the world can do to prepare for them is to keep a ‘crash bag’ by the door full of travel essentials.

    This is the stuff we don’t know about.

    What these two categories have in common is the presence of an awful lot of information. Either a large database of information accrued over a period of time or large amounts of information generated in a very short time. What both content providers and content consumers really want is a well-told and relevant story. Turning information into stories makes things easier to comprehend, easier to remember and perhaps easier to learn from. Making information meaningful is what a journalist does.

    Relevance is the true promise of the Semantic Web. Profium and the Open Calais project from Thompson Reuters are just two of the increasing number of Semantic Web efforts that promise better and more relevant information. Profium applies Semantic Web technology to the massive ever-filling databases of news agencies, and links that information in a meaningful way to produce wire-copy that can then be sold on to their news publishing clients. This timely relevance means that conflicting reports can be avoided and accuracy enhanced.

    Open Calais (nice overview here from Lullabot) can be used to convert great lumps of text into something meaningful and relevant through predesigned taxonomies, classifications and preassigned tags. Slate magazine has News Dots which is a very interesting application of Semantic Web technology to news reports.

    This powerful pre-filtering and tagging means that a journalist on the wrong end of a lot of text information (which is most of the time) now has a significantly useful tool for making sense of the incoming data. On the Lullabot page there is a link to an Open Calais viewer. Drop a lump of text in there, this article for example, and see what comes up. It becomes immediately obvious how useful this technology can be.

    Through their Open blog, the New York Times, in their usual wonderfully innovative way, have given us the opportunity to build our own API to retrieve and assemble information from the NYT archives that we think might be relevant to us. The demos here use the info boxes from Wikipedia that were converted to linked data by DBpedia. But the awesomeness is there for all to see. (Tip: The code looks even easier if you squint your eyes.)

    This is joined-up search for grown-ups where we can bring ideas together by the means of linked data. Contrast this to the block-letter searches of Google where many items are returned but few of them are relevant and none are related to each other in a meaningful way.

    The hold up for semantic technologies is in implementation. Things will really speed up once data is stored in ways that make searching and finding quicker and more accurate. The momentum is alreadhy there and will only increase exponentially over time.

    So that leaves us with the the third part of the statement to resolve: what about the stuff we don’t know we don’t know? Well, by definition it’s unknowable, but linked data gives the possibility of throwing up a surprise or two.

    Perhaps separate pieces of data that we thought up until now as having no relation to each other are perhaps related by unknown connections hidden away from our awareness. Through the magic of the Semantic Web we have the possibility of increased serendipity and the making of vital life-enhancing connections, in the same way that talking to a complete stranger can reveal a mutual acquaintance.

    We may never be able to get rid of the big nasty surprises, but having lots more fun small ones will always be welcome.

5 thoughts on “Wanted: “Joined-Up” News Search For Grown-Ups Via Linked Data

  1. hi thank you so much for your post. This is a great post about the news I would say. Because you have provided so many necessary information here. Which are helpful. And this will help me out in future.


  2. A vital correct re. this post. There isn’t any such thing as: DBpedia in Germany. OpenLink doesn’t operate a data center in Germany. I do know that Freie and Leipzig universities have/do contribute extraction code from Wikipedia that ends up in a live instance of our database (DBpedia the world uses) running our Software (Virtuoso Database) deployed from our data centers etc..Please re-check:1. http://dbpedia.org/About2. http://wiki.dbpedia.org/Imprin…3. http://www.openlinksw.com/data… — What is DBpedia?Kingsley


  3. Tom, the stuff we don’t know we don’t know is not unknowable, not at all. Anyone who keeps his or her eyes open and gets out in the world, physically or electronically or just by reading a good old newspaper, learns new stuff everyday, and not only in categories he/she was already aware of.In the electronic information world, the way you find out about things you didn’t know existed is via information collection and data mining, technology that finds patterns. For applications to online social and news media, you might look into “text analytics” category. (Evidently, I’ve now given you an example of something you didn’t know you didn’t know.) The ClearForest technology that’s the basis of Thomson-Reuters’ Open Calais fits this category.If you’d like to pursue this topic further, please check out a conference I’m organizing, Smart Content, http://smartcontentconference…. . You’ll find Semantic Web proponents well represented, also folks using a variety of analytical technologies to enrich news, social media, search, etc.Seth, http://sethgrimes.com


  4. Hi Seth, the phrase in its absolute sense is axiomatic for any given slice of time. We will always live in a state of basically having no idea what’s out there. But the great gift that nearly all of us a have is that of curiosity which leads to exploration. Wonderfully, we get to invent or evolve the tools, such as speech to have conversation so we can exchange ideas and so on, so we can explore more and discover more.Once we learn something the state of our knowing or knowledge changes, the boundaries and before long we are back at the edge again discovering cool new things.But we have to be prepared for surprises, good and bad, and since, as far as I can work out most of Greek Literature seems to be about warning humans about not getting too big for their boots we would be wise to be ready for the unexpected regardless of how smart or knowledgable we think we are.Good luck with the http://smartcontentconference…. I’ll be looking for the twitter hashtag.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s