I keep hearing the term Linked Data, but what does it mean?
More or less what it says. All the data on the Internet linked together.
And that is important to me because…?
Your company, like everyone else’s company has a number of separate processes going. Accounts, marketing, HR, government compliances, legal issues, transport considerations. The information that pertains to each process is stored for convenience in separate databases. These databases are associated with the various applications that are used to create them. SAP for accounts, various spreadsheet formats, and document pages.
All contain information vital to the running of your company. All contain information which is mutually inaccessible to each other.
It has worked so far because we have had the human workaround. If a report has to be written for the quarterly board meeting, then someone has to get the information from each of these ‘data silos’ separately and spend a goodly amount of time and energy on finessing the disparate contents into something understandable and useful to act as a basis for a fruitful discussion.
With Linked Data technology running on your system, you ask your system for the information you want in the format you want. The computer itself works out what is relevant and useful. Linked Data enables the various databases to talk to each other and work out what is needed.
Can you imagine if all the databases across the world were linked the sort of really interesting questions that it would be possible to ask with the expectancy of a realistic and relevant answer?
I thought it was already linked together?
The data, usually words, numbers or symbols are contained on a page, usually the webpage that we see before us in our browser. But it is these pages that are linked together. The data itself is all on its lonesome.
While we can get to the pages through searches, links or entering addresses in the location bar the same can’t be said for the data – the words, numbers and symbols represent objects.
In addition, most databases where information is stored are not linked. They are in effect separated ‘silos’ of information. Until recently, with or without the permission of the owners, it was not possible for the these data silos to ‘talk’ to each other.
The main idea of Linked Data is to develop the processes to make communication possible, by means of commonly-agreed protocols, between various datasets wherever they may be.
Isn’t what we have enough? It all seems to work well.
Well, it is a fantastic thing to be able to navigate the world of information going from page to page, but imagine how much more could be done if we could navigate the world of information using the data, the numbers and words on a given page. Imagine if we could instruct computers to combine these particles of information in such a way to produce useful and usable results from it, rather than the process we now have of finding stuff and having it presented to us but still leaving us with the requirement to do all the heavy lifting and join the dots ourselves.
But if I type a set of words into Google, I find a whole bunch of pages containing those words. So what?
Well, you have got yourself to a page with the words you were looking for is what’s so. But at present if you were say, planning a trip, you would have to visit separate sites for each stage of the journey. Airlines, ground transport, hotels, meeting points, etc. If the data was connected together you could put in a set of search terms, date, place, journey time, cost and see a number of options for the entire journey presented to you. That is a big timesaver.
How does that work?
Although Linked Data, or the original term, the Semantic Web, denotes one underlying principle of connectedness, Linked Data is in fact made up of a potpourri of technologies, each dealing with an aspect of bringing the Web of Data to life through individual but interrelated means.
For instance, we know most of the useful pages on the Web are not filled with capricious and random words and images, but are generally about something and are structured in a fairly standard way. Timetables look like timetables, blogs look like blogs, landing pages look like landing pages and so on. If they were to contain information that could help a computer distinguish one from another then that would save us all a lot of time and trouble when looking for something. Searching for events would return searches from pages structured to represent events and so on.
There is a list detailing some of the major technologies being implemented below with a brief outline of their purpose.
If this technology is so great then why isn’t it everywhere?
First of all, there is no killer application in the sense of a spreadsheet or a text document. Nor is there a single killer service like Facebook or RSS. The technology is constantly being rolled out behind the scenes.
This is primarily computer-to-computer technology. As it is becoming more widely available, new applications are and will be created to take advantage of the new ways of handling the huge amounts of data that will be liberated and it is those applications that we will see and use.
Secondly, reinventing the way the world handles data is no small thing, especially when there is so much of it. It has been estimated that the information contained in all the world’s databases has passed the three petabyte mark. A petabyte is a number with twenty-one zeroes after it. That is a lot of data to make useful and relevant.
Fortunately, increasing amounts of the new information that is being created now is being stored in databases in such a way as to make it meaningful and useful for Linked Data technologies.
Thirdly, research in and development in Linked Data processes is continually coming up with new and better ways of doing things. The technology is constantly being updated and improved so data can be stored and retrieved more efficiently.
So, it may not be everywhere at the moment but it will be one day. That will most likely be sooner rather than later.
What do you mean by meaningful and relevant and why is that important?
There are two main senses in the way we use the word meaning in everyday life. One is that if we say a thing has meaning then we are giving the thing significance, as opposed to something we don’t find significant and consequently has no meaning for us. You can see right away that this is a very individual way of looking at things. What is meaningful to one person is not necessarily meaningful to another. Two, we can also say something has meaning because we gain understanding from it. “Don’t touch the stove,” is a meaningful statement because we derive from the meaning of the statement the understanding that we may burn our fingers if we should go ahead and do so.
The use of the word meaning in the Linked Data space is very specific and refers to defining information, data, in such a way that it is understandable by computers. Up until now, words on a web page were understandable by us and findable by search engines, but they were meaningless to computers and thus rendered irrelevant.
By defining the words in such a way that they are understandable by machines, given meaning, the words become data that is relevant and useful.
Why is that important?
Because up until the advent of Linked Data technology, this page you are reading now on your screen needed you to make it meaningful and relevant. The words, numbers or symbols meant nothing at all to the massive processing power in the computer that is sending the information to your screen nor to all the servers around the world.
But now it does. Computers can now understand data instead of simply just recognising data. They understand that the data has meaning and because it has meaning they can link meaning with meaning in a meaningful way.
Say again?
Since data now has meaning, computers can understand the meaning of the information that is contained in their own databases. Moreover, and this is the neat bit, they can now understand what is contained on every other database in the world. This is the true significance of liberated data.
So I can compile the world’s information in any way that suits my requirements. Great, so what are my takeaways. What do I say to my colleagues and customers and any others that may be interested?
Linked Data is going to change the Web, and is going to be as much of a seismic shift in how we use the Internet as the original introduction of HTML.
It will happen over a longer of period of time as there is so much more information to be processed.
Linked Data makes it very easy for computers to communicate with each other. Words, numbers and symbols will mean something to a computer.
Linked Data creates the possibility of a Web of Data. A Web where plans can be found in the pattern of links and the retrieved. A Web where projects can be mined and and pulled from the mass of information fully-formed. A Web with the possibility to find crucial patterns and connections that hitherto have lain undiscovered, which in turn can lead to new breakthroughs in how we ourselves see and interact with the world:
- To be able to work together seamlessly with greater efficiency and mutual understanding.
- To be able to collaborate and share information more effectively regardless of the scale of any project.
- To be able to see the world better because we can join the dots better.
Background Material
Probably the best place to start is with this presentation by Sir Tim Berners-Lee on Linked Data given at a conference. It is short and to the point and but gives a great overview of the field.
Technologies
We have listed some of the major contributing technologies and sources of development for Linked Data here.
There is no need to learn them all as most of this has been designed so computers are better able to understand each other and avoid the need for human intervention. But looking through them should give you an idea of the scope and scale of the Linked Data project.
RDF http://www.w3.org/TR/rdf-primer/ – “The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web.” A lingua franca for inter- and intra-computer communication. It describes what kind of information a piece of data is in such a way that another computer can understand it. Instead of data being a series of 1s and 0s, it now has the added idea of being a graph of interconnected objects with some meaning attached to these objects and the relationships between them.
URI http://en.wikipedia.org/wiki/Uniform_Resource_Identifier – “Uniform Resource Identifier.” A labeler for data somewhat like the way your page is labelled in the URL(Universal Resource Locator) that you see at the top of the browser. But different in that it also indicates within itself how the data should be acted upon. Things that are described in two or more different places can be linked together using common identifiers
OWL http://en.wikipedia.org/wiki/Web_Ontology_Language – “Web Ontology Language.” (OWL sounds better that WOL.) This project works how to define words for computers to understand. The same word in English can have the same meaning and humans are really good at working out what the correct interpretation should be. Interestingly, it is not that a computer can’t find a correct meaning for a given piece of date, the challenge is to overcome the wasteful time and energy in having to do the whole thing over and over again. http://en.wikipedia.org/wiki/Knowledge_representation
SPARQL http://www.w3.org/TR/rdf-sparql-query – “Simple Protocol and RDF Query Language.” This is the tool that is used to find and manipulate the RDF data. Something has to do it and it might as well sound glamorous.
SIOC http://www.w3.org/Submission/sioc-applications/ – “SIOC is designed to export information about the content and structure of online community websites in a machine-readable form.” Social media is about people reaching out to each other on the Web. This inevitably results in the creation of communities. This tool enables the convergence between organised communities and the Web.
DBpedia http://wiki.dbpedia.org/FacetedSearch?v=17jm Uses the little boxes of data that are featured on almost every wikipedia page and makes them more ‘askable’ friendly. That is, you can form a search question that would sound natural to human ears. Try out the samples on the page and ask a few questions of your own. It’s really quite cool and raises the question, why can’t all searches be like this?
insightful; useful; a good new direction for the micro serfs;
LikeLike
For Linked data to become commonplace/grow there needs to be some strong incentive for it to be “rolled out behind the scenes”.What will that incentive be?
LikeLike
What was your incentive to create a webpage in 1992?1) To share data in webpages2) Because your neighbor is doing it?What is your incentive to create Linked Data in 2010?1) To share data as data2) Because your neighbor is doing it? (New York Times, BBC, US gov, UK Gov, etc…)
LikeLike
Tom – nice explanation of the topic. One comment – I’m not sure it’s really helpful to say that computers can ‘understand the meaning’ of the data because it suggests to some people that something magic is going on, when it isn’t.However, the reality of Linked Data is still very powerful. It allows you to describe very precisely not only the data, but also the data *model* (how different parts of the data are related to each other). And that means that two people, or two computer systems, can exchange information more reliably and with less effort by the programmers.
LikeLike
Point taken, Bill. Will amend. Thanks.
LikeLike
The first big incentive is big players adopting the Linked Data technologies either in whole or in part. We already seeing this happen with Facebook and Open Graph. More developers will be able to take advantage of the Linked Data technologies to do new things or old things in new ways. The incentive will simply be to stay apace of change. The question is, perhaps, why aren’t things happening more quickly? One possible answer is that there’s more than three Petabytes of data http://j.mp/ajuhd6 out there with more being added every day. Fortunately, more and more of that new data (as well as previous data being updated) is Linked Data friendly. Therefore, all things being equal, adoption and change will happen more quickly as time passes.
LikeLike
works for the cia. why not for YOUR company.
LikeLike
It feels like we are on an data inception point – spending more time with data than with solving problems. The other day somebody asked me how to deal with a certain situation and where to get the relevant data to get a better picture. My answer: “Why don”t you just call the person?” – Hmm yeah that is easy….Pareto, the one with the 80/20 rule, said that to increase the quality of a product above the 80% level, increases cost exponentially and 100% quality is similar to trying to get to light speed. We are now moving towards the 81% level of data analysis. We slice and dice data for the analysis sake not for a problem solution reason.The real problem I see is that we look at data analysis and data leverage with the eyes of an engineer from the 90’s. I miss discussions like smart pattern recognition algorithms, context relevant relationship analysis, time related data aging processes…. Then and only then a humanoid would be able to leverage it – computer to computer communication is so 1985 😉
LikeLike
An ontology web language (OWL) is mentioned, but not the need for a computer ontology — a systemization of meaningful terms and information about them. A big problem with linked data is that different users are likely to use different terms for the same meaning or the same term for different meanings. For a computer to combine information from two different sources, the information needs to be described in the same way.If one system uses “Bank”, another “Financial_Institution”, and a third “FinancialInstitution”, a system without an ontology which specifies the relationship among these can not combine the data.A more interesting thing about DBpedia is that it provides unique labels for many classes of object and individual things. Multiple systems that use the DBpedia labels can have their data combined by third parties. Some other standards that many systems use have standard labels for relationships between two things, e.g. FOAF has relationships for a person’s name, homepage, people the person knows, and a lot more. Geonames has labels for millions of geographical places and for various geographical relationships. UMBEL has 20,000+ labels for classes derived from Cyc, providing a third name for concepts already in both Cyc and DBpedia, and possibly in other systems. Multiple business systems for electronic data interchange each have had their own sets of codes for concepts they deal with, overlapping labels from many of the above mentioned systemsThe use of multiple sets of labels with overlapping meaning raises the Tower of Babel problem which the use of standardized ontologies was meant to solve. Mappings can be created among the systems, but this slows down processing among systems which use different labeling systems.The use of a few interlinked standard ontologies is necessary for the promise of linked data to show its full potential. Any information promoting linked data should prominently emphasize this.
LikeLike