I keep hearing the term Linked Data, but what does it mean?
More or less what it says. All the data on the Internet linked together.
And that is important to me because…?
Your company, like everyone else’s company has a number of separate processes going. Accounts, marketing, HR, government compliances, legal issues, transport considerations. The information that pertains to each process is stored for convenience in separate databases. These databases are associated with the various applications that are used to create them. SAP for accounts, various spreadsheet formats, and document pages.
All contain information vital to the running of your company. All contain information which is mutually inaccessible to each other.
It has worked so far because we have had the human workaround. If a report has to be written for the quarterly board meeting, then someone has to get the information from each of these ‘data silos’ separately and spend a goodly amount of time and energy on finessing the disparate contents into something understandable and useful to act as a basis for a fruitful discussion.
With Linked Data technology running on your system, you ask your system for the information you want in the format you want. The computer itself works out what is relevant and useful. Linked Data enables the various databases to talk to each other and work out what is needed.
Can you imagine if all the databases across the world were linked the sort of really interesting questions that it would be possible to ask with the expectancy of a realistic and relevant answer?
I thought it was already linked together?
The data, usually words, numbers or symbols are contained on a page, usually the webpage that we see before us in our browser. But it is these pages that are linked together. The data itself is all on its lonesome.
While we can get to the pages through searches, links or entering addresses in the location bar the same can’t be said for the data – the words, numbers and symbols represent objects.
In addition, most databases where information is stored are not linked. They are in effect separated ‘silos’ of information. Until recently, with or without the permission of the owners, it was not possible for the these data silos to ‘talk’ to each other.
The main idea of Linked Data is to develop the processes to make communication possible, by means of commonly-agreed protocols, between various datasets wherever they may be.
Isn’t what we have enough? It all seems to work well.
Well, it is a fantastic thing to be able to navigate the world of information going from page to page, but imagine how much more could be done if we could navigate the world of information using the data, the numbers and words on a given page. Imagine if we could instruct computers to combine these particles of information in such a way to produce useful and usable results from it, rather than the process we now have of finding stuff and having it presented to us but still leaving us with the requirement to do all the heavy lifting and join the dots ourselves.
But if I type a set of words into Google, I find a whole bunch of pages containing those words. So what?
Well, you have got yourself to a page with the words you were looking for is what’s so. But at present if you were say, planning a trip, you would have to visit separate sites for each stage of the journey. Airlines, ground transport, hotels, meeting points, etc. If the data was connected together you could put in a set of search terms, date, place, journey time, cost and see a number of options for the entire journey presented to you. That is a big timesaver.
How does that work?
Although Linked Data, or the original term, the Semantic Web, denotes one underlying principle of connectedness, Linked Data is in fact made up of a potpourri of technologies, each dealing with an aspect of bringing the Web of Data to life through individual but interrelated means.
For instance, we know most of the useful pages on the Web are not filled with capricious and random words and images, but are generally about something and are structured in a fairly standard way. Timetables look like timetables, blogs look like blogs, landing pages look like landing pages and so on. If they were to contain information that could help a computer distinguish one from another then that would save us all a lot of time and trouble when looking for something. Searching for events would return searches from pages structured to represent events and so on.
There is a list detailing some of the major technologies being implemented below with a brief outline of their purpose.
If this technology is so great then why isn’t it everywhere?
First of all, there is no killer application in the sense of a spreadsheet or a text document. Nor is there a single killer service like Facebook or RSS. The technology is constantly being rolled out behind the scenes.
This is primarily computer-to-computer technology. As it is becoming more widely available, new applications are and will be created to take advantage of the new ways of handling the huge amounts of data that will be liberated and it is those applications that we will see and use.
Secondly, reinventing the way the world handles data is no small thing, especially when there is so much of it. It has been estimated that the information contained in all the world’s databases has passed the three petabyte mark. A petabyte is a number with twenty-one zeroes after it. That is a lot of data to make useful and relevant.
Fortunately, increasing amounts of the new information that is being created now is being stored in databases in such a way as to make it meaningful and useful for Linked Data technologies.
Thirdly, research in and development in Linked Data processes is continually coming up with new and better ways of doing things. The technology is constantly being updated and improved so data can be stored and retrieved more efficiently.
So, it may not be everywhere at the moment but it will be one day. That will most likely be sooner rather than later.
What do you mean by meaningful and relevant and why is that important?
There are two main senses in the way we use the word meaning in everyday life. One is that if we say a thing has meaning then we are giving the thing significance, as opposed to something we don’t find significant and consequently has no meaning for us. You can see right away that this is a very individual way of looking at things. What is meaningful to one person is not necessarily meaningful to another. Two, we can also say something has meaning because we gain understanding from it. “Don’t touch the stove,” is a meaningful statement because we derive from the meaning of the statement the understanding that we may burn our fingers if we should go ahead and do so.
The use of the word meaning in the Linked Data space is very specific and refers to defining information, data, in such a way that it is understandable by computers. Up until now, words on a web page were understandable by us and findable by search engines, but they were meaningless to computers and thus rendered irrelevant.
By defining the words in such a way that they are understandable by machines, given meaning, the words become data that is relevant and useful.
Why is that important?
Because up until the advent of Linked Data technology, this page you are reading now on your screen needed you to make it meaningful and relevant. The words, numbers or symbols meant nothing at all to the massive processing power in the computer that is sending the information to your screen nor to all the servers around the world.
But now it does. Computers can now understand data instead of simply just recognising data. They understand that the data has meaning and because it has meaning they can link meaning with meaning in a meaningful way.
Since data now has meaning, computers can understand the meaning of the information that is contained in their own databases. Moreover, and this is the neat bit, they can now understand what is contained on every other database in the world. This is the true significance of liberated data.
So I can compile the world’s information in any way that suits my requirements. Great, so what are my takeaways. What do I say to my colleagues and customers and any others that may be interested?
Linked Data is going to change the Web, and is going to be as much of a seismic shift in how we use the Internet as the original introduction of HTML.
It will happen over a longer of period of time as there is so much more information to be processed.
Linked Data makes it very easy for computers to communicate with each other. Words, numbers and symbols will mean something to a computer.
Linked Data creates the possibility of a Web of Data. A Web where plans can be found in the pattern of links and the retrieved. A Web where projects can be mined and and pulled from the mass of information fully-formed. A Web with the possibility to find crucial patterns and connections that hitherto have lain undiscovered, which in turn can lead to new breakthroughs in how we ourselves see and interact with the world:
- To be able to work together seamlessly with greater efficiency and mutual understanding.
- To be able to collaborate and share information more effectively regardless of the scale of any project.
- To be able to see the world better because we can join the dots better.
We have listed some of the major contributing technologies and sources of development for Linked Data here.
There is no need to learn them all as most of this has been designed so computers are better able to understand each other and avoid the need for human intervention. But looking through them should give you an idea of the scope and scale of the Linked Data project.
RDF http://www.w3.org/TR/rdf-primer/ – “The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web.” A lingua franca for inter- and intra-computer communication. It describes what kind of information a piece of data is in such a way that another computer can understand it. Instead of data being a series of 1s and 0s, it now has the added idea of being a graph of interconnected objects with some meaning attached to these objects and the relationships between them.
URI http://en.wikipedia.org/wiki/Uniform_Resource_Identifier – “Uniform Resource Identifier.” A labeler for data somewhat like the way your page is labelled in the URL(Universal Resource Locator) that you see at the top of the browser. But different in that it also indicates within itself how the data should be acted upon. Things that are described in two or more different places can be linked together using common identifiers
OWL http://en.wikipedia.org/wiki/Web_Ontology_Language – “Web Ontology Language.” (OWL sounds better that WOL.) This project works how to define words for computers to understand. The same word in English can have the same meaning and humans are really good at working out what the correct interpretation should be. Interestingly, it is not that a computer can’t find a correct meaning for a given piece of date, the challenge is to overcome the wasteful time and energy in having to do the whole thing over and over again. http://en.wikipedia.org/wiki/Knowledge_representation
SPARQL http://www.w3.org/TR/rdf-sparql-query – “Simple Protocol and RDF Query Language.” This is the tool that is used to find and manipulate the RDF data. Something has to do it and it might as well sound glamorous.
SIOC http://www.w3.org/Submission/sioc-applications/ – “SIOC is designed to export information about the content and structure of online community websites in a machine-readable form.” Social media is about people reaching out to each other on the Web. This inevitably results in the creation of communities. This tool enables the convergence between organised communities and the Web.
DBpedia http://wiki.dbpedia.org/FacetedSearch?v=17jm Uses the little boxes of data that are featured on almost every wikipedia page and makes them more ‘askable’ friendly. That is, you can form a search question that would sound natural to human ears. Try out the samples on the page and ask a few questions of your own. It’s really quite cool and raises the question, why can’t all searches be like this?