Sindice is a semantic web index, which allows you to access and leverage the “web of data”, which is the rapidly expanding number of websites which are semantically marked up, that is tagged with RDF, RDFa, Microformats or Microdata, tags which can be used to identify online content as belonging to different categories.
This week Sindice, in partnership with Hepp Research and Openlink Software, launched Sindice Ltd, a new startup which will manage Sindice’s intellectual property, and oversee the commercial drive of its products.
Giovanni Tummarello is the CEO of Sindice, which originated at the Digital Enterprise Research Institute in Galway, Ireland. He explains how the web of data will revolutionise online data management, and how it is, “set to explode”, in the coming months. Once it does, he enthuses, the web of data, “all becomes a big graph which one can join with a single query”.
“Semantic mark-up is basically a markup that you put on the page to express what you have on the page, so if you have the name of a movie, because you are discussing that movie in a blog article for example, you might want to tag the title of the movie, the director of the movie, whatever data makes that page recognisable to a search engine for exactly what it is, which is a page which talks specifically about that movie.
“In a regular search engine, it’s just the keywords that are being searched, so you’re looking for the title of the movie, which could be, “The Blue Tomato”, but there are all sorts of pages which can contain these two words, for all sorts of reasons. On the other hand, if you put a mark up saying that it is a movie, you will be stating that you are talking about a movie.”
As Giovanni points out, Sindice acts like a search engine of all the 270 million or so sites which currently have semantic markup, but its real utility is greater than that, “OK, you can put in a keyword and search it, it’s fine, but that’s not really the point”.
“Sindice is basically a search engine which is not just a search engine. Really it’s an infrastructure for leveraging all the web data out there. We have 270 million pages or so at the moment; they are not normal web pages, they are only web pages which have semantic markup on them. What Sindice does is it has a very powerful engine that can correlate information from one website to another.
“You can basically use the entire web as if it is your playground by merging information here and there. You can get the name of a movie from a page which is marked up, and the name of the movie can be looked up on Wikipedia, where you can see what the director is, and then go on Rotten Tomatoes and get the rating, and all together it can be queried with a single query which goes all over the web and returns the information all ready to be consumed to enhance websites, anywhere where you want content aggregated from multiple sources on the web.
The formation of the company is a sign that Sindice are ready to commercialise their technology. Sindice Ltd solidifies the “very important” partnerships with Hepp Research and Openlink Software, and also manages the intellectual property the partners now share, keeping it, “nice and tidy”, so that Sindice can seek further investment.
“There are two main markets we are pursuing, the first one is customised cloud hosted data spaces”, continues Giovanni.
“We’re going to be allowing people to have their own data spaces, we call them, and that comes for a price of course, it’s kind of data as a service.
“You want to have the data that comes from the web of data but you also want to have your own data, you also want to have your own correlation of the data. So you want subsets of Sindice data that need to be live and fresh, that need to be combined in way that you want, to solve your problems.”
Not yet available, this is, he says, “something that will basically appeal to anybody that has a website that they want to enhance with information coming from multiple websites at the same time. This is good for everybody, because the websites which are providing the information, they get traffic and they get links so it becomes a syndication network. That obviously has a value in terms of the possibility of sharing revenue and advertisement”.
Giovanni is confident that following the common approach taken by the three major search engines to create what they call, “a shared markup vocabulary”, the time is right for Sindice to capitalise on the expected flurry of markup activity in the near future.
“It’s exploding as we speak. Microsoft, Google and Yahoo! are telling everybody at the same time to do this! In search engine optimisation circles they are raving about this stuff, and everybody’s implementing it so there’s no alternative.
This means that there will be a lot of people who want to do services on top of this, so if there’s a market, it’s right now.”