Forums, the Semantic Web, and SEO

Edit: A new, longer version of this post is available here.

I think we need to see some developments for forums and bulletin board software regarding the Semantic Web. Yahoo SearchMonkey and Google Rich Snippets are already indexing semantic content from the Web, and you can be sure that Bing will go the same way if recent developments are anything to go by.

Drupal is about to get the jump on vBulletin and phpBB by becoming one of the first forum (content) systems that will have semantic markup out of the box allowing search engines to know what is a post, a reply, a user / person, a topic, etc. And not just for the last 15 posts either a la RSS, but for all site content.

On October 19th, Dries Buytaert committed RDFa additions to Drupal 7 core into the code repository (Twitter announcement). RDFa is a microcontent format based on Semantic Web technologies, that allows people to embed semantic information in their web pages.

So I propose that when vBulletin 4 comes out, someone grabs the default set of templates, and marks them up with RDFa just as was done for Drupal.

Then, whoever wants to benefit from having semantic markup identifying the posts, replies, users, topics and forums on their multi-forum website can do so.

Related posts:

RDFa and search engines (phpBB SEO)

How To: Converting From A WordPress Blog To A vBulletin Forum

I decided after an interesting chat with Loren Feldman from 1938 Media earlier this year that this site should move from a blog to a forum. He had recently swapped from WordPress to vBulletin.

I’m a data hoarder, so I had to figure out how to migrate the data. It’s not simple, because although vBulletin has a WordPress importer, it’s for an old version of WordPress and it’s designed to work with the vBulletin Blog system (not for importing into a forum).

So, here are the steps I went through to get it to work.

.
.
.

Install Drupal 5.

Download and install WordPress Import and Trackback modules for Drupal 5. Enable these, along with the Blog and Forum modules.

Import the WXR file from your WordPress install.

Update all imported story or blog types to forum types by running:

update node set type="forum";

Create a container using the Forums administration panel. Create a forum within that container. Note the forum ID.

Run MySQL queries FOR EVERY NODE to insert term to node and forum to node links. You can use Excel to make a column with IDs and then write a simple formula to create the SQL queries below. Replace 78 with the forum ID created above and have a formula in Excel to replace 60 with the node ID. For example, =”insert into term_node (nid, tid) values (“&A1&”, 78);” and =”insert into forum (nid, vid, tid) values (“&A1&”, “&A1&”, 78);”

insert into term_node (nid, tid) values (60, 78);
insert into forum (nid, vid, tid) values (60, 60, 78);

(If you want to import your tags / categories into vBulletin tags, you may want to dump the term_data and term_node tables from Drupal and somehow import them into tag and tagthread respectively in vBulletin. I’ll give some tips later.)

Delete the imported WordPress category and tag vocabularies.

Install ImpEx on vBulletin and use the Drupal importer. It should “just work”.

.
.
.

If you want to maintain some commenter name on comments, you will also need to mysqldump the comments table from your Drupal database, and import it into your vBulletin database. (I had to do this as my databases were on different servers.)

e.g.

mysqldump -u drupaluser -p drupaldb comments > comments.sql
mysql -u vbuser -p vbdb < comments.sql

Then run:

update post, comments set post.username=comments.name where post.importpostid=comments.cid;
update post, comments set post.ipaddress=comments.hostname where post.importpostid=comments.cid;

to set the commenter name to a name instead of “Guest” (and optionally to preserve the IP address history).

.
.
.

If you wanted to import tags / categories, here are the queries once you’ve imported the term_data and term_node tables.

INSERT INTO tag ( tagid, tagtext, dateline ) SELECT tid, name, UNIX_TIMESTAMP( ) FROM term_data;
INSERT INTO tagthread ( tagid, threadid, userid, dateline ) SELECT term_node.tid, thread.threadid, thread.postuserid, UNIX_TIMESTAMP( ) FROM term_node, thread WHERE thread.importthreadid = term_node.nid;

You may want to write some query to update the thread table with the taglist, using the following query as inspiration:

SELECT threadid, GROUP_CONCAT(tagtext SEPARATOR ', ') FROM tag, tagthread WHERE tag.tagid = tagthread.tagid GROUP BY tagthread.threadid;

I just used an INSERT statement based on the above SELECT to populate a new table of IDs and text strings, and then used that to update the taglist column in the thread table.

Some of my (very) preliminary opinions on Google Wave

I was interviewed by Marie Boran from Silicon Republic recently for an interesting article she was writing entitled “Will Google Wave topple the e-mail status quo and change the way we work?“. I thought that maybe my longer answers may be of interest and am pasting them below.

As someone who is both behind Ireland’s biggest online community boards.ie and a researcher at DERI on the Semantic Web, are you excited about Google Wave?

Technically, I think it’s an exciting development – commercially, it obviously provides potential for others (Google included) to set up a competing service to us (!), but I think what is good is the way it has been shown that Google Wave can integrate with existing platforms. For example, there’s a nice demo showing how Google Wave plus MediaWiki (the software that powers the Wikipedia) can be used to help editors who are simultaneously editing a wiki page. If it can be done for wikis, it could aid with lots of things relevant to online communities like boards.ie. For example, moderators could see what other moderators are online at the same time, communicate on issues such as troublesome users, posts with questionable content, and then avoid stepping on each other’s toes when dealing with issues.

Does it potential for collaborative research projects? Or is it heavyweight/serious enough?

I think it has some potential when combined with other tools that people are using already. There’s an example from SAP of Google Wave being integrated with a business process modelling application. People always seem to step back to e-mail for doing various research actions. While wikis and the like can be useful tools for quickly drafting research ideas, papers, projects, etc., there is that element of not knowing who is doing stuff at the same time as you. Just as people are using Gtalk to augment Gmail by being able to communicate in contacts in real-time when browsing e-mails, Google Wave could potentially be integrated with other platforms such as collaborative work environments, document sharing systems, etc. It may not be heavyweight enough on its own but at least it can augment what we already use.

Where does Google Wave sit in terms of the development of the Semantic Web?

I think it could be a huge source of data for the Semantic Web. What we find with various social and collaborative platforms is that people are voluntarily creating lots of useful related data about various objects (people, events, hobbies, organisations) and having a more real-time approach to creating content collaboratively will only make that source of data bigger and hopefully more interlinked. I’d hope that data from Google Wave can be made available using technologies such as SIOC from DERI, NUI Galway and the Online Presence Ontology (something we are also working on).

If we are to use Google Wave to pull in feeds from all over the Web will both RSS and widgets become sexy again?

I haven’t seen the example of Wave pulling in feeds, but in theory, what I could imagine is that real-time updating of information from various sources could allow that stream of current information to be updated, commented upon and forwarded to various other Waves in a very dynamic way. We’ve seen how Twitter has already provided some new life for RSS feeds in terms of services like Twitterfeed automatically pushing RSS updates to Twitter, and this results in some significant amounts of rebroadcasting of that content via retweets etc.

Certainly, one of the big things about Wave is its integration of various third-party widgets, and I think once it is fully launched we will see lots of cool applications building on the APIs that they provide. There’s been a few basic demonstrator gadgets shown already like polls, board games and event planning, but it’ll be the third-party ones that make good use of the real-time collaboration that will probably be the most interesting, as there’ll be many more people with ideas compared to some internal developers.

Is Wave the first serious example of a communications platform that will only be as good as the third-party developers that contribute to it?

Not really. I think that title applies to many of the communications platforms we use on the Web. Facebook was a busy service but really took off once the user-contributable applications layer was added. Drupal was obviously the work of a core group of people but again the third-party contributions outweigh those of the few that made it.

We already have e-mail and IM combined in Gmail and Google Docs covers the collaborative element so people might be thinking ‘what is so new, groundbreaking or beneficial about Wave?’ What’s your opinion on this?

Perhaps the real-time editing and updating process. Often times, it’s difficult to go back in a conversation and add to or fix something you’ve said earlier. But it’s not just a matter of rewriting the past – you can also go back and see what people said before they made an update (“rewind the Wave”).

Is Google heading towards unified communications with Wave, and is it possible that it will combine Gmail, Wave and Google Voice in the future?

I guess Wave could be one portion of a UC suite but I think the Wave idea doesn’t encompass all of the parts…

Do you think Google is looking to pull in conversations the way FriendFeed, Facebook and Twitter does? If so, will it succeed?

Yes, certainly Google have had interests in this area with their acquisition of Jaiku some time back (everyone assumed this would lead to a competitor to Twitter; most recently they made the Jaiku engine available as open source). I am not sure if Google intends to make available a single entry point to all public waves that would rival Twitter or Facebook status updates, but if so, it could be a very powerful competitor.

Is it possible that Wave will become as widely used and ubiquitous as Gmail?

It will take some critical mass to get it going, integrating it into Gmail could be a good first step.

And finally – is the game changing in your opinion?

Certainly, we’ve moved from frequently updated blogs (every few hours/days) to more frequently updated microblogs (every few minutes/seconds) to being able to not just update in real-time but go back and easily add to / update what’s been said any time in the past. People want the freshest content, and this is another step towards not just providing content that is fresh now but a way of freshening the content we’ve made in the past.