Web 2.0 Expo Tokyo: Scott Dietzen – “The impact of ’software as a service’ and Web 2.0 on the software industry”

Scott Dietzen, president and CTO of Zimbra, gave the third talk last Friday at Web 2.0 Expo Tokyo. Zimbra has been on the go for four years (so they are Web 2.0 pioneers). Scott’s aim for this talk was to share the experience of having one of the largest AJAX-based web applications (thousands of lines of JavaScript code). Since their status has changed since they originally signed up for the conference, he mentioned that Yahoo! are the new owners of Zimbra. But Scott affirmed that Zimbra will remain open source and committed to their partners and customers who have brought Zimbra to where it is.

Web 1.0 started off for consumers, but began to change the ways in which businesses used technology. A handful of technologies allowed us to create a decade of exciting innovations. With Web 2.0, all of us have become participants, often without realising the part we play on the Web – clicking on a search result, uploading a video or social network page – all of this contributes to and changes this Web 2.0 infrastructure. This has enabled phenomenons like Yahoo! Flickr and open source, where a small group of people get together, put a basic model forward, and then let it loose. As a result of many contributions from around the world, we now get these phenomena. There are 11,000 participants in the Zimbra open source community, vastly more than the personpower Zimbra or Yahoo! could put into the project.

Mashups may be the single best thing for Zimbra. AJAX has won over much of the Internet because websites have voted with their feet, and according to Scott “it actually works”. Scott was formerly part of the WebLogic team, and one of that team said recently that there was a special place in heaven for whoever in Zimbra had the patience to get all of that JavaScript programming working properly. There are currently 50 or 60 AJAX development toolkits, but Scott hopes that the industry can rally around a smaller number, especially open-source technologies which offer long-term portability across all the leading platforms.

Another issue is that browsers weren’t initially designed for this kind of “punishment”, so it’s taken time for browsers to become solid productive AJAX containers. They can still do better, and Scott said he is excited to see the emergence of JIT-compilation technology that will allow browsers to operate at least two to three times faster.

With Zimbra, there is no caching of user data within the client. So in a public kiosk, there will be no security leaks under AJAX. The challenge is that the source code is now available to anyone with a web browser. It is crucial to protect against ever executing any JS code that is external to your application. For the first time, we have a universal user interface that is connected to and allows one to mix and match various UIs together: Scott reckons we’ve only just begun to touch the surface of what can be done with mashups.

There are four techniques to speeding up AJAX applications. Firstly, combine stuff together where possible. Then, compress the pages to shrink the required bandwidth for smaller pipes. Next is caching, to avoid having to re-get the JS and re-interpret it (in Zimbra, they include dates for when the JS files were last updated). The last and best technique is “lazy loading”. Zimbra is a very large JS application in one page. By breaking it up into several modules that can be loaded on demand, one can reduce the time from when you can first see and start using the application.

Offline AJAX is a fundamental change but offers many opportunities. You can have the web experience when on a flight or when far away from the data centre that you normally use. Zimbra is faster to use as an offline version while synchronising back to California, rather than having to wait for every operation to cross the ocean and back again. For Zimbra, they took the Java server code and produced a micro version to install on the local desktop. This allows one to preserve all the “smarts”, and make them available to desktop users. Offline isn’t for everything, for example, when data is so secure that it shouldn’t be cached on a PC, or if old data gets stale and no longer makes sense, etc. Also, you have to solve synchronisation issues. You can be changing your mailbox while on a plane, but new mail is arriving in the meantime and some reconciliation has to take place. And there is also the problem of desktop apps in general: once you have code, how do you upgrade it, etc.

In Web 1.0, UI logic, business logic, and data logic were all supposed to be separated. They could fix some (X)HTML and SQL schemas to aid with this, but in practice people didn’t modularise. In Web 2.0, there is an effort to have clearer separtions (due to mashups, feeds, etc.) between client / UI logic, server / business logic, and data logic. It’s better to modularise, and getting people to move towards a more modular architecture will allow you to get more value from your business applications, and will also allow you to “play in this Web 2.0 world”. In relation to SOA, Scott said that we are perhaps moving from something like ISO, where there’s one big document with a 10,000 page specification, to something almost as problematic, where there are one page specs for 10,000 or more web services. There is a well known theory that you can’t start with something complex and expeect everyone to suddenly start using it.

He then focused on software as a service, or SAAS. SAAS is inherent in the Web, since the service is off somewere else when you open up a browser page. He also talked about the opportunities when businesses are put together in the same data centres. This results in multi-tenancy, and the ability or need to set up a single server farm with tens of thousands of companies’ data all intermixed in a secure way without compromising each other. There is a need to think about how to manage so many users together at once. This is usually achieved through a common class of service for all these users as a whole. Administation should be delegated, to some extent, an important aspect to get right. You may also may want to allow users to customise and extend their portion of the applications they are using if appropriate.

Scott next talked about convergence. E-mail has made great progress in becoming part of the web experience (Hotmail, GMail, Yahoo! Mail, etc.). The same thing is now happening to IM, to VoIP, to calendars, etc. For example, a presence indicator next to an e-mail inbox shows if each user is available for an IM or a phone call. Or the reverse: someone tries to call or IM you, but you can push back and say that you just want them to e-mail you because you’re not available right now. Being able to prioritise communications based on who your boss is, who your friends are, etc., is a crucial aspect of harnessing the power of these technologies. On voice, we want to be able to see our call logs, to use these to dial our phone, e.g., you want to just click on a person and call that person. You may also want to forward segments from that voice call over e-mail or IM.

In Japan, they have had compelling applications for mobile devices for a lot longer than the rest of the world. Scott showed a demonstration of the Zimbra experience on an iPhone. Scott finished by saying that everything about these new technologies has to be used right to make someone’s life better, and to make usage more compelling. Innovation, or how deeply we can think about how the future ought to look like, is very important.

Seiji Sato from Sumitomo, whose subsidiary Presidio STX invested in Zimbra last year, then spoke. He started by mentioning that over 100 corporations are now using Zimbra. Sumimoto hopes to contribute to “synergy effects” in Japan’s Web 2.0 community by mashing up services from various businesses and by providing the possibility to extend and utilise available company competencies.

To expand Zimbra’s usage in Japan, Sumitomo have been working with their associate company FeedPath and other Web 2.0 business, providing Zimbra localisation and organising a support structure both for this market and for early adopters. Sato said that although Sumitomo are not a vendor or manufacturer, they feel that the expansion of Web 2.0 is quite viable and very important.

After the talk I asked Scott if Zimbra would be looking at leveraging any widgets that will be developed under the much-hyped OpenSocial initiative within the Zimbra platform, since it seemed to me that there is a natural fit between the implicit social networking information being created within Zimbra and the various widgets that are starting to appear (and I guess since they are just in XHTML / JS, there’s a technology match at least). Scott told me that Zimbra already has around 150 plugins, and that the ability to layer mashups on top of this implicit social network information is certainly very important to them. He was unsure if OpenSocial widgets would fit to Zimbra since their e-mail platform is quite different from SNS platforms, but he did say [theoretically, I should add, as there are no plans to do so] that if such widgets were ported to work with Zimbra, they would probably require extensive testing and centralised approval rather than just letting anybody use whatever OpenSocial widget they wanted to within Zimbra.

Web 2.0 Expo Tokyo: Rie Yamanaka – “A paradigm shift in advertisement platforms: the move into a real Web 2.0 implementation phase”

The second talk at the Web 2.0 Expo Tokyo this morning was by Rie Yamanaka, a director with Yahoo!’s commercial search subsidiary Overture KK. (I realised after a few minutes of confusion that Ms. Yamanaka’s speech was being translated into English via portable audio devices.)

According to Yamanaka, Internet-based advertising can be classified into three categories: banners and rich media, list-type advertisements (which was the central topic of her presentation), and mobile advertising (i.e., a combination of banner and listings grouped onto the mobile platform).

First of all, she talked about advertisement lists. Ad lists are usually quite accurate in terms of targetting since they are shown and ranked based on a degree of relevance. Internet-based ads (when compared with TV, radio, etc.) are growing exponentially. This increase is primarily being driven by ad lists and mobile ads. In yesterday’s first keynote speech with Joi Ito, the discussion mentioned that the focus has already shifted a lot towards internet advertising in the US, perhaps more so than Japan, but that this is now occuring in Japan too.

She then talked about the difference between banners and ad lists. In the case of banner ads, what matters is the number of impressions, so the charge is based on CPM (cost per mille or thousand), and some people think of it as being very “Web 1.0”-like. However, ad lists, e.g., as shown in search results, are focussed more on CPC (cost per click), and are often associated with Web 2.0.

Four trends (with associated challenges) are quite important and are being discussed in the field of Internet advertising: the first is increased traceability (one can track and keep a log of who did what); the next is behavioural or attribute targetting, which is now being implemented in a quite fully-fledged manner; third are APIs that are now entering the field of advertisement; and finally (although its not “Web 2.0”-related in a pure sense) is the integration between offline and online media, where the move to search for information online is becoming prevalent.

  • With traceability, you can get a list of important keywords in searches that result in subsequent clicks. Search engine marketing can help to eliminate the loss of opportunities that may occur through missed clicks.
  • Behavioural targetting, based on a user’s history of search, can give advertisers a lot of useful information. One can use, for example, information on gender (i.e., static details) or location (i.e., dynamic details, perhaps from an IP address) for attribute-based targetting. This also provides personalised communication with the users, and one can then deploy very flexible products based on this. Yahoo! Japan recently announced details of attribute-based advertising for their search which combines an analysis of the log histories of users and advertisers.
  • As in yesterday’s talk about Salesforce working with Google, APIs for advertising should be combined with core business flows, especially when a company provides many products, e.g. Amazon.com or travel services. For a large online retailer, you could have some logic that will match a keyword to the current inventory, and the system should hide certain keywords if associated items are not in stock. This is also important in the hospitality sector, where for example there should be a change in the price of a product when it goes past a best-before time or date (e.g., hotel rooms drop in price after 9 PM). With an API, one can provide very optimised ads that cannot be created on-the-fly by humans. Advertisers can take a scientific approach to dynamically improve offerings in terms of cost and sales.
  • Matching online information to offline ads, while not directly related to Web 2.0, is important too. If one looks at TV campaigns, one can analyse information about how advertising the URL for a particular brand can lead to the associated website. Some people may only visit a site after seeing an offline advertisement, so there could be a distinct message sent to these types of users.

In terms of metrics, traditionally internet-based ads have been classified in terms of what you want to achieve. In cases where banners are the main avenue required by advertisiers, CPM is important (if advertising a film, for example, the volume of ads displayed is important). On the other hand, if you actually want to get your full web page up on the screen, ranking and CPC is important, so the fields of SEO and SEM come into play.

Ms. Yamanaka then talked about CPA (cost per acquisition), i.e., how much it costs to acquire a customer. The greatest challenge in the world of advertising is figuring out how much [extra] a company makes as a result of advertising (based on what form of campaign is used). If one can try and figure out a way to link sales to ads, e.g., through internet conversion where a person moves onwards from an ad and makes a purchase), then one can get a measure of the CPA. For companies who are not doing business on the Web, its hard to link a sale to an ad (e.g., if someone wants to buy a Lexus, and reads reference material on the Web, he or she may then go off and buy a BMW without any traceable link). On the Web, can get a traceable link from an ad impression to an eventual deal or transaction (through clicking on something, browsing, getting a lead, and finding a prospect).

She explained that we have to understand why we are inviting customers who watch TV onto the Web: is it for government information, selling products, etc. The purpose of a 30-second advert may actually be to guide someone to a website where they will read stuff online for more than five minutes. With tracebility, one can compare targetted results and what a customer did depending on whether they came from an offline reference (she didn’t say it but I presume through a unique URL) or directly online. Web 2.0 is about personalisation, and targeting internet-based ads towards segmented usergroups is also important (e.g., using mobile or PC-based social network advertising for female teens in Tokyo; for salarymen travelling between Tokyo and Osaka, it may be better to use ad lists or SMS advertising on mobiles or some other format; and for people at home, it may be appropriate to have a TV ad at a key time at night where there’s a high probability of them going and carrying out a web search for the associated product), and so there’s a need to find the best format and media.

She again talked about creating better synergies between offline and online marketing (e.g., between a TV-based ad and an internet-based ad). If a TV ad shows a web address, it can result in nearly 2.5 times more accesses than can be directly obtained via the Internet (depending on the type of products being advertised), so one can attract a lot more people to a website in this way. Combining TV and magazines, advertisers can prod / nudge / guide customers to visit their websites. There is still a lot of room for improvement in determining how exactly to guide people to the Web. It depends on what a customer should get from a company, as this will determine the type of information to be sent over the Web and whether giving a good user experience is important (since you don’t want to betray the expectation of users and what they are looking for). Those in charge of brands for websites need to understand how people are getting to a particular web page as there are so many different entry points to a site.

Ms. Yamanaka referenced an interesting report from comScore about those who pre-shop on the Web spending more in a store. These pre-shoppers spend 41% more in a real store if they have seen internet-based ads for a product (and for every $1 these people spent online, they would spend an incremental $6 in-store).

There’s also a paradigm shift occuring in terms of ubiquitous computing, which is already a common phenomenon here in Japan. At the end of her presentation, she also referenced something called “closed-loop marketing” which I didn’t really get. But I did learn quite a bit about online advertising from this talk.

Web 2.0 Expo Tokyo: Evan Williams, co-founder of Twitter – “In conversation with Tim O’Reilly”

The first talk of the day was a conversation between Tim O’Reilly and Evan Williams.

Evan started off by forming a company in his home state of Nebraska, then moved to work for O’Reilly Media for nine months but says he never liked working for other people. A little later on he formed Pyra, which after a year had Blogger as its main focus in 1999. They ran out of money in the dot com bust, had some dark times and he had to lay off a team of seven in 2000. He continued to keep it alive for another year and built it back up. Then Evan started talks with Google and sold Blogger to them in 2003, continuing to run Blogger at Google for two years. He eventually left Google anyway, says that it was partially because of his own personality (working for others), and also because within Google Blogger was a small fish in a big pond. Part of the reason for selling to Google in the first place was that they had respect for them, it was a good working environment, and also they would be providing a stable platform for Blogger to grow (eventually without Evan). But in the end, he felt that he’d be happier and more effective outside Google.

So he then went on to start Odeo at Obvious Corp. Because of timing and the fact that they got a lot of attention, they raised a lot of money very easily. He ran Odeo as it was for a year and a half. With Jack Dorsey at Odeo / Obvious, they began the Twitter project. Eventually Evan bought out his investors when he realised Odeo had possibly gotten it wrong as it just didn’t feel right in its current state.

Tim asked Evan what is Twitter and what Web 2.0 trends does it show off? Evan says its a simple service described by many as microblogging (a single Twitter message is called a tweet). That is, blogging based on very short updates with the focus on real-time information, “what are you doing?” Those who are interested in what someone is doing can receive updates on the Web or on their mobile. Some people call it “lifestreaming”, according to Tim. Others think it’s just lots of mundane, trivial stuff, e.g. “having toast for breakfast”. Why it’s interesting isn’t so much because the content is interesting but rather because you want to find out what someone is doing. Evan gave an example of when a colleague was pulling up dusty carpets in his house, he got a tweet from Evan saying “wine tasting in Napa”, so that its almost a vision of an “alternative now”. Through Twitter, you can know very minute things about someone’s life: what you’re thinking, that you’re tired, etc. Historically, we have only known that kind of information for a very few people that you are close to (or celebrities!).

The next question from Tim was how do you design a service that starts off as fun but becomes really useful? A lot of people’s first reaction in relation to Twitter is “why would I do that”. But then people try it and find lots of other uses. It’s much the same motivation (personal expression and social connection) as other applications like blogging, according to Evan. A lot of it comes from the first users of the application. As an example, Twitter didn’t have a system allowing people to comment, so the users invented one by using the @ sign and a username (e.g., @ev) to comment on other people’s tweets (and that convention has now spread to blog comments). People are using it for conversation in ways that weren’t expected. [Personal rant here, in that I find the Twitter comment tracking system to be quite poor. If I check my Twitter replies, and look at what someone has supposedly replied to, it’s inaccurate simply because there is no direct link between a microblog post and a reply. It seems to assume by default that the recipient’s “previous tweet by time” is what a tweet sender is referring to, even when they aren’t referring to anything at all but rather are just beginning a new thread of discussion with someone else using the @ convention.]

Tim said that the team did a lot for Twitter in terms of usability, by offering an API that enabled services like Twittervision. Evan said that their API has been suprisingly successful, and there are at least a dozen desktop applications, others that extract data and present it in different ways, various bots that post information to Twitter (URLs, news, weather, etc.), and more recently a timer application that will send a message at a certain time period in the future for reminders (e.g., via the SMS gateway). The key thing with the API is to build a simple service and make it reusable to other applications.

Right now, Twitter doesn’t have a business model: a luxury at this time, since money is plentiful. At some point, Tim said they may have to be acquired by someone who sees a model or feels that they need this feature as part of their offering. Evan said they are going to explore this very soon, but right now they are focussed on building value. A real-time communication network used by millions of people multiple times a day is very valuable, but there is quite a bit of commercial use of Twitter, e.g., Woot (the single special offer item per day site) have a lot of followers on Twitter. It may be in the future that “for this class of use, you have to pay, but for everyone else it’s free”.

20% of Twitter users are in Japan, but they haven’t internationalised the application apart from having double-byte support. Evan says they want to do more, but they are still a small team.

Tim then asked how important is it to have rapid application development for systems like Twitter (which is based on Ruby on Rails)? Most Google’s applicationss are in Java, C++ and Python, and Evan came out of Google wanting to use a lightweight framework for such development since there’s a lot of trial and error in creating Web 2.0 applications. With Rails, there are challenges to scaling, and since Twitter is one of the largest Rails applications, there are a lot of problems that have yet to be solved. Twitter’s developers talk to 37 Signals a lot (and to other developers in the Rails community); incidentally, one of Twitter’s developers has Rails commit privileges.

Tim says there’s a close tie between open source software and Web 2.0. Apparently, it took two weeks to build the first functional prototype of Twitter. There is a huge change in development practice related to Web 2.0. A key part of Web 2.0 is a willingness to fail, since people may not like certain things in a prototype version. One can’t commit everything to a single proposition, but on the flip side, sometimes you many need to persist (e.g., in the case of Blogger, if you believe in your creation and it seems that people like it).

So, that was it. It was an interesting talk, giving an insight into the experiences of a serial Web 2.0 entpreneur (of four, or was it five, companies). I didn’t learn anything new about Twitter itself or about what they hope to add to their service in the future (apart from the aformentioned commercial opportunities), but it’s great to have people like Evan who seem to have an intuitive grasp on what people find useful in Web 2.0 applications.

Talk by Barney Pell at ISWC 2007, CTO of Powerset

Barney Pell gave the opening talk of the day at ISWC this morning. Barney is former CEO, now CTO of natural language search company Powerset.

He talked about how natural language (NL) helps the Semantic Web (SW), especially both sides of the chicken-and-egg problem (the chicken AND the egg). On one side, annotations can be created from unstructured text, and ontologies can be generated, mapped and linked. On the other side, NL search can consume SW information, and can expose SW services in response to NL queries.

The goal of Powerset is to enable people to interact with information and services as naturally and effectively as possible, by combining NL and scalable search technology. Natural language search interprets the Web, indexes it, interprets queries, searches and matches.

Historically, search has matched query intents with document intents, and a change in the document model has driven the latest innovations. The first is proximity: there’s been a shift from documents being a “bag of keywords” to becoming a “vector of keywords”. The second is in relation to anchor text: adding off-page text to search is next.

Documents are loaded with linguistic structure that is mostly discarded and ignored (due to cost and complexity), but it has immense value. A document’s intent is actually encoded in this linguistic structure. Powerset’s semantic indexer extracts meaning from the linguistic structure, and Barney believes that they are just at the start of exciting times in this area.

Converging trends that are enabling this NL search are language technologies, lexical and ontological knowledge resources, Moore’s law, open-source software, and commodity computing.

Powerset integrates diverse resources, e.g. websites, newsfeeds, blogs, archives, metadata (“MetaSearch”), video, and podcasts. It can also do real-time queries to databases, where an NL query is converted into a database query. Barney maintains that results from databases drive further engagement.

He then gave some demos of Powerset. With the example “Sir Edward Heath died from pneumonia”, Barney showed how Powerset parses each sentence; extracts entities and semantic relationships, identifies and expands these to similar entities, relationships and abstractions; and indexes multiple facts for each sentence. He showed an interesting demonstration where multiple queries on the same topic to Powerset retrieve the same “facts”. The information on the various entities or relationships can come from multiple sources, e.g. information on Edward Heath or Deng Xiaoping is from Freebase and details on pneumonia comes from WordNet.

20071114a.png He gave an example of the search query “Who said something about WMDs?”. This is difficult to express using keyword search: to express that someone “said something” and that it is also about weapons of mass destruction. Barney also showed a parse for the famous wrestler / actor Hulk Hogan, with all the relations or “connections” to him (e.g., defeat) and the subjects or “things” that he is related to (e.g., André the Giant).

Powerset’s language technologies are the result of commercialising the XLE work from PARC, leveraging their “multidimensional, multilingual architecture produced from long-term research”. Some of their main challenges are in the areas of scalability, systems integration, incorporating various data and knowledge resources, and enriching the user experience.

He next talked about accelerating the SW ecosystem. Barney said that the wisdom of crowds can help to accelerate the Semantic Web. What starts as a broad platform gets deeper faster when it gets deployed at a large scale, realising a Semantic Web faster than expected. This drive comes from four types of people:

  • The first category is publishers, who upload their ontologies to get more traffic, and can get feedback to help with improving their content.
  • Users are the next group, as they will “play games” to create and improve resources, will provide feedback to get better search, and will create (lightweight, simple) ontologies for personalisation and organising their own groups.
  • There are also developers, who can package knowledge for specialised applications (e.g., for vertical search).
  • Finally, advertisers will want to create and upload ontologies to express all the things that should match their commercial offerings.

For the community, Powerset will provide various APIs and will give access to their technologies to build mashups and other applications. Powerset’s other community contributions are in the form of datasets, annotations, and open-source software.

Their commercial model is in relation to advertising (like most search engines) and licensing their technologies to other companies or search engines. Another related company (a friend of Barney’s) is [true Knowledge]™.

I’m still waiting for my Powerset Labs account to be approved; looking forward to getting in there and trying it out myself. Thanks to Barney for the great talk.

Brewster Kahle’s (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) – that’s 1 million hours of TV – but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative “OpenLibrary.org“, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). OpenLibrary.org has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.

“Product 2.0” competition on the Web 2.0 Expo blog…

The Web 2.0 Expo blog recently launched a “Product 2.0” competition for free tickets to any of the forthcoming Web 2.0 Expo events. Here are my entries:

20071105a.png Broc.co.li is the healthy green vegetable you will WANT to eat! Our Broc.co.li heads contain powerful nanites that can target the mouth’s taste buds and provide a variety of flavours including chocolate, ice cream and steak. (Only available through mass quantity pre-order at our website http://broc.co.li where you can view the most popular flavours and suppliers in your area.)

20071105b.png Are you tired of looking at your own boring life through your own eyes or a battered set of glasses? If so, then you should try out gla.ss.es 2.0, incorporating our new “Social Spectating” (TM) functionality. Simply put on the glasses, select the friends menu using the iris pinpoint focuser, and choose a friend to switch to their glasses view. Don’t worry, you can still interact in your own world if you want to – your friend’s world can be shown in various levels of transparency (up to fully opaque) as per your preferences. Visit our website at http://gla.ss.es where you can choose from a range of frames and lens sizes.

20071105c.png The stapler has been with us since the 1700s, but never before has anyone attempted to make use of the social connections that are formed through this tiny metal object. Now, with staplr (beta), your staple connects you via wireless RFID technology to whoever is currently in the possession of the stapled documents. You can view a mashup of your staples on our site showing where they are around the world and who you are now connected to! Never has stapling been so interesting. Signup now at http://staplr.com/ and we will bind you…

Eco Treadsetters: Yokohama Tyre Corp. gets ‘traction’ with Pringo SNS

20071101a.png In the niche SNS area, Yokohama Tyre Corporation recently implemented the Pringo white-label SNS solution for their “online presence and green marketing initiative” called Eco Treadsetters.

LA-based marketing agency PCGCampbell built the site using Pringo so that Yokohama consumers across the US could communicate with each other, create and navigate custom profiles, form sub-communities, and submit their environmental projects via the Eco Treadsetters site.

Gary Hall, president of Pringo, claims: “As the only provider of both hosted and non-hosted social media solutions, while also offering a complete Internet consultancy, Pringo is a preferred choice among businesses looking to strengthen online branding.” Pringo also provides a third-party tracking system and traffic monetisation features through their service. Their customer base includes media outlets, corporations and marketing agencies such as 11on11.com, CBS Radio and ePharma.