Polecat: Mining for Meaning with MeaningMine

People now know that if you don’t want the world to know about something than don’t post it on the internet. (Sometimes sadly, but sometimes entertainingly, that particular message hasn’t quite reached everyone yet.)

For individuals it is quite possible to disappear from view online for most businesses this is neither a desirable or worthwhile option. After all, online is where the customers are. However, while a company can control its own postings it has no control over what others may say in response.

Leaving libel issues aside, these comments, whether they are hurtful or not, are a mine of invaluable data that, if thought about and acted upon correctly, could help the company shape a profitable future or enable a non-profit to have a more effective presence in ways that could not have been anticipated otherwise.

Data mining is a set of tools and techniques that enable individuals and organizations to find out how the data that they themselves are generating is being received, reacted to and given meaning by others. Correctly interpreted the results can show how a given activity or series of activities are being perceived by others.

This technology deployed in the fields of Business Information and Analytics is being found to be particularly useful and powerful. It can find the answers to the questions that any organisation that has an online presence has to ask itself on a consistent and regular basis. Such as:

Is what we are doing working?
Do people know we exist? One reason for the absence of sales may be the absence of knowledge about your product.
Do people care that we exist? Are customers seeing our product story as being relevant to them?
Is our offer appealing? Would people appreciate a bit more taste in the design and presentation?
Do people like or dislike dealing with us?
Are people interested but just not quite enough? What woud it take to make that sale?

These are just a few of the thousands of questions to which a new breed of specialist companies coming into existence in the big data space are able to retrieve meaningful answers. They have the technological capability to measure customer sentiment from many sources including the social media channels and be able to provide long-form analytics for business insights.

One such company is Polecat who are based in Dublin, Ireland. They have an R&D team in Bristol, in the UK, and an analytics team San Francisco, in the US.

According to John Peavoy, the Head of Sales at Polecat, “The company has done a lot of research into linguistics, machine learning and search algorithms. The team out of Bristol have created an engine that provides very relevant results to any search terms that you provide.”

The information acquisition platform is a program called MeaningMine which provides visualizations and graphs that can describe the health of a conversation: Key topics, key phrases, sentiment, magic quadrants around influencers and their key roles in a conversation.

“These are tools that enterprise will, typically, find very valuable as a briefing mechanism for the broader organization by informing decisions on how to engage with stakeholders on any particular subject.

“MeaningMine is a browser-based tool that allows you to enter various search terms, manage them, filter them and iterate and immediately see the results of any term you include in the research.”

John goes on to describe the MeaningMine interface, (screenshot above,) “You have a Google-like interface on the left. You have six standard visualizations or graphs, which are all customizable, on the right sid of the browser page. As you enter more terms into your search you can see the effect it has on either increasing or reducing your results. It makes the results more representative of what you are trying to find out.

“We have done a lot of work with some industry specific taxonomies around the energy sector, the financial services sector and some of the government sectors such as tourism.”

But this just the start. According to John, “We are also continuing to add visualizations. At the moment we have visualization around; top number of citations, the health of a particular conversation which is like an advanced sentiment analysis, top organizations, top people and top phrases and words.

“The next level of visualization, the more advanced visualizations which we will be introducing over the next quarter will include a force by sentiment chart — a graph that shows both positive and negative sentiment along a time axis. It provides a very strong snap shot of how a conversation is evolving or has evolved. It allows you to identify where you may need to engage with the stakeholders.”

Gartner estimated the business intelligence and analyis market to be worth over $10 billion in 2010. This is data derived from the software sales of big players such as SAP, Oracle, IBM etc. Another guesstimate, based on a broader base of platforms and tools, suggests the market could already have been worth as much as $50 billion two years ago.

Either way, the indicators show that the business intelligence, analytics and data-mining market is very much in the boom stage at the moment.

“The Irish organization has grown from two people at the end of last year to nine right now. We may be up to fifteen by the end of 2012. We are well on target to hitting thirty people in the Dublin office within three years if not sooner.”

Data Mining: Using Predictive Analysis and Social Network Analysis

Data mining is the extraction of information from raw data. It describes the attempt to find hidden patterns within the data and determine what they might mean. Eric Robson leads the Data Mining and Social Networks Analysis Group at the TSSG which is based at the Waterford Institute of Technology in Ireland. It is a small group mainly concentrated on the commercialization of research.

They look at data in two main ways; predictive analysis and social network analysis. The general approach in predictive analysis is in the classification and grouping of users, customers or subscribers.

These tools are tend to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.

Eric explains further, “For instance, a large supermarket has many thousands of customers and many thousands of products to sell. Usually each customer is tracked via their charge card or their club card and we are able to see return visits.

“On day one, a customer might buy bread and some butter. On day three, they buy some more bread but it might not be until day fourteen that they need to buy some more butter. From this simple example we can see how a trend or a purchasing pattern can be determined.”

In telecoms, another area where there are a great many users and frequent and variable engagement with the services or products provided customers are profiled in terms of their usage.

“Once we have a profile we can see when things go wrong or are not working as they should be. It can help us with fault detection or fraud detection.

“One of the major security risks is SIM-cloning. Where someone can get hold of your SIM card, clone it, and then make calls using your account. Suddenly on your bill you see a whole lot of calls going out to countries you never actually called.”

Knowing what constitutes a normal pattern of behaviour for a given customer allows the system to alert its administrators of unusual or anomalous activity.

In a previous article we gave a brief overview of the social network analysis. We described it as a way of measuring how we are connected. The Data Mining Group at the TSSG have an interest in making this technology more useful to the general business community.

“In social network analytics people are constantly passing messages to each other. From a marketing perspective we can look at who we should be targeting to send our viral message out to for further [propagation.] Who are the biggest distributors of content? It may not necessarily be commercial entities. It could be; bloggers, people with very active Facebook accounts, people with very active Twitter accounts.

“In terms of product, we can start identifying who are the key influencers. Say, I wanted to sell something like running shoes and this guy is a marathon runner and blogs about them. If we know that people listen to him then the running shoe manufacturer can start targeting this guy. ‘Here’s a free pair of running shoes. Tell us what you think of them.’ More importantly, ‘Tell the world what you think of them.’

“If someone is blogging about something we want to understand exactly what they are blogging about and what their opinion was on that subject. Did they like it or not like it? To extent did they like it or not like it?”

These tools end to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.

Eric says, “We decided to see if we could make it applicable to the SME, small and medium sized enterprise, market. We took these techniques and put them into our cloud based system.

“We will host the infrastructure and host the knowledge and techniques that people need and we will put it up as a pay as you go service.”

As yet, this service is not live but the Data Mining and Social Networks Analysis Group are still able to bring their knowledge and expertise to the marketplace. In an arrangement called an Innovation Partnership setup in conjunction with Enterprise Ireland they are working with a Dublin based company called Datafusion International. They write software for law enforcement agencies such as The Gardai – the Irish police service and Homeland Security in the US.

Each of these agencies has a number of data sources. They are able to access data from such things as the land registry to determine the owner of a property or the vehicle registry to determine ownership of a car, van or some other vehicle. Also, they have access to revenue records and court transcripts.

These are all discrete sources of data that the law enforcement agencies but the only problem is that they are all separately housed in their own departments such as the departments of justice, the departments of transport, the ports authorities and so on. However, there is no linkup between them.

Eric explains, “ What they said to us was, ‘We have all this data. We would like to try and link people together. We would like to see a social network map of everybody in the system.

“So, we took all that data and started discerning relationships between them. If two people had the same address we would put a link between them. If they came in on the same flight we would also be able to indicate that there was a link between them

“We are using a product called Gate from the University of Sheffield. It is a term extraction engine. We can look at any kind of news article or any piece of free text and it will parse that text. It will tokenize it and break it up into different parts of speech in terms of what’s a noun and what’s a verb. But more importantly it will identify the names mentioned in the article and who are they mentioned in relation with.”

“We don’t used linked data technology as yet but we do use fuzzy logic. The software is designed to be used by trained individuals within the various law enforcement agencies. Although the program can identify different persons or the same person in different places there will be a human presence involved in the process of checking and verifying identities.”