Data mining is the extraction of information from raw data. It describes the attempt to find hidden patterns within the data and determine what they might mean. Eric Robson leads the Data Mining and Social Networks Analysis Group at the TSSG which is based at the Waterford Institute of Technology in Ireland. It is a small group mainly concentrated on the commercialization of research.
They look at data in two main ways; predictive analysis and social network analysis. The general approach in predictive analysis is in the classification and grouping of users, customers or subscribers.
These tools are tend to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.
Eric explains further, “For instance, a large supermarket has many thousands of customers and many thousands of products to sell. Usually each customer is tracked via their charge card or their club card and we are able to see return visits.
“On day one, a customer might buy bread and some butter. On day three, they buy some more bread but it might not be until day fourteen that they need to buy some more butter. From this simple example we can see how a trend or a purchasing pattern can be determined.”
In telecoms, another area where there are a great many users and frequent and variable engagement with the services or products provided customers are profiled in terms of their usage.
“Once we have a profile we can see when things go wrong or are not working as they should be. It can help us with fault detection or fraud detection.
“One of the major security risks is SIM-cloning. Where someone can get hold of your SIM card, clone it, and then make calls using your account. Suddenly on your bill you see a whole lot of calls going out to countries you never actually called.”
Knowing what constitutes a normal pattern of behaviour for a given customer allows the system to alert its administrators of unusual or anomalous activity.
In a previous article we gave a brief overview of the social network analysis. We described it as a way of measuring how we are connected. The Data Mining Group at the TSSG have an interest in making this technology more useful to the general business community.
“In social network analytics people are constantly passing messages to each other. From a marketing perspective we can look at who we should be targeting to send our viral message out to for further [propagation.] Who are the biggest distributors of content? It may not necessarily be commercial entities. It could be; bloggers, people with very active Facebook accounts, people with very active Twitter accounts.
“In terms of product, we can start identifying who are the key influencers. Say, I wanted to sell something like running shoes and this guy is a marathon runner and blogs about them. If we know that people listen to him then the running shoe manufacturer can start targeting this guy. ‘Here’s a free pair of running shoes. Tell us what you think of them.’ More importantly, ‘Tell the world what you think of them.’
“If someone is blogging about something we want to understand exactly what they are blogging about and what their opinion was on that subject. Did they like it or not like it? To extent did they like it or not like it?”
These tools end to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.
Eric says, “We decided to see if we could make it applicable to the SME, small and medium sized enterprise, market. We took these techniques and put them into our cloud based system.
“We will host the infrastructure and host the knowledge and techniques that people need and we will put it up as a pay as you go service.”
As yet, this service is not live but the Data Mining and Social Networks Analysis Group are still able to bring their knowledge and expertise to the marketplace. In an arrangement called an Innovation Partnership setup in conjunction with Enterprise Ireland they are working with a Dublin based company called Datafusion International. They write software for law enforcement agencies such as The Gardai – the Irish police service and Homeland Security in the US.
Each of these agencies has a number of data sources. They are able to access data from such things as the land registry to determine the owner of a property or the vehicle registry to determine ownership of a car, van or some other vehicle. Also, they have access to revenue records and court transcripts.
These are all discrete sources of data that the law enforcement agencies but the only problem is that they are all separately housed in their own departments such as the departments of justice, the departments of transport, the ports authorities and so on. However, there is no linkup between them.
Eric explains, “ What they said to us was, ‘We have all this data. We would like to try and link people together. We would like to see a social network map of everybody in the system.
“So, we took all that data and started discerning relationships between them. If two people had the same address we would put a link between them. If they came in on the same flight we would also be able to indicate that there was a link between them
“We are using a product called Gate from the University of Sheffield. It is a term extraction engine. We can look at any kind of news article or any piece of free text and it will parse that text. It will tokenize it and break it up into different parts of speech in terms of what’s a noun and what’s a verb. But more importantly it will identify the names mentioned in the article and who are they mentioned in relation with.”
“We don’t used linked data technology as yet but we do use fuzzy logic. The software is designed to be used by trained individuals within the various law enforcement agencies. Although the program can identify different persons or the same person in different places there will be a human presence involved in the process of checking and verifying identities.”