Data Mining: Using Predictive Analysis and Social Network Analysis

Data mining is the extraction of information from raw data. It describes the attempt to find hidden patterns within the data and determine what they might mean. Eric Robson leads the Data Mining and Social Networks Analysis Group at the TSSG which is based at the Waterford Institute of Technology in Ireland. It is a small group mainly concentrated on the commercialization of research.

They look at data in two main ways; predictive analysis and social network analysis. The general approach in predictive analysis is in the classification and grouping of users, customers or subscribers.

These tools are tend to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.

Eric explains further, “For instance, a large supermarket has many thousands of customers and many thousands of products to sell. Usually each customer is tracked via their charge card or their club card and we are able to see return visits.

“On day one, a customer might buy bread and some butter. On day three, they buy some more bread but it might not be until day fourteen that they need to buy some more butter. From this simple example we can see how a trend or a purchasing pattern can be determined.”

In telecoms, another area where there are a great many users and frequent and variable engagement with the services or products provided customers are profiled in terms of their usage.

“Once we have a profile we can see when things go wrong or are not working as they should be. It can help us with fault detection or fraud detection.

“One of the major security risks is SIM-cloning. Where someone can get hold of your SIM card, clone it, and then make calls using your account. Suddenly on your bill you see a whole lot of calls going out to countries you never actually called.”

Knowing what constitutes a normal pattern of behaviour for a given customer allows the system to alert its administrators of unusual or anomalous activity.

In a previous article we gave a brief overview of the social network analysis. We described it as a way of measuring how we are connected. The Data Mining Group at the TSSG have an interest in making this technology more useful to the general business community.

“In social network analytics people are constantly passing messages to each other. From a marketing perspective we can look at who we should be targeting to send our viral message out to for further [propagation.] Who are the biggest distributors of content? It may not necessarily be commercial entities. It could be; bloggers, people with very active Facebook accounts, people with very active Twitter accounts.

“In terms of product, we can start identifying who are the key influencers. Say, I wanted to sell something like running shoes and this guy is a marathon runner and blogs about them. If we know that people listen to him then the running shoe manufacturer can start targeting this guy. ‘Here’s a free pair of running shoes. Tell us what you think of them.’ More importantly, ‘Tell the world what you think of them.’

“If someone is blogging about something we want to understand exactly what they are blogging about and what their opinion was on that subject. Did they like it or not like it? To extent did they like it or not like it?”

These tools end to be used by large organizations such as supermarket chains and telecoms operators. They are expensive in terms of software and hardware and expensive in terms of the people who need to operate these systems.

Eric says, “We decided to see if we could make it applicable to the SME, small and medium sized enterprise, market. We took these techniques and put them into our cloud based system.

“We will host the infrastructure and host the knowledge and techniques that people need and we will put it up as a pay as you go service.”

As yet, this service is not live but the Data Mining and Social Networks Analysis Group are still able to bring their knowledge and expertise to the marketplace. In an arrangement called an Innovation Partnership setup in conjunction with Enterprise Ireland they are working with a Dublin based company called Datafusion International. They write software for law enforcement agencies such as The Gardai – the Irish police service and Homeland Security in the US.

Each of these agencies has a number of data sources. They are able to access data from such things as the land registry to determine the owner of a property or the vehicle registry to determine ownership of a car, van or some other vehicle. Also, they have access to revenue records and court transcripts.

These are all discrete sources of data that the law enforcement agencies but the only problem is that they are all separately housed in their own departments such as the departments of justice, the departments of transport, the ports authorities and so on. However, there is no linkup between them.

Eric explains, “ What they said to us was, ‘We have all this data. We would like to try and link people together. We would like to see a social network map of everybody in the system.

“So, we took all that data and started discerning relationships between them. If two people had the same address we would put a link between them. If they came in on the same flight we would also be able to indicate that there was a link between them

“We are using a product called Gate from the University of Sheffield. It is a term extraction engine. We can look at any kind of news article or any piece of free text and it will parse that text. It will tokenize it and break it up into different parts of speech in terms of what’s a noun and what’s a verb. But more importantly it will identify the names mentioned in the article and who are they mentioned in relation with.”

“We don’t used linked data technology as yet but we do use fuzzy logic. The software is designed to be used by trained individuals within the various law enforcement agencies. Although the program can identify different persons or the same person in different places there will be a human presence involved in the process of checking and verifying identities.”

Advertisements

9 thoughts on “Data Mining: Using Predictive Analysis and Social Network Analysis

  1. Great article, very interesting. I’m in this area, going to look these guys up… but if they pop in I hope they’ll post their home page url or Twitter handle or something?

    Like

  2. Excellent article. As our company is currently building a global network that links official company registers together, the whole area of data mining is very interesting. It could help identify cross shareholdings, cross directorships, patterns of company closures etc etc.

    Like

  3. There is a lot of ambiguity about the differences between Data Mining and Machine Learning (you could throw in Knowledge Discovery, Predictive Analytics and Statistics (to a lessor extent) in to the terminology mixture also) with many definitions suggesting they are the same or at least have a significant amount of overlap. However the commonly excepted difference is that Machine Learning works with a sample of data and some background (domain) knowledge to develop and train its algorithms, while Data Mining starts with a large data set already in place in order to train its algorithms.

    Like

  4. It is amazing this article and description. I have my graduation project on this major. My question is what do I need to build up a SNA tool do I need machine learning in which I take a sample then I train the data or data mining where I start with a large amount of data.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s