Statistical forecasting is a familiar technique where lots of data points are gathered and assessed. An aggregation is formed and the analysis of that aggregated information is used to predict everything from elections to global warming.
This method, as pioneered by W. Edwards Deming, was used as an industrial tool while working to help create a viable manufacturing industry, virtually from scratch, from amidst the burnt out wreckage of post-war Japan. Now, nearly all modern industry uses statistical probability in their manufacturing and engineering processes to great effect as long as precision is not confused with accuracy.
However, statistical approaches that are ideal for maximizing the performance of machines encounter serious issue when it comes to working with humans. That is because even when we are given a highly restricted amount of options from which to choose from it doesn’t follow that the optimal decision gets made. This source for unpredictably by itself not such a big deal with the involvement of the properly trained and competent people working in a team. But even within clearly defined parameters the possibility always exists that, even with good intentions and sound thinking, decisions can get made which could have disastrous consequences.
In any system where human are required to make a decision options and methodologies can vary from person. They can vary because of influences from environmental conditions and also in how the information presents itself.
Technology, through the use of robots, automated systems, and computerization has at every stage of its growing capability been used to take humans out of the industrial process but human involvement can never be eliminated.
Even in the well-resourced and vast global systems; commercial, political, military and environmental, it is the individual doing the unpredictable thing that can invalidate complex, deeply wrought models of statistical forecasting based on petabytes or more of data.
We know humans are unreliable. We all think it is wise to get a second opinion on important medical matters even if we have faith and ability in the training and competence of the doctor that we are dealing with. So, the question becomes, can we account for our unpredictability in predictable ways?
Michal Kosinski, Operations Director at the Cambridge University Psychometrics Centre, thinks it is possible. In an article soon to be published by the Proceedings of the National Academy of Sciences he suggests that minor or incomplete digital information that exists about our individual selves when collated and aggregated with hundreds of thousands of people roughly like us can serve to make accurate predictions about what we will do next.
The testbed used for his theory was Facebook. Using a sample size of 58,000 and measuring just their activity using the Like button alone they were able to make predictions of such accuracy that the information, according to the researchers could be, “Worthwhile for advertisers.”
[NB: When the PNAS article is published Technology Voice will be covering the findings in greater detail.]
But the application of this work goes beyond selling stuff on a social network site. By being able to make predictions in the very area where predictive power is weakest could save us from all sorts of disasters. Apart from natural disasters nearly all catastrophes can be traced to human error of some sort.
The classic failure of statistical forecasting as a predictive tool was seen in the recent financial meltdown. Financial Services was, and still is, a massively data-ized industry. Yet the gazillion bits of intensely analyzed data and the resulting highly thought-through prognostications were rendered illusory when the economy disappeared.
The idea of econometrics as a trustworthy and reliable tool for policy-makers evaporated along with it. But this attitude only shows a lack of understanding on what the limits are on what can be gleaned from an approach that is based purely on statistical forecasting. The failure was not in the data but in how people used and interacted with (or ignored) the data.
However, by focusing on people are likely to do rather than what information on its own does we can perhaps model intensely complex systems such as economies in a far more realistic and useful manner.
Approaching the same issue from another angle is the research is taking place in what are called, agent-based models, which in turn have been developed from complexity theory, (the theory that enables companies to move things from A to B around the world but often, and counter-intuitively, not by the shortest route.) It looks very promising and the work being done by projects such as CRISIS – Complexity Research Initiative for Systemic InstabilitieS is beginning to attract wider attention and is receiving major funding.
The idea of developing agent-based models is to predict what human decision-makers in a given system will do. That agent could be a pilot in a plane, a manager tasked with a project or a customer picking up something from the supermarket on the way home from work. All humans in a system – all bringing their own rationales and temperament to the process.
Apart from the obvious benefit to marketers, being able to predict how people or individuals will make decisions in a given situation while working within the confines of a process or a system would help to reduce randomness and increase reliability.
However, it seems we now have the very odd situation where the abstracted you, which has been formulated by the aggregation of data derived in part or in whole by vasts number of people who are somewhat like you, is more likely to behave like you than the real you. Where’s Douglas Adams when you need him?