Society faces many challenges from individual health to global financial crises, food shortages, disease outbreaks, and ethnic violence. These issues are highly complex. The reason why we have so much trouble addressing them is actually rooted in a basic mathematical problem: How can we identify the important information—the information that determines effective interventions—in a large amount of data?
We have more data about our society than ever before, but the path from big data to actionable solutions is not straightforward. “From big data to important information,” a paper summarizing research at the New England Complex Systems Institute, provides a framework that shows what the problem is and how to make real progress in solving these challenges.
What is the scientific trajectory for taking a sick person or a struggling country’s economy to a state of health or stability? It would seem that the right approach is straightforward, if difficult. Study the available data and build a model. Simulate the model, predict the effects of interventions—voila, the problem is solved. This approach seems reasonable, but there is a hidden flaw. It only works if we correctly identify the right properties to study.
What if we simply keep collecting more and more detail about a system? Wouldn’t the model be right? The problem is that the data is never ending; we can’t get all of the microscopic information about a system, even for a small system.
NECSI’s solution to this challenge comes from a new mathematical approach to describing systems developed originally in physics, specifically the “renormalization group” study of phase transitions. Consider the boiling point of water. The transition causes a discontinuous change in density. If we increase the pressure, the boiling temperature also increases. The discontinuity in the density decreases until it disappears. At that point, called a second order phase transition point, the system has a behavior that cannot be described by the density change itself. Instead, there are fluctuations of density across the material. To treat these mathematically, Ken Wilson developed renormalization group to treat the behavior as a function of scale.
NECSI’s paper extends these concepts to a generalized multiscale information theory. In this approach, the data or information itself is given a specific scale. Different properties of a complex system affect its behavior at different scales. When we wish to change the behavior of a real-world system, we usually want to affect the largest scale. It follows that the properties most important to our models would be those that can be seen on the largest scale.
This approach is a massive simplification. Rather than studying all the data and all the chains of cause and effect that influence the behavior of a system from the molecular scale up, we can focus only on the specific information that matters. Still, this approach requires its own mathematical treatment that can be challenging to implement in practice. Yet when the effort is made, multiscale information theory can provide detailed guidance about which interventions can lead to the changes we want to make in human health or society.
NECSI has successfully applied these methods to a number of real-world problems. An analysis demonstrated that the Arab Spring was precipitated by rising food prices. Those prices were in turn driven up by corn being converted to ethanol and agricultural market speculation in the United States. Another analysis found the biggest predictor of violence in many countries is the geographic distribution of ethnic groups.
We correctly predicted there would be a dramatically larger outbreak of Ebola before the West African outbreak. Then we correctly predicted that door-to-door health screenings would rapidly stop the virus from spreading. In all these cases, our approach untangled the complex web of cause and effect to find the most important properties and levers for solving some of society’s biggest challenges.
The study, From Big Data To Important Information was recently published by Yaneer Bar-Yam in the journal Complexity.