Big data is already a lot like magic. Based on your spending habits Mastercard are supposed to know that you are getting a divorce before you do, computers are better at predicting good vintages than wine experts, facebook and google probably know you better than your own mother. Now the data competition site Kaggle is about to get meta on us in an attempt to take data science to a whole new level. They want computers to do the science for us. In particular they have launched a contest to find an algorithm for knowing what causes what. Here’s the challenge:
Cause-Effect Pairs Competition
Given samples from a pair of variables A, B, find whether A is a cause of B.As is known, "correlation does not mean causation." More generally, observing a statistical dependency between A and B does not imply that A causes B or that B causes A; A and B could be consequences of a common cause. But, is it possible to determine from the joint observation of samples of two variables A and B that A should be a cause of B? There are new algorithms that have appeared in the literature in the past few years that tackle this problem. This challenge is an opportunity to evaluate them and propose new techniques to improve on them.We provide hundreds of pairs of real variables with known causal relationships from domains as diverse as chemistry, climatology, ecology, economy, engineering, epidemiology, genomics, medicine, physics. and sociology. Those are intermixed with controls pairs of independent variables and pairs of variables that are dependent but not causally related and semi-artificial cause-effect pairs real variables mixed in various ways to produce a given outcome.
via Kaggle.
That may sound a little dry to you but it is actually dangerously revolutionary. Science is all about measuring what can be measured and finding signals in the noise. The more data you have the more chance you have to find patterns but the harder you have to search. An automated pattern classifier has the potential to make science move a lot, lot faster. It might even break it.
Finding patterns is one half of science. Explaining them is the other. With a clever computer algorithm scouring huge datasets and throwing up causal relationships left, right and centre, we will most likely lose the ability to keep.
So this is just the sort of thing that might accidentally move science beyond human understanding. That may sound like a contradiction in terms since science is the very process of understanding the world but that’s the thing about revolutions, they completely change your perspective and your understanding. Although in this case it may just highlight the limits of our understanding.