##
Can Machines Learn Cause and Effect?
*June 6, 2018*

*Posted by Peter Varhol in Algorithms, Machine Learning.*

Tags: artificial intelligence, Bayes Theorem, Judea Pearl, Machine Learning

add a comment

Tags: artificial intelligence, Bayes Theorem, Judea Pearl, Machine Learning

add a comment

Judea Pearl is one of the giants of what started as an offshoot of classical statistics, but has evolved into the machine learning area of study. His actual contributions deal with Bayesian statistics, along with prior and conditional probabilities.

If it sounds like a mouthful, it is. Bayes Theorem and its accompanying statistical models are at the same time surprisingly intuitive and mind-blowingly obtuse (at least to me, of course). Bayes Theorem describes the probability of a particular outcome, based on prior knowledge of conditions that might be related to the outcome. Further, we update that probability when we have new information, so it is dynamic.

So when Judea Pearl talks, I listen carefully. In this interview, he is pointing out that machine learning and AI as practiced today is limited by the techniques we are using. In particular, he claims that neural networks simply “do curve fitting,” rather than understand about relationships. His goal is for machines to discern cause and effect between variables, that is “A causes B to happen, B causes C to happen, but C does not cause A or B”. He thinks that Bayesian inference is ultimately a way to do this.

It’s a provocative statement to say that we can teach machines about cause and effect. Cause and effect is a very situational concept. Even most humans stumble over it. For example, does more education cause people to have a higher income? Well maybe. Or it may be that more intelligence causes a higher income, but more intelligent people also tend to have more education. I’m simply not sure about how we would go about training a machine, using only quantitative data, about cause and effect.

As for neural networks being mere curve-fitting, well, okay, in a way. He is correct to point out that what we are doing with these algorithms is not finding Truth, or cause and effect, but rather looking at the best way of expressing a relationship between our data and the outcome produced (or desired, in the case of unsupervised learning).

All that says is that there is a relationship between the data and the outcome. Is it causal? It’s entirely possible that not even a human knows.

And it’s not at all clear to me that this is what Bayesian inference is saying. And in fact I don’t see anything in any statistical technique that allows us to assume cause and effect. Right now, the closest we come to this in simple correlation is R-squared, which allows us to say how much of a statistical correlation is “explained” by the data. But “explained” doesn’t mean what you think it means.

As for teaching machines cause and effect, I don’t discount it eventually. Human intelligence and free will is an existence proof; we exhibit those characteristics, at least some of the time, so it is not unreasonable to think that machines might someday also do so. That said, it certainly won’t happen in my lifetime.

And about data. We fool ourselves here too. More on this in the next post.