- Blog
Judea Pearl on the Future of AI, LLMs, and Need for Causal Reasoning
At the recent Causal AI Conference in San Francisco, attendees were treated to a special premiere: a pre-recorded fireside chat with Judea Pearl, a true pioneer in the field of causal inference and artificial intelligence. This wasn’t just any recorded interview. Darko Matovski, the founder and CEO of causaLens, had personally traveled to Pearl’s office in Los Angeles to meet Judea and bring the insights of this legendary figure directly to the conference audience.
Awarded the Turing Award in 2011, Pearl has made groundbreaking contributions to artificial intelligence through his development of a calculus for probabilistic and causal reasoning. His book, “The Book of Why: The New Science of Cause and Effect,” has become something of a bible at causaLens and has inspired countless researchers and practitioners to think causally.
In this candid conversation with Darko, Pearl addresses some of the most critical questions in AI today. He explores the limitations of deep learning, the transformative potential of causal reasoning, and his vision for the future of AI development. His insights serve both as a thoughtful critique of current approaches and a compelling roadmap for the future of AI.
Whether you’re a data scientist, an AI enthusiast, or a business leader looking to explore the power of causal AI, this interview is a must-watch (or a must-read if you prefer reading over watching).
This transcript has been lightly edited for length and clarity.
The Limitations of Deep Learning and the Need for Causal AI
Darko: “Judea, if I were a data scientist or researcher doing deep learning, and I had found good results and was very happy, and someone came and said, ‘Hey, you should also do Causal AI,’ what would you say to that?”
Judea: “Well, deep learning can give you answers to a very limited class of questions. Everything depends on the question you want to ask and the answer you want to get. Deep learning essentially gives you a functional approximation. If you want to find the approximation to a probability function, maybe the conditional probability of a scene interpretation, given the scenery, will provide you with a good, effective method of going from input to output. But if you want to ask more sophisticated questions as we do in everyday life, if you want to read and understand a problem, a story in the New York Times, or communicate with any social scientist or political scientist, you’re talking about cause and effect.
What causes what? What will happen if I do this? Can you explain to me why I get sick? These kinds of questions form what we call the ladder of causation that deep learning cannot give you an answer to unless you insert all the answers explicitly, all the answers that you ever got, all the questions and the answers, like in a Chinese room of sorts.
If you make a long list of all the answers and all the questions that you anticipate asking in the future. But this, of course, takes a lot of memory and computation. So perhaps one day, we may be able to do it and just approximate that ideal of a Chinese room with a functional approximation.
We are not there yet. Because it’s a super simple explanation, all possible questions, including causal questions and counterfactuals, are included, making it a super equation. So we have to have an encoding, a parsimonious encoding that will generate the answer to a class of questions you anticipate asking. That is what causal models are. They are a parsimonious representation of a world model that can generate the answers to questions.
You notice that I’m talking about answers to questions. I’m not talking about what other people are talking about.”
The Ladder of Causation
Darko: “That makes sense. For people who are not familiar with the ladder of causation, could you share it in your own words and explain why it’s so important to climb that ladder?”
Judea: “The ladder of causation consists of three levels. Above them, you have the standard statistics questions. Deep learning belongs there.
I observe X; what can you tell me about Y? It essentially means, give me the conditional probability of Y given X or give you approximations thereof, or bounds thereof, but give me a function from observation into the probability of the output. This is what deep learning is; this is what logic is, by the way. The logic that doesn’t have the probability.
It’s ‘give me the input with the output.’
Beyond that, you want to go to level one. What if I do? I want to ask, ‘I want to raise the price.’ Okay, how much am I going to gain in revenue? How much am I going to lose in sales and so forth? That’s a good example, which I cannot answer just by observation.
Even though I have a record of prices and sales figures for the past, I cannot conclude; I cannot tell you what will happen if I do it today with the current market situation.
Yes, you got almost to the third level of explanation. I know the output. Tell me what caused it. I killed my headache. Was it the aspirin or perhaps the good news I got on television?
So, I have to go backward in time and do retrospection. I have an output. Something happened to me; tell me what caused it. This contains most of our everyday conversation and most of the scientific conversation.”
Bayesian Networks vs. Causal Reasoning
Darko: “Often we get asked this question: obviously, you are the inventor of Bayesian networks. But we often get asked, ‘I’m doing Bayesian stuff, why do I need causal stuff? Why can I not get away with just a Bayesian reasoning engine?'”
Judea: “What is Bayesian? Bayesian is still built on statistics on probability theory. It just inverts the condition of the joint probability. You express it in two different directions: X given Y or Y given X. But both of them are computable from the joint probability of X and Y. So it’s all still within the framework of probability theory.
Nothing could be done outside probability, which speaks the language of probability. You need to have a new language, and therefore, you cannot do with Bayesian. Although interestingly, historically speaking, Bayes himself was motivated by causal consideration. He wanted to prove, according to Price’s phrase, the existence of the deity. The final cause. So, what kind of evidence do we need to prove that Jesus was indeed resurrected? How much evidence do you need for that?”
The Scarcity of Causal Reasoning Skills in Data Science
Darko: “When I visited you here last time, you told me that we have a serious problem, and that is the lack of data scientists trained in causality. I think that problem still persists today. Do you have any solutions in mind?”
Judea: “Here’s the problem. We produce one or two PhDs a year in causal reasoning. Do you know how many PHDs data science produces a year? I wouldn’t be exaggerating if I said 5000 a year. Why? Because what does it do? If you want to hire people to work in your company or any other enlightened company that wants to get into computer science, where do they get the staff?
The PhDs or even BS-level engineers who can think causally are not necessarily experts. They don’t exist. And why? The number one limitation of how much we can produce is your CV. Isn’t it a problem of attention if I go today to my dean, the chancellor of my university, and I tell him, ‘Let’s start a center in causal reasoning,’ they’d say, ‘Who needs causal reasoning? Students want data science.’
So, if you ask me what effect the data science of deep learning has had on causal reasoning, it’s the parameters of attention. I have nothing against it; it did a good job and was very successful, but at the same time, to go to the next level, we need the attention of both funders and educators.”
LLMs and Causal Reasoning
Darko: “That makes complete sense. Do you see that we can change the game a little bit, and rather than relying on the few PhD students, could we not train AI agents to do causal reasoning for us? Perhaps if an AI agent has seen thousands or hundreds of thousands of causal problems and we have had a human solve it, the AI agent could observe how they approach it, how they apply the do-operator, or how they build a counterfactual explanation?”
Judea: “To some degree, the GPT of today, the ChatGPT, empirically does that in a very sloppy way. But I’ll give you an example. Like I give you the famous canonical problem of the firing squad, and I ask you, the prisoner is dead, right? What would happen if rifleman one were to refrain from shooting? The first answer you’ll get is, ‘It’s illegal to shoot in California.’
You have to be very careful. But then you try to prompt it. You tell it, ‘I don’t care about legality. I want to know what happens to the prisoner.’ So it becomes a better reasoner. If you prompt it well enough, eventually, it will give you an answer, which is a cookbook answer taken from some papers in philosophy.
It tells you about names and slogans of over-identification. So it does it to a certain degree, but you have to prompt it properly. Right now, the art of prompting is an art. I don’t know how to control it. Some people are better at it, and they think it’s a new science. The science of prompting. You tell it that you are writing from an engineering perspective.”
Causal AI and Large Language Models (LLMs)
Darko: “So how do you see, at a high level, the interplay between LLMs and causal models? How do you think they play together?”
Judea: “Let’s not talk about LLMs because we don’t know what they are. But let’s talk about deep learning. Let’s talk about it as a functional approximator. If it’s a functional approximator, we know exactly how to use it because we can do ordinary causal inference. Every time the answer consists of a function, let’s approximate it using deep learning. For instance, take the Back Door formula.
What is a Back Door formula? It’s just an adjustment for certain variables. It’s a probability distribution. Go ahead and approximate the probability distribution. We know exactly how to use it and in what context.
Effectively, that is, we know how to use it at that level. If you want to use it at a higher level, we still don’t know.”
Causal AI and Artificial General Intelligence (AGI)
Darko: “So that’s maybe a good segue into how you think about reaching artificial general intelligence and how you see causal methods playing into that end state of the AI field.”
Judea: “First of all, I believe in general artificial intelligence. One day, we will be able to replicate all human functions. Causal reasoning is important, but it’s not sufficient. It has to be coupled with, of course, natural language understanding with vision. Yes, many other things. But it’s one field where we have science.
We have a framework, and we know what we are doing. So, it is a necessary component. Which has a scientific backing, right? So, it should be a role model for other facets, but we don’t have the scientific backing for those yet.”
Impact of Generative AI on the Causal Community
Darko: “So how do you feel the emergence of generative AI, LLMs, and such things has impacted the causal community? Has it helped it? Has it hindered it? Do you see any effects?”
Judea: “I already mentioned the way it hindered it. By shifting the attention. And creating a vacuum. We need to overcome it, and I hope people will realize the limitations of generative AI. And in the success of causal reasoning, directly answering the questions that users have in mind and most users have in mind causal questions, counterfactuals, and explanations, and I would say personalized, or situation-based, situations-based decision making.”
The Importance of Individual-Level Effects
Judea: “Every decision-making is situation-specific decision-making. You want to treat this particular question. Do you want to pacify this customer?
You get some information from the population, but you want to apply it to a specific case. So, in a simple case, you want to be able to choose customers who will be retained if you give them an offer. And they will quit if you don’t give them an offer and you want to identify them.
But the whole purpose of the target is to find the customer who will act one way if I give him an incentive and a different way if I don’t. And this is already a specific counterfactual question. Okay. Yes, if I do, no if I don’t. But on this individual, forget about the population.
So we have now a theory for that. We have balance, and this balance can direct decision-makers. And the balances are computable. And in most cases, especially for people who are already doing it, using a causal model, either by learning or discovery or by making assumptions. If you already have a causal model, you can compute this balance without doing any extra work.”
Challenges in Implementing Causal AI
Judea: “And if you don’t, if you want to do extra work, you combine observational and experimental data, and the combination gives you information on the specific individual. So this is doable. I only need to go to the FDA and convince them that this is a way of selecting or developing drugs and not the way it’s being done today.
We care about harm and benefit to patients. We don’t care about maximizing the probability of survival. It’s easy to maximize the probability of survival. Just select those people who are immune, healthy people who would survive anyway, and yeah, look, it’s easy. Yeah, but you need to spend your resources on those people who can be helped.
So that’s a tough, tough task to convince people who have been doing things differently for the past two centuries.”
Industries Likely to be Transformed by Causal AI
Darko: “If you were to guess, which industries do you feel causal reasoning will transform first?”
Judea: “I think it’s going to be medicine. Yes, it will look at personalized medicine, for example. Maybe not because medicine is very conservative; maybe marketing would be even earlier. Because people want more profit, we don’t have a scientific tradition that ties their hands to the fact that they have to pay lip service.”
Judea Pearl’s insights underscore the transformative potential of causal AI, from personalized medicine to data-driven decision-making in business.
The ability to reason causally, to ask “what if” questions, and to understand the true drivers of outcomes represents the next crucial step in AI’s evolution. This shift from pattern recognition to causal understanding opens up new possibilities across industries, from personalized medicine to more effective business strategies.
At causaLens, we’re witnessing firsthand how these concepts are being applied in real-world scenarios. Every day, we see organizations leveraging causal AI to gain deeper insights and make more informed decisions. It’s exciting to be part of this journey, helping to bridge the gap between academic theory and practical application.