A Professional Superforecaster Walks Us Through His AI Progress Forecasts
In our AI progress forecasting panel, we posed questions on when AI would help solve a Millennium Prize Problem and the rise of autonomous vehicles. Here's how a superforecaster made his predictions.
In November, FRI announced the first results from our Longitudinal Expert AI Panel (LEAP)—a project that collects forecasts on the future of AI from over 330 top computer scientists, AI industry experts, economists, and policy specialists. The LEAP panel also includes 60 superforecasters with a track record of highly accurate predictions in prior geopolitical forecasting tournaments. These superforecasters come from two places: (1) the original IARPA ACE tournament, where LEAP coauthor Phil Tetlock and collaborators called the top two percent of forecasters by accuracy “superforecasters”; and (2) Good Judgment Inc, which hosts open forecasting tournaments and certifies a small number of participants as being highly accurate superforecasters.
In the first three months of LEAP, we asked panelists for their forecasts across a range of indicators, from benchmark performance and electricity use to drug discovery and AI progress. In this post, we wanted to shed a little light on how forecasters approach LEAP questions, so we’ve asked a superforecaster to take us through his own forecasting process on two LEAP questions.
Dan Mayland is both a professional superforecaster and a coauthor on our LEAP work, but his approach below gives a flavor of how a superforecaster might generally approach these questions. He was interviewed by FRI’s Matt Reynolds.
FRI: Let’s start with a question from the first wave of LEAP:1
What percentage of U.S. ride-hailing trips will be provided by autonomous vehicles that are classified SAE Level 4 or above in the years 2027 and 2030?
What’s the first thing you do when you see that question?
DM: The first thing I’m doing is breaking down the elements of the question. Do I really know what ride-hailing means? I do a quick check to make sure my understanding is consistent with what the question writers mean.
The next element is “classified SAE Level 4 or above.” So I get what that means, but is SAE going to change what Level 4 is in the intervening years? Now we’re forecasting two things: Whether we can technically achieve Level 4 autonomy, and whether SAE is going to change its criteria in a way that complicates the question. So I check out how consistent it has been in the past.
In this case, I feel pretty confident that SAE Level 4 is going to mean the same in 2027 as it does in 2030, but that’s one of those things that can make a big difference. The main thing I’m doing here is looking for little bits of information that are almost legalistic, but that could prove to be deterministic.
FRI: Alongside the question, we provide some background information, including, in this case, a historical baseline estimating that 0.27% of U.S. ride-hailing trips in Q4 2024 were at SAE Level 4 or above.
DM: I read the logic and make sure the calculations make sense. And then what I do is determine whether the way FRI calculated the base rate is consistent with the way that the question will resolve.
In this case, the base rate applies only to Waymo, and Waymo is pretty transparent in how it reports its data. But when I’m forecasting, I have to take into account Tesla, Uber, Lyft, and who knows who else. Will these companies be as transparent? Will they make it easier or harder to calculate the resolution? It’s the same thing as with the SAE Level 4 definition—I’m kicking the tires on these.
FRI: After breaking down the question and understanding the base rate, what’s your next move?
DM: I read some background on Waymo that gives me a sense of when it started testing self-driving cars and the history of the company, just giving myself a basic education on the situation. Then I look at the pace of diffusion for a variety of technologies, the sigmoid curve, where there’s a slow build-up, then rapid adoption and flattening. I’m starting to get a theory of the case, and thinking that we’re probably just at the beginning of the steep part of that sigmoid curve.
FRI: At this point, are you starting to land on a median estimate for 2027?
DM: My theory of the case at this point, based on a Goldman Sachs analysis and Waymo’s figures is that we’ve got a doubling of growth year-over-year. That’s not a bad starting point.
Then I’m looking at some other projections for massive scale-up. I’m thinking, are the optimists really fully considering the manufacturing scale required, the capex requirements, the regulatory environment, and so on? I’m trying to imagine if I prove to be wrong on this, would it be that I was too optimistic or not optimistic enough? Why would I have messed up?
FRI: This process of trying to work backwards from an imagined failure to identify where your forecast would have gone wrong is called a pre-mortem, right?
DM: Right. In this case, if I underestimate the pace of AV rollout, it might be down to a confluence of a laissez-faire regulatory environment paired with an underestimation of Elon Musk’s ability to flip a switch and turn a whole bunch of existing autonomous vehicles into Level 4 self-driving ride-hailing autonomous vehicles.
On the downside, I’m thinking that if there’s an accident or something, and there’s a regulatory backlash, that could slow things down. If you’re talking about exponential progress, a year of flat growth makes a big difference, and it would change my forecast in dramatic ways.
FRI: Let’s hear your 2027 and 2030 median predictions.
DM: My median for 2027 is 2.2%. That’s pretty much a doubling of growth. That’s my year-over-year trend, and that gets me to 17.3% in 2030.
FRI: That 2027 prediction is considerably lower than the expert median of 7.3%. You’ve analyzed the rationales of LEAP forecasters—can you say where your own forecast diverges from the average expert?
DM: If I had to speculate, I think it’s because I’m considering more of the regulatory and physical barriers when implementing something in the real world. There’s a sense that AI is evolving quickly, and there’s going to be a big correlation between AI progress and the implementation of autonomous vehicles.
I think of it more like a battery. Capabilities are charging up in some way, but they are going to be discharged at a slower pace. I’m expecting near full saturation eventually, but going from city to city and dealing with regulation after regulation is going to slow things down to a lower level than 7% by 2027—that’s a lot.
FRI: The next question I’d like to break down is from the second LEAP wave:
What is the probability that AI will solve or substantially assist in solving a Millennium Prize Problem in mathematics by 2027, 2030, and 2040?
This seems like a very different kind of question. We know that one of the seven Millennium Prize Problems has been solved, but that doesn’t seem like much to go on. Is this a harder question to forecast?
DM: It didn't feel harder to forecast, but whether something's hard to forecast or not is ultimately determined by how accurate or not people were. I'll have a better answer for that question in 2030 or 2040. If I was right and my theory of the case prevailed, then I guess it wasn't that hard.
FRI: Which parts of this question are you homing in on?
DM: The “substantially assisting” part is highly subjective language. FRI clarifies this by saying that it applies if a panel of five expert mathematicians specializing in the field of the solved problem would agree with a statement that the AI-assisted part of the solution is “original, important, and likely could not have been produced without the AI.”
That gives some clarity, but I’m putting a note in the back of my mind that some people may have a more liberal interpretation of this, so there’s a margin of uncertainty there we should account for.
FRI: And there’s also the question of what “solved” means in this context.
DM: It’s clear that solved doesn’t just mean that somebody said they solved it, or the New York Times reported that they solved it. Solved means the Clay Mathematics Institute (CMI) has said it has been solved, and if you check its rules and regulations, you learn that for it to consider a problem solved, the CMI has three conditions:
The proposed solution must be published in a qualifying outlet
At least two years must have passed since publication
The proposed solution must have received general acceptance in the global mathematics community
These are the kinds of details that can prove highly determinative. This rule about two years passing since publication, for example, makes the 2027 resolution very unlikely. That would require someone proposing a solution before the end of 2025, and the CMI putting its seal of approval on it as soon as that two-year deadline passes.
The likelihood of that is negligible. From a forecasting perspective, finding this detail is super satisfying because it allows me to extremize in a way that will reflect well on my Brier score eventually. I can put my likelihood of that 2027 deadline almost to zero.
And if you take the one Millennium Prize Problem that has been solved—the Poincaré Conjecture—there was a seven-year gap between publication and the CMI announcing that the conditions had been met to award the prize.
FRI: Does the fact that we have one solved Problem give us something of a base rate? One solution every 25 years or so?
DM: I wouldn’t say one in 25 years. One has been solved since they were announced in 2000, but these problems weren’t invented in 2000. The Navier-Stokes equation was first introduced in 1822 and then developed over the subsequent decades. P vs NP was definitively stated in 1971. If you look across these problems, you could say we’re roughly at an average of 100 or so years they’ve been around, and one has been solved. So that’s not exactly rapid progress.
FRI: Is AI progress on mathematical benchmarks a useful reference class here?
DM: The honest answer here is I don’t know. Forecasters often make lousy pundits because it’s hard to get a yes/no answer to a question like that.
When I’m looking at a question like this, I’m weighing probabilities in my head about how relevant these benchmarks are. The Tier Four questions from FrontierMath are a little closer to the Millennium Prize Problems than the International Math Olympiad. Or maybe something like Humanity’s Last Exam is closer, too. But it’s hard to draw conclusions here, because these Millennium Prize Problems require intense, deep thought, over very long time spans.
Moravec’s Paradox is relevant here—AI might excel at things that humans find hard, such as very complex computations, and be surprisingly awful at things that humans find trivial, like moving objects around. I’m aware that just because I think these questions are extremely hard, it might be relatively easy for an intelligent system to draw on massive amounts of computational ability and solve some of these problems. I have to stop that bias from impacting my forecasts too much.
FRI: What other information are you drawing on when you’re forecasting this question?
DM: Well, it turns out that Google DeepMind CEO Demis Hassabis said at the beginning of 2025 that they might be a year and a half away from solving a Millennium Prize Problem. Later press coverage suggested this might be the Navier-Stokes equation.
That’s a relevant data point, but then you have to consider who this person is. Is this a hyperbolic statement that is unlikely to materialize in the timeline he’s talking about? Or is this for real?
So now I’m forecasting whether the CEO of Google DeepMind is a serious person or not, and how much to value his opinion. That’s an important part of forecasting, because nobody’s going to know everything about everything. So you’re a bullshit filter, essentially.
FRI: So a forecasting question about the Millennium Prize has become a question about how seriously we should take Demis Hassabis’ public statements?
DM: That is the biggest factor influencing my forecast, but it’s not the only one. I’m also considering the pace of AI progress and timelines to AGI. This question asks about a probability by 2040, and I’m thinking there is a pretty good chance we’re going to have super powerful AI by that date and that might make these problems solvable.
FRI: What’s your pre-mortem on this question?
DM: In one scenario, it’s that I fell foul of Moravec’s Paradox, and I underestimated the pace of progress and overestimated the difficulty of these questions for AI systems. In the other scenario, it’s that I bought too much into the AI hype and that scaling and new architectures didn’t pan out to the degree that people are predicting, or that I think they will.
But even at this level, if you go back to Demis Hassabis and DeepMind, they’re using the AI now, and it’s not going to get worse. So I’m guessing there’s more of a probability that I’m underestimating progress than overestimating.
FRI: Going right back to the question, let’s hear your median predictions for 2027, 2030 and 2040.
DM: For 2027 I’m at 2%—as we discussed, I’ve pretty much ruled this out.
I’m at 40% for 2030, which is a big jump. And then I’m at 80% by 2040.
FRI: Looking at the LEAP responses for 2030, experts and superforecasters are at a median probability of 20.4% and 20.5% respectively. That’s a lot lower than you.
DM: I thought they would have factored in the DeepMind statement more, but it could be that the fine print of the CMI rules are really driving forecasts down, and that could be fair.
The thing to bear in mind is that we aren’t all forecasting on the same information. Somebody who never saw that DeepMind statement is probably going to come at this from a very different perspective.
Another thing you have to consider is whether the forecasters are interpreting the question in the same way. If a question isn’t well defined, then you might get 100 people forecasting 100 different questions. Question writing is an art unto itself, and it can be annoying and nitpicky, but if you want to get a good output, then it’s really important that you pressure test those questions as much as possible.
You can find forecasts for all questions in the LEAP forecast explorer. Click the ‘filter’ button to expand the view and select a specific wave and question.


