We’ve just released an updated version of ForecastBench, our LLM forecasting benchmark. Here’s what the new results reveal about the accuracy of state-of-the-art models.
Questions: what happens when you extremize the public and super forecasts, rather than using the median? And when you also do that that with forecasts with multiple LLMs?
If finance analyst ends up informally leaning on a “Claude” take...not realising so many other analysts are doing same...it’s an AI herd. And herds crash
So happy to see you’ve started a Substack to more widely share your great work!
Questions: what happens when you extremize the public and super forecasts, rather than using the median? And when you also do that that with forecasts with multiple LLMs?
"LLMs have surpassed the general public"
If finance analyst ends up informally leaning on a “Claude” take...not realising so many other analysts are doing same...it’s an AI herd. And herds crash
A dodgy AI owner may even game the system :-)