How accurate are prediction markets?

The Brier score is a proper scoring rule that measures the accuracy of probabilistic predictions. Using API data from various prediction market platforms, we can calculate scores for each market and check how often they were correct.

The results: One month before close, 63% of markets were already within 30% of the correct resolution, representing a Brier score of 0.09 (n=84,357). The median Brier score at market midpoint was 0.0256 (n=671,732).

Additionally, we've matched markets across platforms in a curated collection of questions. Using those matches, we can generate scores for each platform that rewards markets that are correct, confident, and early.

Those results indicate that each platform's performance is similar on average. However, each platform outperforms the others in certain categories. See the chart to the right below for those scores.

What's a prediction market?
How do you calculate the scores?
What about calibration?

One month before close, most markets are already within 30% of the resolution.

When graded directly against each other, some platforms perform better in specific categories.

Category	Kalshi	Manifold	Metaculus	Polymarket
Culture	A-	D+	F	A-
Economics	A-	C-	C+	A-
Politics	B+	C	C+	B
Science	C+	C	A-	A
Sports	A-	C-	D	A-
Technology	A-	C+	C+	B

Letter grades based on relative Brier scores from n=971 matched markets.

Source: brier.fyi

Culture

Kalshi	A-	-0.013
Manifold	D+	+0.018
Metaculus	F	+0.051
Polymarket	A-	-0.008

Economics

Kalshi	A-	-0.008
Manifold	C-	+0.010
Metaculus	C+	+0.003
Polymarket	A-	-0.011

Politics

Kalshi	B+	-0.006
Manifold	C	+0.005
Metaculus	C+	+0.003
Polymarket	B	-0.001

Science

Kalshi	C+	+0.004
Manifold	C	+0.008
Metaculus	A-	-0.006
Polymarket	A	-0.025

Sports

Kalshi	A-	-0.013
Manifold	C-	+0.012
Metaculus	D	+0.024
Polymarket	A-	-0.010

Technology

Kalshi	A-	-0.014
Manifold	C+	+0.005
Metaculus	C+	+0.004
Polymarket	B	-0.001

What's a prediction market?

Predicting the future is hard, but it's also incredibly important. Let's say someone starts making predictions about important events. How much should you believe them when they say the world will end tomorrow? What about when they say there's a 70% chance the world will end in 50 years?

Prediction markets are based on a simple concept: If you're confident about something, you can place a bet on it. If someone else disagrees with you, declare terms with them and whoever wins takes the money. By aggregating the implied odds of these trades, you can gain an insight into the wisdom of the crowds.

Imagine a stock exchange, but instead of trading shares, you trade on the likelihood of future events. Each prediction market offers contracts tied to specific events, like elections, economic indicators, or scientific breakthroughs. You can buy or sell these contracts based on your belief about the outcome - if you are very confident about something, or you have specialized information, you can make a lot of money from a market.

Markets give participants a financial incentive to be correct, encouraging researchers and skilled forecasters to spend time investigating events. Individuals with insider information or niche skills can profit by trading, which also updates the market's probability. Prediction markets have out-performed polls and revealed insider information, making them a useful tool for information gathering or profit.

Some popular prediction market platforms include:

Kalshi

A US-regulated exchange with limited real-money contracts.

Manifold

A play-money platform where anyone can make any market.

Metaculus

A long-horizon forecasting platform, not a prediction market.

Polymarket

A high-volume, decentralized cryptocurrency exchange.

Learn the basics about prediction markets

More information about these platforms

How do you calculate the scores?

The traditional way to score predictions is using Brier scores, which measure how far off your prediction was from reality. While these work great for individual predictions, they struggle to compare predictions across different time periods - being 90% confident a month before an event is more impressive than being 90% confident the day before.

To account for this, we use a relative Brier scoring system. For each matched question across platforms, we compare how early each platform reached the correct probability range. Platforms that arrive at accurate predictions earlier receive more points, while those that take longer or never reach accuracy receive fewer points.

As an example, let's look at the probability history for an actual set of markets.

Our first step is preprocessing - for every market we average the probability over each day to minimize transient spikes and normalize the data.
Next we narrow the range down to the period where at least two markets are open. For some markets we will also override the start or end dates to limit the scoring period.
For each day in the scoring range, we calculate the score for each market. In this case we will use the Brier score, but other scores such as log would also work. This is the daily absolute score.
We then calculate the median daily score as a baseline for comparison. Markets that do better than this median will have better scores at the end.
For each day, we find the difference between each market's daily absolute score and that median.
Finally, we sum all of the market's daily differences and divide them by the total number of scoring days. Not all markets are open for the same duration, so this grants better scores to the markets that were open for longer.
This gives us a number that can be graded similarly to a Brier score, in the sense that lower is better. However, the scores can now be from -1 to +1 and most will be centered around 0, the median score.
In order to more easily evaluate these scores, we can assign then letter grades at certain cutoffs. These are the grades that you see on each question card.

We calculate these scores for all linked markets, since we are confident that they meet our standards for serious markets making real predictions. We can average these scores together to get overall scores for each platform, category, and combination therof.

Daily Probabilities

Daily Calculated Brier Scores

Difference from Median Brier Score

Source: brier.fyi

Relative score results:

Kalshi

-0.027

Manifold

+0.039

Metaculus

-0.026

Polymarket

-0.017

More information on scoring

What about calibration?

Accuracy is a good metric, but another lens we can use for analysis is calibration. For a group of markets to be perfectly calibrated, their average resolution values must match their average prediction values.

For example, let's say there are a handful of markets that will be determined by rolling a 6 on a fair six-sided die. We would expect each market to have an average probability of around 17%, and once they resolve we would expect around 17% of them to resolve positively. If both are true, then those markets were well-calibrated. If not, then some of our assumptions were incorrect.

This plot takes all of the prediction and resolution values and shows how closely they match. They should form a straight line from the bottom-left to the top-right - points significantly under or over that line represent systemic errors.

Calibration Plot

KalshiManifoldMetaculusPolymarket

Calibration plot for all platforms, with market probability at midpoint versus average resolution value. Includes all resolved binary and multiple choice markets. n=671,732 markets

Source: brier.fyi

See all calibration charts

How accurate are prediction markets?

Culture

Economics

Politics

Science

Sports

Technology

Top Questions

Probability History

Probability History

Probability History

Probability History

What's a prediction market?

How do you calculate the scores?

Daily Probabilities

Daily Calculated Brier Scores

Difference from Median Brier Score

Relative score results:

What about calibration?

Calibration Plot