The Brier score is a proper scoring rule that measures the accuracy of probabilistic predictions. Using API data from various prediction market platforms, we can calculate scores for each market and check how often they were correct.
The results: One month before close, 62% of markets were already within 30% of the correct resolution, representing a Brier score of 0.09 (n=76,941). The median Brier score at market midpoint was 0.0225 (n=443,535).
Additionally, we've matched markets across platforms in a curated collection of questions. Using those matches, we can generate scores for each platform that rewards markets that are correct, confident, and early.
Those results indicate that each platform's performance is similar on average. However, each platform outperforms the others in certain categories. See the chart to the right below for those scores.
One month before close, most markets are already within 30% of the resolution.
When graded directly against each other, some platforms perform better in specific categories.
Category | Kalshi | Manifold | Metaculus | Polymarket |
---|---|---|---|---|
Culture | A- | D | F | B |
Economics | B+ | D+ | C | A- |
Politics | B | C- | C | C+ |
Science | C | C- | B | A |
Sports | A- | D+ | D | A- |
Technology | A- | C | C | C+ |
Letter grades based on relative Brier scores from n=942 matched markets.
Source: brier.fyi
Kalshi | A- | -0.013 |
Manifold | D | +0.018 |
Metaculus | F | +0.051 |
Polymarket | B | -0.008 |
Kalshi | B+ | -0.008 |
Manifold | D+ | +0.011 |
Metaculus | C | +0.004 |
Polymarket | A- | -0.011 |
Kalshi | B | -0.006 |
Manifold | C- | +0.005 |
Metaculus | C | +0.003 |
Polymarket | C+ | -0.001 |
Kalshi | C | +0.004 |
Manifold | C- | +0.008 |
Metaculus | B | -0.007 |
Polymarket | A | -0.025 |
Kalshi | A- | -0.015 |
Manifold | D+ | +0.013 |
Metaculus | D | +0.019 |
Polymarket | A- | -0.011 |
Kalshi | A- | -0.014 |
Manifold | C | +0.005 |
Metaculus | C | +0.004 |
Polymarket | C+ | -0.001 |
Each card here asks a general question that several prediction market platforms have indepenently tried to predict. We find and link markets when they resolve in order to judge all platforms on an even playing field. As of 6/4/2025 we have 942 linked markets across 378 unique questions.
A traditional accuracy analysis would look at a single point in time, usually midway through the market. We calculate our absolute scores in this way, and show it below as the midpoint Brier score.
However, that form of scoring often misses a lot of important information. When you look at the probability charts below, you can see how each market's prediction changes over time as they respond to news, polls, and other information. In addition, each prediction platform has different rules on what they predict and how early their markets open which makes a direct comparison difficult.
In order to address this, we start by scoring every market on every day that it's open. Then we aggregate them into a relative score - grading that market's performance relative to the other linked markets on each day and rewarding those that were correct earliest. Check out the scoring section for details about the system.
Source: brier.fyi
Source: brier.fyi
Source: brier.fyi
Source: brier.fyi
Predicting the future is hard, but it's also incredibly important. Let's say someone starts making predictions about important events. How much should you believe them when they say the world will end tomorrow? What about when they say there's a 70% chance the world will end in 50 years?
Prediction markets are based on a simple concept: If you're confident about something, you can place a bet on it. If someone else disagrees with you, declare terms with them and whoever wins takes the money. By aggregating the implied odds of these trades, you can gain an insight into the wisdom of the crowds.
Imagine a stock exchange, but instead of trading shares, you trade on the likelihood of future events. Each prediction market offers contracts tied to specific events, like elections, economic indicators, or scientific breakthroughs. You can buy or sell these contracts based on your belief about the outcome - if you are very confident about something, or you have specialized information, you can make a lot of money from a market.
Markets give participants a financial incentive to be correct, encouraging researchers and skilled forecasters to spend time investigating events. Individuals with insider information or niche skills can profit by trading, which also updates the market's probability. Prediction markets have out-performed polls and revealed insider information, making them a useful tool for information gathering or profit.
Some popular prediction market platforms include:
The traditional way to score predictions is using Brier scores, which measure how far off your prediction was from reality. While these work great for individual predictions, they struggle to compare predictions across different time periods - being 90% confident a month before an event is more impressive than being 90% confident the day before.
To account for this, we use a relative Brier scoring system. For each matched question across platforms, we compare how early each platform reached the correct probability range. Platforms that arrive at accurate predictions earlier receive more points, while those that take longer or never reach accuracy receive fewer points.
As an example, let's look at the probability history for an actual set of markets.
We calculate these scores for all linked markets, since we are confident that they meet our standards for serious markets making real predictions. We can average these scores together to get overall scores for each platform, category, and combination therof.
Source: brier.fyi
Accuracy is a good metric, but another lens we can use for analysis is calibration. For a group of markets to be perfectly calibrated, their average resolution values must match their average prediction values.
For example, let's say there are a handful of markets that will be determined by rolling a 6 on a fair six-sided die. We would expect each market to have an average probability of around 17%, and once they resolve we would expect around 17% of them to resolve positively. If both are true, then those markets were well-calibrated. If not, then some of our assumptions were incorrect.
This plot takes all of the prediction and resolution values and shows how closely they match. They should form a straight line from the bottom-left to the top-right - points significantly under or over that line represent systemic errors.
Calibration plot for all platforms, with market probability at midpoint versus average resolution value. Includes all resolved binary and multiple choice markets. n=443,535 markets
Source: brier.fyi