About

Brier.fyi launched in 2025 to help people compare and evaluate prediction markets fairly. Our mission is to help the public make informed decisions about which markets to trust and how to interpret forecasting data. We believe in transparency, honest comparisons, and providing meaningful context.

The idea for this site started in July 2023 when we set out to create a calibration plot for Manifold, as they weren't publishing their own accuracy statistics at the time. What began as "Calibration City" eventually grew to include other platforms like Kalshi, Metaculus, and Polymarket. Thanks to grants from the Manifold Community Fund and EA Community Choice programs, we were able to expand our work. We added more markets, created new filters and charts, tracked accuracy metrics, and built guides for newcomers. However, we weren't comfortable sharing hard numerical accuracy scores or directly comparing platforms because they were fundamentally different from each other.

While our data was solid, it wasn't conveying the insights people expected. Calibration is very useful, but it can't ever tell the whole story. In January 2025, we initiated a complete overhaul. We started fresh, linking similar markets across platforms and finding directly comparable questions. We built a new website, less focused on experimentation and more focused on showing valuable results. We wanted to have something that actually answered questions like “How accurate are prediction markets?” and “Which platform is more accurate on the topics I care about?”

We're still growing - adding new platforms, curating interesting questions, and building new features. You can find all our source code under the on GitHub project Themis, complete with our open issues and roadmap. All our data is available through our PostgREST API. We welcome collaboration and encourage others to use our data, with the hope that you'll share your findings with us.

Frequently Asked Questions

What market types do you support?

Broadly, we support all binary and multiple-choice markets on all supported platforms. There are a few asterisks around this, however.

For platforms like Kalshi and Polymarket where all markets are binary, the process is straightforward. Simple yes/no questions are extracted as-is, with the probability based on the price of the YES side. On these platforms, question groups are actually constructed out of binary markets (usually in the form of "Will team X win game Y?" or "Will metric X be greater than Y at time Z?") which makes extraction straightforward.

Manifold and Metaculus have a number of different market types, which they use for question groups and continuous spreads.

For dependent multiple-choice markets, where there are many potential outcomes but only one outcome can be selected (e.g. "Which film will win best picture at the 2025 Oscars?"). For those markets, we track the winner and use that option's probability over time as the market probability.
On Manifold, dependent multiple-choice markets can actually resolve to multiple options as long as their resolution values sum to one. We do not currently evaluate markets that resolve in this way.
For independent multiple choice markets, where any number of options can be selected, we decompose all options into their own binary markets. Each option has its own probability history and statistics. Currently only Manifold has this type, everyone else implements this concept as groups of binary markets.
Both Manifold and Metaculus have a few versions of continuous markets, which predict a numeric value or date. For Manifold this includes "PseudoNumeric", "Number", "MultiNumeric", and "Date". For Metaculus this includes "Numeric" and "Date". We do not currently evaluate these markets, but we have plans to do so in the future.
Finally, Metaculus has the special "Conditional" type. They are not currently supported but we have plans to implement this type in the future.

We used to assume that the implied probability of a market before the first trade was 50%, since that is how the probaility is often shown on each platform frontend. However, this caused problems if a significant amount of time elapsed between market creation and the first trade or if there were no trades whatsoever. Now, we consider the market to have no probability until the first trade. We also ignore any market that has zero trades for the same reason.

For traditional market sites, the implied probability is equivalent to the price of one YES share (where payout would be $1 if it resolves YES). For sites that aggregate predictions in other ways, we follow the probability that they display most prominently. For Metaculus this is the community prediction, which is exposed as recency_weighted in the API.

We do not evaluate non-market items from any platform, such as bounties, posts, or non-forecasting polls. Our downloader runs nightly, and notifies us of any new or unrecognized market types so we can implement them as quickly as possible.

How do you match markets together?

When we first started working on the Calibration City site, we realized how different each prediction market platform was. The apples-to-oranges problem has reared its head many times and we were certainly not the first to realize this.

Our goal with the matching process is to find equivalent markets across platforms, usually by targeting one of the following situations.

There are some events that most prediction markets are already tracking, such as elections or sports tournaments. These are usually high-volume, popular markets, but can have subtle differences in resolution criteria.
Some platforms will create "mirrors" of existing markets from other platforms, allowing users to see differences and arbitrate. These are nice because they will always resolve the same, but are usually limited to platforms that allow users to make their own markets.
Occasionally, external groups will make predictions or encourage others to predict specific questions. Some of these, such as the ACX prediction content, have a wide range of valuable questions that share resolution criteria. Even better, some of these have awards that encourage good turnout.

After downloading items from each platform's API, we use a couple techniques to try to find these markets. We generate embeddings to find similar markets, then refine matches using tags, keywords, duration overlap, and other heuristics.

The final decision as to whether two markets are "equivalent" can be surprisingly difficult. For instance, there can be two markets that resolve on December 31st vs January 1st, or use two different news sources, or have any other number of slight variations that make them not 100% equivalent. In order to stay in the spirit of the concept, we allow grouping markets as long as these differences wouldn't have more than a 1% chance of changing the resolution.

All matches are picked and approved by real people. We do our best, but there may be some mistakes. Contact us if you think there's an issue with a market link, or if you have a suggestion for additional market links.

How are the scores calculated?

Currently we have two main types of scores: absolute and relative scores.

Absolute Scores

All absolute scores are calculated based on a criterion probability and scored using a scoring rule. The criterion probability is what we reference as the market's "prediction", and can be at a specific point in time like the midpoint or 30 days before resolution, or it could be an aggregation such as the time-weighted average probability.

The Brier score is fairly intuitive, with better scores closer to zero and worse scores closer to one. Random guesses tend towards a score of 0.25 with superforecasters around 0.10. With any market's criterion probability $p$ and the market's resolution $r$ , we can calculate the Brier score with:

\text{Brier Score} = \begin{cases} ~(p-1)^2 &\text{if } r = 1 \\ ~p^2 &\text{if } r = 0 \\ ~(p-r)^2 &\text{if } 0 \leq r \leq 1 \end{cases}

(p-r)^2

The logarithmic score is another strictly proper scoring rule, but with better scores closer to zero and worse scores closer to negative infinity. Predictions far from the correct resolution are punished extremely hard with this scoring rule. With any market's criterion probability $p$ and resolution $r$ , we can calculate the logarithmic score with:

\text{Logarithmic Score} = \begin{cases} ~ln(p)&\text{if } r = 1 \\ ~ln(1-p) &\text{if } r = 0 \\ ~r \cdot ln(p) + (1-r) \cdot ln(1-p) &\text{if } 0 \leq r \leq 1 \end{cases}

r \cdot ln(p) + (1-r) \cdot ln(1-p)

The spherical score is a third strictly proper scoring rule. Better scores tend towards one but the worst possible score is only zero, but the vast majority of predictions will fall very high on this scale (between 0.99 and 1.0) so differentiation is difficult. With any market's criterion probability $p$ and resolution $r$ , we can calculate the spherical score with:

\text{Spherical Score} = \begin{cases} ~\displaystyle\frac{p}{\sqrt{p^2+(1-p)^2}} &\text{if } r = 1 \\ ~\displaystyle\frac{1-p}{\sqrt{p^2+(1-p)^2}} &\text{if } r = 0 \\ ~\displaystyle\frac{r \cdot p+(1-r) \cdot (1-p)}{\sqrt{p^2+(1-p)^2}} &\text{if } 0 \leq r \leq 1 \end{cases}

\displaystyle\frac{r \cdot p+(1-r) \cdot (1-p)}{\sqrt{p^2+(1-p)^2}}

Relative Scores

Relative scores are calculated based on the performance of the market relative to other markets. They provide a measure of how well the market has performed compared to its peers. These are only present for markets that are linked in a question, since they are scored against the other markets in that question.

We calculate a relative score with each scoring rule, which you can find on the individual question pages. The overall process is the exact same between each, with the only difference being the scoring rule used. Scoring rules where a lower score is better will be evaluated the same way for a relative score, so a lower relative Brier score is better while a higher relative logarithmic score is better.

The process to calculate relative scores for a group of markets starts by determining the scoring period. We choose to score groups for the duration where at least two markets are open. In some situations, we also override the start or end dates so that the scoring period does not include days where the outcome was already known.

For each day in the scoring period, we calculate each market's score using a scoring rule (Brier, log, etc.) and, from those, calculate the median score. We then subtract the median from each market's daily score and save it as the daily relative score.

Finally, we sum all of the daily relative scores for each market and divide that by the total number of days in the scoring period. Note that this is not a simple average! For markets that were not open for the entire scoring period, their sum is being divided by more days than they had values for. This means that a market that otherwise performed the same as another but was open for less time will have a relative score closer to zero. Also note that relative scores can be both positive and negative, since this is the difference from a median score.

One way to represent this score for each market would be:

\displaystyle\text{Relative Score} = \frac{\sum_{i=1}^{n} (s_i - m_i)}{n}

\displaystyle\frac{\sum_{i=1}^{n} (s_i - m_i)}{n}

Where $s_i$ is the market's score on day $i$ , $m_i$ is the median score on day $i$ , and $n$ is the number of days we're scoring.

You can learn more about relative scores at the following sources:

How are the letter grades calculated?

The letter grades are intended to be an easy-to-read, intuitive representation of how well the market has performed on a specific axis. Each score (e.g. Brier score at market midpoint, spherical score one month before close, etc.) has a corresponding letter grade, which is determined by comparing the score to a set of predefined thresholds.

The thresholds for absolute scores are:

Grade	Brier Score	Logarithmic Score	Spherical Score	Probability Margin
S	0.0000 to 0.0001	0.0000 to -0.0101	1.0000 to 0.9999	0.0000 to 0.0100
A+	0.0001 to 0.0009	-0.0101 to -0.0305	0.9999 to 0.9995	0.0100 to 0.0300
A	0.0009 to 0.0018	-0.0305 to -0.0434	0.9995 to 0.9990	0.0300 to 0.0424
A-	0.0018 to 0.0022	-0.0434 to -0.0480	0.9990 to 0.9988	0.0424 to 0.0469
B+	0.0022 to 0.0030	-0.0480 to -0.0563	0.9988 to 0.9983	0.0469 to 0.0548
B	0.0030 to 0.0045	-0.0563 to -0.0694	0.9983 to 0.9974	0.0548 to 0.0671
B-	0.0045 to 0.0055	-0.0694 to -0.0771	0.9974 to 0.9968	0.0671 to 0.0742
C+	0.0055 to 0.0075	-0.0771 to -0.0906	0.9968 to 0.9955	0.0742 to 0.0866
C	0.0075 to 0.0150	-0.0906 to -0.1306	0.9955 to 0.9904	0.0866 to 0.1225
C-	0.0150 to 0.0250	-0.1306 to -0.1721	0.9904 to 0.9828	0.1225 to 0.1581
D+	0.0250 to 0.0500	-0.1721 to -0.2531	0.9828 to 0.9609	0.1581 to 0.2236
D	0.0500 to 0.1100	-0.2531 to -0.4030	0.9609 to 0.8958	0.2236 to 0.3317
D-	0.1100 to 0.2500	-0.4030 to -0.6931	0.8958 to 0.7071	0.3317 to 0.5000
F	0.2500 to 1.0000	-0.6931 to -3.4028235e+38	0.7071 to 0.0000	0.5000 to 1.0000

Brier Score:0.0000 to 0.0001

Logarithmic Score:0.0000 to -0.0101

Spherical Score:1.0000 to 0.9999

Probability Margin:0.0000 to 0.0100

A+

Brier Score:0.0001 to 0.0009

Logarithmic Score:-0.0101 to -0.0305

Spherical Score:0.9999 to 0.9995

Probability Margin:0.0100 to 0.0300

Brier Score:0.0009 to 0.0018

Logarithmic Score:-0.0305 to -0.0434

Spherical Score:0.9995 to 0.9990

Probability Margin:0.0300 to 0.0424

A-

Brier Score:0.0018 to 0.0022

Logarithmic Score:-0.0434 to -0.0480

Spherical Score:0.9990 to 0.9988

Probability Margin:0.0424 to 0.0469

B+

Brier Score:0.0022 to 0.0030

Logarithmic Score:-0.0480 to -0.0563

Spherical Score:0.9988 to 0.9983

Probability Margin:0.0469 to 0.0548

Brier Score:0.0030 to 0.0045

Logarithmic Score:-0.0563 to -0.0694

Spherical Score:0.9983 to 0.9974

Probability Margin:0.0548 to 0.0671

B-

Brier Score:0.0045 to 0.0055

Logarithmic Score:-0.0694 to -0.0771

Spherical Score:0.9974 to 0.9968

Probability Margin:0.0671 to 0.0742

C+

Brier Score:0.0055 to 0.0075

Logarithmic Score:-0.0771 to -0.0906

Spherical Score:0.9968 to 0.9955

Probability Margin:0.0742 to 0.0866

Brier Score:0.0075 to 0.0150

Logarithmic Score:-0.0906 to -0.1306

Spherical Score:0.9955 to 0.9904

Probability Margin:0.0866 to 0.1225

C-

Brier Score:0.0150 to 0.0250

Logarithmic Score:-0.1306 to -0.1721

Spherical Score:0.9904 to 0.9828

Probability Margin:0.1225 to 0.1581

D+

Brier Score:0.0250 to 0.0500

Logarithmic Score:-0.1721 to -0.2531

Spherical Score:0.9828 to 0.9609

Probability Margin:0.1581 to 0.2236

Brier Score:0.0500 to 0.1100

Logarithmic Score:-0.2531 to -0.4030

Spherical Score:0.9609 to 0.8958

Probability Margin:0.2236 to 0.3317

D-

Brier Score:0.1100 to 0.2500

Logarithmic Score:-0.4030 to -0.6931

Spherical Score:0.8958 to 0.7071

Probability Margin:0.3317 to 0.5000

Brier Score:0.2500 to 1.0000

Logarithmic Score:-0.6931 to -3.4028235e+38

Spherical Score:0.7071 to 0.0000

Probability Margin:0.5000 to 1.0000

The thresholds for relative scores are a little different. The relative scoring algorithm we use results in a lot of scores very close to zero with a sharp dropoff and roughly-symmetrical curve on either side. We calculate our grade cutoffs so that C+ is centered at zero, with widths based on the deviations of the scores.

The thresholds for relative scores can be found on GitHub for now while we continue to add questions and tweak the grades in response.

Who made this?

wasabipesto, lead developer

Hi, I'm wasabipesto. You can find me at wasabipesto.com, or on GitHub at github.com/wasabipesto. I don't have a twitter, don't look for me there. If you want to contact me directly, you can email me at contact@wasabipesto.com.

How can I support development of this site?

If you find this site useful, please share it with others! We believe prediction markets are valuable, but only if their accuracy is verified and understood. We believe prediction markets should be evaluated rigorously and publicly, showing both their strengths and weaknesses, in order to earn credibility.

Please also share your feedback about the site with us. Currently our focus is on improving the site, making it more intuitive to use while also adding new features that give valuable insights. We're specifically interested in:

Did any of the explanations fail to make sense? Was there anything confusing, contradictory, or misleading? Were there any claims that felt weak or unsupported?
Is the site missing a market platform or category that you're interested in? Any topic that isn't covered in our questions?
Does it feel like the site is missing a certain metric or statistic? Is there anything else you feel we could do with this data?

You can use the form below to submit this or any other feedback.

If you want to support the site financially, you can donate via GitHub Sponsors. For $5 per month you can have your name listed on the site as a supporter.

Contact

Getting in Touch

Having an issue with the site? See a bug or a typo? Found a set of markets that aren't covered here? Do you need help accessing the data for a research project? Here's how you can contact us:

For issues with the site, backend API, data downloaders or extractors, or anying else to do with code, please submit an issue on GitHub. Also please submit an issue if you have ideas for new features or improvements.
If you're interested in contributing to the project, please check out the Good First Issue or Help Wanted tags on GitHub Issues. There are usually some features we're looking to implement or problems that need solving. If you have an idea for a feature not already listed, please get in touch first so we can discuss it before spending time on a pull request.
For questions about the site or assistance with the data, or for any other inquiry, feel free to reach out to us via email at contact@brier.fyi. This is likely the fastest way to get in touch with us.

And finally, here's a handy form for anything else:

Links & Resources

Prediction Market Explanations

Astral Codex Ten: Prediction Market FAQ

Scott Alexander gives a summary of what prediction markets are, their fundamental qualities, and common objections. It's excellent and super easy to read - if you read anything from this list, it should be this one.

Wikipedia: Prediction market

The obligatory Wikipedia page on the topic. It has a good overview and timeline of the recent history, but not much else.

Prediction Markets are not Polls

A common refrain to prediction markets is that they're "just polls of random people on the internet". Isaac King puts together a great rebuttal with examples as to why this is not the case.

Prediction Markets: When Do They Work?

In an older post (from 2018), Zvi discusses some situations where prediction markets thrive, and some where they don't. There are many more markets today, but I believe the basis of this post still holds.

Previous Aggregation & Comparison Research

First Sigma: What can we learn from scoring different election forecasts?

Jack compares head-to-head performance between Metaculus, 538, Manifold, Polymarket, EBO, and PredictIt on the US 2022 midterm elections. Metaculus and 538 took the lead but with a small sample size.

EA Forum: Predictive Performance on Metaculus vs. Manifold Markets

A direct comparison of 64 binary markets mirrored between Manifold and Metaculus. Metaculus had a better score on 75% of the questions.

JHK Forecasts: Forecast Database

Jack Kersting (a different Jack) assembled an impressive list of US election forecasts across dozens of predictors from 2016 to 2024. While this isn't comparing prediction markets, it's still a good example of prediction metrics.

Platform Self-Scoring

Many platforms will score themselves and publish their results, or users will create dashboards similar to this site with the API or blockchain data. We took a lot of inspiration from these sites when creating our standardized scoring format and inspiration for our charts.

Scoring Rules

Wikipedia: Scoring Rule

The Wikipedia page on scoring rules. It was a great starting point for our research due to covering many different score types.

Cultivate Labs: Relative Brier Scores

This post from Cultivate Labs describing their relative scoring system is really the basis for this site. It describes the method of creating relative scores based on the daily mean score, with the added twist of penalizing forecasters that start predicting later.

Eigil Fjeldgren Rischel: Against calibration

A good summation of calibration versus accuracy, mainly that calibration can be applied more broadly but is less meaningful. This is exactly why we switched focus on this site away from calibration!

Scientific Articles

Prediction market accuracy in the long run

A comparison of the performance of Iowa Electronic Markets (a small academic platform) versus contemporary polls. Between 1988 to 2004, the markets outperformed polls 74% of the time.

How manipulable are prediction markets?

A team attempts to manipulate 817 random markets on Manifold in early 2024 and finds that the manipulations were somewhat reversed after 7 days, moreso after 30 days.

Miscellaneous

Metaforecast

Metaforecast was created by Nuño Sempere (and now maintained by QURI) to be a search engine for prediction markets from over a dozen platforms. One search bar to find open predictions from basically anywhere.

Saul Munn: Prediction Market Map

If you're interested in exploring everything there is to know about prediction markets, Saul keeps a categorized list of resources on Notion.