Brier Score measures how good probability predictions are, not just whether the winner was guessed correctly. It ranges from 0 (perfect) to 1 (worst possible).
Formula: Average of (predicted probability - actual outcome)² across all predictions.
Consider two models predicting the same 10 matches:
Brier Score captures this: Model B has a much better (lower) Brier Score because its confident predictions are correct.
| 0.25 | Random coin flip (baseline) |
| 0.20-0.25 | Poor — barely better than random |
| 0.15-0.20 | Good — meaningful prediction quality |
| 0.10-0.15 | Very good — strong predictive power |
| <0.10 | Excellent — rare in esports |
We also track "Sharp" — accuracy specifically when the model is highly confident (80%+). A model with 85% Sharp score means when it's very confident, it's right 85% of the time. This is what matters for high-stakes decisions.