Prediction Methodology
This page explains how CS2PREDICT generates match predictions, what data we use, how models are evaluated, and what the accuracy metrics mean. Our approach prioritizes transparency — every prediction is auditable, every model's track record is public.
Data Sources
All match data comes from HLTV.org, the authoritative source for professional CS2 statistics. We collect:
- Match schedules — upcoming matches with teams, format (bo1/bo3/bo5), and tournament info
- Match results — final scores, map scores, round-by-round data
- Team rankings — HLTV top-30 world rankings, updated weekly
- Player statistics — individual player ratings, K/D ratios, and recent performance
- Head-to-head records — historical results between specific team matchups
Our database contains 15,000+ matches and tracks 860+ teams with 2,900+ players. Data refreshes every 5 minutes.
Feature Engineering
For each match, we compute 50+ features that capture different aspects of competitive performance:
| Category | Features | Description |
|---|---|---|
| ELO Ratings | ELO, ELOHQ, TierELO, Tier2ELO, Tier3ELO | Multiple ELO systems with different K-factors and pools. Standard ELO tracks all matches; TierELO variants focus on top-30 teams where signal-to-noise ratio is highest. |
| Form | Win rate (last 10), momentum, streak | Recent performance trajectory. Momentum captures whether a team is trending up or down. Streak detects winning/losing runs. |
| Head-to-Head | H2H win rate, H2H matches count | Historical record between the two specific teams. Weighted toward recent encounters. |
| Strength of Schedule | SOS rating, win rate vs top-10 | How strong a team's recent opponents have been. A team beating top-10 opponents carries more signal than beating unranked teams. |
| Psychology | Tilt factor, recovery rate | Mental state indicators. Teams on losing streaks may underperform their skill level. Some teams recover quickly from losses, others don't. |
| Player-level | Average player rating, rating variance | Individual player performance aggregated at team level. Accounts for roster changes and stand-ins. |
Model Architecture
CS2PREDICT runs 27 independent models simultaneously. Each model uses a different subset of features and/or a different algorithm to produce a win probability. We deliberately avoid a single "best" model approach because:
- Different models excel in different contexts — ELO-based models work best for well-known teams; form-based models capture hot/cold streaks better
- Model disagreement is informative — when models disagree, the match is likely unpredictable; when they agree, confidence is warranted
- Transparency over accuracy — showing multiple perspectives helps users form their own judgment rather than blindly trusting one number
All models use logistic regression or ensemble methods. We intentionally avoid deep learning — our models are fully interpretable, and every prediction can be traced back to specific input features.
Model Categories
- Base models (21) — each produces predictions from primary features (ELO, OPEN, PRO, MAJOR, MIND, FORM, SOS, WR10, STREAK, etc.)
- Meta-models (6) — consensus filters that activate only when multiple base models agree strongly (VANGUARD, SENTINEL, STRONGHOLD, HIGHORACLE, NEXUS, TRIBUNAL). These have higher accuracy but lower coverage.
Evaluation Metrics
We evaluate model quality using multiple metrics, not just accuracy:
| Metric | What It Measures | Good Value |
|---|---|---|
| Accuracy | Percentage of correct winner predictions (excluding 48-52% range) | > 65% |
| Brier Score | Mean squared error of predicted probabilities vs actual outcomes. Measures calibration quality. | < 0.20 (0.25 = coin flip) |
| Log Loss | Penalizes confident wrong predictions heavily. A model that says 90% and is wrong gets punished much more than one that says 55%. | < 0.55 |
| Sharp | Accuracy when model confidence exceeds 80%. Shows whether high-confidence predictions are reliable. | > 75% |
| Calibration | When a model says 70%, does the team actually win 70% of the time? Ideal: predicted probability matches observed frequency. | Diagonal on calibration plot |
Match Tier Classification
Matches are classified into tiers based on tournament importance and team quality:
- Top — Premier events (PGL Major, IEM, BLAST), 5-star matches, top-ranked teams. Model accuracy is evaluated primarily on this tier.
- Major — Major qualifiers, premium events with strong teams.
- Other — Regional online matches, lower-tier tournaments. Less data available, predictions less reliable.
Calibrating Models
New models must accumulate at least 20 evaluated predictions on top-tier matches before they appear in the leaderboard ranking. During calibration, accuracy statistics are displayed but considered preliminary. This prevents ranking models based on insufficient data.
Limitations
- Models cannot account for roster changes announced after prediction generation
- Map veto/pick strategies are not modeled (we predict bo-series outcomes, not individual maps)
- Player health, motivation, and external factors are not captured
- "Other" tier matches have less historical data and lower prediction reliability
- Past accuracy does not guarantee future performance — models are continuously monitored and updated
For technical questions about our methodology, contact us through the support system. For the complete model leaderboard with current accuracy, visit Models.