Prediction Methodology
This page explains how CS2PREDICT generates match predictions, what data we use, how models are evaluated, and what the accuracy metrics mean. Our approach prioritizes transparency — every prediction is auditable, every model's track record is public.
Data Sources
All match data comes from HLTV.org, the authoritative source for professional CS2 statistics. We collect:
- Match schedules — upcoming matches with teams, format (bo1/bo3/bo5), and tournament info
- Match results — final scores, map scores, round-by-round data
- Team rankings — HLTV top-30 world rankings, updated weekly
- Player statistics — individual player ratings, K/D ratios, and recent performance
- Head-to-head records — historical results between specific team matchups
Our database contains 15,000+ matches and tracks 860+ teams with 2,900+ players. Data refreshes every 5 minutes.
Feature Engineering
For each match, we compute 50+ features that capture different aspects of competitive performance:
| Category | Features | Description |
|---|---|---|
| ELO Ratings | ELO, ELOHQ, TierELO, Tier2ELO, Tier3ELO | Multiple ELO systems with different K-factors and pools. Standard ELO tracks all matches; TierELO variants focus on top-30 teams where signal-to-noise ratio is highest. |
| Form | Win rate (last 10), momentum, streak | Recent performance trajectory. Momentum captures whether a team is trending up or down. Streak detects winning/losing runs. |
| Head-to-Head | H2H win rate, H2H matches count | Historical record between the two specific teams. Weighted toward recent encounters. |
| Strength of Schedule | SOS rating, win rate vs top-10 | How strong a team's recent opponents have been. A team beating top-10 opponents carries more signal than beating unranked teams. |
| Psychology | Tilt factor, recovery rate | Mental state indicators. Teams on losing streaks may underperform their skill level. Some teams recover quickly from losses, others don't. |
| Player-level | Average player rating, rating variance | Individual player performance aggregated at team level. Accounts for roster changes and stand-ins. |
Model Architecture
CS2PREDICT runs 27 independent models simultaneously. Each model uses a different subset of features and/or a different algorithm to produce a win probability. We deliberately avoid a single "best" model approach because:
- Different models excel in different contexts — ELO-based models work best for well-known teams; form-based models capture hot/cold streaks better
- Model disagreement is informative — when models disagree, the match is likely unpredictable; when they agree, confidence is warranted
- Transparency over accuracy — showing multiple perspectives helps users form their own judgment rather than blindly trusting one number
All models use logistic regression or ensemble methods. We intentionally avoid deep learning — our models are fully interpretable, and every prediction can be traced back to specific input features.
Model Categories
- Base models (21) — each produces predictions from primary features (ELO, OPEN, PRO, MAJOR, MIND, FORM, SOS, WR10, STREAK, etc.)
- Meta-models (6) — consensus filters that activate only when multiple base models agree strongly (VANGUARD, SENTINEL, STRONGHOLD, HIGHORACLE, NEXUS, TRIBUNAL). These have higher accuracy but lower coverage.
All 27 Models — Detailed Reference
ELO — Classic ELO rating system adapted for CS2. Each team starts at 1000. After each match, ratings shift based on the expected vs actual outcome. Larger upsets cause bigger rating changes. ELO captures overall team strength but reacts slowly to roster changes. Accuracy is stable across all match tiers.
ELOHQ — High-quality ELO variant with K-factor 44 and scale 2000. Reacts faster to upsets than standard ELO and converges quicker for teams with few matches. Designed for scenarios where rapid form changes matter — new rosters, post-break returns, or teams coming off long offline periods.
ELOFACE — ELO enhanced with training activity signals. Teams that practice more before a match tend to perform better. This model blends the standard ELO probability with a training activity logit, giving slight edges to teams with higher recent practice hours.
TIERELO — ELO computed only from matches involving top-30 ranked teams. By excluding lower-tier noise, this model provides cleaner signals for premium matchups where both teams have extensive competitive history at the highest level.
TIER2ELO / TIER3ELO — Additional tier-restricted ELO pools with different K-factors and scales. TIER2ELO uses K=32 with scale 2500; TIER3ELO uses scale 4000 for sharper separation. Both focus on top-team matchups with different sensitivity to upsets.
OPEN — The broadest model: combines HLTV ranking, momentum, win rate, and head-to-head record. Works across all match tiers including regional and online matches. Designed as a reliable baseline — not the most accurate for top-tier, but consistent everywhere.
PRO — Tuned for top-30 matchups. Emphasizes ranking and head-to-head record while reducing weight on momentum. Falls back to OPEN when teams lack ranking or H2H data. Best for PGL, ESL, BLAST, and IEM tournament matches.
MAJOR — Optimized for Major tournament matches where head-to-head history carries the most weight. Similar to PRO but with stronger H2H emphasis — teams that historically dominate a rival maintain that edge in high-stakes environments.
MIND — Psychology model analyzing tilt and mental state. Tracks how teams respond to losing streaks, comeback patterns, and performance under pressure. Uses tilt differential and H2H psychology to identify teams likely to underperform or overperform their skill level.
FORM — Pure recent form based on win rate over the last 10 matches. Confidence-weighted: teams with fewer recent matches shrink toward 50%. No ranking, no H2H — purely "who is winning more right now." Captures hot/cold streaks that ELO misses.
SQUAD — Player-level model that aggregates individual player ratings from the last 10 matches per player. A team of 5 players averaging 1.15 rating will outscore a team averaging 0.95, even if the teams have similar ELO. Accounts for stand-ins and recent roster changes.
SOS — Strength of Schedule model combining ranking, form, and opponent quality. A team with a 70% win rate against top-10 opponents is stronger than one with 70% against unranked teams. SOS normalizes performance by opponent difficulty.
WR10 — Win rate against top-10 opponents combined with ranking and H2H. Focuses on performance in the hardest matches — beating top teams is a stronger signal than beating weaker ones. Particularly effective for predicting upsets.
STREAK — Tracks winning and losing streaks with ranking and H2H context. A team on a 5-match win streak has momentum; a team that just lost 3 in a row may be tilting. Streak thresholds trigger confidence adjustments.
WR10PURE — Pure win rate vs top-10 without any ranking adjustment. The rawest measure of elite performance — how often does this team actually beat the best teams? Low coverage but high signal when both teams have top-10 matchup data.
WR10COMBO — Combines win rate vs top-10 with ELO ratings and recent form. A blended model that uses elite performance data alongside broader metrics. Balances the specificity of WR10 with the generality of ELO.
APEX — Optimized ensemble combining ELO, momentum, streak, and strength of schedule. Walk-forward tuned on historical data to maximize accuracy. One of the strongest single-signal models for top-tier matches.
PHANTOM — Maximum-accuracy model combining ELO, TierELO, win rate, and tilt. Uses the widest feature set of any single model. Achieves highest accuracy but requires all input features to be available — lower coverage on matches with unknown teams.
ORACLE — Blends ELO, ELOHQ, momentum, tilt, and strength of schedule. Designed for balanced predictions across multiple match contexts. The most versatile single model in the ensemble.
EAGLE — Combines ranking, ELO, ELOHQ, momentum, and win rate. High accuracy when confident but selective — only produces strong predictions for matches with rich data on both sides.
VANGUARD — Meta-model consensus filter. Activates only when both ELO and WR10 agree strongly (both above 65% for the same team). When two independent approaches converge, confidence is high. Low coverage but historically very accurate.
SENTINEL — Anti-signal consensus model. Activates when SOS strongly favors one team (above 60%) AND the MIND model disagrees (below 48%). The idea: when psychology says one thing but fundamentals say another, trust the fundamentals. The "anti-tilt" model.
STRONGHOLD — Consensus filter requiring ELO and SOS to both strongly agree (above 60% same direction). When skill rating and schedule strength align, the prediction is reinforced.
HIGHORACLE — Oracle model filtered to high-confidence predictions only (above 67%). Drops all Oracle predictions below the confidence threshold. Fewer predictions, but the ones it makes carry stronger conviction.
NEXUS — Consensus of APEX and ORACLE when both strongly agree (above 65% same direction). Two of the strongest individual models confirming each other creates a high-confidence signal.
TRIBUNAL — Three-model consensus requiring OPEN, PRO, and ORACLE to all agree (above 55% same direction). The broadest consensus model — when three different analytical approaches converge, the signal is robust across match contexts.
Evaluation Metrics
We evaluate model quality using multiple metrics, not just accuracy:
| Metric | What It Measures | Good Value |
|---|---|---|
| Accuracy | Percentage of correct winner predictions (excluding 48-52% range) | > 65% |
| Brier Score | Mean squared error of predicted probabilities vs actual outcomes. Measures calibration quality. | < 0.20 (0.25 = coin flip) |
| Log Loss | Penalizes confident wrong predictions heavily. A model that says 90% and is wrong gets punished much more than one that says 55%. | < 0.55 |
| Sharp | Accuracy when model confidence exceeds 80%. Shows whether high-confidence predictions are reliable. | > 75% |
| Calibration | When a model says 70%, does the team actually win 70% of the time? Ideal: predicted probability matches observed frequency. | Diagonal on calibration plot |
Match Tier Classification
Matches are classified into tiers based on tournament importance and team quality:
- Top — Premier events (PGL Major, IEM, BLAST), 5-star matches, top-ranked teams. Model accuracy is evaluated primarily on this tier.
- Major — Major qualifiers, premium events with strong teams.
- Other — Regional online matches, lower-tier tournaments. Less data available, predictions less reliable.
Calibrating Models
New models must accumulate at least 20 evaluated predictions on top-tier matches before they appear in the leaderboard ranking. During calibration, accuracy statistics are displayed but considered preliminary. This prevents ranking models based on insufficient data.
Limitations
- Models cannot account for roster changes announced after prediction generation
- Map veto/pick strategies are not modeled (we predict bo-series outcomes, not individual maps)
- Player health, motivation, and external factors are not captured
- "Other" tier matches have less historical data and lower prediction reliability
- Past accuracy does not guarantee future performance — models are continuously monitored and updated
For technical questions about our methodology, contact us through the support system. For the complete model leaderboard with current accuracy, visit Models.