Prediction Methodology

This page explains how CS2PREDICT generates match predictions, what data we use, how models are evaluated, and what the accuracy metrics mean. Our approach prioritizes transparency — every prediction is auditable, every model's track record is public.

Data Sources

All match data comes from HLTV.org, the authoritative source for professional CS2 statistics. We collect:

Match schedules — upcoming matches with teams, format (bo1/bo3/bo5), and tournament info
Match results — final scores, map scores, round-by-round data
Team rankings — HLTV top-30 world rankings, updated weekly
Player statistics — individual player ratings, K/D ratios, and recent performance
Head-to-head records — historical results between specific team matchups

Our database contains 15,000+ matches and tracks 860+ teams with 2,900+ players. Data refreshes every 5 minutes.

Feature Engineering

For each match, we compute 50+ features that capture different aspects of competitive performance:

Category	Features	Description
ELO Ratings	ELO, ELOHQ, TierELO, Tier2ELO, Tier3ELO	Multiple ELO systems with different K-factors and pools. Standard ELO tracks all matches; TierELO variants focus on top-30 teams where signal-to-noise ratio is highest.
Form	Win rate (last 10), momentum, streak	Recent performance trajectory. Momentum captures whether a team is trending up or down. Streak detects winning/losing runs.
Head-to-Head	H2H win rate, H2H matches count	Historical record between the two specific teams. Weighted toward recent encounters.
Strength of Schedule	SOS rating, win rate vs top-10	How strong a team's recent opponents have been. A team beating top-10 opponents carries more signal than beating unranked teams.
Psychology	Tilt factor, recovery rate	Mental state indicators. Teams on losing streaks may underperform their skill level. Some teams recover quickly from losses, others don't.
Player-level	Average player rating, rating variance	Individual player performance aggregated at team level. Accounts for roster changes and stand-ins.

Model Architecture

CS2PREDICT runs 27 independent models simultaneously. Each model uses a different subset of features and/or a different algorithm to produce a win probability. We deliberately avoid a single "best" model approach because:

Different models excel in different contexts — ELO-based models work best for well-known teams; form-based models capture hot/cold streaks better
Model disagreement is informative — when models disagree, the match is likely unpredictable; when they agree, confidence is warranted
Transparency over accuracy — showing multiple perspectives helps users form their own judgment rather than blindly trusting one number

All models use logistic regression or ensemble methods. We intentionally avoid deep learning — our models are fully interpretable, and every prediction can be traced back to specific input features.

Model Categories

Base models (21) — each produces predictions from primary features (ELO, OPEN, PRO, MAJOR, MIND, FORM, SOS, WR10, STREAK, etc.)
Meta-models (6) — consensus filters that activate only when multiple base models agree strongly (VANGUARD, SENTINEL, STRONGHOLD, HIGHORACLE, NEXUS, TRIBUNAL). These have higher accuracy but lower coverage.

Evaluation Metrics

We evaluate model quality using multiple metrics, not just accuracy:

Metric	What It Measures	Good Value
Accuracy	Percentage of correct winner predictions (excluding 48-52% range)	> 65%
Brier Score	Mean squared error of predicted probabilities vs actual outcomes. Measures calibration quality.	< 0.20 (0.25 = coin flip)
Log Loss	Penalizes confident wrong predictions heavily. A model that says 90% and is wrong gets punished much more than one that says 55%.	< 0.55
Sharp	Accuracy when model confidence exceeds 80%. Shows whether high-confidence predictions are reliable.	> 75%
Calibration	When a model says 70%, does the team actually win 70% of the time? Ideal: predicted probability matches observed frequency.	Diagonal on calibration plot

Match Tier Classification

Matches are classified into tiers based on tournament importance and team quality:

Top — Premier events (PGL Major, IEM, BLAST), 5-star matches, top-ranked teams. Model accuracy is evaluated primarily on this tier.
Major — Major qualifiers, premium events with strong teams.
Other — Regional online matches, lower-tier tournaments. Less data available, predictions less reliable.

Calibrating Models

New models must accumulate at least 20 evaluated predictions on top-tier matches before they appear in the leaderboard ranking. During calibration, accuracy statistics are displayed but considered preliminary. This prevents ranking models based on insufficient data.

Limitations

Models cannot account for roster changes announced after prediction generation
Map veto/pick strategies are not modeled (we predict bo-series outcomes, not individual maps)
Player health, motivation, and external factors are not captured
"Other" tier matches have less historical data and lower prediction reliability
Past accuracy does not guarantee future performance — models are continuously monitored and updated

For technical questions about our methodology, contact us through the support system. For the complete model leaderboard with current accuracy, visit Models.

CS2PREDCT.gg