A transparent look at our machine learning pipeline — what data we use, how we engineer features, and how we validate everything.
Match results, standings, fixtures
7 seasons of match results, league standings, and fixture scheduling across all 8 leagues. Our primary source for historical outcomes and team data.
Bookmaker odds from 20+ bookmakers
Real-time odds from 20+ bookmakers worldwide. We compute implied probabilities and detect market inefficiencies where our model disagrees with the consensus.
Expected goals (xG) per match
Expected goals, shot data, and advanced match statistics. xG measures shot quality rather than quantity — one of the strongest predictive signals in football analytics.
Confirmed lineups T-60min
Starting lineups and confirmed team sheets, typically available 60 minutes before kickoff. Lets the model account for key player absences and tactical changes.
Raw data from 4 sources is transformed into over 100 predictive features per match, grouped into 7 families:
Last 5/10 matches, home vs away split
Point-in-time, updated after each match
xG scored and conceded
Starting XI, key players missing
Implied probabilities, market efficiency
Points, position, goal difference
Last 5 meetings
Gradient boosting algorithm
We use LightGBM — a gradient boosting algorithm proven in sports prediction research. Trained separately per league because Bundesliga and La Liga have different dynamics.
Last updated: 12 April 2026
| League | Accuracy |
|---|---|
| 52.5% | |
| PL | 50.3% |
| PD | 49.0% |
| BL1 | 55.6% |
| SA | 55.5% |