How It Works

How our AI predicts football matches

From raw data to actionable predictions in 5 steps. No black box — here's exactly how the pipeline works, what data we use, and how we validate results.

The prediction pipeline

STEP 01

Data collection

Every 3-4 days, our pipeline pulls fresh data from 3 independent sources: match results, standings, and fixture schedules from football-data.org; real-time bookmaker odds from The Odds API (20+ bookmakers); and expected goals (xG), shot data, and advanced stats from understat.com.

STEP 02

Feature engineering

Raw data is transformed into 70+ predictive features per match. These include rolling form metrics (last 5/10 games), ELO ratings updated after every match, expected goals differentials, head-to-head records, home/away performance splits, market odds implied probabilities, and more.

STEP 03

Model training

We train separate LightGBM gradient-boosted models for each league. The Bundesliga is high-scoring and suits aggressive models; Serie A is defensive and rewards conservative predictions. Per-league training captures these differences. Hyperparameters are tuned using Optuna with 100-200 trials on GPU.

STEP 04

Prediction generation

For each upcoming fixture, the league-specific LightGBM model outputs win/draw/loss probabilities, an over/under 2.5 goals prediction, and the most likely exact scoreline.

STEP 05

Validation and delivery

All predictions are evaluated using strict temporal splits — we only test on data the model hasn't seen. Value bets are flagged by comparing model probabilities against bookmaker odds. Smart coupons are generated by combining the highest-EV picks. Results are pushed to the app.

Data sources

football-data.org

Match results, standings, fixtures, and team data for all 8 leagues. Our primary source for historical match outcomes and scheduling.

The Odds API

Real-time bookmaker odds from 20+ bookmakers worldwide. We use these to calculate implied probabilities and detect value bets where our model disagrees with the market.

understat.com

Expected goals (xG), shot data, and advanced match statistics. xG is one of the most powerful predictive features in our model — it measures the quality of chances created.

Feature engineering

We calculate 70+ features per match. Raw data is transformed into signals that capture team strength, momentum, scoring patterns, and market expectations. Here are the main categories:

Form & Momentum

  • Last 5 match results (W/D/L)
  • Last 10 match points percentage
  • Goals scored/conceded rolling average
  • Win streak / losing streak length

Ratings & Rankings

  • Dynamic ELO rating (updated per match)
  • League position and points gap
  • Goal difference ranking
  • Home/away form split

Expected Goals (xG)

  • xG for and against (rolling)
  • xG overperformance/underperformance
  • Shot quality differential
  • Non-penalty xG

Market & Head-to-Head

  • Bookmaker implied probabilities
  • Player availability
  • Head-to-head win rate
  • Goals in H2H meetings (avg)

Two models, one prediction

LightGBM

A gradient-boosted decision tree model optimized for tabular data. We train one model per league with Optuna hyperparameter tuning (100-200 GPU-accelerated trials).

Outputs: Win/Draw/Loss probabilities, Over/Under 2.5 prediction

Score prediction

Exact scoreline estimation derived from LightGBM probability outputs. Accounts for home/away goal distributions and league-specific scoring patterns.

Outputs: Most likely exact scoreline (e.g. 1-0, 2-1)

Temporal validation

Many prediction sites claim high accuracy but test their models on data the model has already seen. This is called data leakage and produces misleadingly high accuracy numbers.

We use strict temporal splits: the model trains only on past data and predicts only on future matches. The last 20% of time-ordered data is reserved for evaluation. This mirrors how the model operates in production — it never has access to future information.

All published accuracy numbers come from 140+ days of this genuine out-of-sample evaluation. Check the actual results on our model performance page.

See the model in action

Start with a free 3-day trial. No credit card required.

Try It Free