⚽ Can We Predict the Next Premier League Champion with Binomial Probability?

⚽ Can We Predict the Next Premier League Champion with Binomial Probability?

posted Originally published at dev.to 3 min read

What are the chances your favorite EPL team wins the league next season? Time to let math do the talking!


Idea Behind the Madness

Every football fan has asked it:
"Can my team win the league next season?"

Instead of relying on blind hope, I decided to use binomial probability to calculate each team's chances of taking the crown in the next Premier League season, based entirely on how they performed last time.

We’ll:

  1. Fetch last season’s final standings from an API.
  2. Use binomial distribution to simulate two things:
    • The probability of a team repeating its exact win total.
    • The probability of a team reaching the typical championship threshold which is ≈27.6 so 28 wins.
  3. Rank them accordingly.

️ Step 1: Fetching EPL Data Using an API

I used the football-data.org API to pull the standings. You’ll need a free API token, save it in a .env file like this:

API_TOKEN=your_football_data_token

Now fetch the standings:

import requests
import os
from dotenv import load_dotenv

load_dotenv()

def fetch_epl_standings():
    token = os.getenv("API_TOKEN")
    if not token:
        raise ValueError("API_TOKEN not found in env")
    
    uri = "http://api.football-data.org/v4/competitions/PL/standings?season=2024"
    headers = { 'X-Auth-Token': token }

    response = requests.get(uri, headers=headers)

    if response.status_code != 200:
        raise Exception(f"API request failed with status code {response.status_code}: {response.text}")

    data = response.json()
    return data["standings"][0]["table"]

standings = fetch_epl_standings()

Convert to DataFrame:

import pandas as pd

data_rows = []
for team in standings:
    data_rows.append({
        "Pos": team["position"],
        "Team": team["team"]["name"],
        "Matches": team["playedGames"],
        "Wins": team["won"],
        "Draws": team["draw"],
        "Losses": team["lost"],
        "Points": team["points"],
        "+/-": team["goalDifference"],
        "Goals": f'{team["goalsFor"]}:{team["goalsAgainst"]}'
    })

df = pd.DataFrame(data_rows)
df.to_csv('epl_standings.csv', index=False)

Step 2: Binomial Probability of Exact Win Count

Now let's calculate the probability of each team repeating the exact number of wins they had last season.

import math

# Loop through each row and calculate binomial probability
for index, row in df.iterrows():
    team = row['Team']
    n = int(row['Matches'])  # total games
    k = int(row['Wins'])     # wins
    p = k / n                # estimated win probability

    try:
        binom_prob = math.comb(n, k) * (p**k) * ((1 - p)**(n - k))
    except OverflowError:
        binom_prob = 0.0

    print(f"{team}: P( {k} wins)  = {binom_prob:.6f}")

Sample Output:

Liverpool FC: P( 25 wins)  = 0.135388
Arsenal FC: P( 20 wins)  = 0.128761
Ipswich Town FC: P( 4 wins)  = 0.206486
Southampton FC: P( 2 wins)  = 0.278054

What These Results Tell Us

  • Top teams like Liverpool have lower exact probabilities, there's more room for variation when you're near the top.

  • Lower-table teams tend to have higher repeat chances, but don't celebrate just yet...

Step 3: Probability of Title-Winning Season (≥ 28 Wins)

Next, we model the probability of each team reaching 28 or more wins, a common threshold to win the league.

We'll use the cumulative binomial distribution:

from scipy.stats import binom

def title_probability(wins, matches=38, threshold=28):
    p = wins / matches
    return 1 - binom.cdf(threshold - 1, matches, p)

for index, row in df.iterrows():
    team = row['Team']
    wins = int(row['Wins'])
    prob = title_probability(wins, threshold=28)
    print(f"{team}: P(Wins ≥ 28) = {prob:.6f}")

Sample Output:

Team P(Wins ≥ 28)
Liverpool FC 19.78%
Manchester City FC 1.54%
Arsenal FC 0.66%
Chelsea FC 0.66%
Newcastle United 0.66%
Manchester United FC 0.00%

Interpretation

Liverpool is most likely to hit 28+ wins based on current form.

City, Chelsea and the others trail behind, possibly due to more draws or inconsistent performances.

Man United? Their chance rounds to zero. Ouch .

United fans, this model says your 11-win season gives you a statistically negligible shot at the title. You might want to pray harder than you code.

⚠️ Limitations

Let’s be honest, binomial probability isn’t a crystal ball. Here's why:

  1. It ignores real-world dynamics: transfers, injuries, managerial changes.
  2. It assumes independent, identically distributed matches (which football is not).
  3. Based on one season, not a large enough sample for deep insight.

But hey, it’s fun and statistically grounded!

Want to Take This Further?

Here’s how you can level up the model:

  • Use Poisson regression to simulate goals per match.

  • Integrate Elo ratings or other power metrics.

  • Run full Monte Carlo simulations of future fixtures.

  • Track the model live across the season for dynamic probabilities.

Final Thoughts

While this model won’t help you win your fantasy league, it does give a math-driven glimpse into who’s statistically positioned to succeed. Liverpool fans? You have reason to dream. Southampton? Maybe next year...

Football is unpredictable, and that's what makes it beautiful. But every now and then, it's fun to let the math have a shot at calling the game. ⚽

If you read this far, tweet to the author to show them you care. Tweet a Thanks

More Posts

Inside Netflix’s $1 Billion Algorithm - How Recommendations Predict Your Next Binge

Codeverse pro - Aug 12

Tabsdata's pub/sub model replaces data pipelines with declarative contracts for Python developers.

Tom Smith - Jun 21

How AI Flipped the Way We Learn to Code

rahul mishra - Jul 18

From Raw Data to Model Building

rahul mishra - Aug 28

Full-Stack vs. Data Science: Which Career Path Scales Better in 2025

Sunny - Aug 8
chevron_left