Football Odds Prediction

import pandas as pd
from football_odds.models import DoublePoisson, save, load

# required_columns: fixture_date, home_team_name, away_team_name, goals_home, goals_away
df = pd.read_csv('file/containing/matches/played.csv')

dp = DoublePoisson()
dp.fit(df)

model_pkl = 'path/to/model.pkl' 
save(dp, model_pkl)

dp_copy = load(model_pkl)

The MarketOdds class will provide the probabilities of the markets of match given the scores of the home/away teams. Below is a snippet of how this can be used.

from football_odds.odds_compiler import MarketOdds

mo = MarketOdds(
    home_score=(1.2, 0.7),
    away_score=(1.1, 0.9),
    home_adv=1.2,
)

print(mo)

Output

1x2: (0.48992558472222425, 0.2883851893358551, 0.22168922594192064)

Half Time 1x2: (0.3620983599084602, 0.45043119817708993, 0.1874704419144499)

Correct Score 0-1: 0.09755248246238979
Correct Score 0-2: 0.03755770574802006
Correct Score 0-3: 0.009639811141991816
Correct Score 1-0: 0.1641922302224119
Correct Score 1-2: 0.048674786649434004
Correct Score 1-3: 0.012493195240021394
Correct Score 2-0: 0.10639656518412291
Correct Score 2-1: 0.08192535519177464
Correct Score 2-3: 0.008095590515533864
Correct Score 3-0: 0.04596331615954109
Correct Score 3-1: 0.035391753442846646
Correct Score 3-2: 0.013625825075495958

Over/Under 1.5: (0.6115637516497572, 0.3884362483502428)
Over/Under 2.5: (0.3411814634463568, 0.6588185365536432)
Over/Under 3.5: (0.15497819430361537, 0.8450218056963846)

API

We create a simple API with one GET endpoint. This endpoint is in the format of http://127.0.0.1:8000/MATCH_ODDS/{home_team}/{away_team}

One such example is http://127.0.0.1:8000/MATCH_ODDS/Arsenal/Leicester

This method hits the DoublePoisson.test(home_team, away_team) method, that returns a MarketOdds objects. This class compiles all relevant probabilities based on the attacking/defensive scores of the home/away team along with the home advantage. For this particular example, only the match outcome probabilities are relevant.

Data

We obtain historical football results of

England Premier League
from season starting 2010 to
November 2023

The data was obtained from API-FOOTBALL and is store them in a local QuestDB instance running on docker. The relevant data obtained in our case is:

Home team name
Away team name
Home goals scored
Away goals scored
Fixture Date

Methodology

Cleaning

Under analysis/Pre-Match Analysis.ipynb, we explore the structure and nature of the data. We first notice that some games erroneously have NULL values under the home/away goals. These are filtered out when querying.

Poisson-Distributed goals

The main assumption of the Dixon & Cole model is the Poisson distribution assumption of the goals. Upon visual inspection of the histogram, this assumption makes sense. Running the Pearson Chi-square test on the distribution of the homa/away goals against the Poisson, we get high p-values indicating that modelling using the Poisson distribution is adequate.

Independent home/away goals

Another feature implemented by Dixon & Cole is that low-goal results are dependent. We however test for independence between the home and away goals, again using the Chi-square test for independence. This resulted in a high p-value, meaning that the distributions are statistically independent. This simplified are model considerably.

Home Advantage

The summary statistics indicate that the home team scores more goals on average than the away team, indicating that the home team has an inherit advantage. The Mann-Whitney-U test confirms this suspicion, having a p-value < 0.05 against the alternate hypotheses that the average number of home goals is greater than the away goals. Under the Poisson-distributed-goals assumption, we can use the E-test, which also gives the same results.

Evaluation

Under analysis/Evaluation.ipynb, We use 1 year of data to train the model, from 2021-01-01 until 2021-12-31.

Since the model output gives the probability of a HOME/DRAW/AWAY win, we can consider this problem as a multiclass classification problem. We consider the F1, Accuracy, Recall and Precision scores. We also calculate the Root Mean Squared Error (RMSE) between the predicted outcome probability and the binary outcome of the predicted event. We plot the ROC curve and calculate the AUC.

In summary, the following metrics were used:

F1
Accuracy
Precision
Recall
RMSE
AUC

In a failed attempt to measure concept drift we subset the test data into 31-day periods, and calculate all metrics for these periods.

Results

NOTE: Due to the nature of the Poisson model, the probability of a DRAW is lower than a win, and thus the model essentially almost never predicts a DRAW as the most likely outcome. This will skew results negatively when considering the overall weighted F1 score and accuracy. Below is a summary of the results

Overall accuracy of 52% (keeping in mind that this is a 3-class classification problem)
Weighted F1-Score of 61% for HOME guesses and 51% for AWAY guesses
AUC of 62%
Concept drift not obvious from the plots, and further analysis is required

Improvements

We could apply grid search over the zeta parameter to optimise the decay of old games and choose the best model.
Better methods to evaluation the optimal concept drift need to be researched and implemented
Increasing the pool of teams and thus sample size
Better backtesting framework: simulate betting on the Betfair exchange by backing and laying odds based on the discrepancy of the model using the Kelley Criterion.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
API		API
analysis		analysis
football_odds		football_odds
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Football Odds Prediction

Table of contents

Introduction

Model Explanation

Code Structure

Analysis

football_odds

API

Data

Methodology

Cleaning

Poisson-Distributed goals

Independent home/away goals

Home Advantage

Evaluation

Results

Improvements

About

Releases

Packages

Languages

DylanZammit/Football-Odds-Prediction

Folders and files

Latest commit

History

Repository files navigation

Football Odds Prediction

Table of contents

Introduction

Model Explanation

Code Structure

Analysis

football_odds

API

Data

Methodology

Cleaning

Poisson-Distributed goals

Independent home/away goals

Home Advantage

Evaluation

Results

Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages