Predicting EPL Standings Using Poisson Distribution
This project attempts to determine whether goal scoring in the premier league can be modelled by a poisson process. A simple Poisson model is a good starting point to model the number of goals scored in a match in a 90-min game. Later, we will use this to simulate an entire season using on the data from 2005-2020 seasons.
The data for this project is scraped from football-data.co.uk which contains information for each game since 1993/94 season.
[Show me the Code]
Here is a snapshot of the data
epl.tail(3)
Div | season | Date | HomeTeam | AwayTeam | FTHG | FTAG | FTR | HTHG | HTAG | ... | AvgC<2.5 | AHCh | B365CAHH | B365CAHA | PCAHH | PCAHA | MaxCAHH | MaxCAHA | AvgCAHH | AvgCAHA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5698 | E0 | 1920 | 26/07/2020 | Newcastle | Liverpool | 1.0 | 3.0 | A | 1.0 | 1.0 | ... | 2.40 | 1.0 | 1.94 | 1.96 | 1.95 | 1.97 | 2.03 | 2.00 | 1.95 | 1.92 |
5699 | E0 | 1920 | 26/07/2020 | Southampton | Sheffield United | 3.0 | 1.0 | H | 0.0 | 1.0 | ... | 2.01 | -0.5 | 1.97 | 1.93 | 2.00 | 1.93 | 2.03 | 1.96 | 1.98 | 1.89 |
5700 | E0 | 1920 | 26/07/2020 | West Ham | Aston Villa | 1.0 | 1.0 | D | 0.0 | 0.0 | ... | 2.03 | 0.0 | 1.93 | 1.97 | 1.95 | 1.98 | 1.99 | 2.00 | 1.93 | 1.95 |
3 rows × 140 columns
For this project we only need 5 columns starting from the Date
field. Therefore, let’s get rid of the unwanted columns and rename them so that is is easier to index these columns.
# removing unncessary cols
epl = epl.iloc[:, 1:7]
# rename cols
epl.columns = [i.lower() for i in epl]
epl.rename(columns={'fthg':'homegoals', 'ftag':'awaygoals'}, inplace=True)
Let’s have a look at the average number of home and away goals scored in each game.
# avg home vs away goals
epl.mean()[1:]
homegoals 1.537544
awaygoals 1.147544
dtype: float64
Well, this result should be intuitive to anyone who follows sports like Soccer, American football, Basketball, etc. as the team which plays at home is considered to have an advantage mainly due to the players familiarity and comfort with the environment and their fan’s presence.
Let’s try to compare the actual number goals scored in a game to the goals calculated by the poisson process.
The poisson model fits the distribution very well, as shown in the figures above. Hopefully, we’ll be able to adapt this method to simulate individual matches throughout the course of a season.
We will now create poisson regression models and get the regression coefficients as each team’s home and away scoring rates. Since, by default, statsmodels uses the team that comes first alphabetically as the reference group, we will replace the y-intercept term with 0.
home = smf.glm(formula="homegoals ~ 0 + hometeam", data=epl,
family=sm.families.Poisson()).fit()
home.summary()
Dep. Variable: | homegoals | No. Observations: | 5700 |
---|---|---|---|
Model: | GLM | Df Residuals: | 5661 |
Model Family: | Poisson | Df Model: | 38 |
Link Function: | log | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -8746.5 |
Date: | Thu, 25 Nov 2021 | Deviance: | 6561.9 |
Time: | 10:39:35 | Pearson chi2: | 5.72e+03 |
No. Iterations: | 5 | Pseudo R-squ. (CS): | 0.09657 |
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
hometeam[Arsenal] | 0.7527 | 0.041 | 18.515 | 0.000 | 0.673 | 0.832 |
hometeam[Aston Villa] | 0.1874 | 0.060 | 3.108 | 0.002 | 0.069 | 0.306 |
hometeam[Birmingham] | 0.1352 | 0.107 | 1.261 | 0.207 | -0.075 | 0.345 |
hometeam[Blackburn] | 0.3354 | 0.073 | 4.574 | 0.000 | 0.192 | 0.479 |
hometeam[Blackpool] | 0.4568 | 0.183 | 2.502 | 0.012 | 0.099 | 0.815 |
hometeam[Bolton] | 0.3137 | 0.074 | 4.231 | 0.000 | 0.168 | 0.459 |
hometeam[Bournemouth] | 0.3588 | 0.086 | 4.184 | 0.000 | 0.191 | 0.527 |
hometeam[Brighton] | 0.1001 | 0.126 | 0.794 | 0.427 | -0.147 | 0.347 |
hometeam[Burnley] | 0.1236 | 0.088 | 1.404 | 0.160 | -0.049 | 0.296 |
hometeam[Cardiff] | 0.0760 | 0.156 | 0.487 | 0.627 | -0.230 | 0.382 |
hometeam[Charlton] | 0.0760 | 0.156 | 0.487 | 0.627 | -0.230 | 0.382 |
hometeam[Chelsea] | 0.7560 | 0.041 | 18.627 | 0.000 | 0.676 | 0.836 |
hometeam[Crystal Palace] | 0.0864 | 0.083 | 1.040 | 0.298 | -0.076 | 0.249 |
hometeam[Derby] | -0.4595 | 0.289 | -1.592 | 0.111 | -1.025 | 0.106 |
hometeam[Everton] | 0.5024 | 0.046 | 10.903 | 0.000 | 0.412 | 0.593 |
hometeam[Fulham] | 0.3365 | 0.061 | 5.488 | 0.000 | 0.216 | 0.457 |
hometeam[Huddersfield] | -0.3795 | 0.196 | -1.935 | 0.053 | -0.764 | 0.005 |
hometeam[Hull] | 0.1190 | 0.097 | 1.230 | 0.219 | -0.071 | 0.308 |
hometeam[Leicester] | 0.4456 | 0.075 | 5.945 | 0.000 | 0.299 | 0.592 |
hometeam[Liverpool] | 0.7527 | 0.041 | 18.515 | 0.000 | 0.673 | 0.832 |
hometeam[Man City] | 0.8105 | 0.039 | 20.521 | 0.000 | 0.733 | 0.888 |
hometeam[Man United] | 0.7527 | 0.041 | 18.515 | 0.000 | 0.673 | 0.832 |
hometeam[Middlesbrough] | 0.2336 | 0.091 | 2.559 | 0.010 | 0.055 | 0.413 |
hometeam[Newcastle] | 0.3196 | 0.054 | 5.892 | 0.000 | 0.213 | 0.426 |
hometeam[Norwich] | 0.1911 | 0.093 | 2.049 | 0.040 | 0.008 | 0.374 |
hometeam[Portsmouth] | 0.2252 | 0.092 | 2.457 | 0.014 | 0.046 | 0.405 |
hometeam[QPR] | 0.0513 | 0.129 | 0.397 | 0.691 | -0.202 | 0.304 |
hometeam[Reading] | 0.2196 | 0.119 | 1.851 | 0.064 | -0.013 | 0.452 |
hometeam[Sheffield United] | 0.2336 | 0.144 | 1.619 | 0.106 | -0.049 | 0.517 |
hometeam[Southampton] | 0.3652 | 0.068 | 5.404 | 0.000 | 0.233 | 0.498 |
hometeam[Stoke] | 0.2664 | 0.064 | 4.195 | 0.000 | 0.142 | 0.391 |
hometeam[Sunderland] | 0.1173 | 0.065 | 1.797 | 0.072 | -0.011 | 0.245 |
hometeam[Swansea] | 0.2970 | 0.075 | 3.974 | 0.000 | 0.151 | 0.444 |
hometeam[Tottenham] | 0.6071 | 0.044 | 13.884 | 0.000 | 0.521 | 0.693 |
hometeam[Watford] | 0.1983 | 0.085 | 2.338 | 0.019 | 0.032 | 0.365 |
hometeam[West Brom] | 0.2583 | 0.064 | 4.051 | 0.000 | 0.133 | 0.383 |
hometeam[West Ham] | 0.3514 | 0.051 | 6.832 | 0.000 | 0.251 | 0.452 |
hometeam[Wigan] | 0.1060 | 0.077 | 1.378 | 0.168 | -0.045 | 0.257 |
hometeam[Wolves] | 0.2083 | 0.092 | 2.253 | 0.024 | 0.027 | 0.389 |
Similarly for away goals
away = smf.glm(formula="awaygoals ~ 0 + awayteam", data=epl,
family=sm.families.Poisson()).fit()
away.summary()
Dep. Variable: | awaygoals | No. Observations: | 5700 |
---|---|---|---|
Model: | GLM | Df Residuals: | 5661 |
Model Family: | Poisson | Df Model: | 38 |
Link Function: | log | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -7854.0 |
Date: | Thu, 25 Nov 2021 | Deviance: | 6783.8 |
Time: | 10:39:35 | Pearson chi2: | 5.94e+03 |
No. Iterations: | 5 | Pseudo R-squ. (CS): | 0.08115 |
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
awayteam[Arsenal] | 0.4634 | 0.047 | 9.863 | 0.000 | 0.371 | 0.555 |
awayteam[Aston Villa] | 0.1197 | 0.062 | 1.919 | 0.055 | -0.003 | 0.242 |
awayteam[Birmingham] | -0.2036 | 0.127 | -1.603 | 0.109 | -0.453 | 0.045 |
awayteam[Blackburn] | 0.0655 | 0.084 | 0.780 | 0.435 | -0.099 | 0.230 |
awayteam[Blackpool] | 0.2744 | 0.200 | 1.372 | 0.170 | -0.118 | 0.666 |
awayteam[Bolton] | -0.0152 | 0.087 | -0.173 | 0.862 | -0.186 | 0.156 |
awayteam[Bournemouth] | 0.1001 | 0.098 | 1.026 | 0.305 | -0.091 | 0.291 |
awayteam[Brighton] | -0.2364 | 0.149 | -1.586 | 0.113 | -0.529 | 0.056 |
awayteam[Burnley] | -0.0918 | 0.098 | -0.936 | 0.349 | -0.284 | 0.100 |
awayteam[Cardiff] | -0.4187 | 0.200 | -2.094 | 0.036 | -0.811 | -0.027 |
awayteam[Charlton] | -0.1112 | 0.171 | -0.649 | 0.517 | -0.447 | 0.225 |
awayteam[Chelsea] | 0.4700 | 0.047 | 10.037 | 0.000 | 0.378 | 0.562 |
awayteam[Crystal Palace] | 0.1269 | 0.081 | 1.560 | 0.119 | -0.033 | 0.286 |
awayteam[Derby] | -0.8650 | 0.354 | -2.447 | 0.014 | -1.558 | -0.172 |
awayteam[Everton] | 0.0937 | 0.057 | 1.658 | 0.097 | -0.017 | 0.204 |
awayteam[Fulham] | -0.1908 | 0.080 | -2.390 | 0.017 | -0.347 | -0.034 |
awayteam[Huddersfield] | -0.4595 | 0.204 | -2.251 | 0.024 | -0.860 | -0.059 |
awayteam[Hull] | -0.2498 | 0.116 | -2.149 | 0.032 | -0.478 | -0.022 |
awayteam[Leicester] | 0.3264 | 0.080 | 4.103 | 0.000 | 0.170 | 0.482 |
awayteam[Liverpool] | 0.4411 | 0.048 | 9.284 | 0.000 | 0.348 | 0.534 |
awayteam[Man City] | 0.4895 | 0.046 | 10.557 | 0.000 | 0.399 | 0.580 |
awayteam[Man United] | 0.4895 | 0.046 | 10.557 | 0.000 | 0.399 | 0.580 |
awayteam[Middlesbrough] | -0.3054 | 0.120 | -2.555 | 0.011 | -0.540 | -0.071 |
awayteam[Newcastle] | -0.0583 | 0.066 | -0.891 | 0.373 | -0.187 | 0.070 |
awayteam[Norwich] | -0.2912 | 0.119 | -2.454 | 0.014 | -0.524 | -0.059 |
awayteam[Portsmouth] | -0.1350 | 0.110 | -1.230 | 0.219 | -0.350 | 0.080 |
awayteam[QPR] | -0.0357 | 0.135 | -0.265 | 0.791 | -0.300 | 0.229 |
awayteam[Reading] | 0.1313 | 0.124 | 1.059 | 0.290 | -0.112 | 0.374 |
awayteam[Sheffield United] | -0.5021 | 0.209 | -2.408 | 0.016 | -0.911 | -0.093 |
awayteam[Southampton] | 0.1178 | 0.076 | 1.540 | 0.124 | -0.032 | 0.268 |
awayteam[Stoke] | -0.2364 | 0.082 | -2.895 | 0.004 | -0.396 | -0.076 |
awayteam[Sunderland] | -0.1006 | 0.073 | -1.383 | 0.167 | -0.243 | 0.042 |
awayteam[Swansea] | -0.0462 | 0.089 | -0.520 | 0.603 | -0.220 | 0.128 |
awayteam[Tottenham] | 0.3925 | 0.049 | 8.063 | 0.000 | 0.297 | 0.488 |
awayteam[Watford] | -0.1112 | 0.099 | -1.123 | 0.261 | -0.305 | 0.083 |
awayteam[West Brom] | -0.1472 | 0.078 | -1.885 | 0.059 | -0.300 | 0.006 |
awayteam[West Ham] | 0.0260 | 0.061 | 0.429 | 0.668 | -0.093 | 0.145 |
awayteam[Wigan] | -0.0334 | 0.082 | -0.406 | 0.685 | -0.195 | 0.128 |
awayteam[Wolves] | 0.0412 | 0.101 | 0.410 | 0.682 | -0.156 | 0.238 |
Now that we have the home and away scoring rates for each team let’s prepare the data for the simulation.
We first combine the two dataframes such that every club plays with every other team twice (home and away). This results in a total of 380 rows, representing the 380 matchups of the season.
Since we used the data from seasons 2005-2020, there are some teams which are relegated and therefore currently not in the EPL. It makes sense to exclude these clubs and include only those which have qualified for the next season.
team_home | rate_home | team_away | rate_away | |
---|---|---|---|---|
0 | Arsenal | 2.122724 | Aston Villa | 1.127159 |
1 | Arsenal | 2.122724 | Brighton | 0.789465 |
2 | Arsenal | 2.122724 | Burnley | 0.912288 |
3 | Arsenal | 2.122724 | Chelsea | 1.599994 |
4 | Arsenal | 2.122724 | Crystal Palace | 1.135303 |
... | ... | ... | ... | ... |
337 | Wolves | 1.231583 | Sheffield United | 0.605258 |
338 | Wolves | 1.231583 | Southampton | 1.125019 |
339 | Wolves | 1.231583 | Tottenham | 1.480678 |
340 | Wolves | 1.231583 | West Brom | 0.863121 |
341 | Wolves | 1.231583 | West Ham | 1.026341 |
342 rows × 4 columns
We would like to simulate the results 10,000 times, hence we dulplicate the above table 10,000 times, and create a new table with 3420000 (342 x 10,000) rows where each 342 row subset is the result of 1 simulation. We feed in the scoring rates calculated earlier to the poisson distribution to generate the number of goals scored for every home and away team in every row of the table.
In addition, the number of points for every match outcome based on the team’s number of goals scored are also calculated, as a side gets 3 points if it scores more than its opponent, 1 point if it’s a draw, and 0 points if the opposing roster has more goals.
Here’s a look at the first few rows of the table. The columns sim_home
and sim_away
indicate our simulate match outcome for each matchup, as they are the number of goals scored for each club randomly generated using their scoring rates. home_pts
and away_pts
columns show the final points of the match. For example, consider the second row, Arsenal vs Brighton, Arsenal on their home pitch beats Brighton 2-1 and is therefore awarded 3 points for the win whereas Brighton gets no points for their loss.
sim_xG.head(3)
team_home | rate_home | team_away | rate_away | sim_home | sim_away | home_pts | away_pts | |
---|---|---|---|---|---|---|---|---|
0 | Arsenal | 2.122724 | Aston Villa | 1.127159 | 1 | 0 | 3 | 0 |
1 | Arsenal | 2.122724 | Brighton | 0.789465 | 2 | 1 | 3 | 0 |
2 | Arsenal | 2.122724 | Burnley | 0.912288 | 4 | 1 | 3 | 0 |
Let’s have a look at one of the simulation
sim_season(7)
team | final_pts | gd | sim | rank | |
---|---|---|---|---|---|
0 | Man United | 67 | 22 | 7 | 1 |
1 | Arsenal | 67 | 18 | 7 | 2 |
2 | Chelsea | 67 | 16 | 7 | 3 |
3 | Leicester | 61 | 14 | 7 | 4 |
4 | Man City | 59 | 29 | 7 | 5 |
5 | Liverpool | 59 | 21 | 7 | 6 |
6 | Tottenham | 53 | 8 | 7 | 7 |
7 | Newcastle | 50 | -4 | 7 | 8 |
8 | Southampton | 46 | 7 | 7 | 9 |
9 | Fulham | 46 | -7 | 7 | 10 |
10 | Crystal Palace | 45 | -19 | 7 | 11 |
11 | West Brom | 44 | -9 | 7 | 12 |
12 | Everton | 42 | 10 | 7 | 13 |
13 | Wolves | 40 | -14 | 7 | 14 |
14 | Brighton | 40 | -18 | 7 | 15 |
15 | Burnley | 39 | -15 | 7 | 16 |
16 | West Ham | 38 | -10 | 7 | 17 |
17 | Aston Villa | 35 | -22 | 7 | 18 |
18 | Sheffield United | 34 | -27 | 7 | 19 |
We’re now interested in obtaining the result for each one of our 10,000 simulations. We combine all the results in a single dataframe. The simulation group of each row is denoted by the sim
column.
epl_sim.head()
team | final_pts | gd | sim | rank | |
---|---|---|---|---|---|
0 | Arsenal | 68 | 23 | 1 | 1 |
1 | Man United | 65 | 23 | 1 | 2 |
2 | Tottenham | 64 | 25 | 1 | 3 |
3 | Man City | 63 | 22 | 1 | 4 |
4 | Liverpool | 58 | 21 | 1 | 5 |
Let’s take a look each team’s chances of finishing at each position.
epl_sim.groupby('team')['rank'].value_counts(normalize=True).unstack().fillna(0)
rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
team | |||||||||||||||||||
Arsenal | 0.1615 | 0.1593 | 0.1540 | 0.1331 | 0.1140 | 0.0852 | 0.0611 | 0.0438 | 0.0321 | 0.0214 | 0.0118 | 0.0080 | 0.0048 | 0.0046 | 0.0026 | 0.0017 | 0.0005 | 0.0004 | 0.0001 |
Aston Villa | 0.0012 | 0.0044 | 0.0063 | 0.0143 | 0.0219 | 0.0328 | 0.0524 | 0.0623 | 0.0707 | 0.0758 | 0.0817 | 0.0841 | 0.0857 | 0.0812 | 0.0814 | 0.0708 | 0.0686 | 0.0620 | 0.0424 |
Brighton | 0.0001 | 0.0002 | 0.0008 | 0.0023 | 0.0029 | 0.0067 | 0.0109 | 0.0187 | 0.0250 | 0.0361 | 0.0465 | 0.0519 | 0.0645 | 0.0760 | 0.0932 | 0.1166 | 0.1340 | 0.1531 | 0.1605 |
Burnley | 0.0004 | 0.0007 | 0.0020 | 0.0031 | 0.0062 | 0.0126 | 0.0223 | 0.0279 | 0.0384 | 0.0467 | 0.0631 | 0.0694 | 0.0788 | 0.0863 | 0.0939 | 0.1087 | 0.1154 | 0.1181 | 0.1060 |
Chelsea | 0.1664 | 0.1697 | 0.1506 | 0.1367 | 0.1100 | 0.0857 | 0.0586 | 0.0416 | 0.0290 | 0.0183 | 0.0119 | 0.0069 | 0.0059 | 0.0046 | 0.0018 | 0.0012 | 0.0009 | 0.0001 | 0.0001 |
Crystal Palace | 0.0006 | 0.0023 | 0.0057 | 0.0085 | 0.0147 | 0.0272 | 0.0363 | 0.0491 | 0.0585 | 0.0671 | 0.0755 | 0.0789 | 0.0834 | 0.0852 | 0.0898 | 0.0853 | 0.0916 | 0.0783 | 0.0620 |
Everton | 0.0091 | 0.0179 | 0.0263 | 0.0430 | 0.0644 | 0.0865 | 0.0935 | 0.1043 | 0.0964 | 0.0883 | 0.0785 | 0.0666 | 0.0575 | 0.0500 | 0.0380 | 0.0341 | 0.0210 | 0.0165 | 0.0081 |
Fulham | 0.0007 | 0.0018 | 0.0041 | 0.0067 | 0.0147 | 0.0256 | 0.0338 | 0.0487 | 0.0563 | 0.0662 | 0.0751 | 0.0863 | 0.0894 | 0.0904 | 0.0918 | 0.0890 | 0.0868 | 0.0756 | 0.0570 |
Leicester | 0.0192 | 0.0393 | 0.0523 | 0.0739 | 0.0836 | 0.1018 | 0.1108 | 0.1009 | 0.0902 | 0.0773 | 0.0582 | 0.0519 | 0.0403 | 0.0331 | 0.0249 | 0.0164 | 0.0128 | 0.0086 | 0.0045 |
Liverpool | 0.1543 | 0.1518 | 0.1543 | 0.1339 | 0.1193 | 0.0848 | 0.0664 | 0.0451 | 0.0345 | 0.0192 | 0.0134 | 0.0079 | 0.0064 | 0.0032 | 0.0021 | 0.0021 | 0.0012 | 0.0001 | 0.0000 |
Man City | 0.2373 | 0.1789 | 0.1497 | 0.1268 | 0.0958 | 0.0744 | 0.0462 | 0.0348 | 0.0203 | 0.0124 | 0.0086 | 0.0066 | 0.0030 | 0.0028 | 0.0012 | 0.0007 | 0.0003 | 0.0002 | 0.0000 |
Man United | 0.1890 | 0.1722 | 0.1469 | 0.1255 | 0.1129 | 0.0820 | 0.0598 | 0.0380 | 0.0269 | 0.0192 | 0.0098 | 0.0056 | 0.0054 | 0.0026 | 0.0024 | 0.0012 | 0.0003 | 0.0003 | 0.0000 |
Newcastle | 0.0007 | 0.0029 | 0.0087 | 0.0127 | 0.0186 | 0.0301 | 0.0444 | 0.0559 | 0.0648 | 0.0790 | 0.0888 | 0.0889 | 0.0843 | 0.0855 | 0.0818 | 0.0747 | 0.0721 | 0.0602 | 0.0459 |
Sheffield United | 0.0002 | 0.0002 | 0.0012 | 0.0016 | 0.0027 | 0.0060 | 0.0086 | 0.0146 | 0.0234 | 0.0306 | 0.0389 | 0.0522 | 0.0610 | 0.0769 | 0.0938 | 0.1085 | 0.1375 | 0.1751 | 0.1670 |
Southampton | 0.0037 | 0.0082 | 0.0179 | 0.0301 | 0.0425 | 0.0614 | 0.0792 | 0.0833 | 0.0921 | 0.0899 | 0.0790 | 0.0818 | 0.0771 | 0.0655 | 0.0558 | 0.0498 | 0.0375 | 0.0297 | 0.0155 |
Tottenham | 0.0571 | 0.0872 | 0.1047 | 0.1184 | 0.1207 | 0.1186 | 0.0999 | 0.0746 | 0.0627 | 0.0428 | 0.0346 | 0.0258 | 0.0168 | 0.0131 | 0.0110 | 0.0066 | 0.0028 | 0.0018 | 0.0008 |
West Brom | 0.0003 | 0.0015 | 0.0042 | 0.0062 | 0.0120 | 0.0193 | 0.0287 | 0.0421 | 0.0512 | 0.0635 | 0.0679 | 0.0787 | 0.0880 | 0.0908 | 0.0956 | 0.0928 | 0.0940 | 0.0944 | 0.0688 |
West Ham | 0.0018 | 0.0054 | 0.0116 | 0.0215 | 0.0364 | 0.0430 | 0.0578 | 0.0769 | 0.0767 | 0.0886 | 0.0897 | 0.0787 | 0.0767 | 0.0758 | 0.0708 | 0.0645 | 0.0539 | 0.0439 | 0.0263 |
Wolves | 0.0009 | 0.0028 | 0.0055 | 0.0107 | 0.0169 | 0.0262 | 0.0395 | 0.0499 | 0.0642 | 0.0739 | 0.0821 | 0.0861 | 0.0881 | 0.0885 | 0.0819 | 0.0865 | 0.0757 | 0.0699 | 0.0507 |
Interpretation example: Arsenal has a 0.1698 (16.98%) probability of finishing first, 0.1712 (17.12%) probability of finishing second, and so on and so forth.
In order to get the probability of Arsenal finishing atleast at a certain position, we add up the probabilities of all positions until that particular position. For example, the probability of Arsenal finishing in the top 4 is the sum of probabilities of positions 1, 2, 3 and 4 i.e, 0.1622 + 0.1637 + 0.1523 + 0.1359 = 0.6141 (61.41%)
From the simulation results, it would be interesting to see which team has the highest chances of winning the league.
np.round(epl_sim[epl_sim['rank'] == 1].team.value_counts(normalize=True).nlargest(4) * 100, 2)
Man City 23.62
Man United 18.82
Chelsea 16.57
Arsenal 16.08
Name: team, dtype: float64
Well this isn’t surprising, as Man City would have the highest scoring rates since they’ve dominated the EPL a lot recently.
A top 4 finish at the end of the season implies that the team has qualified for the UEFA Champions League. Let’s have a look at the teams which dominate these rankings.
np.round(epl_sim[epl_sim['rank'].isin([1, 2, 3, 4])].team.value_counts(normalize=True).nlargest(7) * 100, 2)
Man City 17.20
Man United 15.73
Chelsea 15.48
Arsenal 15.10
Liverpool 14.76
Tottenham 9.12
Leicester 4.59
Name: team, dtype: float64
The EPL is dominated clubs like Man City, Man Utd, Chelsea, Arsenal and Liverpool, therefore its not a suprise to see these clubs dominating the top 4 positions. However, this list also includes Leicester City. This is due to their magical season of 2015-2016, where they shocked the world by winning the title. Since then, Leicester City has been performing consistently well and therefore, has secured its position as one of the top clubs in the EPL.
Let’s have a look at the final standings by using the entire simulation data
epl_sim.groupby('team')[['final_pts', 'gd']].median().sort_values(['final_pts', 'gd'], ascending=False)
final_pts | gd | |
---|---|---|
team | ||
Man City | 63.0 | 22.0 |
Chelsea | 62.0 | 19.0 |
Man United | 62.0 | 19.0 |
Arsenal | 62.0 | 18.0 |
Liverpool | 61.0 | 18.0 |
Tottenham | 57.0 | 11.0 |
Leicester | 53.0 | 4.0 |
Everton | 50.0 | 0.0 |
Southampton | 48.0 | -3.0 |
West Ham | 46.0 | -6.0 |
Aston Villa | 45.0 | -8.0 |
Newcastle | 44.0 | -8.0 |
Wolves | 44.0 | -9.0 |
Crystal Palace | 43.0 | -10.0 |
Fulham | 43.0 | -10.0 |
West Brom | 42.0 | -11.0 |
Burnley | 40.0 | -13.0 |
Brighton | 38.0 | -16.0 |
Sheffield United | 38.0 | -16.0 |
Limitations
- Our approach is based on the assumption that the number of goals scored may be correctly represented by a Poisson distribution. The model results will be inaccurate if that assumption is incorrect. The number of occurrences in half that time period follows a Poisson distribution with mean λ/2, given a Poisson distribution with mean λ. In soccer terms, the first and second halves of a football match should have an equal number of goals. Regrettably, this does not appear to be the case.
- Morever, Poisson Distribution is a simple predictive model that doesn’t allow for numerous factors. Situational factors – such as club circumstances, game status etc. – and subjective evaluation of the change of each team during the transfer window are completely ignored.