Introducing the Shapley Value to Football

September 19, 2013

Football, SubFeature

newc im
Photo by Jed Leicester/Action Images

Most measures of performance in soccer depend on a player’s passes, shots, tackles, and the like. There are three big problems with these traditional metrics: 1) they do not always translate into a team’s ultimate objectives, such as points or goal difference; 2) they may be too specific to encompass all of a player’s contributions to results, whether tangible or intangible; and 3) they take no account of the constant interaction between all of the players on the field. A metric borrowed from economics, the Shapley value, has none of these problems.

Invented by the Nobel-winning economic and mathematician Lloyd Shapley, the Shapley value is an attempt at detecting the contribution of one actor to the results generated by a group. In economics, it is often used to gauge individual productivity among people who work together in a team.

The formula for Shapley values is essentially a thought experiment. Imagine that you can form the same team many times over by adding its members one by one in every possible order. Each time a given member is added to the team, he has some marginal effect on the team’s results. By averaging all of these marginal effects across all the ways of forming the team, we can get some idea of how pivotal the given member is. For instance, if he always has a big marginal effect, no matter when he joins the team, then he must be very important to the team’s results.

Computing Shapley values based on entire XIs requires a huge number of iterations – about 40 million to figure out the value of all the players in a given group – and would take weeks of constant desktop processing for a single season. A more manageable alternative is to compute Shapley values for groups of offensive and defensive players.

I support Newcastle United, so I decided to use its players for my first foray into Shapley values for soccer. I looked at all of Newcastle’s results in the Premier League in 2012-13, classing the goalkeeper and the four deepest outfield players as the defensive group and the other six players on the field (barring red cards) as the offensive group. My output – the result I wanted to parse between the players – was goal difference per minute, adjusted for the strength of opposing teams.

To calculate Shapley values, we only need to know the composition of each XI that played for Newcastle, how long it was on the field, and the goal difference during that time. These data are available all over the Internet, which is another difference versus some of the more complicated metrics based on individual statistics.

For a given player, the Shapley value is his expected contribution to goal difference, on average, per minute. On a team with negative goal difference over the entire season – like Newcastle in 2012-13 – we would expect most of the Shapley values to be negative as well. Most players will share some responsibility for the poor results, but some will still rise to the top.

Like almost all metrics in soccer, Shapley values are influenced by luck. A player who happened to be on the field only when one of his teammates made horrific errors will share some of the blame for those errors and have a lower Shapley value. By the same token, a player whose appearances happened to coincide with goals will have a high Shapley value. By focusing on players who accrued a lot of minutes over the course of the season, I hoped to even out these effects.

Defense. In the 2012-13 season as a whole, Newcastle used 42 different defensive formations. These are the Shapley values of the players who spent the equivalent of at least two full matches in defense:

2012-2013 Newcastle Goalkeepers

Player
Shapley Value
Tim Krul-0.00041
Steve Harper-0.00260
Rob Elliot-0.00299

2012-2013 Newcastle outfield players

Player
Shapley Value
James Perch0.00012
Fabricio Coloccini-0.00002
Mathieu Debuchy-0.00009
Mike Williamson-0.00074
Davide Santon-0.00090
Steven Taylor-0.00136
Danny Simpson-0.00142
Jonas Gutierrez-0.00235
Mapou Yanga-Mbiwa-0.00350
Massadio Haidara-0.00871

Tim Krul was the most valuable goalkeeper. Based on these figures, playing Rob Elliot instead of Krul would generate a worse goal difference by about 0.25 goals per game, or 9.5 goals over the entire season.

James Perch was the best defender, though he also played in midfield, followed by Fabricio Coloccini, the captain. The worst performers were two of Newcastle’s January signings, Mapou Yanga-Mbiwa and Massadio Haidara. But not all the signings were busts; Mathieu Debuchy was the third-best defender.

Offense. Here are the Shapley values for players who spent the equivalent of at least two full matches as part of Newcastle’s offensive group during the 2012-13 season:

2012-2013 Newcastle attacking players

Player
Shapley Value
Yoan Gouffran0.00092
Sylvain Marveaux 0.00083
Gael Bigirimana-0.00021
Hatem Ben Arfa-0.00054
Jonas Gutierrez-0.00068
Shane Ferguson-0.00072
Vurnon Anita-0.00087
Demba Ba-0.00126
Yohan Cabaye-0.00130
Cheick Tiote-0.00165
Papiss Cisse-0.00187
Moussa Sissoko-0.00194
James Perch-0.00222
Gabriel Obertan-0.00324
Sammy Ameobi-0.00348
Shola Ameobi-0.00385

Gael Bigirimana was an underused player, having detracted much less from the team’s goal difference than several other midfielders in the 413 minutes he played. He and Newcastle’s other “creative” players – Yoan Gouffran, Sylvain Marveaux, and Hatem Ben Arfa – were the most productive on the field. Demba Ba, who was sold to Chelsea in midseason, was not especially productive to the offensive formations in which he played, but he was still better than Papiss Cisse, the remaining out-and-out striker. The central midfielders – Vurnon Anita, Yohan Cabaye, Cheick Tiote, and Perch – were not very pivotal, either, and Perch made much more positive contributions when playing in defense.

Aggregation. As I discussed in my previous article, an ideal metric will be aggregable, allowing us to predict the performance of a hypothetical team by adding up the scores of individual players. Shapley values require the same caution in aggregation as most other soccer metrics. They are based on the contribution of players to groups in which they actually played, so combining players who never played together could have unexpected results: clashes of style, overlapping positions, etc.

One way around this is to calculate Shapley values for groups of players rather than for individuals. This is somewhat analogous to computing plus-minus figures for entire offensive lines in ice hockey. In soccer, we might want to calculate the Shapley value for a partnership in central defense, or a core group of strikers and offensive midfielders. But we would always be missing some of the potential interactions between players.

Comparisons with existing metrics. Shapley values are agnostic to how the actions of players on the field translate into goal difference, which is a departure from widely used measures of individual performance. Here is how Newcastle’s Shapley values (in rank order by position) compared with individual player ratings from whoscored.com, a popular website, for 2012-13:

Newcastle Shapley Values compared to existing metrics

Player
Shapley Value
Whoscored.com rating
Tim Krul-0.000416.68
Steven Harper-0.002606.68
Rob Elliot-0.002996.40
James Perch0.00012 6.58
Fabricio Coloccini-0.00002 6.75
Mathieu Debuchy-0.000096.83
Mike Williamson-0.00074 7.30
Davide Santon-0.000906.91
Steven Taylor-0.00136 6.65
Danny Simpson-0.001426.53
Mapou Yanga-Mbiwa-0.00350 6.86
Massadio Haidara-0.008716.50
Yoan Gouffran 0.00092 6.70
Sylvain Marveaux 0.000836.67
Gael Bigirimana-0.000216.39
Hatem Ben Arfa-0.000547.01
Jonas Gutierrez-0.000686.82
Shane Ferguson-0.00072 6.32
Vurnon Anita-0.000876.41
Demba Ba-0.00126 7.11
Yohan Cabaye-0.001307.15
Cheik Tiote-0.00165 6.80
Papiss Cisse-0.001876.57
Moussa Sissoko-0.00194 7.01
Gabriel Obertan-0.003246.30
Sammy Ameobi-0.00348 6.36
Shola Ameobi-0.003856.34

There is only a moderate correlation (roughly 0.3) between the two sets of ratings, suggesting that the Shapley values are picking up factors not considered by the whoscored.com statisticians.

I first calculated these Shapley values a few months ago, and since then they have certainly influenced my thinking about Newcastle. I would have been happy for the club to sell Cabaye for a high price in the summer, since he did not seem too pivotal to the team’s results. As the season began, I was aghast to see Yanga-Mbiwa and Sissoko starting together. But I’ve also been gratified by the performances of Gouffran, Marveaux, and Ben Arfa in the past few weeks. Now, if we could only get Bigirimana some more minutes….

For more quick insights, follow us on Twitter @BSportsFootball

  • More From BSports Football
  • View how BSports Analytics affects matches using Match Analysis

 
  •  

    Related

    2 Comments

    1. Hi Daniel,

      as mentioned on Twitter, i like this post. Great job and well written! I just thought you may want to multiply your Shapley values with 90. I think a goaldiff/90min scale is easier to read and interpret the numbers like “If Krul plays instead of Harper, expected goal diff will rise about 0.15″. (Although this is literally speaking only true if the rest of the team is randomly selected).

      Cheers, Jörg

      • Hi Jörg,

        I did in fact multiply by 93 in an earlier version of this work that was circulated to some teams. But using that kind of figure can tempt people to sum up Shapley values for an entire team and decide that the answer is the most likely goal difference for a game. As I wrote above, aggregation is not straightforward with Shapley values.

        I think the most useful applications of Shapley values are 1) deciding which player to start in a given position, 2) figuring out which player needs to be stopped on an opposing team, and 3) trying to spot overrated and underrated players in the transfer market.

        Best,
        Dan

    Comments

    Your email address will not be published. Required fields are marked *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    Current month ye@r day *