Thursday, 25 July 2013

Review of 2012/13 Season by Chance Type

At the recent 'Football and Data' event in London, I presented (among other things) a review of Premier League performance by team based on chance type.  This is a follow on from the 'Big Chance' analysis I did a few months ago looking at the relative conversions rates for the top strikers and each PL team.

Basically, not all attempts at Goal have an equal likelihood of being scored and there will be cases where despite the headline figures showing Team A had more attempts at goal, Team B had better quality attempts and 'deserved' the win.

I've put together a basic attempt to break down attempts into different categories, it's far from perfect (as will be discussed at the end) but gives a good initial breakdown of chance type.

For each of the 380 Premier League games in the 2012/13 season I've taken details of attempt type (using Opta data from EPL Index) and categorised into one of the following groups:

Clear Cut Chance (also described as a 'Big Chance') - Where there is a 'reasonable' expectation of scoring
Inside Box Attempts
Outside Box Attempts

The categories above are created as a hierarchy so all Clear Cut Chances exclude Penalties (included normally within Opta definition but stripped out in this analysis) and all Inside Box attempts exclude Clear Cut Chances.

For all teams the overall figures were:

From the figures above it can be seen that excluding Penalties, Clear Cut Chances (CCC) accounted for around half of all goals last season and that attempts outside the box (excluding CCC) are rarely rewarded (as a result they are the ones that live long in the memory).

Shooting from outside the box may or may not be the most statistically beneficial use of possession but I'm sure when one does go in the player (or the fans) aren't worrying too much about the conversion stats.

Split by Team the figures are as follows:

Ranges for each value range from Dark Red - High for that variable (column) compared to other teams down to Dark Green for low

Premier League 2012/13 Conversion Rates by Attempt Type - Teams Ordered by League Position
There are a few figures in the table above that stand out: At the top of the table, the two Manchester teams create similar levels of Clear Cut Chances but conversion rates vary greatly (44% for Man Utd and 27% for Man City).

Also interest was QPRs Inside Box Conversion rate was actually lower than their Outside Box rate which in part helps explain why they only scored 30 league goals all season.  Also scoring 1 from 4 of your penalties doesn't help when goals are so hard to come by.

For the Man Utd/Man City example it isn't known of course if the difference due to:

  • Good finishing from Man Utd
  • Bad finishing by Man City
  • Whoever codes Man City games being more generous as coding a chance as a CCC
  • Man Utd creating a 'better class' of CCC
  • Some combination of all of the above

As the data is available for every match, I have also been able to collate the relative conversion rates for the opposition team against each side in the League over the season:
Premier League 2012/13 Opposition Conversion Rates by Attempt Type
The teams that stand out for me in terms of defence was teams playing against Wigan.  Wigan didn't concede that many Clear Cut Chances but conceded almost half of them.

Expected Goals per Match
By taking each team's average conversion rate over the season along with the type of attempts they had in each match it's possible to determine the 'Expected' number of goals for both teams for all 380 matches based on average performance over the season.

The league average for penalty conversion was applied for all teams but all other figures use that team's average performance.  I have allocated a team the win if the expected difference in goals was 0.5 goals or more than their opposition (using 1 goal as the differentiator gave too many draws):

From this it can be seen that the a team who 'deserves' to win, does so 53% of the time (where actual=expected, the top left to bottom right diagonal).  There are 27 (7%) of games where the team who 'deserves' to win loses and the remaining 40% are where one team 'deserves' to win but draws or where a draw is expected and one side wins.

Applying this to a league table gives the following results:
League Points Actual and Expected
I was a bit surprised to see Manchester United could arguably have taken even more points than they did, but ultimately this is down to how they approach the league.  Stats at a season level are all well and good but ultimately a league campaign is not a 9-month War but a series of separate 90 minute battles.  In any given game, more often than not, Manchester United are the better side.

Liverpool also show a big difference between actual and expected which is possibly down to the fact that they tended to blow hot and cold.

Southampton and Sunderland finished with similar points totals but possibly Sunderland were lucky to get what they did.

The above is a long long way from being a definitive look at who got what they deserved last season and there are a number of areas that need to be taken into consideration:

It is assumed that all Clear Cut Chances are within the box, this is an over-simplification but necessary as data not available at sufficiently low level i.e., CCC total is available as is total shots inside/outside box but not CCC inside/outside box.

Excludes Own Goals

Is left to a subjective definition of what a Clear Cut Chance is: some will be obvious, others will be more of a judgement call e.g., is someone stretching to reach a cross a Clear Cut Chance?

Teams may have differing styles in different matches, a good example of this would be Liverpool with/without Suarez.  Overall Liverpool have a well below average Inside Box conversion rate due to Suarez liking to shoot regardless of the angle.  Therefore 5 Inside Box shots when he's playing might give a different goal expectation compared to when he isn't.

Game State: There'll be instances where a team scores early with a speculative effort then spends most of the rest of the match defending.  The above analysis would say the other team deserved the win but in reality setting up defensively when winning is a reasonable approach.

Averages look at conversion rate for team but ignore the save rate of their opponents, all other things being equal, you would expect goalkeepers from a top side to save a higher proportion of shots than a keeper for a team further down the table.

Clear Cut Chance is far to broad a grouping to be fully useful (as is discussed below).

Next Stages
In an ideal world you'd have more than 3 types of attempts, Clear Cut, Inside Box and Outside Box are far too general to be a fair split of types of chances that players have.

In a business context, if you segmented customers by those who'd bought once, twice and 3+ times then you'd certainly see differences in behaviour in these 3 segments but you'd be losing out on a lot of potential discrimination of the 3+ group.

Similarly 100 segments might be overkill with little practical differences between a lot of them causing unnecessary confusion.

There's plenty of work appeared on the web over the last couple of weeks by people such as Kickdex, Colin Trainor and Statsbomb looking at things such as the area of the shot, as just Inside Box is surely too general a grouping.

Ultimately everyone is trying to make the most of the data that's available but using location of shot misses out on the one thing that Clear Cut Chances attempts to bring in and that's context of an attempt, finishing from an angle after rounding the keeper is easier than an a central attempt from 10 yards out when there's a mass of bodies between you and the goal.

As there are around 260 attempts on goal for each set of 10 Premier League fixtures, it doesn't seem a Herculean task to bundle those into 5-7 segments of conversion difficulty, from Tap-In to 35 yard pile-driver.

It may be that Player A has scored 5 more goals in a season than Player B from the same number of shots purely because Player A has had the better quality chances (this leaves aside the other argument that chances don't always appear on a plate, some will be down to the anticipation of the Player and others will be down to the quality of the team around him).

It's likely that Prozone/Opta already do something along these lines but don't publish it (I don't blame them as anything that gets published gets copied/extracted and used elsewhere).  An example of this may be from this Prozone article looking at 'Expected Goals' for some of the top strikers last season.
One of the charts from Prozone Article of Actual v. Expected Goals 
It'd be interesting to know how much detail is put in to calculating the probability of a goal from a particular attempt as with a more robust approach, it'll be far easier to tell the difference between a player scoring lots because their getting in good positions and one who maybe is performing at a level that might not be sustainable.

Update: Since writing this seen an excellent piece by Ted Knutson similarly arguing for context of shots to be thought about.  As I mention above, there's a fair chance this sort of thing already exists within clubs but nobody is going to give away that kind of insight for nothing.

Twitter: @we_r_pl

Match Stats: Created using EPL Index