Basically, not all attempts at Goal have an equal likelihood of being scored and there will be cases where despite the headline figures showing Team A had more attempts at goal, Team B had better quality attempts and 'deserved' the win.
I've put together a basic attempt to break down attempts into different categories, it's far from perfect (as will be discussed at the end) but gives a good initial breakdown of chance type.
For each of the 380 Premier League games in the 2012/13 season I've taken details of attempt type (using Opta data from EPL Index) and categorised into one of the following groups:
Clear Cut Chance (also described as a 'Big Chance') - Where there is a 'reasonable' expectation of scoring
Inside Box Attempts
Outside Box Attempts
The categories above are created as a hierarchy so all Clear Cut Chances exclude Penalties (included normally within Opta definition but stripped out in this analysis) and all Inside Box attempts exclude Clear Cut Chances.
For all teams the overall figures were:
From the figures above it can be seen that excluding Penalties, Clear Cut Chances (CCC) accounted for around half of all goals last season and that attempts outside the box (excluding CCC) are rarely rewarded (as a result they are the ones that live long in the memory).
Shooting from outside the box may or may not be the most statistically beneficial use of possession but I'm sure when one does go in the player (or the fans) aren't worrying too much about the conversion stats.
Split by Team the figures are as follows:
Ranges for each value range from Dark Red - High for that variable (column) compared to other teams down to Dark Green for low
|Premier League 2012/13 Conversion Rates by Attempt Type - Teams Ordered by League Position|
Also interest was QPRs Inside Box Conversion rate was actually lower than their Outside Box rate which in part helps explain why they only scored 30 league goals all season. Also scoring 1 from 4 of your penalties doesn't help when goals are so hard to come by.
For the Man Utd/Man City example it isn't known of course if the difference due to:
- Good finishing from Man Utd
- Bad finishing by Man City
- Whoever codes Man City games being more generous as coding a chance as a CCC
- Man Utd creating a 'better class' of CCC
- Some combination of all of the above
As the data is available for every match, I have also been able to collate the relative conversion rates for the opposition team against each side in the League over the season:
|Premier League 2012/13 Opposition Conversion Rates by Attempt Type|
Expected Goals per MatchBy taking each team's average conversion rate over the season along with the type of attempts they had in each match it's possible to determine the 'Expected' number of goals for both teams for all 380 matches based on average performance over the season.
The league average for penalty conversion was applied for all teams but all other figures use that team's average performance. I have allocated a team the win if the expected difference in goals was 0.5 goals or more than their opposition (using 1 goal as the differentiator gave too many draws):
From this it can be seen that the a team who 'deserves' to win, does so 53% of the time (where actual=expected, the top left to bottom right diagonal). There are 27 (7%) of games where the team who 'deserves' to win loses and the remaining 40% are where one team 'deserves' to win but draws or where a draw is expected and one side wins.
Applying this to a league table gives the following results:
|League Points Actual and Expected|
Liverpool also show a big difference between actual and expected which is possibly down to the fact that they tended to blow hot and cold.
Southampton and Sunderland finished with similar points totals but possibly Sunderland were lucky to get what they did.
LimitationsThe above is a long long way from being a definitive look at who got what they deserved last season and there are a number of areas that need to be taken into consideration:
It is assumed that all Clear Cut Chances are within the box, this is an over-simplification but necessary as data not available at sufficiently low level i.e., CCC total is available as is total shots inside/outside box but not CCC inside/outside box.
Excludes Own Goals
Is left to a subjective definition of what a Clear Cut Chance is: some will be obvious, others will be more of a judgement call e.g., is someone stretching to reach a cross a Clear Cut Chance?
Teams may have differing styles in different matches, a good example of this would be Liverpool with/without Suarez. Overall Liverpool have a well below average Inside Box conversion rate due to Suarez liking to shoot regardless of the angle. Therefore 5 Inside Box shots when he's playing might give a different goal expectation compared to when he isn't.
Game State: There'll be instances where a team scores early with a speculative effort then spends most of the rest of the match defending. The above analysis would say the other team deserved the win but in reality setting up defensively when winning is a reasonable approach.
Averages look at conversion rate for team but ignore the save rate of their opponents, all other things being equal, you would expect goalkeepers from a top side to save a higher proportion of shots than a keeper for a team further down the table.
Clear Cut Chance is far to broad a grouping to be fully useful (as is discussed below).
Next StagesIn an ideal world you'd have more than 3 types of attempts, Clear Cut, Inside Box and Outside Box are far too general to be a fair split of types of chances that players have.
In a business context, if you segmented customers by those who'd bought once, twice and 3+ times then you'd certainly see differences in behaviour in these 3 segments but you'd be losing out on a lot of potential discrimination of the 3+ group.
Similarly 100 segments might be overkill with little practical differences between a lot of them causing unnecessary confusion.
There's plenty of work appeared on the web over the last couple of weeks by people such as Kickdex, Colin Trainor and Statsbomb looking at things such as the area of the shot, as just Inside Box is surely too general a grouping.
Ultimately everyone is trying to make the most of the data that's available but using location of shot misses out on the one thing that Clear Cut Chances attempts to bring in and that's context of an attempt, finishing from an angle after rounding the keeper is easier than an a central attempt from 10 yards out when there's a mass of bodies between you and the goal.
As there are around 260 attempts on goal for each set of 10 Premier League fixtures, it doesn't seem a Herculean task to bundle those into 5-7 segments of conversion difficulty, from Tap-In to 35 yard pile-driver.
It may be that Player A has scored 5 more goals in a season than Player B from the same number of shots purely because Player A has had the better quality chances (this leaves aside the other argument that chances don't always appear on a plate, some will be down to the anticipation of the Player and others will be down to the quality of the team around him).
It's likely that Prozone/Opta already do something along these lines but don't publish it (I don't blame them as anything that gets published gets copied/extracted and used elsewhere). An example of this may be from this Prozone article looking at 'Expected Goals' for some of the top strikers last season.
|One of the charts from Prozone Article of Actual v. Expected Goals|
Update: Since writing this seen an excellent piece by Ted Knutson similarly arguing for context of shots to be thought about. As I mention above, there's a fair chance this sort of thing already exists within clubs but nobody is going to give away that kind of insight for nothing.
Twitter: @we_r_pl http://www.twitter.com/we_r_pl
Match Stats: Created using EPL Index