Thursday, 12 June 2014

Beyond Big Chances - Shot Segmentation

This was initially going to be a much bigger piece but with the World Cup only hours away, I wanted to get something out before I get sucked into 3 games a day for the next couple of weeks.

There's a lot of work going on around Expected Goals, in any match you may lose but have had better quality chances and if the match were replayed infinitely with the same chances for each side you'd win more than you lose.

The aim of this is to remove the 'luck' from individual games and focus more on longer term performance.  If on a regular basis, based on expected goals you 'deserved' to lose, then even if recent results have gone your way over the longer term things are likely to go sour.  "The only stat that matters is the score" is fine in a cup final but longer term it's more important to track the underlying numbers and not just results.

How 'long' is long term will depend, just as you could theoretically toss a coin 10 times and get heads each time, some players or clubs could be performing above expectation due to reasons beyond their control rather than due to any real above average ability.

Anyone interested in doing this based on data in the public domain is usually working with one hand tied behind their back as a lot of context around a shot (e.g., defensive pressure) is missing.  I'm not moaning that Opta don't go around giving away for free all the data they spend money collecting, but it does make things trickier.

The only stat that gives context to a chance (beyond location and body part) is Opta's Big Chance stat which they define as where there is a 'reasonable' expectation of a goal.  These chances convert at around 40% (excluding penalties) compared to around 5% for other shots.

This gives a good split between 'big' and 'not so big' chances, but just having two types of shot is a bit limited to differentiating between shot types.  To get beyond this I've used Big Chance data along with shot type/location data to try and break chances into a few more meaningful segments.

There is more work behind this but for sake of brevity I'll leave that for another time, this is just a basic first pass where only distance to the nearest point on the goal line is looked at rather than angle.

The segments are a hierarchy so group h for example are shots inside the box that are not included in groups a-g.
Shot Type Summary - Even within Big Chances, there's the opportunity to split further although this does start to create small segments
With 9 different types of shot you're arguably spreading yourself too thin so I've grouped the chances based on the colour types above to give a grade to each shot:

There's plenty more that can be done with this with a bit more time (and a bit more data) but the above shows that all shots are far from equal and that a qualitative measure such as 'big chance' can help give more understand of chance quality than location alone, although if you were doing this for real, you'd do a lot more work grouping the chances based on video.