Monday, 11 April 2016

This model is wrong but it may be useful

Expected Goals (xG/Goal Likelihood) models are increasingly common and try to add a bit more understanding to what’s going on in a game to try and get beyond just the scoreline or the top level shot stats.

None of this is to knock a lot of good work that’s being done in this field (there’s a list of useful links at the end of this), but anyone can build a model.  Technically you could call what Lawro does in his predictions a model even if it’s probably a fairly simple ‘mental flowchart’ to pop out the score at the other end.

The aim of this piece is give a very basic introduction to looking at Expected Goals.  It’s not even that big a leap from when managers talk about restricting the other teams ‘big chances’.

All figures that follow come using Opta data but will have been butchered and filleted by my own fair hands so any errors/omissions etc., will be mine not theirs. 

Analysis excludes Penalties and Free Kicks which are special cases and will be dealt with separately some other time although there’s plenty on free kicks from across the Big 5 Leagues on my blog.

If we start with a very simple model where we assume all shots are created equal we get the following for Premier League data:

So if we apply the 2010-2014 conversion rate to 2014/15 activity (excluding set piece/own goals) we get the following:

This, as I’ve mentioned before is a model, just because you call something a model doesn’t necessarily make it any good.

As always, the key thing with looking at any numbers is the inferences, you could look at the numbers and say:
  • Chelsea were lucky
  • Chelsea were more skilful in converting chances
  • Chelsea created better than average chances
  • The model's crap
Many the time I’ve been in a meeting post a marketing campaign and the conversation has gone something like this:

Boss: We forecast sales of £100k but only sold £73k, why was that?
Marketing Wonk 1: The weather hasn’t been very good the last couple of weeks?
Marketing Wonk 2: People’s budgets are stretched post-Christmas?
Me (in my head, as I'm a coward): It’s because the £100k figure was a nice round number you pulled out of thin air and has no real basis in fact

It’s a natural trait to try and rationalise any figures you see after the event but generally you should always be wary of any explanation you’re given and should try to at least run a quick sense test on things.

A nice example of this is the Baresi/Maldini stat of 23 goals conceded in 196 games that was doing the rounds last year even though it’s completely wrong (there’s a good piece here on it).

Going back to our crude expected goals model, obviously not all shots are equal, if we do a basic inside/outside box split we get the following:

Shots from inside the box convert at 4-5 times those outside the box, so this split helps differentiate between those who are shot heavy outside the box (e.g., QPR with 47% of shots coming from outside the box) with those doing their work in better positions (Man C with only 27% of their shots from outside the box).

At Leicester’s ‘Tactical Insights’ performance analysis event a few weeks ago, when asked about stats, Roy Hodgson said if Shots suddenly were the key thing he’d get people to shoot from half way, he was being a bit flippant as even a 3-year-old knows a shot at an open goal from 3 yards has a better chance of being a goal than a punt from 40 yards.

What you’re left with then is a balance between more and more detail (angle of shot, headers, defensive pressure) better explaining a team/player’s activity against over-complicating things and creating a ‘black box’ approach which spits out a final number that may be harder to explain to players/management.

Using data is better than not using data, using detailed data (e.g., Shot Location) is better still and even better again is combining it with video.

In the chance below, Rooney has a simple tap-in so at the point the shot is taken the likelihood of a goal is close to 1, if you treated that shot the same as all others from that location then Rooney (and Man Utd) would then be outperforming xG, but unless you had the video you wouldn’t know if this was luck, good finishing, good positioning by the forward or good chance creation.  

Beyond the shot itself, subsequent versions would have a more fluent xG at any given time (there’s a number of people building non-shots models) which basically say ‘I have the ball in this position, what’s the likelihood I score before giving away possession’.   In the case of the Rooney goal you could take it several steps back with the likelihood increasing as each part of the move is successfully completed.

The example below from Basketball gives a good example of this in practice:
As does this video from Prozone’s Paul Power (Expected Goals example is about 8 minutes in but the whole video is well worth a watch).
Performance.LAB Innovation Seminar: Game Intelligence - Paul Power from Prozone Sports on Vimeo.

The concept that some areas are better to shoot from that others isn’t a hugely difficult one to grasp although if the example from American Football is to go by from earlier this year, maybe teams aren’t fully optimising their activity.
Single Match Expected Goals – Will Gurpinar-Morgan
Expected Goals – Martin Eastwood