Sunday, 24 August 2014

Swans 1 - Burnley 0 Stats and Chalkboards

They say not playing well but still winning is the sign of champions, that's obviously stretching things here and ultimately the second half was more about Burnley's missed chances than snuffing out Burnley's threat (Monk in an interview saying he was pleased that Burnley were restricted to only 1 attempt was true but felt more of a case of putting a positive spin on things).

This was certainly a proverbial 'game of two halves' with Swansea looking to protect their first half lead after Dyer's well taken goal and with a mixture of luck, bad finishing and some good goalkeeping from Fabianski ended up with the 3 points.
Shots by Minute - Swans with only 1 shot in last 40 minutes. Despite the last 20 minutes seeming pretty nervous, Burnley only had 1 shot after the 74th Minute
In situations where a team is protecting a 1 goal lead it's always difficult to determine how much of the pressure from the losing team is due to them forcing themselves on the game and how much is the team in teh lead taking a more defensive approach.

In terms of pass volumes, Swansea's falls off a cliff from around the 70th minute after a period of domination of the ball either side of half time.
Average Passing numbers per minute (Average out over the last 10 minutes at each point) - From 70th minute onwards Burnley has two thirds of possession (122 passes to Swansea's 61)
A good example of the change is looking at Ash's passing by half.  This was 100% in the 1st half (44/44) and 73% in the 2nd half (16/22):
Ash's passing by half, mainly short and lateral in 1st half, only half the volume (and generally longer passes) in the 2nd half
A similar thing can be seen from Fabianski's distribution:
Fabianski with only 1 short pass in the second half as Burnley look to increase pressure where in the 1st half the short ball to Rangel or long to Bony were more successful 
Days like this one need to be remembered when we play well but things don't go our way, on another day, the 1st half pressure could have yielded a second goal (an ultimately a 3+ goal win), but also better Burnley finishing could have seen the points shared.

Update: Another way of looking at the level of pass control would be to look at % possession not just total volume which gives the chart below:
Swansea's possession stats at rolling 10 minute intervals, barely breaking 40% in the last 20 minutes
The 5 minute possession chart is a bit more up and down than the 10 minute version but more accurately reflects swings during a match.

Other Posts: Premier league Shot LocationWorld Cup Shot Location World Cup Distance + Sprint Stats
Twitter: @we_r_pl
Match Stats: Created using Statszone , Whoscored  and Squawka

Sunday, 17 August 2014

Man Utd 1 - Swans 2 Stats and Chalkboards

As Gylfi Sigurdsson stroked in what turned out to be the winner, the commentator called it a 'Smash and Grab' which was an enormous exaggeration.  Yes, Man Utd might have outshot the Swans by 14-5 but in terms of decent chances there were only 3 and all of them ended up as goals.
Shots by Minute - Quality not Quantity was key here, the near 20 minute period in the middle of the 2nd half where Man Utd didn't manage a shot shows how well Swansea dealt with any threat
MOTD tweeted that the 29 passes in Ki's goal were more than any last season (beating the 24 for Bony's goal v Arsenal):

I mentioned in the preview yesterday, that this was a great time to be playing Man Utd due to a mixture of injuries, Van Persie not yet being available due to his World Cup exploits and the fact that (even more so now), there will likely be a number of new signings coming in before September.

Man Utd had most of the ball but did relatively little with it, much was made of their over emphasis on crossing last season under Moyes but this seemed to be more of the same:
First half crossing came almost exclusively down the Swansea left and the opposite side in the second half. 2 of the 4 successful crosses were from corners.
Crossing from this kind of distance from goal would normally be classed as 'Putting it in the Mixer' and generally speaking it was dealt with easily, predominantly by Ash who made 15 clearances, 9 more than the next player (Amat).
Williams with most Clearances and Shelvey with most recoveries.  Jonjo made 4 fouls by the time of Man Utd's equaliser but none afterwards (although was one incident where ref played advantage).
For me the central part of the first team when everyone is fit would be Leon/Ki/Sigurdsson and although Jonjo sailed close to the wind at times he put in a good performance as Man Utd created very little through the middle.

Jonjo and Ki's passes received by half - Shelvey far more heavily involved in working with the ball in the first have but focused more of defensive activity in the second
Van Gaal's change to a 4-4-2 in the second half and in particular Januzaj's attacking of Taylor brought some initial results (and led to the corner from which they scored).  I think too much could be made of Taylor's substitution, yes he got skinned a couple of times but once your on a yellow in that situation, it's a prudent step to be replaced (even if it is by the man most people's whipping boy - Dwight Tiendalli).

It was a bit of a surprise to see this tactic greatly reduced after Swansea got their second but is probably the kind of area you'd need to re-watch the game a couple of times as it could well be due to the extra protection Dwight received (from Shelvey?).  Also looking at Januzaj's passes received he's a lot more involved down Swansea's right after Herrera comes off.
Man Utds 2nd half take-ons, pre and post Gylfi's winner
Looking at Fabianski's activity, he didn't have to do anything spectacular, but what he did have to do, he did well.  In terms of distrubution, after passes to Amat (6) his next highest was to Bony.  The margin of error on this kind of thing is pretty small, a good pass and it's on Bony's chest and nobody's is getting it off him, a bit too high and then it's 50:50 at best.
Fabianski's passes to Bony (also played 4 to Gomis in the 15 or so minute he played) and Bony's aerial duels
As Garry Monk mentioned at the Fans Forum, the Villarreal game was more about minutes on the pitch than formation (I'd be surprised in Gylfi plays on the wing that often) and it was a case of going back to a more familiar formation.

The big difference for me this season is the strength of the squad with a variety of options depending on opposition/score e.g.,

  • 2 of Leon/Ki/Shelvey
  • 2 of Bony/Gomis/Sigurdsson
  • 2 of Routledge/Dyer/Montero
As well as other players such as Emnes, as long as we are not too unlucky with injuries we'll have the option to make substitutions that are like-for-like in terms of quality which hasn't been the case in previous seasons which is a credit to the club as it shows how they've used TV and transfer money to build season upon season.

Update: I'm looking at different ways of presenting match stats and the one below shows pass volumes based on an average of the last 5 minutes, this helps to show (roughly) the level of control a team has at any given time.  As has been seen with the rise in counter attacking football over the last couple of years, not having the ball isn't always a bad thing but bearing that in mind it does show who has control at any given time.
Chart shows Man Utd's early control of the ball and Swansea's lack of passing prior to putting the 29 together for Ki's goal around the 31 minute mark.  The impressive things from a Swansea viewpoint are the control at the end of the first half and (at least initially post Utd's equaliser)
Other Posts: Premier league Shot LocationWorld Cup Shot Location World Cup Distance + Sprint Stats
Twitter: @we_r_pl
Match Stats: Created using Statszone , Whoscored  and Squawka

Saturday, 16 August 2014

Swans v Man Utd - Pragmatism v Optimism

Only a couple of hours now to the start of the new season, it was interesting to see comments by Huw Jenkins at certain points last season that he felt there was a bit of a defeatist attitude (see below from Guardian article in Nov 2013):

It is a theme that Jenkins returns to time and again without being able to put his finger on the reasons for the change in mindset. "I don't know [why it has happened]. But I've sensed it this season more than ever. We generally tend to talk about things, for me, in the wrong way. We don't see every game as a winnable game, which is not right and it's not our mentality."

Whenever I heard Laudrup speak, I thought he pretty much always talked perfect sense, the problem is whether you want a manager to have the attitude along the lines of "This side are better than us, lets keep it tight, get to 60 minutes and see what happens".

This attitude meant other than the first day 4-1 defeat to Man Utd, no side gave us a pasting last season even though as can be seen in the chart below, we only got 2 points from 16 games against the top 8.
Of the 2 points, 1 came in the Monk era v Arsenal and the other was 'The Shelvey Game' v Liverpool
With Man Utd still in a state of transition, in terms of today's starting 11 it arguably be weaker than that which finished last season, this is a huge opportunity to get some real momentum into the season.  I'm not wanting to run down the Laudrup tenure, but in part (league wise) it was 7 points from the first 3 games followed by average performance thereafter.

I think he'll start on the bench today but I do like the look of Gomis, here's his shot chart from last season with 14 goals (1pen) from 86 shots.  If Bony can get into double figures again and Gomis and Sigurdsson both get close to that too then we'll have a strike force to be feared.

Wednesday, 13 August 2014

Premier League Shot Location Analysis

Before the World Cup I wrote a quick post on Shot Segmentation with the aim to classify the 10,000 or so shots taken during the 2013/14 season into a smaller number of groups to try and quantify the 'quality' of opportunities each side has (and concedes).

The full background is here, but basically is a case of using a mix of Opta's big chance metric to get beyond just using location data, combined with location and shot type (e.g., was the attempt a header).  It's not as good as having the full video of every shot but is an improvement on just knowing shot volume.
These categories aren't set in stone but are a decent first pass at grouping shot types
I then simplified this down to 5 categories (by the colours above) to give the segments below:

None of the above is perfect, some of the splits are arbitrary and there will be some chances that are on the boundary of two groups where a slight change in location could change it from being a 'Good' to an 'OK' chance.  The aim for now though is to get a rough feel of the type of chances teams are creating.  With a bit more data (and a bit more time), I'd expect what goes in to these groupings to change a bit.

Splitting this out by team gives the following for Goals scored:

Obviously that's a pretty busy table and I'll probably look at individual teams in more detail when they are due to play Swansea, but the key things from it for me are:

  • Liverpool scoring 10 penalties (of 12 attempts) , on the latest Game podcast Rory Smith makes the point of how often Suarez had the knack of playing the ball on to an opponents hand, they'll still have plenty of pace in the team but may find it hard to get that number of penalties again although I'm sure Liverpool fans could reel off plenty of times they didn't get one
  • Norwich (& Swansea) scoring only 1 goal from a 'Great Chance' all season.  Different teams have different styles and some may lack the close in shots that are the 'Great Chances'  but neither created too many of these chances ether (5 and 4 respectively).
  • West Ham with only 3 goals from 'Hit and Hope', their lower conversion is partly due to them having the highest number of shots from headers 

Looking at actual shot location adds a bit more context, below is Norwich's shot map:
Norwich had below average conversion in each of the groups apart from 'Hit + Hope', there's a lot of chances within the 6 yard box (but few classed as 'Great Chances' you'd need video to get further into whether it was bad chances or bad finishing
Excluding pens and OGs Chelsea and Arsenal scored 64 and 65 goals respectively but their shot location is quite different:
Chelsea with a high proportion of goals coming from on or just outside the edge of the box (which highlights the issue of having a inside/outside box split) as well as close in within the central part of the 6 yard box
Arsenal's goals coming more from the central area of the penalty box and fewer from outside the box
Having this information by match, means we can also turn it on its head and look at the kind of shots a team conceded.  Everton had a particularly good season with the question being can it be repeated.  Looking at the 'Good Chances' they conceded (i.e., Big Chance headers along with Big Chance shots from more than 6 yards out), of the 34 they conceded only 4 ended up as goals (where you would expect 12 based on average performance).

There's always the risk by cutting data too many times you're left with something which just exists by chance, but given Howard's performance v Belgium in the World Cup, it's possible he is above average at stopping these kinds of chances (he saved 12 of the 16 on target - 75% compared to a division average of 43%).

Fulham's goals conceded map makes uncomfortable viewing, a ton of goals from close in and plenty from range too:

Having this kind of data opens up all sorts of possibilities, the free kick chart below throws up some questions:
Optimal position seems to be around 25 yards out just to the right of goal - follow up could look at foot used by free kick taker.  The expanse of white centrally in front of the D is possibly artificial where players move the ball back to where they think is far enough back to allow them time to get the ball over a wall and back down
Goal Location for Top 4 - Man City prevalent on edge of 6 yard box and Liverpool with a number of goals from narrow angles
Other Posts: World Cup Shot Location , World Cup Distance + Sprint Stats
Twitter: @we_r_pl
Match Stats: Created using Statszone , Whoscored  and Squawka

Friday, 8 August 2014

World Cup Stats Part 2 - Shot Location

Following on from my previous piece looking at distance+sprint stats at the World Cup, this one takes a look at Shot Location.

All Shot Charts below exclude the 5 own goals and 12 penalties (exc. shoot-outs) that were scored in the tournament.

The overall shot chart is pretty busy, but you can see the lack of goals from outside the box:

Looking at just the goals, makes things a bit clearer:
Goals from outside the box were fairly rare and the number of absolute screamers was limited.

For direct shots from Free Kicks, only 3 of 118 were converted:

These were the long range effort from Switzerland v France:
Messi's Free kick v Nigeria and Luiz's v Colombia:

I had planned (but never got round to) putting a piece together on David Luiz's free kicks at Chelsea and how they were an example of hope over experience, but his one against Colombia in the quarter finals was impressive.  Over a large number of events, shooting from that kind of distance might not be the optimal strategy, but if it comes off in the Quarter Finals of the World Cup you can't really knock it.

In terms of headers, 31 of 272 were successful with Van Persie's being the furthest out by some distance.
Grouped together by location the figures are:

When using the shot segmentation methodology I put together based on last season's Premier League, gives the following figures (more details here):

Looking at the two finalists, there's a marked difference in location (as well as volume) of goals.  Even if you took the Brazil game out the overall pattern would be pretty much the same for Germany, although for all the talk of a 'Golden Generation' for Germany and talk of Messi being tired, the difference between the two sides was minimal with Argentina arguably having the better chances even if they weren't as impressive as Germany on the way to the final.
Goals by Team during the World Cup (OGs/Pens excluded)
I'd argue that the Opta stats are a little bit out for Messi's goal (furthest right in the chart above) v Iran but it was still a great finish although closer than the chart suggests (I'm not knocking Opta, the accuracy is pretty good and also I'm cobbling this all from various sites without paying so can't exactly complain!):

The next stage is to repeat the work done on last season's Premier League with more data (other seasons) and other leagues, to get a better understanding of shot outcomes.  For the Swans fans who've got this far, Swans shot stats aplenty will be coming soon(ish).

Twitter: @we_r_pl
Match Stats: Created using Statszone and Squawka

Wednesday, 23 July 2014

World Cups Stats pt1 - Distance Run and Sprints

I know pre-season is already underway (check out The Jack Cast podcast for more on that), but for a bit longer I'm planning on looking back at stats from the World Cup.

FIFA's website has a load of detail on individual matches and overall stats although for a lot of the more detailed information is held in a multitude of PDF files (how I've put the data together is too dull for even a stats blog, but will probably cover it on my business blog at some stage).

I've done plenty over the last couple of years with Opta data but it was interesting to see FIFA provide information on distance covered and sprints for each player for every game which adds an extra layer of data.

What I've created is a dataset with factors such as time played, distance run and number of sprints, I've reworked the figures to re-calibrate to a 90 minute equivalent where someone has taken part in extra time (if someone has played a full match which with injury time is usually around 95 minutes then this is classed as 90 minutes).

As with any stat, context is key. Prilo's stats against England show someone who only sprinted 5 times in the game.  That doesn't make Pirlo lazy but someone who was able to control the game which was particularly useful given the conditions in Manaus.
At 538 recently they had a piece looking at something similar in terms of distance covered by Messi over the tournament and it's certainly true that he's at the lower end of the scale in terms of activity but like Pirlo that doesn't necessarily make him tired or lazy and would need to be compared to his normal output for Barcelona.

Messi's Distance/Sprint stats (dark purple) compared to all other players.  He's certainly at the lower end of the distance scale but no set pattern in terms of games.  Lowest distance covered was against Iran which will be not because he wasn't trying but because Argentina had most of the ball so fewer transitions between attack and defence.
In terms of the top sprinters, the top 10 shows it's not always a good attribute as 2 of the top 10 performances came from Brazilians against Germany (all well and good sprinting, but if that's sprinting back to try and recover a 3 on 2 due to you sprinting up and out of position in the first place then that's not so great).
Top sprinters - playing 90+ minutes of a match to avoid cases of high impact subs who might per minute produce higher rates. Could also look at different levels for the cut-off.
For distance, again the USA come out on top, this time with Bradley having 3 of the top 10.
Bradley with 3 of the top 10, most impressive of all is the one against Belgium where he's managed to keep the rate up to include extra time
With anything like this there's always all sorts of caveats such as the type of match (e.g., is one side dominating possession or is it end-to-end, the amount of time the ball is in play - I'd expect a stop-start game like Brazil-Colombia might be low on overall distance run), but provides some areas of interest.

One of the biggest criticisms of British football is the love of someone who can 'Just fahking run around a bit' over technique and if 'having a good engine' was everything, then it'd be Mo Farah rather than James Rodriguez being unveiled at Real Madrid.

As always it's a balance between skill and athleticism but I'd imagine it'd be far easier to improve someone's physical condition that their skill level, especially beyond their mid-20s.

Thursday, 12 June 2014

Beyond Big Chances - Shot Segmentation

This was initially going to be a much bigger piece but with the World Cup only hours away, I wanted to get something out before I get sucked into 3 games a day for the next couple of weeks.

There's a lot of work going on around Expected Goals, in any match you may lose but have had better quality chances and if the match were replayed infinitely with the same chances for each side you'd win more than you lose.

The aim of this is to remove the 'luck' from individual games and focus more on longer term performance.  If on a regular basis, based on expected goals you 'deserved' to lose, then even if recent results have gone your way over the longer term things are likely to go sour.  "The only stat that matters is the score" is fine in a cup final but longer term it's more important to track the underlying numbers and not just results.

How 'long' is long term will depend, just as you could theoretically toss a coin 10 times and get heads each time, some players or clubs could be performing above expectation due to reasons beyond their control rather than due to any real above average ability.

Anyone interested in doing this based on data in the public domain is usually working with one hand tied behind their back as a lot of context around a shot (e.g., defensive pressure) is missing.  I'm not moaning that Opta don't go around giving away for free all the data they spend money collecting, but it does make things trickier.

The only stat that gives context to a chance (beyond location and body part) is Opta's Big Chance stat which they define as where there is a 'reasonable' expectation of a goal.  These chances convert at around 40% (excluding penalties) compared to around 5% for other shots.

This gives a good split between 'big' and 'not so big' chances, but just having two types of shot is a bit limited to differentiating between shot types.  To get beyond this I've used Big Chance data along with shot type/location data to try and break chances into a few more meaningful segments.

There is more work behind this but for sake of brevity I'll leave that for another time, this is just a basic first pass where only distance to the nearest point on the goal line is looked at rather than angle.

The segments are a hierarchy so group h for example are shots inside the box that are not included in groups a-g.
Shot Type Summary - Even within Big Chances, there's the opportunity to split further although this does start to create small segments
With 9 different types of shot you're arguably spreading yourself too thin so I've grouped the chances based on the colour types above to give a grade to each shot:

There's plenty more that can be done with this with a bit more time (and a bit more data) but the above shows that all shots are far from equal and that a qualitative measure such as 'big chance' can help give more understand of chance quality than location alone, although if you were doing this for real, you'd do a lot more work grouping the chances based on video.