Friday, 1 June 2012

Swansea 3 – Arsenal 2: A detailed analysis using Opta data

The reason I started the We Are Premier League blog at the start of the season was due to my interest in both Football and Data Analysis.  Swansea’s promotion to the Premier League gave me the opportunity to use the wealth of information available for games at that level.

In the Championship (and below), match stats are limited top level information such total shots, shots on target, possession %.  In the Premier League pretty much every activity gets recorded and with tools and data sources such as Statszone and EPL Index it’s possible for anyone to tap in to this information.

As great as these resources are, as I’m an analyst in my day job (usually on less interesting subjects such as results of Marketing activity), I like to get to the lowest level information possible to be able to build results that I’m interested in, rather than having to rely on sources that have already aggregated data.

To help with this, Opta have kindly provided the full activity detail for the Swansea 3 – Arsenal 2 game which was one of the highlights of the season.  The aim of this post is to show the kind of analysis that is possible with this detailed data above what might be publicly available along with some season long views that help put the Arsenal game in context.  

The detailed Opta data contains the recording of over 1,800 separate events that take place during the game; these can be things such as pass, shot, tackle etc., with details on the player involved, time, location on the pitch and direction of ball movement.

Below are some examples of the kind of thing that can be done using this data:

Time with Ball
As every event is Timestamped, it is possible to know who is in possession of the ball at any given time.  I have assigned possession as the team responsible for the ball at that time e.g., if Arsenal play a long ball forward the ball is still in their possession until it is touched by a Swansea player. 

For dead ball situations, (e.g., Free Kick, Throw Ins, Goal Kicks) possession belongs to the team who are due to restart.  For Substitutions possession is with the team making the substitution in that it is up to them how much time the substitution takes.

Fig. 1: Time in Possession by Half and Possession Type
Both halves saw the ball in play for around 30 minutes which is not a big surprise with two passing teams taking part.  Swansea had the majority of time with the ball in play in the first half with almost the reverse figures in the second half.

This game was relatively free flowing but it would be interesting to compare this with other games such as Stoke 2 - Swansea 0 where the ball seemed to spend most of its time wrapped in a towel ready to be thrown in by a Stoke player.

Ex-Manager Brendan Rodgers has mentioned that the focus on possession is partly to give his forwards a rest to enable them to play a high tempo pressing game when Swansea don’t have the ball.  Both the possession and the pressing mean that Swansea tend to dominate possession if not territory.

Fig. 2: Ball Possession % and Territory (% of time ball in Opposition half) by Game 
Even though the possession figure may vary considerably by game, Swansea’s territory figure is consistently around 40% (i.e., the ball is in the Swansea half 60% of the time).

Activity Sequences
By creating a chronological list of all the events that occur during the game, it is possible to link events together to determine sequences of events.  One example of this is looking at the number of consecutive passes made without interruption.

Swansea had a passing sequence of 10+ completed passes 10 times compared to Arsenal’s 6.  The most impressive of which was 26 Passes from 33m 5s to 34m 24s.  Sequences are consecutive passes where possession is not lost (i.e., Accidental touches from the opposition or fouls are ignored).

Other examples could look at who was the last player on a team conceding to touch a ball, it may not always be their fault of course but could help identify particular weaknesses e.g., poor clearances from a full back.

Another example of being able to link data is looking at Pass Pairings.  Using Statszone, I often look at passes received as well as passes made.  The advantage of having the full data is being able to link the person making the pass to the person receiving it.  It also gives the ability to combine the activity of players e.g., Nathan Dyer who was then substituted for Wayne Routledge.

Fig. 3: Top 10 Passer/Receiver combinations for completed passes.  Overall, 31% (130/423) of Swansea’s Completed Passes either were made by or were to Leon Britton.  Completed passes that came off an opposition player are excluded.   
The figures above show that the passing between Britton and Allen accounted for 20 completed passes in the first half but only 9 in the second.

The quartet of Britton/Allen/Rangel/Dyer have been key to Swansea’s passing game during the season and over the course of the season with Swansea have one of the highest proportions of play down a single flank with Rangel and Dyer heavily involved down the right as can be seen by the chart below.

Fig. 4: Proportion of Attack along Left/Right hand side of pitch.  Raw Data from  Those below the blue diagonal are more right than left sided.

In the chart above it can be seen that Swansea and Stoke favour the right hand side, Wigan and Wolves the left.  Other notable figures include Fulham who have low figures for both left and right side attack (i.e., are heavily central) and Everton who have high % both left and right (therefore relatively little through the middle.  More on this at Zonal Marking who did the original work on side of attack which inspired the view above.
Pass Distance
Every pass made is attributed a x,y co-ordinate on the pitch from which inferences such as pass distance, pass direction can be made.
Fig. 5: Average Pass distance per player – Ordered by Number of Completed passes made.  
The three main passers in the Swansea team all have an average pass distance of under 15 metres.  Of Britton's 71 successful passes, 21 were under ten metres and 44 under fifteen metres.

Fig. 5b: Britton's passing.  The actual passing is the 'easy' bit, it's the positioning and anticipation that is special.  Without wanting to get too pretentious it's his ability to be like a chess player being able to anticipate a couple of moves ahead and to be in the right place as a result that is so impressive. 

This ability has been a consistent factor across the whole of the season as can be seen by his game by game figures below:

Fig. 6: Britton’s pass completion rate per game.  Britton took part in all games except Stoke (H) and Spurs (H).  Overall pass completion was 93.5% and was above 85% in every game.
In comparison, in the Swansea - Arsenal game, Miquel (74.1%) and Ramsey (75.9%), Arsenal’s top two passers have a far lower pass completion rate and a higher average pass distance suggesting that Swansea’s pressing resulted in less controlled passing and less time for the support player to get in to the perfect position.

Splitting Activity by Time
By having the granular information of every event it is possible to use this to aggregate in any way required, one example is to look at the pass volumes by minute during the match.

Fig. 7: Pass Volumes by Minute, from this the relative periods of dominance can be seen.  Generally Swansea control possession with Arsenal having strong periods either side of half time and between the 65th and 75th Minute (shortly after the introduction of Rosicky and Henry).
Having the figures at this low level means it is possible to create bespoke bandings of activity which can highlight the shifts in dominance in terms of possession:

Fig. 8: Passes completed per team per Time Period.  This makes it easy to see the huge discrepancy in passing  volumes between the 11th and 40th minute
Taking a cumulative view of the same data shows where the passing volumes for the two teams diverge at around 10 minutes and also the two spikes in activity for Arsenal either side of Half Time and also at around 65 minutes.

Fig. 9: Cumulative Number of Passes completed by minute

However, splitting the pass completion rates by half shows the increase in completion rates for Arsenal in the second half.

Fig. 10: Pass Completion by half – Ordered by overall completion rate.  Key areas to note are that the top two pass completion players are both Arsenal substitutes (highlighted in Pink) and the change in pass completion rates of the two goalkeepers (highlighted in yellow).  For Vorm, unable to play as many short passes out to his defenders in the second half due to Arsenal pressure, there are more long balls often resulting in loss of possession.
Goalkeeper passing in each half shown using Stats Zone, the raw data can be used to  highlight key areas of interest but then usually best to present this back graphically to give a better level of insight
Splitting activity by time period is technically possible using Stats Zone but would require a lot of manipulation to return the figures for all players.  Having the raw data means being able to look at all players at the same time (and also over varying time periods such as pre/post the introduction of Rosicky) to be able to determine what is worth looking at graphically.

As well as completion rates, the figures can be used to look at overall pass volumes, the Swansea centre-back pairing of Williams and Caulker for completed 59 passes in the first half compared to 28 in the second as Arsenal pressed more and also controlled possession better in the second half.

For any single game there is only so much insight that can be gained, the real value of this kind of analysis comes from looking at activity over a season.  In the Premier League there are 380 matches a season, so there is only so many of them that can be watched in any detail, having the ability to manipulate data as well as the understanding of what is and isn't significant could be an extra tool for a team to better understand themselves and probably more importantly, their opposition.

Pure number crunching can never identify the nuances present in a match that may be spotted when watching a game but should be seen as a guide pointing towards areas of interest as well as providing a dispassionate view of activity.

Twitter: @we_r_pl
Match Stats: Created using data supplied by Opta www.optasportspro.com and Stats Zone
Chalkboards: Created using Stats Zone