In the Championship (and below), match
stats are limited top level information such total shots, shots on target,
possession %. In the Premier League
pretty much every activity gets recorded and with tools and data sources such
as Statszone and EPL Index it’s possible for anyone to tap in to this
information.
As great as these resources are, as I’m an
analyst in my day job (usually on less interesting subjects such as results of
Marketing activity), I like to get to the lowest level information possible to
be able to build results that I’m interested in, rather than having to rely on
sources that have already aggregated data.
To help with this, Opta have kindly
provided the full activity detail for the Swansea 3 – Arsenal 2 game which was
one of the highlights of the season. The
aim of this post is to show the kind of analysis that is possible with this
detailed data above what might be publicly available along with some season
long views that help put the Arsenal game in context.
The detailed Opta data contains the
recording of over 1,800 separate events that take place during the game; these
can be things such as pass, shot, tackle etc., with details on the player
involved, time, location on the pitch and direction of ball movement.
Below are some examples of the kind of
thing that can be done using this data:
Time
with Ball
As every event is Timestamped, it is
possible to know who is in possession of the ball at any given time. I have assigned possession as the team
responsible for the ball at that time e.g., if Arsenal play a long ball forward
the ball is still in their possession until it is touched by a Swansea
player.
For dead ball situations, (e.g., Free Kick,
Throw Ins, Goal Kicks) possession belongs to the team who are due to
restart. For Substitutions possession is
with the team making the substitution in that it is up to them how much time
the substitution takes.
![]() |
Fig. 1: Time in Possession by Half and
Possession Type
|
Both halves saw the ball in play for around
30 minutes which is not a big surprise with two passing teams taking part. Swansea had the majority of time with the
ball in play in the first half with almost the reverse figures in the second
half.
This game was relatively free flowing but it would be interesting to compare this with other games such as Stoke 2 - Swansea 0 where the ball seemed to spend most of its time wrapped in a towel ready to be thrown in by a Stoke player.
This game was relatively free flowing but it would be interesting to compare this with other games such as Stoke 2 - Swansea 0 where the ball seemed to spend most of its time wrapped in a towel ready to be thrown in by a Stoke player.
Ex-Manager Brendan Rodgers has mentioned that the
focus on possession is partly to give his forwards a rest to enable them to
play a high tempo pressing game when Swansea don’t have the ball. Both the possession and the pressing mean
that Swansea tend to dominate possession if not territory.
Activity
Sequences
By creating a chronological list of all the
events that occur during the game, it is possible to link events together to
determine sequences of events. One
example of this is looking at the number of consecutive passes made without
interruption.
Swansea had a passing sequence of 10+
completed passes 10 times compared to Arsenal’s 6. The most impressive of which was 26 Passes
from 33m 5s to 34m 24s. Sequences are
consecutive passes where possession is not lost (i.e., Accidental touches from
the opposition or fouls are ignored).
Other examples could look at who was the
last player on a team conceding to touch a ball, it may not always be their fault
of course but could help identify particular weaknesses e.g., poor clearances
from a full back.
Another example of being able to link data is looking at Pass Pairings. Using Statszone, I often look at passes
received as well as passes made. The
advantage of having the full data is being able to link the person making the
pass to the person receiving it. It also
gives the ability to combine the activity of players e.g., Nathan Dyer who was
then substituted for Wayne Routledge.
The figures above show that the passing
between Britton and Allen accounted for 20 completed passes in the first half
but only 9 in the second.
The quartet of Britton/Allen/Rangel/Dyer
have been key to Swansea’s passing game during the season and over the course
of the season with Swansea have one of the highest proportions of play down a single
flank with Rangel and Dyer heavily involved down the right as can be seen by
the chart below.
![]() |
Fig. 4: Proportion of Attack along
Left/Right hand side of pitch. Raw Data
from Whoscored.com Those below the blue diagonal are more right than left sided.
In the chart above it can be seen that
Swansea and Stoke favour the right hand side, Wigan and Wolves the left. Other notable figures include Fulham who have
low figures for both left and right side attack (i.e., are heavily central) and
Everton who have high % both left and right (therefore relatively little
through the middle. More on this at Zonal Marking who did the original work on side of attack which inspired the view above.
|
Pass
Distance
Every pass made is attributed a x,y co-ordinate
on the pitch from which inferences such as pass distance, pass direction can be
made.
This ability has been a consistent factor across the whole of the season as can be seen by his game by game figures below:
![]() |
Fig. 6: Britton’s pass completion rate per
game. Britton took part in all games
except Stoke (H) and Spurs (H). Overall
pass completion was 93.5% and was above 85% in every game.
|
In comparison, in the Swansea - Arsenal game, Miquel (74.1%) and Ramsey (75.9%), Arsenal’s
top two passers have a far lower pass completion rate and a higher average pass
distance suggesting that Swansea’s pressing resulted in less controlled passing
and less time for the support player to get in to the perfect position.
Splitting
Activity by Time
By having the granular information of every
event it is possible to use this to aggregate in any way required, one example
is to look at the pass volumes by minute during the match.
Having the figures at this low level means it is possible to create bespoke bandings of activity which can highlight the shifts in dominance in terms of possession:
![]() |
Fig. 8: Passes completed per team per Time Period. This makes it easy to see the huge discrepancy in passing volumes between the 11th and 40th minute |
Taking a cumulative view of the same data shows where the passing volumes for the two teams diverge at around 10 minutes and also the two spikes in activity for Arsenal either side of Half Time and also at around 65 minutes.
![]() |
Fig. 9: Cumulative Number of Passes completed by minute |
However, splitting the pass completion
rates by half shows the increase in completion rates for Arsenal in the second
half.
Splitting activity by time period is technically possible
using Stats Zone but would require a lot of manipulation to return the figures
for all players. Having the raw data
means being able to look at all players at the same time (and also over varying
time periods such as pre/post the introduction of Rosicky) to be able to determine what is worth looking at graphically.
As well as completion rates, the figures
can be used to look at overall pass volumes, the Swansea centre-back pairing of
Williams and Caulker for completed 59 passes in the first half compared to 28
in the second as Arsenal pressed more and also controlled possession better in
the second half.
For any single game there is only so much
insight that can be gained, the real value of this kind of analysis comes from
looking at activity over a season. In
the Premier League there are 380 matches a season, so there is only so many of
them that can be watched in any detail, having the ability to manipulate data as well as the understanding of what is and isn't significant could be an extra tool for a team to better understand themselves and probably more importantly, their opposition.
Pure number crunching can never identify
the nuances present in a match that may be spotted when watching a game but should be
seen as a guide pointing towards areas of interest as well as providing a
dispassionate view of activity.
Twitter: @we_r_pl http://www.twitter.com/we_r_pl
Twitter: @we_r_pl http://www.twitter.com/we_r_pl
Match Stats: Created using data supplied by Opta www.optasportspro.com, http://eplindex.com/ http://whoscored.com and Stats Zone
Chalkboards: Created using Stats Zone http://fourfourtwo.com/statszone