You are on page 1of 12

Computers & Graphics 84 (2019) 122–133

Contents lists available at ScienceDirect

Computers & Graphics


journal homepage: www.elsevier.com/locate/cag

Special Section on SIBGRAPI 2019

How do soccer teams coordinate consecutive passes? A visual


analytics system for analysing the complexity of passing sequences
using soccer flow motifs
Jose Luis Sotomayor Malqui a, Noemí Maritza Lapa Romero a, Rafael Garcia a,
Hande Alemdar b, João L.D. Comba a,∗
a
Instituto de Informática – UFRGS, Brazil
b
Department of Computer Engineering, Middle East Technical University (METU), Turkey

a r t i c l e i n f o a b s t r a c t

Article history: The analysis of passing strategies plays a major role in soccer. Soccer managers use scouting, video
Received 16 April 2019 footage, and soccer data feed to collect information about tactics and player performance. However, the
Revised 23 August 2019
nature of passing strategies is complex enough to reflect what is happening in the match and makes it
Accepted 29 August 2019
hard to understand its dynamics. Furthermore, there exists a growing demand for pattern detection and
Available online 3 September 2019
passing analysis popularized by FC Barcelona’s tiki-taka. In this paper, we describe a visual analytics sys-
Keywords: tem to analyze the sequence and trajectory of consecutive passing sequences. We describe a two-phase
Computers and graphics clustering algorithm that extracts typical trajectory clusters in passing sequences, which result in eight
Formatting predominant clusters. The combined analysis of the sequence and trajectory clusters allow experts to
Guidelines perform multi or single-game analysis in various ways. We show the potential of our approach in case
studies using data from the Brazilian and Turkish leagues and report feedback from soccer experts.
© 2019 Elsevier Ltd. All rights reserved.

1. Introduction events, previous works [3–5] study particular sequences of events


that might result in real scoring opportunities. These events could
Soccer is a widely popular sport and the one with more rev- be player positions, shots or the positioning of the ball during a
enues in the global sports events market. Managers, advertisers, passing sequence. Hughes and Franks [6] show that, for successful
and club owners follow the performance of a team in detail. To teams, longer ball possessions were confirmed to produce more
gain competitive advantage and success in local matches and in- goals than shorter passing sequences. Simultaneously, the preven-
ternational tournaments, the use of data in soccer has seen a huge tion of ball loss reduces the probability of taking a goal because of
growth in recent years [1]. In contrast with other sports, the low counter-attacks. Recent studies are revealing a growing interest in
probabilities of scoring in soccer and the team strategies add to studying the complexity structure of passing strategies [5,7] and
the complexity of game analysis. The complexity increases due to their impact in crucial plays in matches.
external variables like weather, home advantage, team formations, The goal of our work is to complement the analysis of se-
among others. The analysis of soccer matches allows a team to quences of four consecutive passes proposed by Gyarmati and
learn about its errors and to study the adversary. Anguera [5], also called soccer flow motifs, which allowed to
The analysis of different formations has been widely studied confirm the FC Barcelona as having a unique style of play. In
since the beginning of soccer [2]. Match statistics benefit both addition to their combinatorial analysis of soccer flows motifs,
coaches and players by adding performance information to their we propose to consider the position of players (trajectory) in
knowledge. However, apart from key events in a soccer match passing sequences. As it is well known, soccer passing benefits
such as shots, goals, fouls, and number of passes, there exists an from triangular formations to create opportunities for offensive
interest from the research community to understand the dynamic and defensive plays, as well as spread out players on the pitch
aspect of the game. To deal with the complexity of the game allows a team to take advantage of space efficiently and move
the ball throughout the length of the field. We propose a Visual
Analitcs (VA) system to allow the exploration of the trajectory of

Corresponding author. passing sequences and their relationship with the players involved,
E-mail address: comba@inf.ufrgs.br (J.L.D. Comba). and validate our proposal with case studies using soccer data from

https://doi.org/10.1016/j.cag.2019.08.010
0097-8493/© 2019 Elsevier Ltd. All rights reserved.
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 123

premier soccer leagues. In summary, we outline the main research 2.2. Visual analytics systems for sports data
contributions of this paper as follows:
In combination with statistics, visualization, and analytics tech-
• A Visual Analytics system that supports the interactive analysis
niques are used to extract insights from sports data [18,19]. A pop-
of passing sequences using soccer flow motifs.
ular visual design is heatmaps [20], which displays the field of
• An unsupervised approach to discover trajectory patterns in
play using a color mapping proportional to the frequency of posi-
soccer flow motifs based on clustering by trajectory similarity.
tions in a given location. Another one is the flow graph [21] where
• A case study of passing strategy analysis using data from the
a graph represents a team, with players as nodes and the links
Brazilian Serie A 2015 dataset.
show the connections between players. There are several visual an-
• A second case study using data from Turkish Super League
alytics systems for sports analytics. Soccer Scoop [22] and Match-
2016, with feedback from soccer analysts and training staff.
Pad [23] use glyph-based visualizations to compare soccer play-
2. Related work ers and analyze performances during games. CourtVision [24] and
SnapShot [25], respectively designed for basketball and hockey, in-
We report related work focused on the statistical properties of troduce specific types of heatmaps focused on the ball and puck
soccer matches and visual analytics of sports data. shots. Legg et al. [26] describe a visual search system for Rugby
matches. They used a sketch-based interface to perform a search
2.1. Statistical approaches for understanding soccer without semantic annotation. Perin et al. [27] developed a tool that
offers different views on soccer match data for event comparison
Statistic analysis uses data mining and information discovery and generating automatic reports. Janetwzko et al. [28] detected
research. Gudmundsson and Wolle [8] developed tools to cluster relevant events and phases semi-automatically by integrating sta-
passes and movement of individual players. They calculate all pos- tistical features. Soccer drawn is a visualization that presents an
sible passing alternatives in a given time and compute the most analysis of a soccer game representing continuous movements of
frequent pass sequences. Additionally, they computed correlations the ball as lines [29]. The position of the lines in the same part
between sub-trajectory clusters computed from players movement of the pitch reveals trends in how the game was played. Soccer
as an evaluation of common actions. Lucey et al. [9] highlighted simulations create visualizations of matches to help managers im-
the problem of alignment when dealing with multi-agent trajec- prove their decision making. Shao et al. [4] propose a novel ap-
tories and presented a representation based on the player “role” proach for searching trajectory data in soccer matches in which the
instead of its “identity”. They showed an effective way of discover- user sketches a situation of interest based on two different simi-
ing team formation, and soccer plays using the proposed role rep- larity measures. Stein and Sacha [30] use a parallel coordinate plot
resentation. Pena and Touchette [10] used tools from network the- for statistical analysis. It includes a density distribution represen-
ory to describe soccer team strategies. They defined a passing net- tation of clustered data for activity phases of professional soccer
work with players as nodes and edges weighted by the number of players. Our proposal allows the analysis of soccer trajectories of
passes completed among them. From the resulting network, they passing sequences. In follow-up work, Stein et al. [31] proposes
identify soccer play patterns, determine key events and potential an integrated VA system that brings video footage to the analy-
weaknesses. Wei et al. [11] explored the “role-representation” and sis of soccer matches. The evolution of the spatiotemporal position
used a feature reduction strategy to create a compact spatiotempo- of players and tactical systems is discussed in the recent works by
ral representation. They found the most likely formation patterns Machado et al. [32] and Wu et al. [33].
of a team and showed a match segmentation used to detect game
phases without manual intervention. Lucey et al. [12] used Occu- 3. Soccer flow motifs
pancy maps to make comparisons between each team’s style of
play. They visualized the difference of occupancy between home In this Section, we introduce the notion of soccer flow motifs
and away matches and provided a method of automatically flag- used in the visual analytics system to perform pass analysis.
ging behavioral differences. This work was followed by Bialkowski
et al. [13], which utilized a formation descriptor to determine the 3.1. Soccer passing sequences
identity of a team. To do so, they minimize the entropy of role-
specific occupancy maps. Milo et al. [14] introduced the concept To help introduce the definition of a soccer passing sequence,
of a network motif, which defines that passing networks can be we explain the contents of the two soccer datasets we use in
reduced to complex networks to find structure similarities and this work. The first one is the 2015 Brazilian Serie A season F24
perform information processing. To quantify soccer changes, Lucey Opta feed [1]. The F24 Opta feed consists of XML files with event
et al. [15] presented a method to estimate the likelihood of chances tags, describing specific match situations along with their position.
of scoring. They trained a logistic regressor with strategic features Some events include passes, fouls, tackles, and corners. Due to data
such as defender proximity, speed of play and defensive formation. restrictions, we consider only the second half of the season. This
Regarding large-scale analysis, Bialkowski et al. [16] also worked dataset consists of 180 games with more than 30 0,0 0 0 events. The
with large datasets and presented a method to conduct both indi- analysis of this dataset considered 18 games per team. The second
vidual player and team analysis. They discovered player roles from dataset was provided by Sentio Sports [34], which offers a semi-
data by utilizing a minimum entropy data partitioning method and automated computer vision based solution for producing player
automatically detected formations. Gyramati et al. [17] used “flow locations and pass information. It includes 144 games from the
motifs” of passes to characterize a team behavior to find simi- 2016/2017 Turkish Super League. Since both datasets come in dif-
larities and disparities between teams using data from European ferent formats, we wrote specific parsers to extract the relevant
Leagues. We expand on their motif analysis by considering the re- fields for our approach: pass events, player identity, and event po-
lation with the trajectory clusters we were able to identify in pass- sition. The format of a pass event Pn is
ing sequences. With an emphasis in the study of passing behavior
Pn =< id, x, y, playeri , player j , t (n ) >
in soccer, Rein et al. [7] studies passing patterns in front of the goal
by estimating the control of space and number of defenders using where playeri is the player who passed the ball at time t(n) to
Voronoi diagrams. This proposal complements our approach and playerj . To identify a pass sequence it is necessary to establish that
offers other insights into the complex study of passing sequences. team kept ball possession. For this to happen, the following pass
124 J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133

jectory of the soccer flow motif. The insight given by the combined
analysis of the structure and trajectory of soccer flow motifs is ex-
ploited in our approach.

4. Clustering the trajectories of soccer flow motifs

The proposal of Gyarmati et al. [17] is fascinating, but it does


not take into account the trajectory shape of soccer flow motifs.
Clustering passing sequences by trajectory shapes can help iden-
tify passing strategies, specially for soccer where shapes as trian-
gles and diamonds are very common in the sport language. In par-
ticular, we posed ourselves the question: is it possible to cluster
the shape of passing trajectories formed by the position of players
in the order that the passes take place? Several points must be ad-
dressed to answer this question. For example, we need to define
Fig. 1. Soccer flow motifs: (a) five possible motifs: ABAB, ABAC, ABCA, ABCB and
how to compute the similarity between passing trajectories based
ABCD; (b) example that shows a passing sequence of 8 players and how a sliding
window identifies the 5 different motifs. on shape. More importantly, we need to choose a clustering algo-
rithm. In this section, we explain how we addressed these ques-
tions.
in a sequence starts from the player that received the pass pre-
viously. Also, the time interval between passes must be within a
given maximal time Tmax . In our tests, we compute ball possessions 4.1. Computing the similarity of passing trajectories
for each match using a threshold Tmax of 5 s.
Ball possession is formally defined by Gyarmati et al. [17] as a We need to define a metric that compares the shape of two
sequence of passes that fulfill two constraints: passing trajectories based on the position of players when passing
the ball. As explained above, a passing trajectory is composed of
player j (m ) = playeri (m + 1 ), ∀m ∈ {1, . . . , n − 1} six positions instead of just the expected four positions of players.
t (m + 1 ) − t (m ) ≤ Tmax , ∀m ∈ {1, . . . , n − 1} For each ball possession, we select all three-pass sub-possessions,
composed of six player positions.
3.2. The sequence of players in a soccer flow motif To compute the trajectory similarity, we use a metric that finds
the best alignment among sets of points. We need a similarity
A passing sequence is computed when a team has possession metric to compare passing trajectories that is both invariant to ro-
of the ball. For each possession, we use the soccer flow motif tation and length of passes. For this purpose, in a pre-processing
of passing sequences as described by Gyarmati et al. [17] that step, we resample passing sequences and apply invariant transfor-
introduces a sequence of three passes, involving up to four distinct mations. The preprocessing step consists of four operations: resam-
players (referenced by letters A, B, C, and D). Each four-letter pling, rotation, scaling, and translation. To improve the accuracy of
sequence is called a soccer flow motif. As can be seen in Fig. 1, similarity comparisons, we increase the resolution in the six-player
there are five distinct combinations. For instance, the flow motif trajectory by doing an equidistant sampling of n points over the
ABAC represents a sequence of passes with three different players: trajectory. We empirically chose n = 16 since it represented a rea-
player A passes to player B, player B passes back to player A sonable balance between computation speed and accuracy during
and finally, player A passes to player C. Their approach does not the similarity comparisons. We sum the distances between each
consider the identity of the involved players and focuses on the player and divide this length by n − 1 to obtain an increment be-
passing sequence to identify different styles of play. Among the tween each resampled point. The trajectory is processed from the
exciting results they present, they were able to identify a unique first pass position adding an increment in X-axis, to create new
style of play for FC Barcelona among teams in Europe. As it is points using linear interpolation.
well-known, FC Barcelona uses a passing sequence known as tiki- For the rotation operation, we use the “indicative angle” ap-
taka, where players pass the ball back and forth. They demonstrate proach [36] to approximate the best rotation angle that aligns two
that FC Barcelona stands out from other teams, in particular, due trajectories. Since passing trajectories may vary in size depending
to reduced use of the ABCD motif. on the pass distance, we apply a non-uniform scale transformation
to a squared domain. Even though horizontal or vertical lines may
3.3. The trajectory of soccer flow motifs distort by a non-uniform scaling, we identify trajectories by testing
if the smaller dimension of the bounding box exceeds a thresh-
Soccer coaches often train their teams to perform triangula- old (0.02% of the pitch length). If it does, we scale them uniformly
tions [35], referring to the triangular position of players in short and translate to a reference point. We rotate trajectories around its
passing sequences, which reveals the importance of the position centroid by its indicative angle. All trajectories are represented as
of players to understand passing sequences. For this purpose, we a feature vector v = [x1 , y1 , x2 , y2 , ... , xn , yn ] with n = 16. Where
need to consider the position of players in a passing sequence, also x, y are 2D coordinates from the trajectory sampled points after
called passing trajectory. A passing trajectory is composed of the preprocessing.
positions of players in a passing sequence. Unlike the combinato- Given the previous alignment, the similarity of passing sub-
rial structure of the soccer motifs described above, when looking sequences requires to rotate two vector trajectories vt and vg in
at the position of players, it is necessary to discuss how this in- an angle (θ ) that minimizes the distance between each point. To
formation is extracted from the data available. For example, if we overcome this problem, we use the approach for gesture recogni-
consider a passing sequence with four players, it is necessary to tion described by Li [37]. It uses the optimal angular distance, which
include two additional player positions, since the second and third computes the cosine distance to find the angle between two vec-
players in the motif might move after receiving the pass. The re- tors in a high-dimensional space. This allows finding the optimal
sulting six players position define a polyline that captures the tra- angle much faster, thus improving the number of computations.
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 125

Fig. 2. Sum of squared error (SSE) for different K values using the elbow Fig. 3. Example of ball trajectory bundling. We draw each of the segments of a
method [38]. We used K = 50 and K = 8 for the K-means and spectral clustering, passing trajectory in different colors. We display passing trajectories without (left)
where the SSE decreased abruptly. and with bundling (right).

Previous work [37] showed to be possible to solve this optimiza-


tion problem in closed-form with the equation: The display of all passing sequences inside each one of the 8
 vt (θ ) · vg 
clusters generated by spectral clustering gives an intuition of the
θoptimal = arg min arccos shape of the cluster. Since drawing all passing trajectories may lead
−π ≤θ ≤π |vt (θ )||vg | to overplotting, we used an edge-bundling algorithm [39] to sum-
marize the high-level trajectory (Fig. 3). Using this approach, we
where vt (θ ) is the vector obtained after rotating vt by θ . We used
visualize the resulting 8 clusters in Fig. 4 along with representative
the optimal angular distance for comparing flow motifs.
passing sequences displayed in their actual location on the pitch.
We display passing sequences using red, blue, and green segments,
4.2. Two-stage clustering of soccer flow motifs corresponding to the first, second, and third passes, respectively.
For simplicity, we used a nickname for each cluster that encodes
We present a clustering algorithm that separates ball trajecto- the shape of the cluster. Trajectories in cluster 1 and 3 have three
ries, regardless of the length of the passes and orientation. We first collinear players and a single pass with a divergent angle. How-
considered using a spectral clustering algorithm, since it is graph- ever, in cluster 3 both sides have a similar length (creating a peak),
based and suited for computing similarity of trajectories that are whereas one of the sides of cluster 1 is shorter (swoosh). Cluster
rotation invariant, represented by their optimal angular distance. In 5 has a bowl shape due to two consecutive passes with a similar
our preliminary tests, spectral clustering was able to produce good right angle. Clusters 2 and 4 present a zig-zag pattern. The angles
results for small sets of trajectories but became too slow when between trajectories in cluster 4 are similar, but in cluster 2 it has
processing the entire collection of trajectories. Spectral Cluster- one smaller acute angle. The acute angle suggests a small player
ing computes the eigenvalue decomposition of the Laplacian ma- movement due to a wall pass (side peak). Clusters 6 and 7 rep-
trix, which is computationally expensive for large matrices due to resent passes that close a circuit, which returns the ball near the
the required eigendecomposition computations. To overcome this original position with a crossing (cluster 6) or closer to the first
problem, we decided to use a preliminary filtering step that re- point (cluster 7). Finally, cluster 8 contains passes in a straight hor-
lies on a simpler and more efficient clustering algorithm, such as izontal or vertical line. Fig. 5 shows the pipeline of our approach.
K-means. The main reason to use K-means, and therefore a two-
step clustering approach, was to reduce the number of trajectories 5. Visual analytics system
to be considered by spectral clustering. Also, the K-means algo-
rithm using the Euclidean distance does not allow to discriminate We designed an interactive visual analytics prototype to sup-
trajectory by orientation. Hence, the orientation grouping is per- port the analysis of soccer flow motifs in single and multiple
formed by Spectral Clustering. matches. In the first version of the prototype, we used soccer anal-
The resulting two-step clustering algorithm works as follows. In ysis research questions to formulate preliminary tasks to drive the
the first step, a K-means algorithm with the Euclidean distance re- visualization design and interaction techniques. The goal was to
duces the number of trajectories and discover global clusters. Since build a user interface that supported multiple filtering capabilities
the K-means clustering does not allow to discriminate trajectories couple with interactive visualization of the passing sequences. This
by orientation, we use a graph-based spectral clustering that uses a prototype considered only the Brazilian league dataset. The second
similarity distance based on the rotation angle between needed to version of the prototype was developed after receiving the Turk-
minimize the distance between trajectories. The input for spectral ish league dataset from Sentio Sports and interacting with their
clustering is the similarity matrix computed with optimal angular staff, as well as feedback from the coaching staff of Turkish teams.
distance from the trajectory centroids obtained with K-means. The This latter interaction was instrumental in revising our list of re-
result of the spectral clustering outputs the most common clusters quirements and improving several aspects of our prototype. Upon
of passing trajectories. request, we also prepared special reports for upcoming matches in
We ran the clustering algorithm several times to find the num- the Turkish league. In this section, we present the main compo-
ber of clusters and prevent loss of information. We first validate nents, features, and visualizations of our visual analytics system. In
our algorithm using the Brazilian league dataset. We select the Fig. 6, we show screenshots of several components of the interface.
number of clusters using the Elbow method, a standard approach
to finding the number of clusters. In Fig. 2 we show that a good 5.1. Support for single and multiple match analysis
number of clusters for K-means is 50 representative. The clus-
ters fromK-means are sent to the Spectral Clustering. The elbow The system is designed to support the analysis of individual or
method suggests a good separation using 8 clusters. In the Turk- multiple matches at once (Fig. 6(a)). As requested by the experts,
ish dataset, we obtained 25 clusters for K-means, and the same 8 we allow the analyst wants to inspect the behavior of a given team
clusters for spectral clustering. in individual or multiple matches (e.g., all matches, matches at
126 J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133

Fig. 4. Trajectory cluster visualization using edge-bundling. Each cluster is composed of three passes in red (first), blue (second) and green (third). On top we show all 50
clusters generated by K-means, and include along each cluster the index from one of the eight clusters generated by spectral clustering. In the middle we summarize the
shape of the 8 clusters. At the bottom, for each cluster we give examples of passing sequences that occurred during matches. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. Clustering pipeline. Input: sequences of three passes are resampled and transformed to be invariant to orientation and location. Transformed sequences are processed
by a k-means and a spectral clustering. In the output, we generate clusters that encode the trajectories most used.
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 127

Fig. 6. Screenshots from the visual analytics interface: (a) in a first tab, matches can be selected individually or in groups from a search box and a drop-down list; upon
selection, a second tab (b–g) display the details of the selected matches; (b) motifs of selected match(es) can be filtered in several ways: by team, by type of motif, by cluster,
by origin and destination (defense, middle, and attack) and displayed using several parameters (colors of segments, transparency, etc.); (c) a histogram counts the number
of occurrences of players in motifs; (d) counts of motifs by type, by cluster, and by origin and destination; (e) timeline view display motifs in time and also filtering by time
intervals; (f) filtered motifs trajectories displayed over the pitch; (g) heatmap of filtered motifs; (h) a third tab displays the matrix count of motifs by type and cluster, one
for each team and pitch position. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

home, away matches, against a specific team, matches of the first nated views. Therefore changes in the selections reflect in the in-
or second half of the season, etc.). As the system may support sev- formation displayed (e.g., the passing sequences).
eral matches, support for searching for strings containing parts of
the team name is also supported. Once the matches are selected,
the interface opens in a separate tab analysis window.
5.3. Histogram of player participation in motifs

5.2. Selections of motifs and trajectories A histogram is displayed to reveal how many times a given
player participated in the selected motifs (Fig. 6(c)). Since all views
We have five different sequence motifs and eight different tra- are coordinated, any changes in the selection or filtering parame-
jectory clusters. Therefore it is important to support filtering the ters are automatically updated in the histograms. The user can also
analysis by any combination of motifs and trajectories (Fig. 6(b)). select specific players in the histogram to narrow the selection to
The selections are used to configure the passing sequences that those players. This was a special feature requested by the expert,
will be displayed. One possibility is selecting all passing sequences a report on the motif participation of a specific player. The mo-
for a single team, or a single match. Motifs and Cluster can all dis- tifs of all matches selected in the previous step are displayed in an
played or selected individually. We can display passing sequences analysis window that supports narrowing the scope of the study
in all regions of the field, or individually (defense, middle, or at- using several filters. By default, all motifs are displayed, but filter-
tack). The segments of a passing sequence (red, blue, or green) ing allows choosing a specific sequence motif or trajectory cluster.
can all be selected or displayed individually. Finally, we can specify We can select motifs that start and end at a specific region of the
the region that contains the origin and destination, from all possi- pitch (e.g., passes from defense to middle field), as well as change
ble combinations to selections (middle to attach, defense to attack, visual cues used to display motifs (such as colors and transparency
etc.). It is important to stress that the user interface has coordi- of passing segments).
128 J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133

Fig. 7. Soccer motifs frequency for Brazilian Serie A. Top two teams in the league
were also the top two teams on three-passing sequences. Histograms are ordered
by final place in the fixture ranking.

5.4. Statistics of motifs and trajectories

We summarize the statistics of motifs and trajectories to allow Fig. 8. Top four teams on defense, middle, and offense ordered by cluster 2. Rows
comparison (Fig. 6(d)). Also, we display statistics of the number of correspond to clusters 1–8. The sequence of four columns under each team to mo-
motifs that start and end at specific regions of the pitch. This was tifs ABAB, ABAC, ABCA, and ABCB. We show the final fixture position in the league
a special request of our experts that want to inspect motifs from after the team name (in parenthesis). The first three teams in the league (Corinthi-
ans, Atlético-MG and Grêmio) are on the top four ranking in the middle and
the middle to attack. offense motifs, while Corinthians is the only one to appears on defense (in first
place).
5.5. Timeline summary and selection

A timeline is essential to summarize when the motifs take place actual position of players, not the ones used during the cluster-
during a match (Fig. 6(e)). Experts often want to focus the analy- ing process. Each of the three pass segments is displayed using a
sis at specific periods of the match with more detail (begin or end color mapping. By default, we use in order the red, green, and blue
of the match). The timeline both displays a summarization of the and for the segments. To represent player positions, we use a cir-
motifs in time, as well as allows the user to define the start and cle, cross, square, and a triangle mapped to the role of the player
end of the time interval to consider. In the x-axis, we used the on a motif structure: A, B, C or D, respectively. We used a trajec-
time of the match when the pass sequence occurred. Thus values tory heatmap as an alternative view to the pitch visualization. The
vary between 0 and 90 min (generally). For the y-axis, we used rainbow colormap used in our visualizations is the standard choice
the real y-coordinate but scaled to the view map. When analyz- for displaying heatmaps in soccer broadcasting and statistical sites.
ing small trajectories, this visualization is very useful to know at The difference from standard soccer heatmaps is that we use the
which minute a team has passing sequences on the right or left of pass trajectory instead of players positions. The trajectory heatmap
the field. If the team is playing at home, the top of the visualiza- is used to represent the passing sequence on a given time window.
tion corresponds to the left band, and the bottom side corresponds Trajectory heatmaps solve the problem of occlusion while visualiz-
to the right band. Selections in the timeline are coordinated and ing multiple motifs. We map colors to the number of times the tra-
reflect in the other views. jectory passed through a point on the pitch. Thus, yellow and red
areas represent parts of the game where passes occurred more.
5.6. Display of flow motifs on pitch and heatmap This visualization design suffers from severe overplotting when
there are a great number of trajectories displayed, as is this the
The current selection of soccer flow motifs is displayed over the case in Fig. 6(f) and (g). However, as we will demonstrate with the
pitch. The analyst can inspect individual motifs and check which case studies that follow, the selection in the coordinated interface
players are involved (Fig. 6(f) and (g)). It is important to men- of specific passing sequences (by motif, cluster, team, time, a re-
tion that the trajectories drawn over the field correspond to the gion of the field) has the effect of greatly reducing the number
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 129

Fig. 9. Display of flow motifs on the pitch and trajectory heatmaps. Top: corinthians. Bottom: Grêmio. Columns represent the type of the soccer motif structure. The
trajectory heatmaps represent places where players moved the ball with more frequency in the side peak cluster. (For interpretation of the references to color in this figure,
the reader is referred to the web version of this article.)

of trajectories. Therefore the analysis can become more interesting, 6.1. Overall analysis of clusters frequency
even to the point of listing the name of players involved in the
passing sequence and other details. The frequency of the 8 discovered trajectory clusters gives an
overall insight of the matches (Fig. 7). We display on top two his-
tograms, including all motifs and including all but the ABCD motif.
5.7. Frequency motif matrix For simplicity when showing these histograms to soccer experts,
we refer to motifs that have at least one repeated player as struc-
The interplay of sequence motifs (five) and trajectory clusters tured motifs (assuming a player repetition is a sign of structure
(eight) leads to possible 40 possible combinations. We use a fre- in the passing sequence). Similarly, we call unstructured the motif
quency motif matrix to display the counting of each of these com- ABCD (different players).
binations. The motif ABCD was excluded from the matrix because Cluster 1 has the highest number of passing sequences, while
its frequency is too high in comparison with the other motifs. cluster 2 and 3 have nearly half the number of cluster 1. In the
Therefore, it ended up dominating the visualization because we histogram without motif ABCD, we see a change in the order be-
scale the color of each matrix cell by the corresponding count. Not tween cluster 1 and 2 as the most applied sequence and a similar
having the ABCD motif is a limitaiton of our system, but it allowed distribution for cluster 5 and 6. In the remaining discussion, we
to focus the analysis in motifs with at least one repeating player. use the top two frequent clusters 1 and 2 to investigate passing
For each team, we create separate matrices for each pitch zone in behavior. Fig. 7 also shows the histogram of clusters per team, or-
which the passing sequences occur: defense, middle, and offense. dered by final fixture place. Top-ranked teams have more passing
It is possible to reorder teams by zone, motif structure, and cluster. motifs than lower ranked teams.
The display of different matrices side-by-side is useful for compar- We use the frequency motif matrix to obtain an overview of
ing different passing patterns. the motif distribution. We display the result for each team side-
by-side, ordered by the total number of passing sequences on clus-
6. Brazilian Serie a 2015 case study ter 2 (Fig. 8). The matrix was filtered to show the clusters for the
top four teams in different zones (defense, middle, and offense).
The first case study to validate our approach uses the 2015 Corinthians far exceeds the second place Vasco on defensive mo-
Brazilian Serie A dataset. We present three types of analysis: team tifs, largely on clusters 1, 2, 4 and 8 with ABCB motifs. In general,
characterization based on cluster frequency, a comparison between ABCB and ABAC motifs both consist of a simple pass followed by
teams based on the passing clusters displayed on the pitch, and an a wall pass. They differ in whether the wall pass was made at the
evaluation of individual players based on motifs. beginning or end of the sequence. Motifs ABAC and ABCB were the
130 J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133

top two used by Corinthians. On the other hand, offensive motifs


have an equal distribution. Corinthians sustained a high frequency
in clusters 1, 2, 4, and 8. We notice that the top 3 teams from
the final standings appear in top positions in the offensive region.
However, of those three teams, Corinthians is the only one which
appears in the defensive ranking. In the middle, we observe that
teams with more structured passing finished in the top positions
of the final fixture. Although the pattern of motifs occurrence in
the top three teams is similar, the biggest difference lies in the first
column (ABAB). Corinthians used more passes of cluster 6, which
correspond to a crossing shape. Such clusters begin and end at the
closer position, which is suitable to keep ball possession without
compromising the ability to transport the ball forward.

6.2. Analysis of motifs over the pitch

We display the flow motifs over the pitch and heatmap to


find places that passing sequences occur. We show results for top
teams Corinthians and Grêmio using the four structured motifs.
Fig. 9 shows a predominance of a vertical pass at the end of the
sequence (shown in green) for motifs ABAC and vertical passes at
the start (shown in red) for motif ABCB. One reason for this be-
havior might be that the long pass from cluster 2 depends on the
position of the third different player in the sequence. Results show
that the long pass, after or before the wall pass, is mainly between
the left and right side, probably due to a change in the direction of
play. In the trajectory heatmaps, we compare the defensive passing
sequences of Corinthians to Grêmio. Corinthians uses the center of
the field more consistently than Grêmio, which prefers the left and
right sides.

6.3. Individual player analysis

We perform individual player analysis to find which players Fig. 10. Corinthians players histogram for home and away matches, cluster 1, and
have similar patterns on passing strategies. We compare the player structured motifs. We use colors to map the player names, which are more frequent
histogram and the flow motifs displayed on the Pitch with glyphs in the top places. Renato Augusto was the top Corinthians player at home, but away
mapped to players positions. In Fig. 10 we show the histograms Jadson was more important.
for all games of Corinthians to understand player participation on
passing motifs and how they changed along with the tournament.
We show 18 histograms for nine games played at home and nine
away. Also, we highlight the top three players. Renato Augusto
lead structured passes in five of the nine home matches. This
pattern does not occur at away games, where Jadson leads in
three games and Renato Augusto leads in two. We also noticed
an absence of top players in the last two matches of Corinthians.
Since Corinthians was already the champion, they were played
with reserve players.

7. Turkish Super League 2016–2017 case study

In the second case study, we use the data from the first half of
the 2016/17 Turkish Super League season. The accompanying video
illustrates our system in action for this dataset. The analysis was
carried out together with professional soccer analysts from four
different Turkish teams. They expressed their interest in knowing
which players are more involved in the structured passes execu-
tion and in which regions of the pitch these passes are performed.
Therefore, we structured our analysis accordingly and refined it
based on feedback received.
We analyzed all 16 matches of Basaksehir FC. One of the main Fig. 11. Report of a Turkish Super League match: heatmaps hightlight the regions
interests for analysts the regions of the pitch where the struc- where the passes were performed and histograms show which players participated
most in those passes. This match was a 2-1 victory of Basaksehir (playing at home)
tured and non-structured passes are executed in a match. Our sys- over Rizespor.
tem helps the analyst with this task providing him with tools that
allow the visualization of these regions using heatmaps and his-
tograms with the number of passes executed by each player. The
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 131

Fig. 12. Top players interacting with the Fenerbahçe player, Skrtel, in the middle field. Souza is the preferred partner of Skrtel in structured passes. Selections of specific
cluster type or zone of the field are useful to identify specific passing sequences, including players involved.
132 J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133

user can select the motifs, clusters, or regions of the pitch to visu- We designed a Visual Analytics system that supports filtering
alize. Fig. 11 shows the histograms and heatmaps with structured and coordinated views of the data, which we used to validate using
and non-structured passes of Basaksehir players in a home match matches from both the Brazilian and Turkish leagues. In the Brazil-
against Rizespor that resulted in a 2–1 victory of Basaksehir. A sim- ian league, the frequency motif matrix allowed us to compare the
ilar report was produced for all matches of Basaksehir and sent to usage of flow motifs for all teams in a league, separating frequen-
their coaching staff. cies by shape, passing structure and pitch region. Using our ap-
The analysts recognized an interesting aspect in the participa- proach we recognized the defensive style of the league champion
tion of Martin Skrtel in structured passes in the middle. Despite Corinthians for Side Peak (single + wall) passes. We provided meth-
being a central defender, Skrtel is the player with the most partic- ods for space and time exploration of passing sequences using the
ipation in midfield structured and non-structured passes at Fener- flow motifs on pitch and trajectory heatmaps to reduce cluttered
bahçe. This emphasizes his importance to the play-making of his passes while analyzing multiple matches. We also demonstrated
team. Most of Skrtel non-structured passes are performed with how useful it could be individual players analysis for assessing per-
defenders as Neustädter and Ali Kaldirim. However, when we se- formance from soccer managers.
lect only structured passes, the player that exchanged more passes A positive aspect of our results is the deployment of our pro-
with Skrtel is Souza, a midfielder. The analysts believe this reveals totype to experts involved with teams in the Turkish league. Im-
his importance to the creation system of Fenerbahçe. portant feedback we received from the coaching staff is the im-
We prepared a report to summarize these observations to the portance of player analysis. Such feedback helped us improved our
consulted analysts. A highlight of this report is shown in Fig. 12. prototype, but more importantly, allowed analysts to reach conclu-
We show the most common Skrtel passing partners at home sions about their match that were not possible with previous anal-
matches for all, structured and non-structured passes. In our in- ysis tools. We prepared, as special request, reports for a team about
teractive tool, we can select the passing sequence directly over the the passing strategies for an upcoming match.
pitch, to display the players involved, and the time the passing se- We foresee that our prototype can also be useful for the scout-
quence happened. This allows us to check what are the passing ing analysis used for hiring new players. In the future, we would
partners for Skrtel in different locations in the field. like to continue the collaboration with soccer teams from premier
leagues. We would like to include spatiotemporal position data of
8. Discussion players to our tool. Also, we want to add links from our analysis
tool to video footage of the soccer match to improve the analysis
Understanding soccer strategies is a challenging task due to at specific passing sequences, as well as a sketch-based interface
the complexity of the game. The approach discussed in this paper to support searching for specific trajectory patterns.
moves one step closer to understanding the key aspect of pass-
ing strategies. The work of Gyarmati et al. [17] brought attention Declaration of Competing Interest
to the study the combinatorial structure of passing sequences. In
this work, we improved their work by adding the analysis of soc- The authors declare that they have no known competing finan-
cer trajectories. We believe that our visual designs can be helpful cial interests or personal relationships that could have appeared to
to understand global and individual aspects of passing sequences. influence the work reported in this paper.
For example, the heatmap display is helpful to understand global
strategies, as demonstrated in Fig. 9 where Corinthians blocks the Acknowledgments
center of the field better than Grêmio, that prefers the side of the
pitch. This visualization was particularly useful when filtered by The authors wish to thank the anonymous reviewers for their
the type of motifs. The display of motifs in the pitch becomes use- valuable comments and suggestions to improve the quality of the
ful when performing individual player analysis, as demonstrated paper. We are especially grateful to Opta Sports and Sentio Sports
in Fig. 12. In this example, we study the motifs associated with a for generously providing us with the soccer data set of the Brazil-
given player to understand its trajectory styles by looking at the ian 2015 league and the Turkish Super league 2016–2017. This
different trajectory cluster types. To reduce overplotting, we can study was partially supported by the Coordenação de Aperfeiçoa-
narrow the analysis to specific intervals in the match using the mento de Pessoal de Nível Superior – Brasil (CAPES) – Finance
timeline, allowing to study individual passing sequences, which Code 001, CNPq 308851/2015-3, CNPq 426397/2018-5, TUBITAK un-
makes the display of motifs in the pitch more useful. Neverthe- der the grant number 118C019 and by ODTU BAP under project
less, our approach has still some limitations. For example, our pro- code YOP-312- 2018-2816.
totype does not include a link to the video footage. This would
be useful to study a single passing sequence selected. Also, the Supplementary material
study of passing sequences in the current interface relies on ex-
ploratory analysis. We believe that our proposal can be even more Supplementary material associated with this article can be
useful if combined with analytics algorithms that suggest passing found, in the online version, at doi:10.1016/j.cag.2019.08.010.
sequences. For example, a sketch-based interface could allow the References
analyst to draw a given trajectory, and the system would automat-
ically retrieve similar trajectories for further analysis. [1] Opta Sports Pro. Opta F24 feed. http://optasports.com/, 2019.
[2] Wilson J. Inverting The Pyramid: The History of Soccer Tactics. Nation Books;
2013. ISBN 9781568587387.
9. Conclusions
[3] Link D, Weber H. Using individual ball possession as a performance indicator
in soccer. Proceeding of the 2015 KDD workshop on large-scale sports analyt-
In this work, we proposed a visual analytics system to support ics. Sydney, Australia; 2015.
[4] Shao L, Sacha D, Neldner B, Stein M, Schreck T. Visual-interactive search for
the analysis of soccer flow motifs. The combined analysis of soc-
soccer trajectories to identify interesting game situations. Visualization and
cer flow motifs and trajectory clusters support the exploration of Data Analysis; 2016.
different passing patterns. We combine the notion of a soccer flow [5] Gyarmati L, Anguera X. Automatic extraction of the passing strategies of soc-
motif with passing trajectories. We describe a two-phase trajectory cer teams. In: Proceedings of the 2015 KDD workshop on large-scale sports
analytics; 2015. p. 0–3.
clustering technique which leads us to discover we 8 representa- [6] Hughes M, Franks I. Analysis of passing sequences, shots and goals in soccer. J
tive types of passing trajectories. Sports Sci 2005;23(5):509–14. PMID: 16194998
J.L.S. Malqui, N.M.L. Romero and R. Garcia et al. / Computers & Graphics 84 (2019) 122–133 133

[7] Rein R, Raabe D, Memmert D. “Which pass is better?” novel approaches to [23] Legg P, Chung D, Parry M, Jones M, Long R, Griffiths I, et al. Matchpad: in-
assess passing effectiveness in elite soccer. Hum Movem Sci 2017;55:172–81. teractive glyph-based visualization for real-time sports performance analysis.
[8] Gudmundsson J, Wolle T. Football analysis using spatio-temporal tools. Comput Comput Graph Forum 2012;31(3pt4):1255–64.
Environ Urban Syst 2014;47:16–27. [24] Goldsberry K. Courtvision: new visual and spatial analytics for the NBA. In:
[9] Lucey P, Bialkowski A, Carr P, Morgan S, Matthews I, Sheikh Y. Representing Proceedings of the MIT sloan sports analytics conference; 2012.
and discovering adversarial team behaviors using player roles. In: Proceed- [25] Pileggi H, Stolper CD, Boyle JM, Stasko JT. Snapshot: visualization to propel ice
ings of the 2013 IEEE conference on computer vision and pattern recognition hockey analytics. IEEE Trans Vis Comput Graph 2012;18(12):2819–28.
(CVPR); 2013. p. 2706–13. [26] Legg PA, Chung DHS, Parry ML, Bown R, Jones MW, Griffiths IW, et al. Transfor-
[10] Peña J, Touchette H. A network theory analysis of football strategies 2012 mation of an uncertain video search pipeline to a sketch-based visual analytics
arXiv:1206.6904. loop. IEEE Trans Vis Comput Graph 2013;19(12):2109–18.
[11] Wei X, Sha L, Lucey P, Morgan S, Sridharan S. Large-Scale analysis of forma- [27] Perin C, Vuillemot R, Fekete JD. Soccerstories: a kick-off for visual soccer anal-
tions in soccer. In: Proceedings of the 2013 international conference on digital ysis. IEEE Trans Vis Comput Graph 2013;19(12):2506–15.
image computing: techniques and applications (DICTA); 2013. p. 1–8. [28] Janetzko H, Sacha D, Stein M, Schreck T, Keim D, Deussen O. Feature-driven
[12] Lucey P, Oliver D, Carr P, Roth J, Matthews I. Assessing team strategy using visual analytics of soccer data. In: Proceedings of the 2014 IEEE conference on
spatiotemporal data. In: Proceedings of the nineteenth ACM SIGKDD interna- visual analytics science and technology (VAST); 2014. p. 13–22.
tional conference on knowledge discovery and data mining; 2013. p. 1366–74. [29] Rosenthal S.. Football drawings. http://www.susken-rosenthal.de/
[13] Bialkowski A, Lucey P, Carr P, Yue Y, Sridharan S, Matthews I. Identifying team fussballbilder/, 2019.
style in soccer using formations learned from spatiotemporal tracking data. In: [30] Stein M, Sacha D. Enhancing parallel coordinates : statistical visualizations for
Proceedings of the 2014 IEEE international conference on data mining work- analyzing soccer data. Electron Imag 2016:1–8.
shop; 2014. p. 9–14. [31] Stein M, Janetzko H, Lamprecht A, Breitkreutz T, Zimmermann P, Goldlucke B,
[14] Milo R, Itzkovitz S, Kashtan N, Chklovskii DMB. Network motifs : simple build- et al. Bring it to the pitch: combining video and movement data to enhance
ing blocks of complex networks. Science 2002;298(5594):824–7. team sport analysis. IEEE Trans Vis Comput Graph 2018;24(01):13–22.
[15] Lucey P, Bialkowski A, Monfort M, Carr P, Matthews I. Quality vs quan- [32] Machado V, Leite R, Moura F, Cunha S, Sadlo F, Comba JL. Visual soc-
tity: improved shot prediction in soccer using strategic features from cer match analysis using spatiotemporal positions of players. Comput Graph
spatiotemporal data . The venue is MIT SLOAN Sports Analytics Confer- 2017;68:84–95.
ence; 2015. http://www.sloansportsconference.com/wp-content/uploads/2015/ [33] Wu Y, Xie X, Wang J, Deng D, Liang H, Zhang H, et al. Forvizor: visualiz-
02/SSAC15- RP- Finalist- Quality- vs- Quantity.pdf ing spatio-temporal team formations in soccer. IEEE Trans Vis Comput Graph
[16] Bialkowski A, Lucey P, Carr P, Yue Y, Sridharan S, Matthews I. Large-scale anal- 2019;25(1):65–75.
ysis of soccer matches using spatiotemporal tracking data. In: Proceedings of [34] Sentio Sports feed. https://sentiosports.com/, 2019.
the 2014 IEEE international conference on data mining; 2014. p. 725–30. [35] Sumpter D. Soccermatics: Mathematical adventures in the beautiful game.
[17] Gyarmati L, Kwak H, Rodriguez P. Searching for a unique style in soccer. Soc Bloomsbury Sigma; 2016.
Inf Netw 2014. [36] Wobbrock JO, Wilson AD, Li Y. Gestures without libraries, toolkits or training:
[18] Stein M, Breitkreutz T, Haussler J, Seebacher D, Niederberger C, Schreck T, a 1 recognizer for user interface prototypes. In: Proceedings of the twentieth
et al. Revealing the invisible: visual analytics and explanatory storytelling for annual ACM symposium on user interface software and technology UIST 07,
advanced team sport analysis. In: Proceedings of the 2018 international sym- 85; 2007. p. 159.
posium on big data visual and immersive analytics; 2018. p. 1–9. [37] Li Y. Protractor: a fast and accurate gesture recognizer. In: Proceedings of the
[19] Perin C, Vuillemot R, Stolper CD, Stasko JT, Wood J, Carpendale S. State of the twenty-eighth international conference on human factors in computing sys-
art of sports data visualization. Comput Graph Forum 2018;37(3):663–86. tems; 2010. p. 2169–72.
[20] WhoScored. Man. city vs leicester. https://www.whoscored.com/Matches/ [38] Kodinariya TM, Makwana PR. Review on determining number of clus-
1285050/Live/, 2019. ter in K-means clustering. Int J Adv Res Comput Sci Manag Stud
[21] Footscope. Footoscope: Fifa world cup south africa. http://www.footoscope. 2013;1(6):2321–7782.
com/worldcup2010/, 2010. [39] Holten D, van Wijk JJ. Force-directed edge bundling for graph visualization. In:
[22] Rusu A, Stoica D, Burns E, Hample B, McGarry K, Russell R. Dynamic visual- Proceedings of the eleventh Eurographics/IEEE – VGTC conference on visual-
izations for soccer statistical analysis. In: Proceedings of the 2010 fourteenth ization EuroVis’09. UK: Chichester; 2009. p. 983–98.
international conference information visualisation (IV); 2010. p. 207–12.

You might also like