You are on page 1of 9

PlayeRank: Multi-dimensional and role-aware rating

of soccer player performance


Luca Pappalardo Paolo Cintia Paolo Ferragina
ISTI-CNR, Pisa, Italy ISTI-CNR, Pisa, Italy University of Pisa, Italy
luca.pappalardo@isti.cnr.it paolo.cintia@isti.cnr.it paolo.ferragina@unipi.it

Emanuele Massucco Dino Pedreschi Fosca Giannotti


Wyscout, Chiavari, Italy University of Pisa, Italy ISTI-CNR, Pisa, Italy
emanuele.massucco@wyscout.com dino.pedreschi@unipi.it fosca.giannotti@isti.cnr.it
arXiv:1802.04987v2 [stat.AP] 16 Feb 2018

ABSTRACT so-called soccer-logs [7, 20, 25], which detail all the events players
The problem of rating the performance of soccer players is attract- generate during a game (e.g., tackles, passes, fouls, shots, dribbles,
ing the interest of many companies, websites, and the scientific etc.). However, these solutions suffer from four main limitations.
community, thanks to the availability of massive data capturing First, existing approaches are mono-dimensional, in the sense that
all the events generated during a game (e.g., tackles, passes, shots, they focus on a single aspect of the performance of a player (gener-
etc.). Existing approaches fail to fully exploit the richness of the ally passes or shots [3, 6, 11, 19]), and thus they do not exploit the
available data and lack of a proper validation. In this paper, we richness of events and attached meta-information present in soccer-
design and implement PlayeRank, a data-driven framework that logs. Second, they rate performance without taking into account
offers a principled multi-dimensional and role-aware evaluation the player’s role on the field (e.g., defender, midfielder, forward)
of the performance of soccer players. We validate the framework and their interplay in contributing to the game outcome. Third,
through an experimental analysis advised by soccer experts, based missing a gold standard, all experimental results are not scientifi-
on a massive dataset of millions of events pertaining four seasons cally validated and, in fact, they report judgments consisting just
of the five prominent European leagues. Experiments show that of informal interpretations based on some simplistic metrics (e.g.,
PlayeRank is robust in agreeing with the experts’ evaluation of market value or goals scored [3, 24, 26]). Finally, apart from the
players, significantly improving the state of the art. We also explore few and partial intrinsic evaluations mentioned above, none has
an application of PlayeRank — i.e. searching players — by introduc- considered the extrinsic evaluation of the impact of player ratings
ing a special form of spatial query on the soccer field. This shows onto more sophisticated tools such as specialized search engines,
its flexibility and efficiency, which makes it worth to be used in the which could eventually constitute the backbone of an advanced
design of a scalable platform for soccer analytics. platform for soccer player mining and retrieval.
In this paper we address all these limitations by designing and
CCS CONCEPTS implementing a framework, called PlayeRank, which offers a prin-
cipled multi-dimensional and role-aware evaluation of the perfor-
• Information systems → Data mining; Information retrieval;
mance of soccer players, driven only by the massive and variegate
KEYWORDS amount of standardized soccer-logs currently produced by several
companies (i.e., Wyscout, Opta, Stats). Our framework is designed
Sports Analytics, Clustering, Searching, Ranking, Multi-dimensional around the orchestration of the solution to five intermediate and
analysis, Predictive Modelling, Data Science non trivial sub-problems: (i) we model the performance of a player
in a game as a multidimensional vector of features extracted from
soccer-logs; (ii) since we do not have a ground-truth for “learning”
1 INTRODUCTION the mapping from these features to performance evaluation, we
Data-driven soccer player ratings are used nowadays by companies turn the original problem into a classification problem between the
and websites, such as WhoScored.com and Eplindex.com, as well performance of teams in a game and the result achieved in that
as by television broadcasts and sports commentators, with the game, hence deriving proper features weights; (iii) we use these
purpose of improving fan engagement. Unfortunately, neither the weights to define the rating of a player in a game as a weighted
algorithms behind these tools are publicly available nor they are combination of the features characterizing his performance; (iv)
evaluated by a peer-review process, thus making it impossible to given that there are different player roles in soccer, we identify, in
assess the soundness of the proposed ratings. an unsupervised way, a set of player roles from the game events
The problem is gaining interest in the scientific literature too, available in the soccer-logs; and finally (v) we compute a set of
with solutions that evaluate the performance of players in soccer role-based rankings of soccer players, one per each role, by taking
and other team sports. These works exploit the massive data streams into account all the information derived in the previous steps.
generated by (semi-)automated sensing technologies, such as the The second contribution of this paper is an extensive and sys-
tematic experimental analysis of PlayeRank, advised by a group
of soccer experts that took into account a massive and unique
dataset made available by Wyscout (the leading company for soccer
L. Pappalardo et al.

scouting), capturing 14 millions of events covering 7,304 games next goal. Players are then ranked according to the aggregate value
and 1,192 players in the last four seasons of the five prominent of their actions, and compared with players of the same role. The
European leagues: La Liga (Spain), Premier League (England), Serie choice of linking performance to chance of scoring has several draw-
A (Italy), Bundesliga (Germany) and Ligue 1 (France). Experiments backs. First, SI understates the contribution of defenders, whose
show that PlayeRank is robust in agreeing with a ranking of players main purpose is to stop the opponents’ attack actions. Second, SI it
given by the experts, and it does not require any human interven- is hardly applicable to soccer, where goals are rare events.
tion because it is fully data-driven. We also show that PlayeRank’s A second strand of literature focuses on quantifying the re-
agreement with the soccer experts is up to 16% (relative) and 13% lation between proxies of a player’s quality, like market value,
(absolute) better than the state-of-the-art PSV metric [3]. wage or popularity, and his performance on the field. Stanojevic
Such precious properties make PlayeRank able to automatically and Gyarmati [24] use soccer-logs to infer the relation between a
provide ranking and searching functionalities for a scalable soc- player’s typical performance and his market value as estimated by
cer analytics platform that processes several seasons and leagues. crowd estimations. They find a large discrepancy between estimated
Hence the third contribution of our paper is to explore an extrinsic and real market values, due to the lack of important information
application of PlayeRank in the context of a search engine, which such as injury-proneness and commercialization capacity. Müller
allows the user to efficiently search the “best players” that pitch in et al. [14] develop a similar approach and use soccer-logs, as well as
zones of the soccer field properly defined at query time. players’ popularity data and market values in the preceding years,
In conclusion, PlayeRank offers state-of-the-art performance in to estimate a player transfer fee. They show that for the low- and
rating, ranking and searching soccer players. This foresees its use medium-priced players the estimated market values are comparable
in the design of a scalable platform for mining and retrieving soccer to estimations by the crowd, while the latter performs better for
players, and opens an interesting set of new problems in soccer the high-priced players. Torgler and Schmidt [26] investigate what
analytics that we state and comment at the end of the paper. shapes performance in soccer, represented as a player number of
goals and assists. They find that salary, age and team effects have a
statistically significant impact on a player performance on the field.
2 RELATED WORKS
The availability of massive data portraying soccer performance has 3 EVALUATION OF SOCCER PLAYERS
facilitated recent advances in soccer analytics. The so-called soccer-
logs, capturing all the events occurring during a game, are the most The main goal of this paper is to introduce PlayeRank, a framework
common data format and have been used to analyze many aspects that ranks soccer players according to the quality of their perfor-
of soccer, both at team [4, 12, 16, 27] and individual level [3, 6, 19]. mance in a series of games. PlayeRank consists of five principled
Among all the open problems in soccer analytics, the data-driven algorithmic steps which hinge onto the massive soccer-logs pro-
evaluation of player performance is the most challenging one, given vided to us by Wyscout (the leading company for soccer scouting),
the absence of a ground-truth for that performance evaluation. which we will describe in detail in Section 5.1.
Duch et al. [6] evaluate a player performance as his flow cen- In the first step, described in Section 3.1, PlayeRank models the
trality (FS), defined as the fraction of times a player intervenes in performance of a player in a game as a multidimensional feature
pass chains which end in a shot. Based on this metric, they rank all vector computed from the game events, available in soccer-logs,
players in Euro Cup 2008 and observe that 8 players in their top 20 describing the way that player pitched in that game. Since for all
list belong to the UEFA’s top 20 list released after the competition. team sports the performance of players is inextricably bound to the
Being based on pass centrality, the FS metric is very simplistic and performance of their team as a whole, in the second step (Section
it mostly makes sense for midfielders and forwards. 3.2) PlayeRank determines the importance (i.e., weight) of each
Brooks et al. [3] develop the Pass Shot Value (PSV), a metric one of those features based on its contribution to a game victory.
to estimate the importance of a pass for generating a shot. They In the third step, described in Section 3.3, the performance rating
represent a pass as a vector of 360 features describing the vicinity of a player in a single game is derived as the weighted sum of the
of a field zone to the pass’ origin and destination. Then, they use features describing his performance during that game; whereas,
a supervised machine-learning model to predict whether or not the player rating in a series of games is derived by aggregating the
a given pass results in a shot. The feature weights resulting from performance ratings reported in each single game of that series.
the model training are used to compute PSV as the sum of the A key issue in soccer is that there are different players roles hav-
feature weights associated with the pass’ origin and destination. By ing different distributions of the features [17], leading to possibly
conducting experiments on soccer-logs, they rank players in La Liga different magnitudes of their ratings. Therefore, the fourth step of
2012-13 according to their average PSV, showing that it correlates PlayeRank identifies in an unsupervised way a set of roles, based just
with the rankings based on assists and goals. Unfortunately, as on the game events available in soccer-logs (Section 3.4). Finally, the
the authors highlight in the paper, PSV is strongly biased towards fifth step of PlayeRank computes a set of role-based rankings of play-
offensive-oriented players. Moreover, PSV is a pass-based metric ers, one per role, which take into account the players’ performance
omitting all the other events players can generate during a soccer ratings during a series of games (Section 3.5).
game, and lacks of a proper validation with soccer experts. Problem definition. In PlayeRank, a soccer game consists of a set
Schulte and Zhao [23] propose the Scoring Impact metric (SI) of events encoded as a tuple ⟨id, type, position, timestamp⟩, where
to rank ice hockey players in NHL. SI metric assigns a value to id is the identifier of the player which originated/refers to this event,
each player action, depending on his team’s chance of scoring the type is the event type (i.e., passes, shots, goals, tackles, etc.), position
PlayeRank: role-aware rating of soccer players

game [16], which is the target of the classification problem solved


in the following Step #2.

3.2 Step #2: Weighting performance features


What makes performance evaluation a difficult task is that we do
д
not have an objective evaluation of the performance pu of each
individual player. This technically means that we do not have a
ground-truth to compute or learn a relation between performance
features and performance evaluation. On the other hand, we ob-
serve that the outcome of a game may be considered a natural proxy
for evaluating performance at team level. Therefore, we overcome
the limitation imposed by the missing ground-truth by propos-
ing a supervised approach: we determine the impact of features
onto a player performance by looking in turn at the team-wise
contribution of these features to the game outcome.
PlayeRank implements this syllogism via a two-phase approach.
In the first phase it solves a classification problem between the
д д
team performance vector pT and the outcome oT of game д: where
д д
Figure 1: Events produced by player Lionel Messi (Barcelona) oT = 1 indicates a victory for team T in game д and oT = 0 indicates
during a game in La Liga (Spain), season 2015/2016. Each a non-victory (i.e., a defeat or a draw). The team performance vector
д (T ) (T )
event is shown on the field at the position where it occurs. pT = [x 1 , . . . , x n ] is obtained by summing the corresponding
д
features over all the players UT composing team T in game д:
д
Õ д
and timestamp denote the spatio-temporal coordinates of the event pT [i] = pu [i].
д
over the soccer field. Figure 1 shows a graphical representation of u ∈UT
the events produced by a player during a game. Section 5.1 will This classification problem has been shown in [16] to be meaningful
detail such events for the Wyscout dataset used in our experiments. in that there is a strong relation between the team performance
The key task addressed by PlayeRank is the “evaluation of the vector and the game outcome. We deploy the constructed classifier
performance of a player u in a soccer game д”. This consists of to extract the weights w = [w 1 , . . . , w n ] which quantify the influ-
computing a numerical rating r (u, д) ∈ [0, 1] that aims at capturing ence of the features to game outcome. We then use the features and
the quality of the performance of u in д given only the set of events the corresponding weights to compute the performance ratings of
produced by that player in that game. This is a complex task because players, as specified in the next sub-section.
of the many events played in a game, the interactions among the
players, and the fact observed above that players performance is 3.3 Step #3: Computing performance ratings
inextricably bound to the performance of their team as a whole.
Given the multi-dimensional vector of features and their weights
PlayeRank addresses such complexity by the five steps sketched at
w, PlayeRank evaluates the performance of a player u in a game д
the beginning of this section, that we detail in the following. д
as the linear combination between w and pu = [x 1 , . . . , x n ]:
3.1 Step #1: Modeling player performance n
1 Õ
r (u, д) = wi × xi . (1)
Given that a game д is represented as a set of events, PlayeRank R i=1
models the performance of a player u in д by the n-dimensional
feature vector The quantity r (u, д) is called the performance rating of u in game д,
д where R is a normalization constant such that r (u, д) ∈ [0, 1].
pu = [x 1 , . . . , x n ]
Since the set of features adopted to train the classifier in Step #2
where x i is a feature that describes a specific aspect of u’s behavior does not include the number of goals scored in a game, but goals
in game д and is computed from the set of events played by u in could be important to evaluate the performance of some (offensive)
that game. To guarantee that all the features are expressed in the players, PlayeRank could be adapted to manage goals too via an
same scale, PlayeRank normalizes them in the range [0, 1]. In our adjusted-performance rating, defined as follows:
experiments at Section 5, we will consider a total of n=76 features
r ∗ (u, д) = α × norm_дoals + (1 − α) × r (u, д) (2)
extracted from the Wyscout dataset. Some features count some
events (e.g., number of fouls, number of passes, etc.), some others where norm_дoals indicates the number of goals scored by u in д
are at finer level in that they distinguish the outcome of those normalized in the range [0, 1], and α ∈[0, 1] is a parameter indicating
events — i.e., if they were “accurate” or “not accurate”. More details the weight given to goals into the new rating. Clearly, r ∗ (u, д) =
on feature types and computation are in Section 5.2. r (u, д) when α=0, and r ∗ (u, д) = num_дoals when α=1.
We point out that among the 76 features, we did not consider Finally, PlayeRank computes the rating of a player u over a series
the goals because they are obviously related to the outcome of a of games G = (д1 , . . . , дm ) as the average of his performance ratings
L. Pappalardo et al.

in each individual game:


m
1 Õ
r (u, G) = r (u, дi ). (3)
m i=1
Similarly, the goal-adjusted rating r ∗ (u, G) of u given the game series
G is computed as the average of his adjusted performance ratings.

3.4 Step #4: Classifying player roles


As pointed out in [18, 23], performance ratings are meaningful only
when comparing players with similar roles. In soccer, each role
corresponds to a different area of the playing field where a player
is assigned responsibility relative to his teammates. Different roles
imply different tasks, hence it is mostly meaningless to compare,
for example, a player that is asked to create goal occasions and a
player that is asked to prevent the opponents to score. Furthermore,
a role is not a unique label as a player’s area of responsibility can
change from one game to another. Figure 2: Grouping of the centers of performance in the clus-
Given these premises, to meaningfully compare one player to ters C 1 , . . . , C 8 . Each color identifies a different cluster; gray
another one, we need an algorithm that is able to detect the role as- points indicate hybrid centers of performance.
sociated to a player’s performance in a game. They do exist methods,
originally designed for hockey [23], that describe a performance as
instead at the border between two or more clusters, and so u’s role
a heatmap of presence in predefined zones of the field. But these ap-
in д cannot be characterized by one of the eight clusters. In this
proaches are hardly applicable to soccer due to the density of game
case we say that u has a “hybrid” role and he is assigned to those
events which is much lower than what happens in hockey. And,
two or more clusters. Thus PlayeRank aims at a finer classification
indeed, the approach in [23] produces on our dataset a clustering
of roles via a soft clustering, as follows.
with a low overall quality (i.e., silhouette score < 0.2). д
For every center of performance cu occurring in some cluster Ci ,
So PlayeRank includes an algorithm that assigns one or more д
PlayeRank computes its k-silhouette sk (cu ) with respect to every
roles to players based on the set of events he has participated in
other cluster Ck (k , i) as:
during a game. Technically speaking, given a player u and a game д,
д д
the role of u in д is characterized by means of his average position. д di (cu ) − dk (cu )
This is called the center of performance for u in д and it is denoted sk (cu ) = д д , (4)
д д д д д max(di (cu ), dk (cu ))
as cu = (x u , yu ), where x u and yu are the average coordinates of
д д
u’s events in game д. Then PlayeRank deploys the classic k-means where dz (cu ) is the average distance between cu and all other points
д
algorithm [10] to group the centers of performance of all players u in cluster Cz . PlayeRank assigns cu to every cluster C j for which
д
in all games д. In our experiments on the Wyscout dataset we vary s j (cu ) ≤ δs , where δs is a threshold indicating the tolerance to
д
k = 1, . . . , 20 and observe that k = 8 provides the best clustering in “hybrid” centers. If no such j does exist, cu is assigned to the cluster
terms of silhouette score [22] (s = 0.43). given by the partitioning computed by the k-means algorithm.
Experiments on the Wyscout dataset have shown that the number
cluster description of hybrid centers increases linearly with δs , from none to all centers.
C1 right fielder, right wing, right back The results shown in Section 5 correspond to δs = 0.1, meaning
C2 central forward that 5% of the centers are classified as hybrids.
C3 central midfielder, internal midfielder
For the sake of completeness we mention that in approaching
C4 left fielder, left wing, left back
the task of role classification we have considered other, more so-
C5 left back, left central back
phisticated modeling of players’ performance such as heatmaps (as
C6 right wing, right forward
C7 right back
in [23]) or events direction (as in [2]), but clusters were of lower
C8 left wing, left forward quality in terms of the silhouette score.
Table 1: Description of clusters made by soccer experts.
3.5 Step #5: Ranking players
Figure 2 shows the result of the 8-means clustering and Table 1 Given a player u and a series of games G, the previous steps have
provides an interpretation of the clusters given by soccer experts computed u’s player rating r (u, G) (or its goal-adjusted version
employed by WyScout. While there are 10 players in a team (ex- r ∗ (u, G)) and u’s role in each game д ∈ G. Based on these infor-
cluding the goalkeeper) we find 8 roles, meaning that there is at mation, PlayeRank constructs eight role-based rankings R 1 , . . . , R 8 ,
least one cluster in each team having more than one player. each corresponding to one of the 8 roles identified in the previous
д
In Figure 2 we notice that, while most of players’ centers cu are step. PlayeRank assigns a player u to Ri if he has at least 40% of
concentrated around one of the eight cluster’s centroids, and so the the games in G assigned to role i. This may assign u to at most
role of player u in game д is well characterized, some of them are two roles and thus it allows us to take into account the versatility
PlayeRank: role-aware rating of soccer players

of players. Clearly, this is a parameter that can be chosen by the zone zi , hence mixing a better modeling of players’ performance
user when running PlayeRank, possibly increasing the number of with the efficient ranking of the approach described above.
assigned roles per player (i.e., his versatility). In this paper we make one step further and design a novel rank-
The position of u in a ranking depends on r (u, G). Therefore ing algorithm that deals directly with Gaussian modelling of playing
u can possibly appear in many rankings, with different positions. zones. For this, we adopt the Kullback-Leibler divergence measure
Section 5.3 applies PlayeRank on the Wyscout dataset by deriving (KL) [5] to measure the “distortion” between distributions G Q and
the role-based rankings for 1, 192 soccer players which played in Gu .1 Nicely enough, in the case of multi-variate normal distribu-
the last four seasons of the five main European leagues. tions (as for G Q and Gu ), KL has a closed form solution [15]:
1 −1
= trace {(ΣQ +Σu−1 ) (µ Q −µu ) (µ Q −µu )T +ΣQ Σu−1 +Σu ΣQ
−1
−2I }.
4 RETRIEVING SOCCER PLAYERS 2
One of the most useful applications of PlayeRank is searching By expanding this equation we turn KL(G Q , Gu ) into a sum of prod-
players in a database, such as the Wyscout’s one. The search is ucts, each product consisting of the multiplication of parameters
driven by a query formulated in terms of a suitable query language from G Q and parameters from Gu . Let us denote by VbQ [i] (resp.
that considers the events that occur during a game and their position Vbu [i]) the parameters of G Q (resp. Gu ) forming the i-th product.
on the field. Since we do not want to enter in the formal definition of Then we can turn the computation of KL into a dot product between
the full query language, which is beyond the scope of this paper, we
two vectors, i.e., KL(G Q , Gu ) = VbQ · Vbu . Clearly Vbu can be pre-built
concentrate here only on its specialties that are the most interesting
in advance for all players u, and VbQ can be built at query time for
algorithmically for the issues we have discussed in this paper.
Q, taking costant time because matrices Σu and ΣQ have size 2 × 2.
We propose the efficient solution of a spatial query over the
We have thus reduced the problem of estimating the “distortion”
soccer-field zones which possibly span more roles and have geo-
between two distributions into the problem of computing the dot-
metric forms that differ from the ones identified in Section 3.4.
product between two properly-built vectors of constant (small) size.
We assume a tessellation of the soccer field into h zones of equal
Therefore, we can adopt the many efficient time-space solutions
size z 1 , . . . , zh . The query is modeled as a vector Q = [q 1 , . . . , qh ]
known in the IR literature for dot-product computation, and thus
in which qi expresses how much relevant is the presence of the
searched player in zone zi . Similarly, player u is modeled as a vec- rank players u for increasing value VbQ · Vbu .
tor Vu = [u 1 , . . . , uh ] in which ui expresses how much inclined is
player u to play in zone zi . We can go from binary vectors, that 5 EXPERIMENTAL RESULTS
model interest/no interest for Q and presence/no presence for Vu , 5.1 Wyscout dataset
to the more sophisticated case in which Q expresses a weighted in- We use data provided by Wyscout (https://wyscout.com) which
terest for some specific zones and Vu is finely modeled by counting, records 14 million events capturing 7,304 games and 1, 192 players
for example, the number of events played in each zone zi . in the last four seasons (starting from season 2013/14) of the five
Now given a query Q and the players of some soccer-DB, the prominent European leagues: La Liga (Spain), Premier League (Eng-
goal is to design an algorithm that ranks players according to their land), Serie A (Italy), Bundesliga (Germany) and Ligue 1 (France).
propensity to play in the field zones specified by Q. We follow the Each event records: (i) a time-stamp; (ii) the player who generated
standard practice of Information Retrieval (IR) and thus sort the the event; (iii) the position on the soccer field, specified by a pair of
players u in decreasing order of the dot product of their vectors: integers in the range [0, 100] indicating the percentage from the left
namely, r (u, Q) = Vu · Q. We can compute this ranking by means of corner of the attacking team; (iv) the event type, subtype and tags,
one of the plethora of efficient time-space solutions known in the that enrich the event with additional information (see Table 2). We
IR literature (e.g., [13, 21]). In this respect we point out that known do not consider the goalkeeping events available from the Wyscout
solutions work efficiently over million (and more) dimensions, so APIs [1], as we discard goalkeepers from the analysis.
that they easily scale onto the problem size at hand, because h ≈ 106 In the Wyscout dataset a match generates an average of 1,700
if we assume zones zi of size 1 cm2 . events, and each player generates around 60 events (Figure 3a),
We notice that other more sophisticated models for queries and with an average inter-time between two consecutive events of 1.5
player’s propensity can be implemented by reducing the problem seconds. Passes are the most frequent events, accounting for more
to the one solved above. In fact, let us assume that the zones of than 70% of the total events (Figure 3b). Note that Wyscout soccer-
the field are described by a bidimentional gaussian distribution of logs adhere to a standard format for storing events collected by
average vector µq and covariance matrix Σq , hereafter indicated as semi-automatic systems [7, 20, 25]. Moreover, given the existing
G Q (µq , Σq ). This modeling is natural in capturing fuzzily the way literature on the analysis of soccer games [3, 4, 8, 9, 16, 17], we can
soccer experts specify playing zones of interests in a bidimensional state that the Wyscout dataset we use in our experiments is unique
plane, i.e., via ellipsoids. Similarly, as shown in [2], we assume in the large number of games (7,304) and players (1,192) considered,
that the player’s propensity can be modeled via a bidimentional and for the length of the period of observation (4 seasons).
gaussian distribution of average vector µu and covariance matrix Σu
derived from the events u played in the past, hereafter indicated as GQ Gu
recall that K L(G Q , Gu ) = +
∫ ∫
1 We G Q log Gu log . It is not symmet-
Gu (µu , Σu ). This Gaussian modeling of the players’ propensity can Gu GQ
ric, but it could be turned into a symmetric measure by computing ∆(G Q , Gu ) =
be used to derive a more robust vector Vu . In fact we could compute
K L(G Q , Gu ) + K L(Gu , G Q ). For simplicity of exposition we stick below onto K L ,
Vu (i) by integrating the gaussian density function of player u over even if what we state below holds also for its symmetric variant.
L. Pappalardo et al.

type subtype tags


pass cross, simple pass accurate, not accurate, key pass, op-
of 14, 608 examples, 80% of which are used to train a Linear Sup-
portunity, assist, (goal) port Vector Machine (SVM). We have selected the cost parameters
foul no card, yellow, red, 2nd yellow that had the maximum average Area Under the Receiver Oper-
shot accurate, not accurate, block, oppor-
tunity, assist, (goal)
ating Characteristic Curve (AUROC) on a 5-fold cross validation.
duel air duel, dribbles, tackles, accurate, not accurate We validate SVM on the remaining 20% of the dataset, finding an
ground loose ball AUC=0.92 (F 1=0.85, accuracy=0.70), significantly better than a
free kick corner, shot, goal kick, throw accurate, not accurate, key pass, op-
in, penalty, simple free kick portunity, assist, (goal)
classifier which always predicts the most frequent outcome (i.e.,
offside non-victory, AUC=0.50, F 1=0.48, accuracy=0.62) and a classifier
touch acceleration, clearance, touch counter attack, dangerous ball lost, which chooses the label at random based on the distribution of
missed ball, interception, opportu-
nity, assist, (goal)
victories and non-victories (AUC=0.50, F 1=0.53, accuracy=0.53).
д
Table 2: Event types, with their possible subtypes and tags. We also experimented with different labellings of oT by defining
д д
either oT =0 in the case of defeat and oT =1 otherwise, or by defining
д
a ternary classification problem where oT =1 indicates a victory,
20K д д
oT =0 a defeat and oT =2 a draw. In all these cases we did not find
µ = 59. 89 72.3% passes
# players per match

5.7% tackles any significant difference in the feature weights described below.
15K σ = 36. 22 4.6% clearances Figure 4 shows the top-10 (black bars) and the bottom-10 (grey
3.4% crosses
3.0% aerial duels bars) feature weights w = [w 1 , . . . , w n ] resulting from SVM. We
10K 2.9% dribbles find that assist-based features are the most important ones, followed
2.8% intercepts by accuracy of shots, goal opportunities created and acceleration in
2.3% fouls
5K 2.2% shots counter attacks. In contrast, getting a red/yellow card gets a strong
0.8% goalkeeping
(a) 0.2% (b) goals
negative weight, as well as the inaccurate shots. It is interesting
0K to notice that, though these choices are pretty natural for who is
0 50 100 150 200 250 300 350 0 20 40 60 80 100 skilled in soccer-player evaluations, PlayeRank derived them by
# events per player frequency (%) just looking at the soccer-logs.

Figure 3: (a) Distribution of the number of events produced touch assist


cross assist
by one player per match (µ=average, σ =st. deviation). (b) Fre- pass assist
corner assist
clearance

top­10
quency of events per type.
shot opportunity
shot accurate
accel. counterattack
touch opportunity
5.2 Construction of player performance clearance counterattack
We compute the players’ performance vectors by a two-step proce- cross not accurate
touch not accurate
dure. First, we define a feature for every possible combination of penalty not accurate

bottom­10
cross
type, subtype and tag. For example, given the foul type, we obtain yellow card
shot blocked
four features: foul no card, foul yellow, foul red and foul 2nd yellow. corner kick
corner not accurate
We discard the goal tag since we have implicitly considered the 2nd yellow card
goals in the outcome of a performance during the learning phase.
red card
Nevertheless goals can be still included in the performance rating 0.00 0.05 0.10 0.15 0.20
by Equation (2) in Section 3.3. Our performance vectors consist absolute weight
then of 76 features. We also created more sophisticated vectors by
considering the field zones where events occurred, without finding
Figure 4: Top-10 (black bars) and bottom-10 (gray) features
any significant differences w.r.t. the results presented below.
д according to the absolute value of the weights.
Second, we obtain the performance vector pu of a player u in
game д by counting the number of events of a given type, subtype
and tag combination that player u produced in д. For example, the For the sake of completeness of our experimental results, we also
number of fouls without card made by u in д compose the value of repeated the classification task separately league by league, i.e., we
feature foul no card of u in д. created five SVMs each one trained on the games of one league only.
We then extracted for each of the five leagues the corresponding set
(j) (j)
5.3 Experimenting PlayeRank of weights w(j) = [w 1 , . . . , w n ] (j = 1, . . . , 5) and quantified the
As discussed in Section 3.2, step #3 of PlayeRank turns the problem difference between w and the w(j) by computing the normalized
of estimating feature weights into a classification problem between root-mean-square error:
a team performance vector and game outcome. We instantiate this q
1 Ín (j) 2
n i=1 (w i − w i )
д д
problem by creating, for each game д, two examples pT and pT (j)
1 2 N RMSE(w, w ) =
corresponding to the performance vectors of the two playing teams, w
д
which are properly associated with the game outcome label oT that where w is the average weight of set w. We found that the maximum
is 1 if a team wins and 0 otherwise. The resulting dataset consists N RMSE is 2%, indicating that the difference between any w(j) and w
PlayeRank: role-aware rating of soccer players

r player club r player club


is not significant and hence the relation between team performance .416 Nolito Sevilla .460 L. Messi Barcelona
and game outcome is independent of the specific league considered.

cluster 4

cluster 2
.415 P. Ntep Wolfsburg .459 K. De Bruyne Man. City
As a result, we will just use w in the following processing steps. .412 D. Cheryshev Villarreal .451 L. Suárez Barcelona
.408 Jordi Alba Barcelona .444 C. Ronaldo R. Madrid
Given w, we compute via step #3 of PlayeRank the rating r (u, G) .407 C. Brunt West Brom. .441 Á. Di María PSG
of each player u over the whole series G of games in the dataset. .425 M. Pjanić Juventus .409 Sergi Roberto Barcelona
Figure 5 offers a pictorial representation of the distribution of these

cluster 3

cluster 1
.424 T. Kroos R. Madrid .408 M. Weiser Hertha B.
ratings by grouping the players on the x-axis according to their .423 Fabregas Chelsea .408 M. Risse Köln
.420 M. Dahoud Borussia D. .402 K. Trippier Tottenham
roles. We remind that a player may be assigned to more than one .420 G. Castro Borussia D. .401 Danilo Man. City
role, among the eight roles detected by step #4 of PlayeRank. We .391 Javi Martínez Bayern M. .444 C. Ronaldo R. Madrid
observe a different distribution of ratings according to the players’

cluster 7

cluster 8
.391 N. Maksimović Napoli .442 Neymar PSG
.390 J. Tah Bayer L. .438 F. Ribéry Bayern M.
roles, both in terms of range of values and concentration. This fully .387 David Lopez Espanyol .436 D. Payet O. Marseille
justifies the design of step #5 of PlayeRank in which we have intro- .387 J. Matip Liverpool .430 D. Alli Tottenham
duced role-based rankings. In fact we notice that the top-ranked .409 Bartra Borussia D. .435 T. Müller Bayern M.

cluster 5

cluster 6
player of cluster 7, Javi Martínez, gets a player rating which is .393 H. Maguire Leicester .433 O. Dembélé Barcelona
.392 K. Rekik Hertha B. .430 A. Robben Bayern M.
almost in the bottom of the ratings of clusters 8 or 2 (Figure 5). .390 D. Blind Man. Utd .428 M. Salah Liverpool
The role-based rankings are explored in Table 3 which reports .389 R. Funes Mori Everton .426 G. Bale R. Madrid
the top-5 players grouped by the eight roles. Although PlayeRank Table 3: Top-5 players in each role-based ranking, with the
is fully data-driven, the rankings are strongly consistent with the corresponding player rating (r ) computed on the last four
common sense of soccer fans. For example, Lionel Messi (Barcelona) seasons and the club where they currently play.
is the best player in cluster 2 (see Figure 2), followed by other
renowned players such as Kevin De Bruyne (Manchester City), Luis
Suárez (Barcelona) and Cristiano Ronaldo (Real Madrid). The last player according to PlayeRank by declaring u 1 stronger than u 2 , if
one, instead, is the best player in cluster 8, preceding Neymar (PSG) u 1 precedes u 2 in the ranking. We then discarded from P all pairs for
and Franck Ribéry (Bayern München). Table 3 also shows the util- which there is not a majority among the evaluations of the soccer
ity of the soft-clustering procedure: some players are assigned to experts: namely, either all experts expressed equality or two experts
multiple rankings, indicating that they can play in different roles, disagreed in judging the best player and the third one expressed
highlighting their versatility. For example, this is the case of Cris- equality. As a result, we discarded 8% of P’s pairs.
tiano Ronaldo, which appears both in clusters 2 and 8. On the other Over the remaining P’s pairs, we investigated two types of con-
hand, our classification in 8 roles allows us to distinguish players cordance among the experts’ evaluations: (i) the majority concor-
that were classified together by the classic three-way classification dance c maj defined as the fraction of the pairs for which PlayeRank
in defenders-midfielder-forwards. This is the case of Messi and De agrees with at least two experts; (ii) the unanimous concordance
Bruyne which are in different roles according to the traditional clas- c una defined as the fraction of pairs for which the experts’ choices
sification (because the former is forward, the latter is midfielder), are unanimous and PlayeRank agrees with them. We found that
whereas they appear in the same cluster 2 according to PlayeRank c maj = 68% and c una = 74%, indicating that PlayeRank has in general
given how they play on the field. a good agreement with the soccer experts, compared to the random
What it is surprising in these role-based rankings is that they choice (for which c maj =c una =50%). Figure 6 (left) offers a more de-
have been derived by PlayeRank without considering the number of tailed view on the results of the survey by specializing c maj and c una
goals scored by players when building the performance vector. Ac- on the three ranges of ranking differences: [1, 10], [11, 20], [21, ∞].
tually, we observe that the goal-adjusted ranking r ∗ (u, G) is pretty The bars show a clear and strong correlation between the concor-
much consistent with r (u, G) for all values of α (Eq. 2). dance among experts’ evaluations (per majority or unanimity) and
the difference between the positions in the ranking of the checked
pairs of players: when the ranking difference is ≤ 10 it is c maj = 59%
5.4 Validation of PlayeRank
and c abs = 61%; for larger and larger ranking differences, PlayeRank
We validate PlayeRank by creating and submitting a survey to achieves a much higher concordance with experts which is up to
three experts, employed by Wyscout, that are professional soccer c maj = 86% and c abs = 91% when the ranking difference is ≥ 20.
talent scouts, hence particularly skilled at evaluating and comparing Clearly, the disagreement between PlayeRank with the experts is
soccer players. Our survey consisted of a set of pairs of players less significant when the players are close in the ranking (i.e., their
randomly generated by a two-step procedure, defined as follows. distance < 10). Indeed, the comparison between soccer players is
First, we randomly selected 35% of the players in the dataset. Second, a well-known difficult problem as witnessed by the significant in-
for each selected player u we cyclically iterated over the ranges crease in the fraction of unanimous answers by experts, which goes
[1, 10], [11, 20] and [21, ∞] and selected one value, say x, for each of from a low 58% in the range [1, 10] to a reasonable 71% in the range
them, and then picked the player being x positions above u and the [21, +∞]. This a fortiori highlights the robustness of PlayeRank: the
one being x positions below u in the ranking (if they exist). This experts disagreement decreases as pairs of players are farther and
generated a set P of 211 pairs involving 202 distinct players. farther in the ranking.
For each pair (u 1 , u 2 ) ∈ P, every expert was asked to select the As a final investigation, we compared PlayeRank with the PSV
best player between u 1 and u 2 , or to specify that the two players metric, the current state of the art in soccer-players ranking [3].
were equally strong. For each such pair, we also computed the best This metric is somewhat mono-dimensional because it exploits just
L. Pappalardo et al.

Figure 5: Distribution of player ratings in each role-based ranking. Each boxplot represents a cluster (role) and each circle
indicates the rating of a player, computed across the last four seasons. For each cluster, the name of the players at the top of
the corresponding role-based rankings are shown.

passes to derive the final ranking. Figure 6 (right) shows the results Messi, whose heatmap of positions is drawn in Figure 7b, has the
obtained by PSV over our set of players’ pairs evaluated by the three highest rˆ(u, G, Q 1 ). In the table, it is interesting to note that, though
experts. It is evident that PSV achieves significantly lower concor- the vector of Arjen Robben is more similar to Q 1 (r (Robben, Q 1 ) =
dance than PlayeRank with the experts: the majority concordance 0.61) than Messi’s vector (r (Messi, Q 1 ) = 0.60), Messi has a higher
ranges from 53% to 76%, while the unanimity concordance ranges player rating (r (Messi, G) = 0.46, r (Robben, G) = 0.43). As a result,
from 55% to 78%. So that PlayeRank introduces an improvement the combination rˆ(u, G, Q 1 ) of the two quantities makes Messi the
which is up to 16% (relative) and 13% (absolute). player offering the best trade-off between matching with the user-
specified zones and performing well in those zones.
Note that the function combining the two factors r (u, Q) and
100 PlayeRank PSV r (u, G) could be defined in many other ways, in order to better
majority majority
91% capture the user’s needs. The one above, in its simplicity, shows the
concordance (%)

90 unanimity 86% unanimity novelty and power of the proposed approach. Other combinations
will be investigated in the future.
80 75% 76%78%
70 68% 70%70%

60 59%61%
53%55%
50
[1, 10] [11, 20] [21, + ∞] [1, 10] [11, 20] [21, + ∞]
ranking difference range ranking difference range
Figure 6: Majority (gray bars) and unanimity (red) concor-
dance between PlayeRank and the three experts (left), and
between the PSV metric and the three experts (right).

5.5 Retrieval of players


We use PlayeRank to implement the search engine described in
Section 4 over the Wyscout soccer-DB. We consider a tessellation
of the soccer field into 100 equal-sized zones. Specifically, we define
Figure 7: Visualization of a spatial query Q 1 (red area) and
a query Q as a binary vector of “presence in a zone” selected by
the heatmaps of presence of G. Bale (a) and L. Messi (b). The
the user and compute r (u, Q) as the dot product between Q and the
darker a zone the more likely the player is to play in it.
player vector Vu , and r (u, G) as the player rating over all games of
u. We then sort the players in decreasing order of a combination of
these two measures:
rˆ(u, G, Q) = r (u, Q) ∗ r (u, G). 6 CONCLUSIONS AND FUTURE WORK
Table 4 shows the top-10 players in the DB according to their In this paper we presented PlayeRank, a novel data-driven frame-
rˆ(u, G, Q 1 ) for an exemplar query Q 1 showed in Figure 7. Lionel work that offers a multi-dimensional and role-aware evaluation of
PlayeRank: role-aware rating of soccer players

player rˆ r r club [11] J. López Peña and H. Touchette. A network theory analysis of football strategies.
1 L. Messi 0.28 0.60 0.46 Barcelona ArXiv e-prints, June 2012.
[12] P. Lucey, D. Oliver, P. Carr, J. Roth, and I. Matthews. Assessing team strategy
2 A. Robben 0.26 0.61 0.43 Bayern M. using spatiotemporal data. In Proceedings of the 19th ACM SIGKDD Intl Conference
3 L. Suárez 0.24 0.54 0.45 Barcelona on Knowledge Discovery and Data Mining, pages 1366–1374, 2013.
4 T. Müller 0.24 0.56 0.43 Bayern M. [13] C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval.
Cambridge University Press, 2008.
5 Mohamed Salah 0.24 0.56 0.43 Liverpool [14] O. Müller, A. Simons, and M. Weinmann. Beyond crowd judgments: Data-driven
6 R. Lukaku 0.24 0.56 0.42 Man. Utd estimation of market value in association football. European Journal of Operational
7 A. Petagna 0.23 0.55 0.42 Atalanta Research, 263(2):611–624, 2017.
[15] T. A. Myrvoll and F. K. Soong. Optimal clustering of multivariate normal distri-
8 D. Berardi 0.22 0.54 0.41 Sassuolo butions using divergence and its application to HMM adaptation. In Procs of the
9 Aduriz 0.22 0.55 0.40 A. Bilbao 2003 IEEE Intl Conference on Acoustics, Speech, and Signal Processing (ICASSP),
pages 552–555, 2003.
10 G. Bale 0.22 0.52 0.43 R. Madrid [16] L. Pappalardo and P. Cintia. Quantifying the relation between performance and
Table 4: Top-10 players in the Wyscout DB according to their success in soccer. Advances in Complex Systems, 2017.
rˆ(u, G, Q 1 ) with respect to query Q 1 in Figure 7, computed on [17] L. Pappalardo, P. Cintia, D. Pedreschi, F. Giannotti, and A.-L. Barabasi. Human
perception of performance. arXiv preprint arXiv:1712.02224, 2017.
the last four seasons of the five main European leagues. [18] S. Pettigrew. Assessing the offensive productivity of nhl players using in-game
win probabilities. In MIT Sloan Sports Analytics Conference, 2015.
the performance of soccer players. Our experiments on four sea- [19] P. Power, H. Ruiz, X. Wei, and P. Lucey. Not all passes are created equal: Objec-
tively measuring the risk and reward of passes in soccer from tracking data. In
sons of the five main European leagues showed that the resulting Procs of the 23rd ACM SIGKDD Intl. Conference on Knowledge Discovery and Data
rankings are consistent with the common sense of soccer experts. Mining, pages 1605–1613, 2017.
Our work paves the way to many research questions. First of all, [20] R. Rein and D. Memmert. Big data and tactical analysis in elite soccer: future
challenges and opportunities for sports science. SpringerPlus, 5(1):1410, 2016.
step #4 of PlayeRank enables the study of the versatility of players, [21] B. Ribeiro-Neto and R. Baeza-Yates. Modern Information Retrieval. Addison-
i.e., their ability to change role from game to game. While this is an Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1999.
aspect not yet investigated in the literature, versatility is a desirable [22] P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and valida-
tion of cluster analysis. Journal of Computational and Applied Mathematics,
property of a player which can be now quantified to understand, 20(Supplement C):53 – 65, 1987.
for example, how it relates to his market value. Second, in step #1 [23] O. Shulte and Z. Zhao. Apples-to-apples: Clustering and ranking nhl players
using location information and scoring impact. In MIT Sloan Sports Analytics
of PlayeRank more sophisticated graph modelings of player inter- Conference, Hynes Convention Center, Boston, MA, USA, 2017.
actions could be considered to better capture the performance of [24] R. Stanojevic and L. Gyarmati. Towards data-driven football player assessment.
teams. Third, we could imagine a learning-to-rank approach by In Procs of the IEEE 16th Intl Conference on Data Mining, pages 167–172, 2016.
[25] M. Stein, H. Janetzko, D. Seebacher, A. Jäger, M. Nagel, J. Hölsch, S. Kosub,
integrating soccer-logs with the evaluations of the soccer talent T. Schreck, D. A. Keim, and M. Grossniklaus. How to make sense of team sport
scouts, in order to produce more realistic ratings which combine data: From acquisition to data modeling and research aspects. Data, 2(1), 2017.
the power of data-driven algorithms with the experience of human [26] B. Torgler and S. L. Schmidt. What shapes player performance in soccer? empirical
findings from a panel analysis. Applied Economics, 39(18):2355–2369, 2007.
experts. Finally, it would interesting to adapt PlayeRank to other [27] Q. Wang, H. Zhu, W. Hu, Z. Shen, and Y. Yao. Discerning tactical patterns for
team sports — such as basketball, hockey or rugby — for which mas- professional soccer teams: An enhanced topic model with applications. In Procs of
the 21th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining,
sive logs are available in the same format of soccer logs [7, 20, 25]. pages 2197–2206, 2015.
Here, one should re-define in step #2 only the “outcome of a game”
in order to set up properly the classification problem solved there.

ACKNOWLEDGMENTS
This work is funded by EU project SoBigData RI, grant #654024. We
thank Daniele Fadda for support on data visualization.

REFERENCES
[1] Wyscout API, howpublished=https://support.wyscout.com/.
[2] A. Bialkowski, P. Lucey, P. Carr, Y. Yue, S. Sridharan, and I. Matthews. Large-scale
analysis of soccer matches using spatiotemporal tracking data. In Procs of the
IEEE Intl Conference on Data Mining, pages 725–730, Dec 2014.
[3] J. Brooks, M. Kerr, and J. Guttag. Developing a data-driven player ranking in
soccer using predictive model weights. In Procs of the 22nd ACM SIGKDD Intl.
Conference on Knowledge Discovery and Data Mining, pages 49–55, 2016.
[4] P. Cintia, F. Giannotti, L. Pappalardo, D. Pedreschi, and M. Malvaldi. The harsh
rule of the goals: data-driven performance indicators for football teams. In Procs
of the 2015 IEEE Intl. Conference on Data Science and Advanced Analytics, 2015.
[5] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley & Son, Inc.,
New York, NY, USA, 1991.
[6] J. Duch, J. S. Waitzman, and L. A. N. Amaral. Quantifying the performance of
individual players in a team activity. PLOS ONE, 5(6):1–7, 2010.
[7] J. Gudmundsson and M. Horton. Spatio-temporal analysis of team sports. ACM
Computing Surveys, 50(2):22:1–22:34, 2017.
[8] L. Gyarmati and M. Hefeeda. Analyzing in-game movements of soccer players at
scale. In MIT SLOAN Sports Analytics Conference, 2016.
[9] L. Gyarmati, H. Kwak, and P. Rodriguez. Searching for a unique style in soccer.
CoRR, abs/1409.0308, 2014.
[10] J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. JSTOR: Applied
Statistics, 28(1):100–108, 1979.

You might also like