You are on page 1of 13

EE 7150

Comparison of Soccer Player Video Tracking


Algorithms
Due on Tuesday, December 10, 2013

Cody Lawyer
December 10, 2013

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Contents
Abstract

Introduction

Technical Discussion

Results

10

Conclusion

11

References

12

Appendix

13

Page 2 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Abstract
The problem of tracking players in sporting events using video cameras is an interesting
problem. Tracking players allows for attributes to be measured that would not otherwise
normally be able to be measured, such as fitness and positioning. It is also a difficult
problem because in some sports there are many players on the field, the players occlude each
other, and the camera is possibly moving. There have been a variety of methods proposed
and implemented to track players. The mean shift algorithm and the continuous adaptive
mean shift algorithm both use the color histogram of the target to do the tracking. The
particle filter method uses randomly placed particles and state estimation for tracking. This
project will implement and compare the mean shift algorithm, continuous adaptive mean
shift algorithm and the particle filter method using MATLAB and a soccer video dataset.

Page 3 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Abstract

Introduction
The problem of tracking players using video is a commonly studied problem. It is a difficult
problem because the camera can be non-stationary, many players can be on the field and
the players can occlude each other among other difficulties.
In [1], the mean shift algorithm of tracking objects is described. In [2], a modified mean shift
algorithm called the continously adaptive mean shift algorithm is described. Both methods use a color histogram to represent and track the target. In [3], a multiple hypothesis
tracker is described which implements camera parameter detection, player detection, and
player tracking using multiple camera angles. In [4], player tracking is implemented using a
Kalman filter. Finally, in [5], player tracking is implemented using a particle filter.
In this project, the mean shift algorithm, the continously adaptive mean shift algorithm,
and the particle filter algorithm were selected to be implemented and compared. The mean
shift and continously adaptive mean shift algorithms were chosen because they are similar
so if one is implemented the other is not difficult to implement. The particle filter method
was chosen it is an interesting method and is very different from the other two algorithms.
The rest of this paper is organized as follows: First, a technical discussion describing the
details of the three methods. Next, a description of the dataset used for comparison followed by a descripition of the MATLAB simulation results. Finally, a conclusion on which
of the three methods is best for tracking players. The references used as well as an appendix
containing the MATLAB code is also included.

Technical Discussion
Mean Shift Algorithm
For the mean shift algorithm [1], first the user is presented with the first frame in the
video to select what player to track. The frame is converted from RBG color space to HSV
color space. This is done because the hue and saturation values in HSV are less susceptible
to changes in brightness. A small window around where the user selects is used to calculate
the target hue histogram of the image. The histogram is formed by placing each pixel in a
bin based on its value. The histogram is then used to calculate the backprojection of the
entire frame. The backprojection is formed by replacing each pixel with the value of the bin
it would be placed in the histogram. As shown in Figure 1, this gives greater values to areas
that match the target to be tracked.

Technical Discussion continued on next page. . .

Page 4 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video TrackingTechnical


Algorithms
Discussion (continued)

Figure 1: Example Backprojection


A window of backprojection values around the target is selected. The following moments
are then computed.
XX
M00 =
I(x, y)
x

M10 =

M01 =

XX

XX
x

xI(x, y)

yI(x, y)

Using the moments, the mean x and mean y location of the window can be calculated.
xc =

M10
M01
yc =
M00
M00

The window is then shifted to the calculated location. If the difference between the previous
location and the new location is smaller than a user defined value, the next frame is loaded.
If the difference is large, the new moments and location are calculated until the window
shifts less than the user defined amount. This done until the target leaves the video frame
or until the end of the video.
The mean shift algorithm requires tuning depending on the application. The tracking window size needs to be adjusted to be similar is size to the expected target. Also the difference
between locations while continuing to iterate can be adjusted to increase speed or the precision of the track.
CAMshift Algorithm

Technical Discussion continued on next page. . .

Page 5 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video TrackingTechnical


Algorithms
Discussion (continued)

The cam shift algorithm [2] is a modified version of the mean shift algorithm. Instead
of using the fixed window size, it adjusts the size of the window throughout the tracking to
better track through occulsions and changes in size.
The cam shift algorithm is implemented by starting with the standard mean shift algorithm.
The window size is adjusted as a function of M00 after it is calculated. This a function that
needs to be tuned to the particular application but in my use it is set as followed.
s=2

M00

The window width is set to 1.5s and the window height is set to 2s. The values that s is
multiplied by can be tuned to the particular application.
Particle Filter The particle filter can be used to track objects. It is implemented as described in [5] and [6] The tracked object is represented by a collection of randomly generated
particles. Each particle has a state and a weight. The state is represented by
xt = [x y r g xt yt ]
where x and y are the location of the particle in the x and y dimension, rt and g are the red
and green chromacity values of the particle, and xt and yt are the velocity of the particle in
the x and y dimension. The chromacity values are between 0 and 1.
The particle and their weights are updated each frame using observed data. For this application, the observed data is the extracted player regions from the frame which are represented
by
C = (e, p, c)
where e is the list of edge points for the region, p is the center of mass of the region, and c
is the average color of the region.
Player region detection is implemented via various common digital image processing tasks.
Using the first frame of the video, a playing field mask is calculated. This is used to filter
out sideline clutter. First, the hue histogram of the entire image is calculated. Since the
image is mostly grass, the histogram will have a peak at green hues. The backprojection
of the entire frame is calculated using the histogram and is thresholded. This is shown in
Figure 2 and represents the grass areas in the image.

Technical Discussion continued on next page. . .

Page 6 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video TrackingTechnical


Algorithms
Discussion (continued)

Figure 2: Playing Field Mask Thresholded Backprojection


The largest areas of this backprojection are selected by finding the connected regions and
counting the pixels. These are assumed to be the large areas inside the playing field. Those
areas are combined by taking the convex hull of them. This results in Figure 3 which is the
playing field mask.

Figure 3: Playing Field Mask


The next step is to determine the player regions in each frame. First, the compliment of the
thresholded backprojection is combined with the playing field mask. This results in Figure
4 which has the players as regions but also has additional clutter such as the playing field
Technical Discussion continued on next page. . .

Page 7 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video TrackingTechnical


Algorithms
Discussion (continued)

markings.

Figure 4: Player Backprojection


To filter out the clutter and seperate players that are close together, image erosion is applied
to the backprojection. Also regions that are too large, small, wide, or tall are removed. This
results in Figure 5 which are the player regions. The edge points, center of mass, and the
average color are calculated for each region and passed to the particle filter each frame.

Figure 5: Player Regions


On the first frame, the region to be tracked is selected by the user. The particles are distributed uniformly over the region. For each new frame, the particles are updated using the
Technical Discussion continued on next page. . .

Page 8 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video TrackingTechnical


Algorithms
Discussion (continued)

previous frames particles and noise and the weights are updated using the player region data.
The particles are updated using the previous frames particles and noise. The x and y
positions of the particles are updated using the following equations.
xt = xt1 + xt1
+ vx vx N (0, x )
yt = yt1 + yt1
+ vy vx N (0, y )
The red and green chromacity values are updated using the following equations.
rt = rt1 + vr vr N (0, r )
gt = gt1 + vg vg N (0, g )
Finally, the x and y velocity values are updated using the following equations.
xt = xt1
+ (1 )(xt xt 1) = [0, 1]
yt = yt1
+ (1 )(yt yt 1) = [0, 1]
After the particles have been updated for the current frame, the weight of each particle is
calulated. The weights are calculated using the player region data. The error vector for each
particle is calculated as follows. ex = 0 if in region else it is set to distance to the nearest
edge. ey = 0 if in region else it is set to distance to the nearest edge. er = Cr rt where Cr
is the average red chromacity of the region and rt is the red chromacity value of the particle
for the current frame. In a similar approach, eg = Cg gt where Cg is the average green
chromacity of the region and gt is the green chromacity value of the particle for the current
frame. ext and eyt are set to zero because they are not updated using any player region data,
just the movement of the particles. The error vector is defined as ed = [ex ey er eg ext eyt ]T .
The weight of each particle is calculated using
eT R 1ed
w = exp d
2

where R is the covariance matrix to weight errors different amounts if neccessary. For example, if in your application the color data is noisy, those errors can be weighted less.
A problem with the particle filter is that after a few iterations most of the particles will
have very low weights. This problem can be mostly prevented by resampling the particles
after the weights are calculated each frame. This done by replacing the particles with small
weights with copies of particles that have large weights. First, a CDF C(i) is calculated using
the weights of each particle. A random starting point u1 is drawn from U [o, Ns1 ] where Ns
is the number of particles. Then moving along the CDF using uj = ui + Ns1 (j 1) and
while uj > C(i) move until a particle with a large enough weight is found. The new particle
is assigned Xki = Xki and the new weight is assigned wki = Ns1 .
Finally, the tracked location of the object is the weighted mean of the particle locations.
There is a large amount of tuning the particle filter your particular application. The number
of particle Ns , the particle noises x , y , r , g , , and the player region detection method
all need to be tuned to your particular application. Also some applications with low process
noise are not suited for particle filters as described below in results.
Page 9 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking AlgorithmsTechnical Discussion

Results
Dataset
The dataset [7] used was six videos of the same clip of a soccer match. Each video was
a different viewpoint of a stationary camera. The resolution of each camera was 1920 by
1080 pixels. The dataset provided ground truth tracking data for each of the players in the
video. An example frame from one of the cameras is shown in Figure 6.

Figure 6: Sample Frame From Dataset


Results For each of the methods described, they were tested by tracking the same player
and comparing the results to that of the truth data. The mean shift algorithm did fairly
well with the track getting worse as the player starts to leave the frame. The video of
this is available at https://www.youtube.com/watch?v=vcH0O8cc7ho. The cam shift algorithm aslo performs pretty well with the track not being lost as much as the player leaves the
frame. The video of this is available at https://www.youtube.com/watch?v=k4pXwkEeEU8.
The particle filter did not track the object. A large number of different settings were try
and none appeared to help the problem. It appeared to be working for a few frames but
collapsed to a single particle after that. Upon further reading, this happens when there is
a small amount of process noise in the data. This means the data is not well suited for a
particle filter. The paper that was used to implement the particle filter used soccer video
with moving cameras while the dataset I was using had stationary cameras. This is what I
believe to cause the problem.
To quantify the results, the root mean square error of each track was calculated. The
result for the mean shift algorithm was RM SEM S = 227.8530. The result for the cam shift
Results continued on next page. . .

Page 10 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms Results (continued)

algorithm was RM SECS = 209.0335. Figure 7 shows a plot of the tracking data and the
truth data.

Figure 7: Tracking Results

Conclusion
This project was a comparison of soccer player video tracking algorithms. The mean shift
algorithm, the cam shift algorithm, and the particle filter method were all detailed and
implemented in MATLAB. They were compared using a soccer video tracking dataset. The
mean shift algorithm and the cam shift algorithm both did a reasonably good job at tracking
the player. The cam shift algorithm did a better job when comparing root mean square
errors. This is expected because the cam shift algorithm is a modified version of the mean
shift algorithm. The particle filter method did not work for this dataset. The collection of
particles collapsed to a single location after a few iterations. This is due to low process noise
possibly caused by the stationary camera of this dataset.

Page 11 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Conclusion

References
1. L. W. Kheng, Mean Shift Tracking, Computer Vision and Pattern Recognition CS4243,
National University of Singapore.
2. G. R. Bradski, Computer Vision Face Tracking For Use in a Perceptual User Interface, Intel Technology Journal, Q2 1998.
3. M. Beetz, S. Gedikli, J. Bandouch, B. Kirchlechner, N. von Hoyningen-Huene, and A.
C. Perzylo, Visually Tracking Football Games Based on TV Broadcasts In IJCAI (pp.
2066-2071), January 2007.
4. M. Xu, J. Orwell, L. Lowey, and D. Thirde, Architecture and algorithms for tracking
football players with multiple cameras, in IEEE Proceedings of Vision, Image and Signal
Processing, vol.152, no.2, pp.232,241, 8 April 2005.
5. A. Dearden, Y. Demiris, and O. Grau, Tracking Football Player Movement From a
Single Moving Camera Using Particle Filters, in Proceedings of CVMP-2006, pp. 29-37,
IET Press, 2006.
6. M. S. Arulampalam, S. Maskell, N. Gorden, and T. Clapp. A tutorial on particle
Filters for online nonlinear/non-gaussian bayesian tracking, in IEEE transactions on signal
processing, 50(2), 2002.
7. T. DOrazio, M.Leo, N. Mosca, P.Spagnolo, P.L.Mazzeo, A Semi-Automatic System
for Ground Truth Generation of Soccer Video Sequences, in the Proceeding of the 6th
IEEE International Conference on Advanced Video and Signal Surveillance, Genoa, Italy
September 2-4, 2009.

Page 12 of 13

Cody Lawyer

EE 7150: Comparison of Soccer Player Video Tracking Algorithms

References

Appendix
The following m-files are attached:
calcHueBackprojection.m
CAMShiftTracking.m
meanShiftTracking.m
ParticleFilterTracking.m
resampleSIR.m
SIRParticleFilter.m

Page 13 of 13

You might also like