桌球比賽裁判自動系統

國立中興大學資訊科學與工程學系
碩士學位論文
桌球比賽裁判自動系統
An Automatic Table Tennis Match Umpiring
System
指導教授: 喻石生 Shyr-Shen Yu

指導教授: 詹永寬 Yung-Kuan Chan
研究生: 王珮翰 Pei-Han Wang
中華民國一○三年七月
i
i
摘要
目前桌球評判方式主要是靠人眼來決定得分與否。但在擊球速度過快或裁判
個人因素而會出現誤判等情形。因此，有必要建立一套完整且準確的桌球自動裁
判系統。在以往的相關應用當中以網球的鷹眼系統最為著名，其方法為追蹤記錄
球的路徑並顯示記錄的實際路徑的圖形圖像，也可以預測球未來的路徑。
本篇文章將此大致概念運用至桌球比賽中，利用頂帽濾波器（tophat）來銳
化桌球之桌緣，並利用形態學以及連通區域標記將桌球桌切割出來。桌網切割的
部分為形態學搭配線性回歸的方式得出其直線方程式。接著利用前景灰階值扣除
背景灰階值來保留移動物件，搭配形態學以及區域性偵測來提高偵測桌球球心之
準確率。最後再將所得之三樣特徵值（桌球桌區域、桌球落點質心、桌網直線方
程式）輸入至本桌球規則系統中來有效判別得分與否。本篇文章採用 JVC-GZ-GX1
攝錄影機拍攝，並與人眼判別進行下列之比對，桌面切割、偵測桌球質心、落點、
評分等準確率，平均準確率分別為 0.9919、0.8488、0.8460、及 0.8322。實驗結
果證明本篇文章提出的演算法對桌球偵測與桌球自動判分有一定之準確率。
關鍵詞: 桌球自動評判、鷹眼系統、軌跡偵測、線性回歸、形態學
i
Abstract
The current referee methods of table tennis games rely on the human eye to
determine the score. However, fast hitting and the referee’s personal factors may cause
misjudgment. Therefore, it is necessary to establish a complete and accurate automated
referee system of table tennis games. The most famous related application is Hawk-Eye,
to visually track the trajectory of the ball and display a record of its statistically most
likely path as a moving image.
This article generally applies this concept to a table tennis game. First, use tophat
filter to sharpen the edge of the table tennis table. Then, we can apply Morphology and
labeling to detect the table tennis table. The part of the table tennis net detection utilizes
Morphology and Linear Regression to find the net’s linear equation. After that, we can
deduct the background intensities from the foreground intensities to retain the moving
objects. And with morphological and regional detection, we can improve the accuracy
of table tennis ball center detection. Finally, import the above results which are the table
tennis table, the table tennis net’s linear equation, and the table tennis ball center to the
automated referee system of table tennis games in order to distinguish score or not. This
article uses JVC-GZ-GX1 camcorder to shoot, and compare with the human eye by the
following features, which are the accuracy of table detection, ball detection, placement
detection, and score. The average accuracies are 0.9919, 0.8488, 0.8460, and 0.8322.
Experimental results show that the algorithm proposed in this article has high accuracy.
Keyword : automated referee system of table tennis games, Hawk-Eye, trajectory
detection, linear regression, morphological
ii
Table of Contents
摘要 ................................................................................................. i
Abstract ......................................................................................... ii
Table of Contents ......................................................................... iii
List of Tables ................................................................................. v
List of Figures .............................................................................. vi
Chapter 1 Introduction ............................................................. 1

1.1 Background ....................................................................................................... 1
1.2 Motivation and goal .......................................................................................... 3
1.3 Environment Settings ....................................................................................... 5
1.4 Organization ..................................................................................................... 6
Chapter 2 Related Works ......................................................... 8

2.1 Morphology ...................................................................................................... 8
2.1.1 Erosion ...................................................................................................... 8
2.1.2 Dilation ................................................................................................... 10
2.1.3 Opening and Closing ...............................................................................11
2.2 Area Filling Algorithm ................................................................................... 13
2.3 Connected Component Labeling .................................................................... 14
2.4 Otsu’s Thresholding ........................................................................................ 15
2.5 Linear Regression ........................................................................................... 17
Chapter 3 ATTMU System ..................................................... 19

3.1 Table detection ................................................................................................ 20
3.2 Net detection ................................................................................................... 24
3.3 Ball detection .................................................................................................. 25
iii
3.3.1 Part A-Candidate point detection ............................................................ 26
3.3.2 Part B-Local area detection .................................................................... 30
3.3.3 Placement detection ................................................................................ 33
3.4 Regular system ............................................................................................... 35
3.4.1 Effective service judgment ..................................................................... 35
3.4.2 Effective service return judgment ........................................................... 38
3.4.3 Reset serving point ................................................................................. 40
Chapter 4 Results and Discussion.......................................... 41

4.1 Experimental evaluation methods .................................................................. 41
4.2 The experiment results of Table Detection ..................................................... 43
4.3 The experiment results of Net Detection ........................................................ 44
4.4 The experiment results of Ball Detection ....................................................... 45
4.5 The experiment result of Regular System ...................................................... 55
4.6 The efficacy of ATTMU ................................................................................. 56
Chapter 5 Conclusions and Future Prospects ...................... 57

5.1 Conclusions .................................................................................................... 57
5.2 Future Prospects ............................................................................................. 58
Reference ..................................................................................... 59
iv
List of Tables
Table 4-1 The assessment results of table detection....................................................... 43
Table 4-2 The assessment results of net detection ......................................................... 44
Table 4-3 The comparing results of the ball detection ................................................... 54
Table 4-4 The comparing results of the placement detection ......................................... 54
Table 4-5 The final results of the referee system ........................................................... 55
Table 4-6 The cost times of ATTMU ............................................................................. 56
v
List of Figures
Fig.1-1 Three main objects of the referee system of table tennis games; ........................ 5
Fig.1-2 The final position of the camera .......................................................................... 6
Fig.1-3 The environment of table tennis games in this thesis .......................................... 6
Fig.2-1 Three possible states of B[x]................................................................................ 9
Fig.2-2 The schematic diagram of erosion ..................................................................... 10
Fig.2-3 The schematic diagram of dilation; ................................................................... 11
Fig.2-4 The schematic diagram of opening; ................................................................... 12
Fig.2-5The schematic diagram of closing; ..................................................................... 12
Fig.2-6 The schematic diagram of Area filling algorithm; ............................................. 13
Fig.2-7 The relationship between P and the pixels around P ......................................... 14
Fig.2-8connected component labeling in the binary image;........................................... 15
Fig.2-9 A linear regression equation having a variable .................................................. 17
Fig.3-1 The flow chart of table detection; ...................................................................... 20
Fig.3-2 The flow chart of table detection ....................................................................... 21
Fig.3-3 the structure element : disc; ............................................................................... 22
Fig.3-4 the process of the table detection ....................................................................... 23
Fig.3-5 The flow chart of net detection .......................................................................... 24
Fig.3-6 The process of the net detection......................................................................... 25
Fig.3-7 The flow chart of net detection .......................................................................... 25
Fig.3-8The flow chart of net detection in Part A ............................................................ 26
Fig.3-9 The process of the ball detection in Part A ........................................................ 29
Fig.3-10 The flow chart of net detection in Part B ......................................................... 30
Fig.3-11 The process of the ball detection in Part B ...................................................... 32
Fig.3-12The feature of the placement............................................................................. 33
Fig.3-13 The flow chart of regular system ..................................................................... 35
Fig.3-14 The schematic diagram of effective service; ................................................... 36
Fig.3-15 The schematic diagram of effective and invalid service return; ...................... 38
Fig.4-1 The experiment results of table detection .......................................................... 43
Fig.4-2 The experiment results of net detection ............................................................. 44
Fig.4-3 Movie(a)-white, the experiment results of table detection ................................ 46
Fig.4-4 Movie(a)-yellow, the experiment results of table detection .............................. 47
Fig.4-5 Movie(b)-white, the experiment results of table detection ................................ 48
Fig.4-6 Movie(b)-yellow, the experiment results of table detection .............................. 49
Fig.4-7 Movie(c)-white, the experiment results of table detection ................................ 50
Fig.4-8 Movie(c)-yellow, the experiment results of table detection .............................. 51
vi
Fig.4-9 Movie(d)-white, the experiment results of table detection ................................ 52
Fig.4-10 Movie(d)-yellow, the experiment results of table detection ............................ 53
vii
Chapter 1 Introduction
1.1 Background
Table tennis is one of the most popular sports in the world. There are no age or
gender barriers. One can play table tennis according to his own capabilities and
limitations, and still be competitive. He does not have to worry about those bruises or
even broken bones that he can get in contact sports. Even many athletes with
disabilities can compete on equal terms with able-bodied athletes at table tennis. He
can also play table tennis all year round, day or night, and don't have to worry about
bad weather or covering up to keep harmful UV rays off him. He does not have to
spend a lot of money to play table tennis either. A huge amount of space is not needed
to have fun playing table tennis at home. Table tennis is easy to play, yet difficult to
master. You will always have another challenge to look forward to, and another
mountain to climb. Table tennis is great for getting up a sweat and getting the heart
rate up. It is also good for the brain of old person. Table tennis is a wonderful sport to
take up for life [1].
Table tennis match umpiring is a very demanding task. Since a table tennis
umpire needs to make accurate judgment about the ball-moving environment, which
involves a series of fast actions and its legitimacy is strictly governed by the Laws of
Table Tennis [2]. A more pragmatic approach [3] is to employ computerized tools
capable of making accurate and fast measurements of the ball moving, to aid the
umpire in making correct decisions.
1
An intuitive, non-disruptive way of evaluating the ball-moving environments to
capture the moving ball, table, and net, called object of interest (OOI), using a video
camera and to then detect and track the OOIs in real time on a frame-wise basis.
However, accurately segmenting and tracking the OOIs in match situations is
extremely challenging for a myriad of reasons, including:
 The ball travels very fast. A high shutter speed camera is necessary; otherwise,
the object can become blurred, color faded, and distorted in shape.
 Besides the ball, many moving objects, such as players, table-tennis bats, and the
crowd, may appear in the video.
 Uneven color and illumination exist since some objects are moving.
 The ball, table, and net may be blocked by the player, the bat, clothing, or others;
otherwise, the ball may disappear from view when it is move to too high or too
low.
 When the contrast between OOI and others is low, the OOI may become
indistinguishable.
 An OOI and others with similar color, size and shape may be confused.
 The size of the ball may be only a few percent of the frame size.
 In a real time application, detecting and tracking the ball must be as fast as
possible.
The study proposes an automatic table tennis match umpiring system (ATTMU
system) to extract the OOIs from a table tennis competition video, and analyze the
actions of the extracted objects, judge whether all the actions comply with the rules,
and then suggest a recommendation for the umpire to consider. ATTMU system will
assist umpires in fast and precisely making correct rulings in a table tennis game.
2
One of official table tennis rules, Articles of 11 point scale (A11)[2] is as follows:
1. When the server serves, the server must hit the ball so that the ball can hit the
table on the server side of the net, and then the ball can go over or around the net
before it hits the table on his opponent's side of the net.
2. When one player serves or fights back, his opponent must hit the ball over the
net and let the ball hit the opponent's court.
3. The winner of a game is the player who first scores 11 points unless both players
score 10 points. If both players score 10 points, then the game is won by the first
player gaining a lead of 2 points.
In this research, the table tennis rules will be used to investigate the performance of
the ATTMU system by experiments.
1.2 Motivation and goal
An automated referee of the table tennis game [4] is a difficult task. On the one
hand, due to the problem of shooting angles, placement of judgment will cause
difficulties. And the athlete stations may also cover part of the desktop, and then it
may cause misjudgment. On the other hand, due to the table tennis ball deformation
caused by the film, it is not easy to find the correct ball positions at each time point.
Furthermore, because the light changes in the environment, it is much difficult to
detect table tennis ball. Therefore, choosing the right camcorder and shooting angles
are important in automated detection of table tennis games.
In the approach [4], shooting modes have taken into double and single camcorder
shooting, as follows:
3
1. Double-camcorders shoot: One camcorder is set up next to the table, and the
other camcorder is set up over the table. The advantage is that we can have the
clearer and actual positions of table tennis balls. Also, the shooting angles of the
camcorder set up next to the table will not affect the placement of judgment.
2. Single-camcorder shooting: The camcorder is set up next to the table. The
advantage is that it only needs to process the images captured by one video
camera, so it has the higher processing speed.
The goal of this thesis is to establish a timely referee system. How to detect
simultaneously when we use the double camcorders is a major problem. So, we use
the single camcorder to make the films. After several experiments, we find the most
appropriate way to set up the camcorder. And it can minimize the misjudgment of
actual location of the table tennis ball by using the single camcorder to shoot.
In a table tennis tournament video, there are three main OOIs, ball, table, and net;
Figure 1.1 shows the three OOIs. First, the ATTMU system will extract the three
OOIs from each frame of the table tennis tournament video, and then judge whether
the actions comply with the rules of A11according to special relationship of ball and
table as well as that of ball and net. We can determine the placement by hitting the
ball on the desktop. And then we can use the placement and net to determine the
service return is effective or not. Finally, we can judge which athlete scores. Thus,
how to detect these objects accurately is very important.
4
Fig.1-1 Three main objects of the referee system of table tennis games;
1.3 Environment Settings
Because the system is expected to adapt to a variety of table tennis competition
environment, we choose four matches environment with the yellow and white ball,
which are the two major specified table tennis balls in the international competitions.
It shows atFig.1-3. So we get a total of eight films to be tested. This thesis uses
JVC-GZ-GX1 camcorder to shoot. The automatic mode is used, and the shutter speed
is set to 1/30, which means thirty images will be shot in one second, for the purpose
of improving the clarity of the ball and accuracy. After several tests and comparisons
of various methods to set up, the final position of the camcorder is set up to align the
net, and 175 cm away from the desktop. The erect height is 200 cm high from the
desktop, and the angle between the desktop and the tripod is 35 degrees. It shows at
Fig.1-2.
5
Fig.1-2 The final position of the camera
Fig.1-3 The environment of table tennis games in this thesis
1.4 Organization
In subsequent chapters, the second chapter will introduce the existing methods
6
which this thesis has used, including morphology, area filling algorithm, connected
component labeling, Otsu’s thresholding, and linear regression. The third chapter is
the main method of this thesis, and it can be divided into the table tennis desktop
detection, the linear equation of the table tennis net detection, and the center of the
table tennis ball detection. And we import the results obtained in these steps into the
referee system of the table tennis games in order to judge the scores. The forth chapter
is experiment results. The fifth chapter is conclusion and future work.
7
Chapter 2 Related Works
This subsection briefly reviews some techniques which will be applied to this
thesis.
2.1 Morphology
Morphology [5] is the method that analyzes the geometry of the structure, based on
a set of algebra. Morphology-based operators are as follows:
1. Erosion
2. Dilation
3. Opening
4. Closing
Morphological image processing is based on the four operators, and then these
operators derive other morphological algorithms. This thesis uses the top-hat filter,
and the principle is subtracted from the original image by the opening image. And
then, we will introduce the four basic operators as follows.
2.1.1 Erosion
Erosion [5] makes the objects in the binary image shrink or thinning. The way and
extent of the shrinkage is controlled by a structuring element. Using the structural
elements of different sizes can remove the objects of different sizes. In addition, if
there is a small link between two objects, we can separate the two objects by erosion.
First, we have an image A and a structuring element B. Then, B is moving on the
image A. There are three possible states as follows:

8
1. B[x] ⊆ A;
2. B[x] ⊆ 𝐴𝑐 ;
3. Both B[x] ∩ A and B[x] ∩ 𝐴𝑐 are not null.
Fig.2-1 Three possible states of B[x]
The first condition shows that B[x] has the closest relation with image A. So, the
point x which satisfies the first condition is called a set of A to B of erosion, denoted
by 𝐴○
－Ｂ. It is defined as follows:
𝐴○
－Ｂ = {𝑧|(𝐵)𝑧 ⊆ 𝐴} (1)
(𝐵)𝑧 means the set of all points z, and the equation represents the displacement of B
contained in A.
Fig.2-2 shows an example of erosion. The solid lines in (b) represent the limit of
movement of the structuring element B. When B moves out over the line, B can’t be
entirely contained in A.
9
Fig.2-2 The schematic diagram of erosion; (a) image A and structuring element B;
(b) the result that A is eroded by B
2.1.2 Dilation
Dilation [5] makes the objects in the binary image larger or thicken. The way and
extent of the shrinkage is also controlled by a structuring element. Assume that the
target image is called A and a structuring element is B. The condition that A dilates by
B is denoted by A⊕B. It is defined as follows:

(2)
𝐴 ⊕ 𝐵 = { 𝑧|(𝐵̂)𝑧 ∩ 𝐴 ≠ ∅ }
(𝐵̂ ) represents the reflection of B. It is defined as follows:

𝐵̂ = { 𝑤|𝑤 = −𝑏，𝑤ℎ𝑒𝑛 𝑏 ∈ 𝐵 } (3)
Where the equation (3) means that after reflecting B for origin and shifting z units,
we can get the set of the displacement z. Fig.2-3 shows an example of dilation. The
solid lines in (b) represent the limit of movement of the structuring element B. When
B moves out over the line, it will lead to the situation that the intersection of A and B
is an empty set. Thus, the area inside the solid lines represents that A dilates by B.
10
Fig.2-3 The schematic diagram of dilation; (a) image A and structuring element
B; (b) the result that A is dilated by B
2.1.3 Opening and Closing
Opening [5] is constituted from two basic operators, including erosion and dilation.
It is denoted by A。B. It is defined as follows:

A。B = (A ○
－B) ⊕ B (4)
Here we define the opening which is between image A and structuring element B.
First, we erode A by B. Next, we dilate the result by B. Because A is first eroded by B,
the narrow part of A will be cut off. Then, we use dilation to smooth the truncated A.
So, opening can be used to cut off the narrow part of the image. It is shown as Fig.2-4.
11
Fig.2-4 The schematic diagram of opening; (a) original image; (b) structuring
element; (c) the result after erosion; (d) the result after (c) is dilated by (b)
Closing [5] between image A and structuring element B is denoted by A・B, It is
defined as follows:
A・B = (A ⊕ B)○
－B (5)
Here we dilate A by B. And then, we erode the result A by B. Because A is first dilated
by B, we can fill the small gaps in the image A. Next, the erosion we used can smooth
the image A. The difference between opening and closing is that closing can be used
to fill the narrow gaps in the image. It is shown as Fig.2-5.
Fig.2-5 The schematic diagram of closing; (a) original image; (b) structuring
element; (c) the result after dilation; (d) the result after (c) is eroded by (b)
12
2.2 Area Filling Algorithm
A hole can be considered an area of background surrounding by the pixels of
foreground. Area filling algorithm [6] uses dilation, complementary set, and
intersection to fill the hole in the image. Assume A is a set of images. It is defined as
follows:
𝑋𝑘 = (𝑋𝑘−1 ⊕ 𝐵) ∩ 𝐴𝑐 𝑘 = 1, 2, 3, ⋯ (6)
Where X0 means the pixels P in the boundary, and B means the structuring element.
If 𝑋𝑘 = 𝑋𝑘−1 , the algorithm stop doing iteration. Finally, we intersect 𝑋𝑘 and A to
fill the hole in the image.
Fig.2-6 The schematic diagram of Area filling algorithm; (a) original image; (b)
structuring element; (c) 𝑋0=P; (d) 𝑋1; (e) intersection of 𝑋3 and A
13
2.3 Connected Component Labeling
Connected component labeling [7] distinguishes the regions in the binary image.
The connected regions have the same labels. The different regions are given different
labels. Connected component labeling is divided into 4-connected labeling and
8-connected labeling. 4-connected labeling means that if those pixels around the pixel
in four directions, including up, down, right, and left have the same intensity as the
pixel, we give them the same labels. And the directions in 8-connected labeling are
more than connected diagonal directions in 4-connected labeling. In this these, we use
8-connected labeling. The algorithm is divided into two steps as follows.
Step 1
Scan the image from left to right, and top to down. The pixel P means the pixel
currently being scanned. If P equals 0, and we scan the next point. If P equals 1, we
judge as follows:
 If the pixels around P, including q, r, s, t are 0, we give P a new label.
 If only one the pixel around P, including q, r, s, t is 1, we give P the same label as
the pixel.
 If there are over two pixels around P, including q, r, s, t are 1, no matter the labels
of these pixels are the same or not, we give P one label of these pixels. And other
labels of these pixels are given in the same equivalence class.
Fig.2-7 The relationship between P and the pixels around P
14
Step 2
Give each equivalence class the unique label. And then substitute each label of the
pixels in the image from the label of the class that the pixel belongs with. It is shown
as Fig.2-8(b) represents that the labels of some pixels are not correct in step one. After
step two, we can get the correct labels.
Fig.2-8connected component labeling in the binary image; (a) binary image (b)
the result after step 1; (c) the result after step 2;
2.4 Otsu’s Thresholding
Otsu’s thresholding method [8] specifies the threshold to transform the
grayscale image into a binary image. Here we introduce the process of automatic
threshold selection as follows:
1. Select the initial value T which is usually the median intensity of the image.
2. Separate all the pixel in the image into two classes by T, 𝐺1 and 𝐺2 . 𝐺1 is the
set of the pixels whose intensities are less than T. 𝐺2 is the set of the pixels
whose intensities are greater than T.
3. Compute the mean of 𝐺1 and 𝐺2 .
4. Compute the new threshold T.
5. Repeat step 2 to 4 until the difference between T is less than 0.
15
Otsu’s thresholding is a method based on histogram. The probability of a pixel with
gray-level 𝑟𝑞 in can be computed as follows.

𝑛𝑞 (7)
𝑝𝑟 (𝑟𝑞 ) = q = 0, 1, 2, … , L − 1
𝑛
Where 𝑛𝑞 denotes the number of pixels with gray-level 𝑟𝑞 and n is the total
number of pixels in a gray-level image . Assume that the threshold k has been
chosen. 𝐺1 is a set of pixels in [0, 1, … , 𝑘 − 1]. And 𝐺2 is a set of pixels in
[𝑘, 𝑘 + 1, … , 𝐿 − 1]. Otsu’s thresholding chooses the between-class variance𝜎𝐵2 to be
the maximum threshold k. The variance is defined as follows.

𝜎𝐵2 = 𝜔1 (𝜇1 − 𝜇𝑟 )2 + 𝜔2 (𝜇2 − 𝜇𝑟 )2 (8)
Where,
𝑘−1 (9)
𝜔1 = ∑ 𝑝𝑞 (𝑟𝑞 )
𝑞 0
−1 (10)
𝜔2 = ∑ 𝑝𝑞 (𝑟𝑞 )
𝑞 𝑘
𝑘−1 (11)
𝜇1 = ∑ 𝑝𝑞 (𝑟𝑞 ) 𝜔1
𝑞 0
−1 (12)
𝜇2 = ∑ 𝑝𝑞 (𝑟𝑞 ) 𝜔2
𝑞 𝑘
−1 (13)
𝜇𝑟 = ∑ 𝑝𝑞 (𝑟𝑞 )
𝑞 0
16
2.5 Linear Regression
Linear regression [9] is known as a linear combination of one or more of the
regression coefficients of the model parameters. The process is that we model the
relationship between a scalar dependent variable and one or more explanatory
variables. A linear regression equation having a variable represents a straight line.
Most commonly, linear regression refers to a model which is constructed by a dataset
which x and y is given. After developing such a model, if an additional value of x is
then given without its accompanying value of y, the fitted model can be used to make
a prediction of the value of y.
Fig.2-9 A linear regression equation having a variable
Here we analysis linear regression as follows:
In order to find the line of linear regression, we should have a dataset which x and y is
given:
𝑥1 , 𝑥2 , … , 𝑥𝑚 , 𝑦1 , 𝑦2 , … , 𝑦𝑚
17
And then, assume a straight line containing the unknowns α and β:
f(𝑥) = αx + β (14)
Define the error value 𝜀𝑖 and the squared error 𝐸2 (𝛼, 𝛽) of the model:
𝜀𝑖 = 𝑦𝑖 − 𝑓(𝑥𝑖 ) (15)
𝑚 (16)
𝐸2 (𝛼, 𝛽) = ∑ 𝜀𝑖 2
𝑖 1
If we can find α and β to let the 𝐸2 be the minimum value, this line is the linear
regression line. The goal of least square method is to find the line which lets 𝐸2 be
the minimum value. It is defined as follows:
1. Assume both partial derivatives of 𝐸2 equal 0. We can get two equations as
follows:
𝑚 (17)
𝛿 1
𝐸(𝛼, 𝛽) = −2 ∑(𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 ) = 0
𝛿α 𝑚
𝑖 1
𝑚
𝛿 1
𝐸(𝛼, 𝛽) = −2 ∑(𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )𝑥𝑖 = 0
{𝛿β 𝑚
𝑖 1
2. We can get α and β by the equations above. And we can also get α and β by
ordinary least squares estimator. It is defined as follows:

∑𝑚
𝑖 1(𝑥𝑖 − 𝑥̅𝑚 )(𝑦𝑖 − 𝑦̅𝑚 ) (18)
𝛽̂𝑚 =
∑𝑚
𝑖 1(𝑥𝑖 − 𝑥̅𝑚 )
2
𝛼̂𝑚 = 𝑦̅𝑚 − 𝑏̂𝑚 𝑥̅𝑚 (19)
18
Chapter 3 ATTMU System
The ATTMU system consists of four approaches: table detection, net detection, ball
detection, and regular system, and the flow chart is shown as Fig.3-1. The table
detection, net detection, and ball detection approaches are to extract the table, net, and
ball from each frame of the table tennis competition video. The regular system
approach is to judge the actions according to the competition rules of A11 based on
the special relationship of ball and table as well as that of ball and net. The ATTMU
system first transforms each frame of a color table tennis competition video into a
gray-level frame. Let 𝑅𝐺𝐵 be one frame of a color table tennis competition video,
and 𝑅𝐺𝐵 (𝑖, 𝑗) be the pixel located at the coordinates on 𝑅𝐺𝐵 . The corresponding
gray-level frame 𝑅𝐺𝐵 can be obtained by the equation (20).
0 (𝑖, 𝑗) = 0.299 × 𝑅 (𝑖, 𝑗) + 0.587 × 𝐺 (𝑖, 𝑗) + 0.114 × 𝐵 (𝑖, 𝑗) (20)
where 𝑅 (𝑖, 𝑗), 𝐺 (𝑖, 𝑗), and 𝐵 (𝑖, 𝑗) are the R, G, and B color components of
𝑅𝐺𝐵 (𝑖, 𝑗) [10]. The system contains the rules of the table tennis game. We combine
these three features, including the table region, the equation of the net, the centroid
coordinates of the ball and this system to determine which athlete scores.
A. Table detection : The goal is to cut out of the table region in the film. So we can
determine the placement of the ball is located in the table region or not. The balls
located in the table region are considered as the placements of the balls. And we
can import the placement in the regular system.
B. Net detection : The goal is to find out the equation of the net. We can import the
placements of the balls in the equation of the net, and determine the ball is
located on the right side or the left side of the net. In order to determine the
service is effective or not.

19
C. Ball detection : Detect the location of the ball in the film. Then we use this and
the table region to determine the placement of the ball.
D. Regulation system : The system contains the rules of the table tennis game. We
combine these three features, including the table region, the equation of the net,
the centroid coordinates of the ball and this system to determine which athlete
scores.
Fig.3-1 The flow chart of table detection;
3.1 Table detection
Table detection is to accurately cut off the tennis table from the frame of the table
tennis competition video. The step can be divided into two parts as follows:
20
Fig.3-2 The flow chart of table detection
In a table tennis game, since the camcorder is generally stationary, the ATTMU
system extracts the table only from the first frame 𝑓 of the gray-level table tennis
competition video. Since white lines 2 cm wide are painted along each edge of a
tennis table, and a center line 3 mm wide parallel to the side lines divides the playing
surface in half, the ATTMU system then uses top-hat filter technique to enhance the
brighter edge of the table.
Top-hat filter employs the ranked value from two different size regions, the
brightest value in a circular interior region is compared with the brightest value in a
surrounding annular region. If the brightness difference is greater than a threshold
level, it is kept or it is erased. Let 𝐸 be a grayscale structuring element. The top-hat
transform of 𝑓 is given by the equation (21).
= 𝑓 − 𝑓 。𝐸 (21)
where “。” denotes the opening operation. The white top-hat transform returns an
image, containing those “objects” of an input image that are smaller than the
structuring element and brighter than their surroundings.
Fig. 3-4(a) is a frame of a color table tennis competition video. Fig. 3-4(b) is the
21
image obtained by executing top-hat filtering operation on the gray-level image of Fig.
3-4(a). From Fig. 3-4(b) one can obviously observe that the top-hat filtering operation
can effectively highline the boundary of upper tennis table. In , the pixels on the
boundary of upper tennis table are generally brighter. Otsu thresholding method is
hence used to decide the suitable threshold 𝑜 for generate a binary image 𝑏 by
the equation (22).
0, 𝑖𝑓 (𝑖, 𝑗) < 𝑜, (22)

𝑏 (𝑖, 𝑗) ={
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,
where 0-bit represents the black pixel and 1-bit represents the white pixel. Fig. 3-4(c)
is the binary image obtained from the image in Fig. 3-4(b).
Next, closing operation is adopted to connect the disconnected object boundary
on 𝑏. In the closing operation, the ATTMU system executes dilation operation and
then erosion operation based on the structuring element shown Fig. 3-3. Then the
biggest region in 𝑏 is considered to the extracted upper surface table. Fig. 3-4(d)
depicts the extracted upper surface table on the image in Fig. 3-4(a).
Fig.3-3 the structure element : disc;
22
(a) original image (b) the result after tophat
(c)the binary image after the Otsu’s (d) the result after the closing on time and
thresholding maintaining the biggest region
Fig.3-4 the process of the table detection
23
3.2 Net detection
Fig.3-5 The flow chart of net detection
Net posts are set apart for a doubles court. The net detection approach is to detect
the net location and use an equation to describe the location. To extract the net, the
ATTMU system first cuts off a small region (a rectangle) 𝑅𝑛 which contains the net,
and then derives the location of the net from the small region. Let 𝑏 consist of
M×N pixels. The central pixel of 𝑏 is located at the coordinates (𝑖𝑐 , 𝑗𝑐 ) by the
equation (23).
M+1 N+1 (23)
(𝑖𝑐 , 𝑗𝑐 ) = (⌊ ⌋,⌊ ⌋)
2 2
Since the net is located at the middle of the table, the ATTMU system thinks that
M+1 N+1
the left most top corner of 𝑅𝑛 is located at (𝑖𝑐 − ⌊ ⌋ , 𝑗𝑐 − ⌊ ⌋) and the right
10 10
M+1 N+1
most bottom corner is located at (𝑖𝑐 + ⌊ ⌋ , 𝑗𝑐 + ⌊ ⌋) . The 𝑅𝑛 cut off from the
10 10
image 𝑏 is shown in Fig. 3-6(a).
After that, Otsu’s thresholding method is also applied to convert 𝑅𝑛 into a
binary image 𝑅𝑏 . Fig. 3-6(b) delineates the 𝑅𝑏 obtained from the 𝑅𝑛 in Fig. 3-6(a).
24
Besides, the thinning operation is used to trim all the lines on 𝑅𝑏 to single pixel
thickness and the result is shown in Fig. 3-6(c). Then, the 1-bit pixels in 𝑅𝑏 are input
to derive the linear equation of the net by linear regression method.
(a) (b) (c)
Fig.3-6 The process of the net detection
3.3 Ball detection
Fig.3-7 The flow chart of net detection
The goal of this step is to detect the location of the ball in the film. It can be divided
into two parts, including Part A, and Part B. We detect the entire image until we find
the coordinates of the ball in Part A. After that, we only detect the area expanding
from the coordinates of the ball by a range in Part B. The moving range between the
25
ball in this frame and the ball in the next frame isn’t too wide. So, if we detect around
the coordinates of the ball by a narrow range, we can not only reduce the execution
time but also improve the detection accuracy. If the ball is not detected in Part B, we
step back to Part A. Else, we repeat Part B.
3.3.1 Part A-Candidate point detection
Fig.3-8The flow chart of net detection in Part A
The method is to find the objects similar to ball and their centroid coordinates. Next,
they can provide these features to detect the ball by a narrow range in Part B.
 Step 1,Remove the background, and remain moving objects:

The balls in the film are the moving objects, so we only have to remain these
objects to do the subsequent image processing. First we consider grayscale image 𝐹
as foreground, and the first grayscale image 0 as background. Because the moving
26
objects are mainly from the foreground image, we have to highlight the high
luminance portion in the foreground. In this case, we remain the area with the stronger
contrast in the image 𝐹. Here we get the image 𝑁𝐹 by using Otsu’s thresholding
method to create a binary image of 𝐹 .The result is shown at Fig.3-9(a).
After that, we subtract the intensities of the foreground from the intensities of the
background. So we can get the initial background removing image 𝐹𝐵 . Similarly, we
get the image 𝑁𝐵 by using Otsu’s thresholding method to create a binary image of
𝐹𝐵 . The result is shown at Fig.3-9(b). So we can intersect 𝑁𝐹 and 𝑁𝐵 to get the
final background removing image . The purpose is to enhance the contrast of the
initial background removing image. The result is shown at Fig.3-9(c). So we can
remain the moving objects initially.
 Step 2,Remain the objects similar to ball, and identify the

candidate point coordinates:
The objects similar to the ball are considered by area and shape. It means we only
remain the area and the shape of the object which is most similar to the ball. First, we
consider by the area of the object. The area of the ball and the table will show a
1 1
certain percentage [3], and the percentage range is around ~ 4500. So, we use this
4000
range to be the threshold to remove the regions which are too large or too small. It’s
defined by equation (24).
1 1 (24)
= {0, 𝑖𝑓 <
4500
× 𝐴𝑖 , 𝑜𝑟 >
4000
× 𝐴𝑖
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Then, we consider by the shape of the object. We use a value d to determine the
regions are nearly circular or not in the image . The value d is defined by equation
(25).
27
area (25)
d = 4π
perimeter 2
0, 𝑖𝑓 𝑑 > 2 (26)
={
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
If the d is closer to 1, it means the region is nearly circular. Conversely, if the d is
far from 1, it means the region is absolutely not circular. Thus, we remove the region
whose d is greater than 2. The remaining regions are the candidate objects similar to
the ball. The result is shown at Fig. 3-9(c). And we use connected component labeling
to label these regions. Where 𝐶𝑧 (𝑖, 𝑗) is the zth centroid candidate of the labeling
objects similar to the ball.
 Step 3,Detection range expansion:

The objects similar to the ball are usually not only one. So, we have to detect each
object similar to the ball, and find the most similar one. We don’t need to detect the
entire image during detecting the objects similar to the ball. Thus, we can expand
from the center of each object similar to the ball by a range. It can not only reduce the
execution time but also improve the detection accuracy. Here, we find the 𝐶𝑧 (𝑖, 𝑗)
from the grayscale image 0. Next, we expand from 𝐶𝑧 (𝑖, 𝑗) by the range of
𝑚2 × 𝑛2 . It’s defined by equation (27).
1 (27)
𝑚2 = 𝑀,
{ 10
1
𝑛2 = 𝑁.
10
The size of the detection range is only needed to be slightly larger the size of the
ball. The result is shown at Fig.3-9(e). Then, we use the Otsu’s thresholding method
to create binary images of the range areas. The result is shown at Fig.3-9(f). Similar to
Step 2, we also consider the objects similar to the ball by area and shape. We remove
28
the regions which are too large or too small by equation (24) in these binary images of
the range areas. And these images are regarded as 𝐵𝑧 . Compute the value d of the
objects in the 𝐵𝑧 by equation (25) and find the image whose d is the closest to
1.The purpose is to find out the object which is the most similar to the ball. Finally,
we define the object as the ball and find the centroid coordinate 𝐹𝐶 (𝑖, 𝑗).
(e) (f)
Fig.3-9 The process of the ball detection in Part A
29
3.3.2 Part B-Local area detection
Fig.3-10 The flow chart of net detection in Part B
The moving range between the balls in two consecutive images isn’t too wide.
If we continue to use Part A to detect the whole image, it can be time consuming. So,
Part B has been proposed to improve Part A. The method is to narrow the detection
range around the ball location of the previous image. If the ball is detected in the
following image, we can continue to use the procedure to detect. It can not only
reduce the execution time but also improve the detection accuracy.
 Step 1,Detection range expansion:

First, we find the 𝐹𝐶 (𝑖, 𝑗) from the next grayscale image 1 . And expand from
𝐹𝐶 (𝑖, 𝑗) by the range of 𝑚3 × 𝑛3 by equation (28). Because the moving range

30
between the balls in two consecutive images isn’t too wide, we only have to detect in
this range.
1 (28)
𝑚3 = 𝑀,
{ 3
1
𝑛3 = 𝑁.
3
 Step 2,Remove the background, and remain moving objects:

This procedure is similar to the first step in Part A. The purpose is to enhance the
contrast of the moving objects, so we can cut out of the ball. And we do the same two
actions in Part A. The first action is that we use Otsu’s thresholding method to create a
binary image of the range area. The result is shown at Fig.3-11(a). In order to remove
the background, the second step is to subtract the intensities of the ball in the range
area from the intensities of the ball of the first frame. Also, we use Otsu’s thresholding
method to create a binary image of the background moving image. Finally, we
intersect both of the binary images to get the moving objects remaining image 𝑀𝐵 .
 Step 3,Remain the objects similar to ball, and identify the

centroid coordinates:
This procedure is similar to the second step in Part A. The purpose is to find the
object which is most similar to the ball. The object similar to the ball can be
considered by area and shape. Considered by area, we remove the regions which are
too large or too small in the image 𝑀𝐵 by equation (24). And considered by shape,
we compute the value d by equation (25) of the objects of the 𝑀𝐵 and remain the
objects whose d is the closest. The result is shown at Fig.3-11(b). Finally, we define
the remaining object as the ball, so we can get the centroid coordinate 𝐹𝐶 (𝑖, 𝑗).
31
(a) (b)
Fig.3-11 The process of the ball detection in Part B
32
3.3.3 Placement detection
The method is to find the placement of the ball by the centroid coordinate 𝐹𝐶 (𝑖, 𝑗)
in ball detection.
Fig.3-12The feature of the placement (a) the k-1th centroid coordinate (a) the kth
centroid coordinate (a) the k+1th centroid coordinate
Fig.3-12 shows that the y value of the placement will increase first and decline,
such as (b). Here we use the following steps to determine whether the nth centroid
coordinate is the placement or not.
 Step 1, determine whether 𝑰𝑭𝑪 (𝒊, 𝒋) is located in the table region

𝑨𝒊 .
First, we take the intersection of the candidate point 𝐹𝐶 (𝑖, 𝑗) and the table region
𝐴𝑖 . Where 𝐴𝑖 is the set of the coordinates in the table region .If the intersection is
absent, it indicates that 𝐹𝐶 (𝑖, 𝑗) is not located in the table, and it isn’t taken into
account. Then, the remaining centroid coordinates detected in ball detection will be
33
stored in the array CSet[ ], which means the set of total centroid coordinates of the
ball. It is expressed by the equation (29),
𝑛1 𝑛2 𝑛𝑘
CSet[ 𝐹𝐶1 (𝑥𝑐1 , 𝑦𝑐1 ), 𝐹𝐶2 (𝑥𝑐2 , 𝑦𝑐2 ), … , 𝐹𝐶𝑘 (𝑥𝑐𝑘 , 𝑦𝑐𝑘 )], 𝑖𝑓 𝐹𝐶 (𝑖, 𝑗) ∈ 𝐴𝑖 (29)
Where 𝑥𝑘 , 𝑦𝑘 mean the x-coordinates and the y-coordinates of the kth centroid
coordinate and 𝑛𝑘 means the frame number of 𝐹𝐶 (𝑖, 𝑗). And the frame number will
be used in regular system.
 Step 2, determine whether 𝑰𝑭𝑪𝒌 (𝒙𝒌 , 𝒚𝒌 ) is the placement.
Then, we store the remaining centroid coordinates of the ball which meet the
following condition in the array PSet[ ], which means the set of total centroid
coordinates of the placement. It is expressed by equation (30),
𝑃𝑆𝑒𝑡[ 𝑃1 𝑛1 (𝑥𝑃1 , 𝑦𝑃1 ), 𝑃2 𝑛2 (𝑥𝑃2 , 𝑦𝑃2 ), … , 𝑃𝑚 𝑛𝑘 (𝑥𝑃𝑚 , 𝑦𝑃𝑚 )] (30)

𝑛𝑘 𝑛𝑘
𝑃𝑚 (𝑥𝑃𝑚 , 𝑦𝑃𝑚 ) = 𝐹𝐶𝑘 (𝑥𝑐𝑘 , 𝑦𝑐𝑘 ), 𝑖𝑓 𝑦𝑐𝑘 − 𝑦𝑐𝑘−1 > 0 𝑎𝑛𝑑 𝑦𝑐𝑘+1 − 𝑦𝑐𝑘 < 0
Where 𝑥𝑚 , 𝑦𝑚 mean the x-coordinates and the y-coordinates of the mth

𝑛𝑘
placement. If the conditions are satisfied, and we can define 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) as the
placement.
34
3.4 Regular system
Fig.3-13 The flow chart of regular system
Shown as Fig.3-13, regular system combines three features obtained from the
above method, including table region 𝐴𝑖 , the equation of the net 𝐸𝑞 , the centroid
𝑛𝑘
coordinates of the placements 𝑃𝑚 (𝑥𝑃𝑚 , 𝑦𝑃𝑚 ) with two judge methods, including
effective service judgment and effective service return judgment to determine the
scores and the winner. Finally, we reset the subsequent of the centroid coordinates of
the ball to be the serving point. Then, we repeat the above action until the end of the
game.
3.4.1 Effective service judgment
The definitions of the effective service is that the ball hit by the server must hit on
the server’s desktop, and then the ball can go over or around the net before it hits the
table on his opponent’s desktop. Thus, the features needed by the method are the first
35
𝑛1
centroid coordinates of the balls 𝐹𝐶1 (𝑥𝑐1 , 𝑦𝑐1 ), the first and the second centroid
𝑛1 𝑛2
coordinates of the placements 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ), 𝑃2 (𝑥𝑃2 , 𝑦𝑃2 ), and the equation of
the net 𝐸𝑞 : ax + by + c. And we determine effective service by these three features.
First, we examine the serving side. The first centroid coordinates of the balls
𝑛1
𝐹𝐶1 (𝑥𝑐1 , 𝑦𝑐1 ) is considered as the serving point 𝑆𝐶 (𝑖, 𝑗). And we do the following
action to judge effective service.
Fig.3-14 The schematic diagram of effective service; (a) serve from the right side;
(b) the first placement hit on the right side of the desktop; (c) the second placement hit
on the left side of the desktop
𝑛1
Fig.3-14 shows that if the first placement 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ) is located on the same
36
𝑛2
side with the serving point 𝑆𝐶 (𝑖, 𝑗), and then the second placement 𝑃2 (𝑥𝑃2 , 𝑦𝑃2 ) is
𝑛1
located on the different side with the first placement 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ), we can regard
this situation as effective service. After that, if the value gained by using 𝑆𝐶 (𝑖, 𝑗) as
variable into the equation of the net 𝐸𝑞 and the value gained by using the first
𝑛1
placement 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ) as variable into the equation of the net 𝐸𝑞 are the same
sign,it means the server hits his own desktop successfully by equation (31). And vice
versa, it means invalid service, so that the receiver scores.

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 , 𝑖𝑓(𝑎𝑥𝐶1 + 𝑏𝑦𝐶1 + 𝑐) × (𝑎𝑥𝑃1 + 𝑏𝑦𝑃1 + 𝑐) > 0 (31)
{
𝑖𝑛𝑣𝑎𝑙𝑖𝑑 𝑠𝑒𝑟𝑣𝑖𝑐𝑒, 𝑖𝑓(𝑎𝑥𝐶1 + 𝑏𝑦𝐶1 + 𝑐) × (𝑎𝑥𝑃1 + 𝑦𝑃1 + 𝑐) < 0
If these results meet the definition of effective service, we do the following action
to judge effective service. Similar as above, if the value gained by using the second
𝑛1
placement 𝑃2 (𝑥𝑃2 , 𝑦𝑃2 ) as variable into the equation of the net 𝐸𝑞 and the value
𝑛1
gained by using the first placement 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ) as variable into the equation of
the net 𝐸𝑞 are the different sign, it means the server hits his own desktop and then hit
the opponent’s desktop successfully by equation (32). Then, we ensure the situation
fulfill the definition of effective service. So we can execute the subsequent method of
effective service return judgment. And vice versa, it means invalid service, so that the
receiver scores.
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 , 𝑖𝑓(𝑎𝑥𝑃1 + 𝑏𝑦𝑃1 + 𝑐) × (𝑎𝑥𝑃2 + 𝑏𝑦𝑃2 + 𝑐) < 0 (32)

{ , 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒
𝑖𝑛𝑣𝑎𝑙𝑖𝑑 𝑠𝑒𝑟𝑣𝑖𝑐𝑒, 𝑖𝑓(𝑎𝑥𝑃1 + 𝑏𝑦𝑃1 + 𝑐) × (𝑎𝑥𝑃2 + 𝑏𝑦𝑃2 + 𝑐) > 0
37
3.4.2 Effective service return judgment
If we ensure this service is effective service after the steps above, the subsequent
placement will be imported into the method of effective service return judgment. The
definition of effective service return is that the player serves or fights back, the other
player must hit the ball over the net and let the ball hit the other player’s desktop. The
𝑛𝑘
features needed by the method are the subsequent placements 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ), which
means the mth centroid coordinate of the placement and the equation of the net 𝐸𝑞 .
Fig.3-15 The schematic diagram of effective and invalid service return; from
(a)to(b) is effective service return; From (c) to (d) is invalid service return;
𝑛𝑘
Fig.3-15shows that if this placement 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) is located on the different side
𝑛𝑘−1
with the previous placement 𝑃𝑚−1 (𝑥𝑚−1 , 𝑦𝑚−1 ), we can regard this situation as
38
𝑛𝑘
effective service return. Conversely, if 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) is located on the same side
𝑛𝑘−1
with 𝑃𝑚−1 (𝑥𝑚−1 , 𝑦𝑚−1 ), it means invalid service return. The previous method,
effective service judgment, use the first and the second centroid coordinates of the
𝑛1 𝑛2
placements 𝑃1 (𝑥𝑃1 , 𝑦𝑃1 ), 𝑃2 (𝑥𝑃2 , 𝑦𝑃2 ). If the server serves successfully, we use
the subsequent placements as variables into the equation of the net 𝐸𝑞 . Then, we let
two of the continuous value to do the multiplication so we can get the new value, and
this new value has to be a negative sign, so we can ensure that the ball hits the other
player’s desktop. And we regard it as effective service return.
This value will be updated when the subsequent placements are used as variables
into the equation of the net 𝐸𝑞 continuously. Until this value is updated as a positive
𝑛𝑘
sign, it indicates that this placement 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) is located on the same side with
𝑛𝑘
the previous placement 𝑃𝑚−1 (𝑥𝑚 , 𝑦𝑚 ). This situation means invalid service return,
so that the opponent scores. It is expressed by equation (33).

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑟𝑒𝑡𝑢𝑟𝑛 , 𝑖𝑓(𝑎𝑥𝑃𝑚−1 + 𝑏𝑦𝑃𝑚−1 + 𝑐) × (𝑎𝑥𝑃𝑚 + 𝑏𝑦𝑃𝑚 + 𝑐) < 0 (33)
{
𝑖𝑛𝑣𝑎𝑙𝑖𝑑 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑟𝑒𝑡𝑢𝑟𝑛, 𝑖𝑓 (𝑎𝑥𝑃𝑚−1 + 𝑏𝑦𝑃𝑚−1 + 𝑐) × (𝑎𝑥𝑃𝑚 + 𝑏𝑦𝑃𝑚 + 𝑐) > 0
The Case 1 and Case 2 in Fig.3-13 show the two situations after determined by
effective service return. It is defined as follows.

𝑛𝑘
 Case 1 : When the value gained by using 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) as variable into the
equation of the net 𝐸𝑞 is updated as a positive sign and the value gained by
using 𝑆𝐶 (𝑖, 𝑗) as variable into the equation of the net 𝐸𝑞 is greater than0,it
represents that server make an invalid service return. And the receiver scores.
𝑛𝑘
 Case 2 : When the value gained by using 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) as variable into the
equation of the net 𝐸𝑞 is updated as a positive sign and the value gained by
using 𝑆𝐶 (𝑖, 𝑗) as variable into the equation of the net 𝐸𝑞 is less than 0, it
represents that receiver make an invalid service return. And the server scores.
39
3.4.3 Reset serving point
If any athlete scores, we regard the next placement, which hasn’t been used as a
variable into the equation of the net, as the first placement. For example, we use the
𝑛𝑘
placement 𝑃𝑚 (𝑥𝑚 , 𝑦𝑚 ) as a variable into the equation 𝐸𝑞 of the net when the
𝑛𝑘+1
athlete scores. Then we set 𝑃𝑚+1 (𝑥𝑚+1 , 𝑦𝑚+1 ) as the first placement. Also, the
𝑛𝑘
centroid coordinates of the ball 𝐹𝐶𝑘 (𝑥𝑐𝑘 , 𝑦𝑐𝑘 ) is regarded as the serving point
𝑆𝐶 (𝑖, 𝑗). Finally, the reset serving point action has been completed.
40
Chapter 4 Results and Discussion
4.1 Experimental evaluation methods
In the experiment, the results of each of these methods, including the edge of the
table, the equations of the nets, the centroid coordinates of the balls, the centroid
coordinate of the placement will be marked at the films of the table tennis game, and
we will assess the accuracy by following methods. We will compare the edge of the
table which is the result in table detection with the experts’ hand-painted desktop. And
compute the true positive (TP), true negative (TN), false positive (FP), false negative
(FN) of the result in table detection. Also the experts make a judgment about the
results of net detection. If the positions of the net detection results are located within a
reasonable margin of error, the experts denote it as True. Otherwise, it’s defined as
False. Then, we compute the accuracy of the centroid coordinates of the balls and
placements by equation (34).

𝑛𝑅 (34)
accuracy =
𝑛𝐺
Where 𝑛𝑅 is the number of the balls and placements in ball detection which the
experts denote it as effective, and 𝑛𝐺 is the number of the balls and placements the
experts mark in the total films. Finally, we compare the results in regular system and
the result which is judged by referee on the spot to see whether the system can make a
judgment correctly.
There are five assessment methods of the table detection, including
Misclassification error (ME), relative foreground area error (RFAE), accuracy,
sensitivity, specificity. There are defined as follows [11].

41
𝐹𝑁 + 𝐹𝑃 𝑃+ 𝑁 (35)
1. ME = = 1−
𝑃 + 𝐹𝑁 + 𝑁 + 𝐹𝑃 𝑃 + 𝐹𝑁 + 𝑁 + 𝐹𝑃
TP (true positive): the correct number of samples in target sample classification
FP (false positive): the error number of samples in non-target sample classification
TN (true negative): the correct number of samples in non-target sample classification
FN (false negative): the error number of samples in target sample classification
The smaller ME value means the error number of samples is fewer, which has the
higher accuracy of classification.
𝐹𝑁 − 𝐹𝑃 (36)
2. , 𝑖𝑓 (𝐹𝑃 + 𝑃) < ( 𝑃 + 𝐹𝑁)
RFAE = { 𝑃 + 𝐹𝑁
𝐹𝑃 − 𝐹𝑁
, 𝑖𝑓 (𝐹𝑃 + 𝑃) ≥ ( 𝑃 + 𝐹𝑁)
𝐹𝑃 + 𝑃
The smaller RFAE value means it has better results of segmentation.
𝑃+ 𝑁 (37)
3. Accuracy =
𝑃 + 𝑁 + 𝐹𝑁 + 𝐹𝑃
Where the denominator is defined as the number of total pixels, and the molecular is
defined as the number after comparing with the result classified by experts. Thus, the
larger value means the higher accuracy of classification.
𝑃 (38)
4. Sensitivity =
𝑃 + 𝐹𝑁
Sensitivity means the ratio of the correct number of samples in target sample
classification. So, the larger value means it has better results of segmentation.
42
𝑁 (39)
5. Specificity =
𝐹𝑃 + 𝑁
Specificity means the ratio of the correct number of samples in non-target sample
classification. So, the larger value means it has better results of segmentation.
4.2 The experiment results of Table Detection
Fig.4-1 The experiment results of table detection
Table 4-1 The assessment results of table detection
Table 4-1 shows that the table detection has a very excellent performance in
43
these five assessment methods. And the accuracy is almost entirely in line with the
correct desktop painted by the experts.
4.3 The experiment results of Net Detection
Fig.4-2 The experiment results of net detection
Table 4-2 The assessment results of net detection
44
As shown in Table 4-2 after visually assessed by experts, the results of net
detection are within a reasonable margin of error. And they are all marked as True.
4.4 The experiment results of Ball Detection
Here the results of ball detection, such as the centroid coordinates and the
placement are marked in the table tennis match films. The centroid coordinates are
marked by the green box, and the placements are marked by the red box. Then, the
experts compute the number of the centroid coordinates and placements which are
within a reasonable margin of error, and also compute the number of the balls and
placements the experts mark in the films. So we can calculate the accuracy by
equation (37). The results are shown at Fig.4-3, Fig.4-4, Fig.4-5, Fig.4-6, Fig.4-7,
Fig.4-8, Fig.4-9, and Fig.4-10.
45
Fig.4-3 Movie(a)-white, the experiment results of table detection
46
Fig.4-4 Movie(a)-yellow, the experiment results of table detection
47
Fig.4-5 Movie(b)-white, the experiment results of table detection
48
Fig.4-6 Movie(b)-yellow, the experiment results of table detection
49
Fig.4-7 Movie(c)-white, the experiment results of table detection
50
Fig.4-8 Movie(c)-yellow, the experiment results of table detection
51
Fig.4-9 Movie(d)-white, the experiment results of table detection
52
Fig.4-10 Movie(d)-yellow, the experiment results of table detection
53
Table 4-3 The comparing results of the ball detection
Movie Num. TP+TN TP+TN+FN+FP Accuracy
Movie(a)-white ball 852 783 91.90
Movie(a)-yellow ball 963 892 92.62
Movie(b)-white ball 743 620 83.44
Movie(b)-yellow ball 841 759 90.24
Movie(c)-white ball 925 868 93.83
Movie(c)-yellow ball 864 823 95.25
Movie(d)-white ball 792 721 91.03
Movie(d)-yellow ball 833 793 95.20
Table 4-4 The comparing results of the placement detection

Movie Num. TP+TN TP+TN+FN+FP Accuracy
Movie(a)-white ball 38 35 92.10
Movie(a)-yellow ball 42 39 92.85
Movie(b)-white ball 29 25 86.20
Movie(b)-yellow ball 37 37 100.00
Movie(c)-white ball 48 46 95.83
Movie(c)-yellow ball 43 40 93.02
Movie(d)-white ball 35 33 94.28
Movie(d)-yellow ball 41 39 95.12
As shown in Table 4-3 and Table 4-4, the accuracy of table detection and placement
detection are more than 90% in eight test movies. However, the accuracy of Movie(b)
using white ball is relatively low with other table tennis match films. This is because
that the white wall occupies a great part of the background in the film. When the
position of the white ball is located in the white wall, it is difficult to detect the
centroid coordinates of the ball. The lower accurate number of the centroid
coordinates causes fewer samples can be used for placement detection. So, it leads the
lower accuracy of the placements. Overall, there are still a very high accuracy rate for
most of the test film in ball detection and placement detection.

54
4.5 The experiment result of Regular System
Table 4-5 The final results of the referee system

Movie Num. Actual score Proposed method
Movie(a)-white ball 7:11 7:11
Movie(a)-yellow ball 8:11 8:11
Movie(b)-white ball 11:5 10:5
Movie(b)-yellow ball 9:11 9:11
Movie(c)-white ball 6:11 6:11
Movie(c)-yellow ball 11:13 11:13
Movie(d)-white ball 11:8 11:8
Movie(d)-yellow ball 11:9 11:9
Here we compare the results. We can see that Movie(b) using white ball is affected
by the lower accuracy of placement. Thus, the result of regular system is different
from the result which is judged by referee. Conversely, due to the high accuracy of the
placements, there are more accurate experiment results in other test table tennis match
films.
55
4.6 The efficacy of ATTMU
Table 4-6 The cost times of ATTMU

Movie Num. Total cost times (s) Cost time per frame (s)
Movie(a)-white ball 583 0.3993
Movie(a)-yellow ball 712 0.4102
Movie(b)-white ball 636 0.4030
Movie(b)-yellow ball 843 0.3881
Movie(c)-white ball 667 0.4238
Movie(c)-yellow ball 962 0.3957
Movie(d)-white ball 923 0.4112
Movie(d)-yellow ball 879 0.4581
Finally, we assess the efficacy of ATTMU. Table 4-6 shows that ATTMU performs
well and spend a little to detect, this is because we narrow the detection range in ball
detection. So, we can avoid the situation of time consuming. Integrate the results in
Table 4-5 and Table 4-6, and we can confirm that ATTMU has the advantage of high
accuracy and saving time.
56
Chapter 5 Conclusions and Future Prospects
5.1 Conclusions
The thesis proposes an automatic table tennis match umpiring system (ATTMU)
for a variety of environments with high accuracy. The system is divided into four
stages step, including table detection, net detection, ball detection, regular system.
The purpose of table detection is to cut out the table area. Net detection is to detect the
equation of the net by the table area. Ball detection is to detect the centroid
coordinates of the ball, and then use the centroid coordinates and the table area to
identify the placements. And regular system is to use the placement from the previous
step and comply them with the rules of A11. So, we can get the final scores.
In the experimental results, we use eight films composed by four different
environments with two major specified table tennis balls in the international
competitions, including the white ball and the yellow ball. Then, we compare the
results, such as the edge of table, the equations of the nets, the centroid coordinates of
the balls, the centroid coordinates of the placements with experts’ judgment. Then we
assess the performance of the ATTMU. The experiment results show that regardless of
specificity, ME, RFAE, accuracy in table detection, and the accuracy in ball detection,
placement detection and regular system, ATTMU has outstanding performance.
57
5.2 Future Prospects
The experiments results of most of the test films have high accuracy. But if the
contrast between the background and the color of the ball is not big enough, the
accuracy of table detection will decline, such as the white wall with the white ball.
Thus, it leads the situation that the results of ATTMU will be inconsistent with the
actual scores. Also, the films we make do not include the situation about that the ball
touch the net or the desk and the curveball, and ATTMU is suitable for singles. In
future, we will make more films with more circumstance which may happen. So we
expect to improve the accuracy of ball detection by selecting the camcorder with
higher pixels or adding more detailed steps in ball detection in future research. Hope
to overcome a variety of environmental constraints and be able to achieve the goal of
umpiring the scores accurately.
58
Reference
[1] S.T. Rodrigues, J.N. Vickers & A.M Williams, “Head, eye and arm
coordination in table tennis”, Journal of Sports Sciences, vol.20, 187-200, 2002
[2] X. Zhang, “Analysis of the effect of new competition rules on table tennis
technique”, Journal of Anhui Sports Science, vol.01, 2002
[3] B. Zhang, W. Chen, W. Dou, Y. Zhang, L. Chen, “Content-based Table Tennis
Games Highlight Detection Utilizing Audiovisual Clues, ”Image and Graphics,
2007. ICIG 2007. Fourth International Conference, vol. 22-24, pp. 833 – 838,
2007
[4] W. Chen, Y. Zhang. “Tracking Ball and Players with Applications to Highlight
Ranking of Broadcasting Table Tennis Video, ”Computational Engineering in
Systems Applications, IMACS Multi-conference, pp. 1896 – 1903, 2006.
[5] R.C. Gonzalez, R.E. Woods, “Digital Image Processing”, 2nd, Prentice-Hall,
2002.
[6] C. Ballester, V. Caselles, J. Verdera, M. Bertalmio, and G. Sapiro, "A variational
model for filling-in gray level and color images", Proc. Int. Conf. Computer
Vision, pp.10 -16, 2001
[7] S. W. Yang, M. H. Sheu, H. H. Wu, H. E. Chien, P. K. Weng, and Y. Y. Wu,
“VLSI Architecture Design for a Fast Parallel Label Assignment in Binary
Image.” Circuits and Systems, IEEE International Symposium on, pp. 2393-2396,
2005.
[8] N. Otsu, “A Threshold Selection Method from Gray-level Histogram, ”IEEE
Transactions on System Man Cybernetics, vol. 9, no. 1, pp. 62-66, 1979.

59
[9] H. Tanaka , S. Uejima and K. Asai "Linear regression analysis with fuzzy
model", IEEE Trans. Systems Man Cybernet, vol. 12, pp.903 -907, 1982
[10] T. Horiuchi and S. Hirano, “Colorization Algorithm for Grayscale Image by
Propagating Seed Pixels,” in Proc. IEEE International Conference on Image
Processing (ICIP), vol.1, pp. 457-460, 2003.
[11] ITU-R Rec. BT.1210-3, "Test materials to be used in subjective assessment,"

February 2004.
60

桌球比賽裁判自動系統

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

桌球比賽裁判自動系統

Uploaded by

Copyright:

Available Formats

國立中興大學資訊科學與工程學系

指導教授: 喻石生 Shyr-Shen Yu

misjudgment. Therefore, it is necessary to establish a complete and accurate automated

likely path as a moving image.

Keyword : automated referee system of table tennis games, Hawk-Eye, trajectory

detection, linear regression, morphological

Table of Contents ......................................................................... iii

List of Tables ................................................................................. v

List of Figures .............................................................................. vi

Chapter 1 Introduction ............................................................. 1

Chapter 2 Related Works ......................................................... 8

Chapter 3 ATTMU System ..................................................... 19

Chapter 4 Results and Discussion.......................................... 41

Chapter 5 Conclusions and Future Prospects ...................... 57

take up for life [1].

umpire in making correct decisions.

However, accurately segmenting and tracking the OOIs in match situations is

extremely challenging for a myriad of reasons, including:

crowd, may appear in the video.

before it hits the table on his opponent's side of the net.

net and let the ball hit the opponent's court.

player gaining a lead of 2 points.

the ATTMU system by experiments.

1.2 Motivation and goal

Furthermore, because the light changes in the environment, it is much difficult to

are important in automated detection of table tennis games.

2. Single-camcorder shooting: The camcorder is set up next to the table. The

camera, so it has the higher processing speed.

how to detect these objects accurately is very important.

1.3 Environment Settings

Because the system is expected to adapt to a variety of table tennis competition

Fig.1-3 The environment of table tennis games in this thesis

is experiment results. The fifth chapter is conclusion and future work.

a set of algebra. Morphology-based operators are as follows:

then, we will introduce the four basic operators as follows.

extent of the shrinkage is controlled by a structuring element. Using the structural

First, we have an image A and a structuring element B. Then, B is moving on the

image A. There are three possible states as follows:

3. Both B[x] ∩ A and B[x] ∩ 𝐴𝑐 are not null.

Fig.2-1 Three possible states of B[x]

(b) the result that A is eroded by B

B is denoted by A⊕B. It is defined as follows:

(𝐵̂ ) represents the reflection of B. It is defined as follows:

2.1.3 Opening and Closing

It is denoted by A。B. It is defined as follows:

First, we erode A by B. Next, we dilate the result by B. Because A is first eroded by B,

Closing [5] between image A and structuring element B is denoted by A・B, It is

to fill the narrow gaps in the image. It is shown as Fig.2-5.

A hole can be considered an area of background surrounding by the pixels of

If 𝑋𝑘 = 𝑋𝑘−1 , the algorithm stop doing iteration. Finally, we intersect 𝑋𝑘 and A to

fill the hole in the image.

labels. Connected component labeling is divided into 4-connected labeling and

8-connected labeling. The algorithm is divided into two steps as follows.

 If the pixels around P, including q, r, s, t are 0, we give P a new label.

labels of these pixels are given in the same equivalence class.

Fig.2-7 The relationship between P and the pixels around P

step two, we can get the correct labels.

2.4 Otsu’s Thresholding

Otsu’s thresholding method [8] specifies the threshold to transform the

threshold selection as follows:

whose intensities are greater than T.

3. Compute the mean of 𝐺1 and 𝐺2 .

4. Compute the new threshold T.