Professional Documents
Culture Documents
Data Science
Data Science
(Data Mining)
2559
(Data Mining)
204423
1
5
1 7
1.1 . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 . . . . . . . . . . . . . . . . . . . . 8
1.2.1 . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 . . . . . . . . . . . . . . . . . 12
1.2.3 . . . . . . . . . . . . . . . . . . . . 14
2 17
2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 . . . . . . . . . . . . . . . . . . . . 19
2.1.3 . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 . . . . . . . . . . . . . . . . . . . . . 19
2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 . . . . . . . . . . . . . . . . . . . . . . . 22
2
2.2.5 . . . . . . . . . . . . . . . 23
2.3 . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 . . . . . . . . . . . . . . . . . . . . . . 26
2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 . . . . . . . . . . . . . . 27
2.4.2 . . . . . . . . . . . . . 28
2.4.3 . . . . . . . . . . . . . . . 30
2.4.4 . . . . . . . . . . . . . . 31
2.4.5 . . . . . . . . . . . . . . . . . . . . . 31
2.5 . . . . . . . . . . . . . . . . . . . . 32
2.5.1 . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.2 . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.3 . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.4 . . . . . . . . . . . . . . . . 42
3 47
3.1 . . . . . . . . . . . . . . . . . . . . . . 48
3.1.1 . . . . . . . . . . . . . . . . 54
3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 64
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.1 . . . . . . . . . . . . . . 65
4.1.2 . . . . . . . . . . . . . . . . . . . . . . 67
4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 72
3
4.2.2 . . . . . . . . . . . . . . . . . 73
4.2.3 . . . . . . . . . . . . . . . . . . . . . . 77
5 83
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.1.1 . . . . . . . . . . . . . . . 86
5.1.2 . . . . . . . . . . . . . . . . . . 88
5.1.3 . . . . . . . . . . . . . . . . . . . . . . 89
5.2 . . . . . . . . . . . . . . . . . . . . . 93
5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.2 . . . . . . . . . . . 96
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5.1 . . . . . . . . . . . . . . . . . 105
5.5.2 . . . . . . . . . . . . . . . . . . . . . 105
5.5.3 . . . . . . . . . . . 106
5.5.4 . . . . . . . . . . . . . 110
6 113
6.1 . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.1 . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.2 . . . . . . . . . . . . . . . . . . . . . . 118
6.1.3 . . . . . . . . . . . . . . . . . 118
6.2 . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2.2 . . . . . . . . . . . . . . . . . 121
124
4
1.1 3 . . . . . . . . . . . . . . . 8
2.1 5 . . . . . . . 25
2.2 1000 . . . . . . . . . 25
2.3 . . . . . . . . . . . . . . . . . . . 26
2.4 . . . . . . . . . . . . . . . 36
2.5 . . . . . . . . 42
3.1 () . . . . . . . . . 49
3.2 () . . . . . . . . . 50
3.3 . . . . . . . . . . . . . . . . . . . . 55
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 . . . . . . . . . . . . . . . . 107
5.2 . . . . . . . . . . . . . . 109
5.3 5 . . . . . . . . . . . . . . . . . . . . . . 110
6.1 . . . . . . . . . . . . . . 116
6.2 . . . 116
6.3 . . 117
6.4 . . . . . . . . . . . 119
6.5 5 . . . . . . . . . . . . . . . . . . 120
5
2.1 . . . . . .
29
4.1 . . . . . . . . . . . . . . . 67
4.2 4 7 . . . . . . . . . . . . 72
4.3 . . . . . . . . . . . . . . . . . . . . . 74
4.4 4 . . . . . . . . . . . . . . . . . . . . . . 74
4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 . . . . . . . . . . . . . . . . . . 76
4.8 . . . . . . . . . . . . . . . . . . 76
4.9 (Score-based) . . . . . . . . . . . . . . . 77
5.1 14 . . . . . . . . . 90
5.2 . . . . . . . . . . . . . . . 106
6
1
1.1
(Information) (Knowledge)
(Raw Data) (Relation-
ship) (Pattern) (Concept)
(Non-trivial)
(Knowledge Discovery in Databases)
(Knowledge Extraction) (Data Analysis)
7
(Query) SELECT
(SQL)
(Data Stream)
(Remote Sensing) [Richards, 1999, Lillesand et al., 2014]
(Graphical data)
1.2
3
(Data Preprocessing) (Data Processing) (Post Pro-
cessing) 1.1
Data
pre- Data Post
Data Knowledge
processing processing processing
1.1: 3
8
(Data Cleaning) (Data Integration)
(Data Selection) (Data Normalising)
(Dimensionality Reduction) 2
3
(Association Rule) (Classication)
(Clustering) (Algorithm)
(Machine Learning)
4 5 6
(Multidisciplinary)
(Database system)
(Data Visualisation) [Cleveland, 1993, Fayyad et al., 2002]
1.2.1
(Image) (Text)
(Sound) (Time Series)
(Transactional Data)
(Frequent Itemset)
4
(Temporal Data)
-
(Spatial Data)
( )
10
Spatio-temporal Data
(Graphical Data)
World-Wide-Web
(Node) (Edge) (Hyperlink)
WWW WWW
AdSense Google
Facebook
Facebook
11
Facebook
1.2.2
2
(Descriptive) (Predictive)
(Class) (Concept)
(Concept/Class Descriptions)
(Data Charac-
terisation)
(Data Discrimination) (Manually)
Feature Extraction
Frequent Patterns
12
buys(X,bike) buys(X,helmet)
[support=3%, condence=60%]
(Measure of Interestingness)
(Condence) (Support) 60%
60%
(Single-dimension Association Rule)
(Minimum Support Threshold) (Minimum Condence
Threshold)
(Model)
(Labelled Training Data)
(Classier)
(Decision Tree), (Articial Neural Network), (Support
Vector Machine) (Logistic Regression)
5
13
(Similarity Measure)
6
(Outlier Analysis)
(Outlier)
(Noise)
[Jindal and Liu, 2007]
[Fawcett and Provost, 1997]
1.2.3
[Han et al., 2011]
14
15
1.
2.
3.
16
2
(Data)
(Data Point),
(Example), (Instance) (Input Vector) x
(Data Set) S
(Feature)
3
3
n N
(Euclidean Space) M
xn = {x1n , x2n , . . . , xm
n , . . . , xn }
M
17
n N m M
m
N M
1 m M
x1 . . . x1 . . . x1
.. . . . .. ..
. . .
1 M
xn . . . xn . . . xn
m
.. .. . . . ..
. . .
1 m M
xN . . . x N . . . x N
(Row Vector)
(Column Vector)
2.0.1. S 5 ()
( ) ()
13 39 157
23 56 157
15 60 177
32 72 187
21 43 162
1 13 39 157
3 60
2.1
18
2.1.1
(Nominal Feature)
(Domain) (Finite Set)
{,,,,,}
2.1.2
(Binary Feature)
2 ( 2)
2 (Symmetric) (Asymmetric)
2.1.3
(Ordinal Feature)
5
4 1
2.1.4
(Numeric Feature)
(Integer) (Real Number)
19
2.2
2
3 (Mean) (Median) (Mode)
2.2.1
(Arithmetic Mean)
1
1
N
x = xn (2.1)
N n=1
1 1 1 1 1 M
N N N
1 2 M
x = [x , x , . . . , x ] = [ x , x ,..., x ] (2.2)
N n=1 n N n=1 n N n=1 n
(Weighted Average)
1
N
x = wn xn (2.3)
N n=1
wn wn
1 2.1
xn 1
N
n=1 wn = 1 (Expected Value)
20
(Extreme Value) (Outlier)
(Trimmed Mean)
k
2.2.1. 6 1 5
10000, 15000, 13000, 20000, 30000
2000000
10000 + 15000 + 13000 + 20000 + 30000 + 2000000
= 348000
6
2.2.2
(Median)
{
N2 + 1 N
median = (2.4)
N
2 N
2 +1 N
2.2.3
(Mode)
(Distribution)
(Unimodal)
(Multimodal)
21
70
( )
2.2.4
(Variance)
1
N
n=1 (xn )2
Var(x) = 2 = (2.5)
N
(Standard Deviation)
(Covariance) i, j
N
n=1 (xn i )(xn j )
i j
i j
Cov(x , x ) = i j = (2.6)
N
i i i
i
(Covariance Matrix)
N
n=1 (xn ) (xn )
T
Cov(x) = = (2.7)
N
22
(xn )T (Transposition)
M M
12 . . . 1 m . . . 1 M
... ... ..
.
..
.
m 1 . . . m . . . m M
2
. .. .
.. . . . . .
.
2
M 1 . . . M m . . . M
(Symmetric) (Square)
(Diagonal)
i j = j i
2.2.5
(Quantile) Q (Range) n
1 |Q| = n 1 n
n = 2
n = 4
n = 100
3 (Q3) 1 (Q1) (Inter-
Quartile Range (IQR)) Q3 Q1 IQR
IQR
23
2.3
(Visualisation)
100000
(Outlier)
[Hoffman and Grinstein, 2002]
2.3.1
(Boxplot) 5 (Five Numbers Summary)
1. (Minimum)
2. 1
3.
4. 3
5. (Maximum)
2.1
(Mild Outlier) Q1 1.5 IQR
Q3 + 1.5 IQR (Extreme Outlier)
Q1 3 IQR Q3 + 3 IQR
24
Median
Extreme outlier Min Max Outlier
x
+ *
Q1 Q3
IQR
1.5*IQR 1.5*IQR
3*IQR 3*IQR
2.1: 5 ( )
2.3.2
(Histogram)
M 1
2 M 2.2
70 Histogram
60
50
40
Y
30
20
10
04 3 2 1 0 1 2 3
X
2.2: 1000
25
2.3.3
(Scatter Plot)
2 3 3
2
2.3
100
80
60
Y
40
20
20
20 0 20 40 60 80 100 120
X
2.3:
2.4
(Similarity)
(Similarity Measure)
26
(Dissimilarity)
2
(Similarity Matrix) ( )
(Dissimilarity Matrix)
0
d(x2 , x1 ) 0
d(x3 , x1 ) d(x3 , x2 ) 0 (2.8)
. . .
.. .. .. 0
d(xN , x1 ) d(xN , x2 ) d(xN , xN 1 ) 0
d(xi , xj ) xi xj .
d()
2.4.1
(Manhattan
Distance) (Euclidean Distance)
(Minkowski Distance)
xi , xj RM
3
(Positive Deniteness): d(xi , xj ) > 0, if i = j and d(xi , xi ) = 0
27
(Triangle Inequality): d(xi , xj ) d(xi , xk ) + d(xk , xj )
h = 1 , L1 ,
(Cityblock Distance)
h = 2 L2
d(xi , xj ) = (xi1 x1j )2 + (x2i x2j )2 + + (xm i xj )
m 2 (2.11)
2.4.2
2
0 1
4 1
0 1 0
- (Contingency
Table)
28
1 0
1 q r
0 s t
2.1:
r+s
d(xi , xj ) = = (2.13)
q+r+s+t
1 0
1
t q,r,s t t
r+s
d(xi , xj ) = = (2.14)
- q + r + s
2.4.1. P Positive (
1 2 3 4
P N P N N N
P N P N P N
P P N N N N
) N Negative ( )
0+1
d(, ) = 2+0+1 = 0.33
29
1+1
d(, ) = 1+1+1 = 0.67
1+2
d( , ) = 1+1+2 = 0.75
(
)
2.4.3
2
1
M P
d(xi , xj ) = (2.15)
M
M P xi xj
2
(Binary Feature Encoding) (State)
( ) (Binary Code)
3 110, 011, 101
2 2 (Encoded Binary
Feature)
30
2.4.4
1
1
(Absolute Value) 2
0 ( ) (Unbounded)
2
0 1 (Rank)
r1
Z= (2.16)
K 1
r K
2.4.5
(Cosine Similarity)
0
1 ( )
-1
180
0
xTi xj
cos(xi , xj ) = (2.17)
||xi ||||xj ||
31
||xi || L2 xi
2.5
4
1. (Accuracy)
2. (Completeness)
3. (Consistency)
4. (Timeliness)
4
4
2.5.1
A B
A cust.id B
customer.id
A B
3
32
2
2
(Random Variable)
p(X, Y ) p(X)
p(Y ) p(X, Y ) = p(X) p(Y )
(Null Hypothesis)
2
1000 X { , }
Y { , }
250 200 450
50 1000 1050
300 1200 1500
2
R
C
(oij eij )2
2
= (2.18)
i=1 j=1
oij
33
oij (Observed Frequency) eij
(Expected Frequency)
C R
k=1 o ik l=1 orj
eij = (2.19)
N
R C
(2.19) eij
X = i Y = j
eij i
Y = j
e11 = 300
450/1500 = 90
250 (90) 200 (360) 450
50 (210) 1000 (840) 1050
300 1200 1500
oij eij 2
34
2 2
P ( 2 > 2 )
2.6
xi , xj
Cov(xi , xj ) = 0
Cov(xi , xj ) = 0
i jCov(xi , xj )
R(x , x ) = (2.21)
xi xj
R(X, Y ) > 0
R(X, Y ) < 0 (
) R(X, Y ) = 0
35
2.4
12 10
10 8
8 6
6 4
Y
Y
4 2
2 0
0 2
2 4 6 8 10 2 4 6 8 10
X X
0.5
1
0
Y
0.5
1
2 1
3 2 1 0 1 2 2 4 6 8 10
X X
2.4:
36
correlation doesnt imply
causation
2.5.2
(Data reduction)
(Para-
metric) (Non-parametric)
(Model)
(Models Parameters)
Gaussian Mix-
ture Model (GMM)
GMM k
37
k
GM M (x) = i N (x; i , i ) (2.22)
i=1
I 0 ki=1 i = 1 GMM
(Mixing Weight)
i
Expectation-Maximisation (EM)
[Dempster et al., 1977]
GMM
(Poisson Distribution)
(Equal-
width Histogram) (Equal-frequency Histogram)
GMM
38
(k-means)
6
(Sampling)
(Random Sampling)
(Skewed Distribution)
(Stratied Sampling)
4
(Simple Random Sampling)
(Random Sampling with Replacement)
(Random Sampling without Replacement)
(Stratied Sampling)
2.5.3
39
(Incomplete)
(Missing Feature)
(Missing Value)
(Incorrect) (Noise),
(Outlier) (Extreme Value)
-1
(Random Error) (Random Deviation)
3
(Quantisation) 1
(Bin)
1
40
( )
()
0 (Zero-mean Noise)
X X
X = X + (2.23)
0
N (0, ). X
N N N
n=1 XN n=1 X
= + n=1
NN NN N
n=1 XN X
= n=1 + 0 (2.24)
N N
0
f () a b b = f (a)
f ()
b a b
b
41
b
b
2.5
(
)
10
6
Y
2 4 6 8 10
X
2.5:
2.5.4
(Mapping Function) z
x
x = f (z) (2.25)
3
42
(Feature Construction)
(Pixel) 3
1
100 100 1
10000 10000 1
(Feature
Vector)
3
(Texture Extraction)
Local
Binary Pattern [Ojala et al., 2002]
(Shape Extraction)
[Ke and Sukthankar, 2004]
(Colour Extraction)
[Manjunath et al., 2001]
(Image Processing) [Sonka et al., 2014]
43
3
1. Min-Max [min, max]
[minn , maxn ]
v min
v = (max min) + min (2.26)
max min n n n
2. Z-score = 0
= 1
v
v = (2.27)
3. (Decimal Scaling)
1
v
v = j
j max(|v |) < 1 (2.28)
10
(Discretisation)
44
45
1.
2.
3.
2, 3, 4, 1, 11, 10, 7, 5, 2, 3, 6, 0, 4, 2
4.
5. 1000 1.
2.
50
550
250
150
46
3
(k-means
clustering) k (k-Nearest Neighbours)
[Indyk and Motwani, 1998]
(Hyperplane)
(Curse of Dimensionality) [Friedman, 1997]
3
3
3 (Principle
47
Component Analysis) (Feature Subset Selection)
(Random Projection)
3.1
M N x1 , . . . , xN
x0
x0
x0
x0 {xn }N
n=1 x0
f0 (x0 )
N
f0 (x0 ) = ||x0 xn ||2 (3.1)
n=1
||x|| = (x1 )2 + (x2 )2 + + (xM )2
(Convex Function) x0 x0
x0 0
N
f0 = (x0 xn )2
n=1
f0 N
= 2(x0 xn ) = 0 (3.2)
x0 n=1
1
N
x0 = xn = (3.3)
N n=1
x0
48
1
3.1: ()
( )
3.1
(Point) (Line)
(Project)
(Projection) x = (x1 , x2 , . . . , xM )
e = (e1 , e2 , . . . , eM ) (Linear Combination)
M
T
e x= ei xi (3.4)
i=1
49
1
y
xn
1
3.2:
e ( 3.2)
y = + ae (3.5)
a x (
)
xn y 3.5 yn = + an e
xn
N
f1 (a1 , . . . , aN , e) = ||( + an e) xn ||2 (3.6)
n=1
N
N
N
= a2n ||e||2 2 an e (xn ) +
T
||xn ||2
n=1 n=1 n=1
50
an e e
an
an
f1 () (Partial Derivative) f1 () an
f1
= 2an ||e||2 2eT (xn ) = 0
an
an = eT (xn ) = xn (3.7)
f1 xn e
xn e xn
an 3.7 3.6
N
N
N
f1 (e) = a2n 2 a2n + ||xk ||2 (3.8)
n=1 n=1 n=1
N N
= [eT (xn )]2 + ||xn ||2 (3.9)
n=1 n=1
N
N
= e (xn )(xn ) e +
T T
||xn ||2 (3.10)
n=1 n=1
N
= eT Se + ||xn ||2 (3.11)
n=1
(Scatter Matrix) S = N n=1 (xn )(xn )
T
f1 eT Se
eT Se e
e e e
e ( )
e
e 1 e eT Se
51
||e|| = 1 (Lagrange Multiplier)
u = eT Se (eT e 1) (3.12)
u
= 2eS 2e (3.13)
e
(Eigensystem) e eT Se
(Eigenvector) S
Se = e (3.14)
(Eigenvalue) e
(Linear Algebra)
M M M
M
eT Se = eT e = , eT Se
e
(ei i )
(Basis) E
k
1
e1 e12 . . . e1k
e21 e22 . . . e2k
E= ... .. .. .. (3.15)
. . .
eM 1 eM
2 . . . eM
k
52
x x M k
1 2 1
e1 e1 eM 1
1 x
e2 e2 e2
1 2 M x x2
T .
. = (3.16)
E x = .. .. .. .. . ...
. . . . M
x
e1k e2k eM k xk
1
k M
k
y =+ am e m (3.17)
m=1
T
=+a E (3.18)
a = E T (x ) = x (3.19)
a ( x ) (Principle Components) k x
x E e i
k S
k
E (Pre-multiply)
(Data Matrix) E M k
Algorithm 1
Algorithm 1
1: X X = 0
2: S = cov(X)
3: T = eig(S) S
4: Select k most important principle components and put it in matrix E
5: Xpca = E T X E
53
3.1.1
Sei = i ei (3.20)
i S
E
(Error)
M
i
= i=k+1M
(3.21)
i=1 i
k
E
3.2
(Feature Selection)
3 (Fil-
ter Method) (In-
54
Filter Learning
Set of features Model
method algorithm
3.3:
3.2.1
3.3
1
Information Theory [MacKay, 2003])
55
(Signal-
2-Noise Ratio)
signal
S2N = (3.22)
noise
k0 k1
S2N = k (3.23)
0 + 1k
0 1 0
0 1 1 k0
0 k
0k 0 k
N0 k
n=1 xn
k0 = (3.24)
N0
N0 k
n=1 (xn k0 )2
0k = (3.25)
N0
N0 0 k1 1k
S2N S2N
S2N M
56
(Information Gain) X, Y
Y X
Y Y
X X
Y
(Entropy)
1
0
(Expected Value)
(Information Content)
1
I(X) = log2 ( ) (3.26)
P (X)
H(X) = E[I(X)] (3.27)
= E[ log2 (I(X))] (3.28)
N
= P (xn ) log2 P (xn ) (3.29)
n=1
(Fair)
N
H(X) = P (xn ) log2 P (xn ) (3.30)
n=1
= [0.5 log2 (21 ) + 0.5 log2 (21 )] (3.31)
=1 (3.32)
57
(Biased)
N
H(X) = P (xn ) log2 P (xn ) (3.33)
n=1
= [1 log2 (1) + 0 log2 (0)] (3.34)
=0 (3.35)
X X
(Discrete Event) Y
k
H(Y ) = P (Y = yi ) log2 (P (Y = yi )) (3.37)
i=1
r
H(Y |X) = P (X = xj )H(Y |X = xj ) (3.38)
j=1
H(Y |X = xj ) Y X xj
k
H(Y |X = xj ) = P (Y = yi |X = xj ) log2 (P (Y = yi |X = xj ))
i=1
(3.39)
58
(Mutual Information)
X Y
p(x, y)
I(x, y) = p(x, y) log (3.40)
x y
p(x)p(y)
1.
x1 x2
2. X Y
M < M
( )
[Guyon and Elisseeff, 2003]
59
3.2.2
(Search Strategy)
(Mean Squared
Error)
(Brute Force)
(Greedy Approach) (Evolutionary Algorithm)
2 2
(Forward Selection) (Backward Elimination)
( )
[Guyon and Elisseeff, 2003]
2
[Eiben and Smith, 2003]
60
[Guyon and Elisseeff, 2003]
3.2.3
(Regularisation)
L1
N
M
fobj = arg min T
yi (w xi + b) + |wj | (3.42)
w
i=1 j=1
| {z } | {z }
objective function regularisation
(Regularisation Parameter)
L1
(w) wj ( 0 ) wj
0 j
j
[Ng, 2004]
61
3.3
(PCA)
PCA
PCA
(Random Projection) Johnson
Lindenstrauss Lemma
Lemma 1 ([Johnson et al., 1986]). Let (0, 1). Let k, M, N N such that k
C2 log N , for a large enough absolute constant C. Let V RM be a set of N points.
Then there exists a linear mapping R : RM Rk , such that for all u, v V :
V M N
(Random Matrix) R (Relative
distance) 1
R
k M k M
(1 )
62
1.
2.
3.
4.
5.
63
4
(Recommendation Systems)
4.1
(Association Rule)
. . . A = B A B
2
(Frequent Pattern)
(Frequent Itemset)
64
(Market Basket Analysis)
4.1.1
I = {I1 , I2 , . . . , Im } D =
{t1 , t2 , . . . , tm } (Transaction Set) ti
ti I A
ti A A ti
(Implication)
A = B (4.1)
65
A I, B I A B = D
D A B (A B)
(Support) s
(Condence) c
D A B (A B)
D A
support(A = B) = P (A B)
# of ti containing A B
= (4.2)
# of ti
conf idence(A = B) = P (B|A)
support(A B)
=
support(A)
# of ti containing A B
= (4.3)
# of ti containing A
, A (Antecedence) B (Consequence) s
(Minimum Support Threshold) c (Minimum
Condence Threshold) ( Strong)
F = A B k s
k (Frequent k-itemset) k
Lk
4.1.1. 4
4.1
I = { , , , }
A = B A = { }
B = {}
2
support(A = B) = = 0.4 = 40%
5
0.4
conf idence(A = B) = = 1 = 100%
0.4
66
4.1:
1 1 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 1 1 0
5 0 1 0 0
2.
4.1.2
(Apriori Algorithm) (Breadth-rst-
search) k (k+1)
1 (1-itemset)
1 L1 L2 L1 L2
2 (Frequent 2-itemsets) L2 L3
k
k
(k+1) (k+1)
67
(Apriori Property)
Apriori Property:
I
I A I
I A I I
I A
Algorithm 2
1: Find all strong 1-itemsets
2: while Lk1 is non-empty set do
3: Ck = apriori-gen(Lk1 )
4: For each c in Ck , initialise c.count to zero
5: for records r in the database do
6: Cr = subset(Ck , r) ; for each c in Cr , c.count++
7: Set Lk to all c in Ck whose support is greater than minimum support
8: end for
9: end while
10: Return all of the Lk sets
apriori-gen
(Joining) (Pruning)
k ( k ) Lk
Lk1 Lk1 Ck
68
Lk1 Lk1
(k 2) l1 l2 Lk1
l1 l2
Ck Lk k
Ck Ck
Lk
Ck Ck
Ck
Ck (k-1)
k (k-1) k
Lk1
l, l.
69
s : s l
support_count(l)
s = (l s) if min_conf idence
support_count(s)
A
A
A 100
2100
(Closed Frequent Itemset) (Maximal
Frequent Itemset)
A S (Superset) B
A A = { }
B = { , } A A
B B
S A
( ) A
A S A
(Frequent) B A B B
B A
4.2
70
Netix
1
Netix
BellKors Pragmatic Chaos AT&T
1 Amazon
(Content-
based System)
(Collaborative Filter-
ing)
(Long-tail Phenomenon) 4.1
4.1:
( )
71
1 2 1 2 3
3 5 1
1 5 2
2 1 1
4 1 3
4.2: 4 7
4.2.1
(Utility Matrix)
4 1-5 4.2
1 K
0 1
72
4.2.2
(Item Prole) (User)
(User Prole)
4.3
73
/ ( ) S.Johansson C.Evans R.Downey
Ant man 1
Avenger 1 1 1
Iron man 1 1
4.3:
4.4
1.
2.
3.
74
(Rating) j i
pi,j = ((j k))/|Si | (4.4)
Si
Si i (j k) 1
k j 0 k
i j
pi,j = (vc vi )/|Cj | (4.5)
cCj
Cj i j
vi i |Cj |
Cj
4.
4.2.1.
Iron man
( )
75
/ Ant man Avenger Iron man
1
1
1 1
4.6:
0
1
( ) 1 5
i
j pi,j = cCj (vc vi )/|Cj |
76
/ Ant man Avenger Iron man
3
2
5 5
4.9: (Score-based)
4.2.3
(Collaborative Filtering)
vi,j
i
j Ii i
i
1
vi = vi,j (4.6)
|Ii |
jIi
active user a
j
n
pa,j = va + w(a, i)(vi,j vi ) (4.7)
i=1
pa,j w(a, i)
a i
77
(Minkowski distance-based Nearest
Neighbour)
(Pearsons Correlation Coefcient)
(Jaccard Distance)
(Cosine Distance)
{
1, if i neighbour(a)
w(a, i) =
0, otherwise
neighbour(a) k a
1
d(ui , uj ) = (|ui1 uj1 |h + + |uiM ujM |h ) h
{
d(ua , ui ), if i neighbour(a)
w(a, i) =
0, otherwise
j (va,j va )(vi,j vi )
w(i, j) =
j (v a,j va )2
j (vi,j vi )
2
78
0
(Finite Set)
S T
1 |S T |/|S T | (4.8)
S T (
) S T (Intersection)
(Union)
4.2.2. S = {dog, cat, parrot, monkey} T = {dog, monkey, snake}
S T SIMJaccard (S, T ) = 2/5 = 0.4
1 - SIMJaccard
(Rating Vector)
79
AB
SIMcos (A, B) = (4.9)
||A|| ||B||
A
A B
B
0
(Discretise)
( )
1-5 3,4,5 1 1 2
0
80
n
|r r|
acc = i=1 (4.10)
n
1 K
N
i=1 (pi,j vi, j)
2
M.S.E = (4.11)
N
N
81
1.
2.
3.
4.
5.
1 1 1 1 1
2 1 1 1
3 1 1 1 1
4 1 1 1
5 1 1 1 1 1
6 1 1 1
7 1 1 1 1 1
[, ] [ , , ]
IF & THEN
82
5
(E-mail)
(Spam E-mail)
(Classication) h: X Y
S = {xi , yi }N i=1 D (xi , yi ) xi X
(Input Vector) yi Y (Class Label) xi
D
(Supervised Learning)
h y x
y y x
h (Classier)
83
5.1
(Bayesian Learning)
(Bayes Rule)
p(x|y)p(y)
p(y|x) = (5.1)
p(x)
(Joint Probability) p(x, y)
p(x)
5.1
y x
1. p(y|x) x y
(Posterior Probability)
2. p(x|y) x y
(Likelihood)
3. p(y) y
(Prior Probability)
4. p(x) x (Evidence)
2
84
(Binary
Classication Problem) 0 1
p(y = 1|x) p(y = 0|x)
p(x|y = 1)p(y = 1)
p(y = 1|x) = (5.3)
p(x)
p(x|y = 0)p(y = 0)
p(y = 0|x) = (5.4)
p(x)
p(x|y)
(Probability Density Function (PDF))
5.1.1. (Positive)
98% (True Positive)
97%
x = positive y = 0
y = 1
p(y = 0|positive) p(y = 1|positive)
85
5.1.1
p(positive|y = 0)p(y = 0)
p(y = 0|positive) = (5.5)
p(positive)
p(positive|y = 0)p(y = 0)
=
p(positive|y = 0)p(y = 0) + p(positive|y = 1)p(y = 1)
(5.6)
p(positive|y = 1)p(y = 1)
p(y = 1|positive) = (5.7)
p(positive)
p(positive|y = 1)p(y = 1)
=
p(positive|y = 0)p(y = 0) + p(positive|y = 1)p(y = 1)
(5.8)
p(positive|y = 1)
0.98 p(negative|y = 0)
0.97
p(y)
y
( ) p(y)
86
p(y)
p(y)
p(x|y = 1)p(y = 1)
p(y = 1|x) = (5.14)
p(x)
p(x|y = 1)p(y = 1)
= (5.15)
p(x|y = 0)p(y = 0) + p(x|y = 1)p(y = 1)
p(x|y = 1)0.5
= (5.16)
p(x|y = 0)0.5 + p(x|y = 1)0.5
p(x|y = 1)
= (5.17)
p(x|y = 0) + p(x|y = 1)
87
p(y = 0) = p(y = 1) = 0.5
ML
p(positive|y = 0) 0.5
p(y = 0|positive) = (5.19)
p(positive|y = 0) 0.5 + p(positive|y = 1) 0.5
0.03 0.5
= (5.20)
0.03 0.5 + 0.98 0.5
= 0.03 (5.21)
p(positive|y = 1) 0.5
p(y = 1|positive) = (5.22)
p(positive|y = 0) 0.5 + p(positive|y = 1) 0.5
0.98 0.5
= (5.23)
0.03 0.5 + 0.98 0.5
= 0.98 (5.24)
> p(y = 0|positive) (5.25)
ML
5.1.2
0.008 %
p(y = 1) =
0.008 p(y = 1) = 0.992
(Maximum A Posterior (MAP))
88
MAP
p(positive|y = 0)p(y = 0
p(y = 0|positive) =
p(positive|y = 0)p(y = 0) + p(positive|y = 1)p(y = 1)
(5.27)
0.03 0.992
= (5.28)
0.03 0.992 + 0.98 0.008
0.79 (5.29)
p(positive|y = 1)p(y = 1)
p(y = 1|positive) =
p(positive|y = 0)p(y = 0) + p(positive|y = 1)p(y = 1)
(5.30)
0.98 0.008
= (5.31)
0.03 0.992 + 0.98 0.008
0.21 (5.32)
< p(y = 0|positive) (5.33)
MAP
ML MAP
(Bayes
Classier)
5.1.3
14 5.1 xi = {x1i , x2i , . . . , xM i }
89
15
x15 = {=,=, = , =}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
5.1: 14
15 }|y)p(y)
p(y|x15 ) = p({x115 , x215 , . . . , xM (5.34)
y = { , } p({x1 , x2 , . . . , xM }|y)
M x 1 p(x|y)
p(x1 = |y = )
y = x1 = y =
90
p(x1 = |y = ) = 29 x
p({x1 , x2 , . . . , xM }|y) x
x1 , x2 , . . . , xM
(Joint Probability) p({x1 , x2 , . . . , xM }|y)
p({x1 , x2 , . . . , xM }|y) = p(x1 |y)p(x2 , . . . , xM |y) (5.35)
= p(x1 |y)p(x2 |y, x1 )p(x3 , . . . , xM |y) (5.36)
= ... (5.37)
(Conditional Probabil-
ity)
(Naive
Assumption)
5.35
p({x1 , x2 , . . . , xM }|y) = p(x1 |y)p(x2 , . . . , xM |y, x1 ) (5.38)
= p(x1 |y)p(x2 |y)p(x3 , . . . , xM |y, x1 , x2 ) (5.39)
M
= p(xi |y) (5.40)
i=1
(Naive Bayes Classier)
15
y = arg max p(x15 |y)p(y) (5.41)
y{ , }
4
= arg max p(y) p(xi |y) (5.42)
y{ , }
i=1
91
y =
v {, , , , , }
92
(
)
( : )
x = {1,3,1,0,0,4}
5.2
(Normal Discrimi-
nant Analysis)
50
93
100 xq = 173.2
p(xq = 173.2|y = )
0 173.2
p(y = |xq = 173.2)
p(y = |xq = 173.2)
(
)
(Data Distribution
Model)
(Normally Distributed)
Central Limit Theorem [Wasserman, 2013]
(Independent Feature)
PDF 1
PDF (Univariate Normal Distribution)
1 (x k )2
p(x|y = k) = exp{ } (5.47)
2k2 2k2
94
0 , 1 , 0 , 1
0 , 1 , 0 , 1
y = 0
0 , 0 y = 1 1 , 1
Nk
n=1 xn
k = (5.54)
Nk
Nk
n=1 (xn k )T (xn k )
k = (5.55)
Nk
Nk
k = p(y = k) = (5.56)
N
(Probabilistic Classier) (
)
(Generative Classier)
95
5.2.1
(Discriminant Function) x
y
( p(y = 1|x) )
f1 (x) = 1 >1 (5.57)
p(y = 0|x)
1 p(y = 1|x) > p(y = 0|x) 0
p(y = 0|x) f1 (x)
( p(y = 1|x) )
f2 (x) = 1 log >0 (5.58)
p(y = 0|x)
(Prediction Threshold) 0
f2 (x)
5.2.2
1
PDF
1 1
p(x|y = k) = exp{ (x k )t 1
k (x k )} (5.59)
(2) |k |
d/2 1/2 2
||
Nk
xn
k = n=1 (5.60)
Nk
Nk
n=1 (xn k ) (xn k )
T
k = (5.61)
Nk
96
M
O(M 2 )
2
(Common Covariance Matrix)
0 + 1
= (5.62)
2
(Diagonal Covariance Matrix)
0
0
97
5.3
1 [Vapnik and Vapnik, 1998]
(Discriminative)
(Logistic Regression) [Ng and Jordan, 2002]
(Support
Vector Machine) (Articial Neural Network)
(Logistic Regression)
(Linear Regression)
x y
y = wT x + (5.63)
w (Weight Vector)
w
|y y| y y
w
1
Vladimir Vapnik Support Vector Machine (SVM)
98
(Decision Hyperplane)
(Threshold) 0 y 0
1 y < 0 0
5.58
( p(y = 1|x) ) ( )
f2 (x) = 1 log > 0 = 1 y > 0 (5.64)
p(y = 0|x)
wT x y
p(y = 1|x)
log = y = wT x (5.65)
p(y = 0|x)
wT x
p(y = 1|x) p(y = 0|x)
5.65
p(y = 1|x)
log = wT x (5.66)
p(y = 0|x)
p(y = 1|x)
= exp(wT x) (5.67)
p(y = 0|x)
p(y = 1|x)
= exp(wT x) (5.68)
1 p(y = 1|x)
p(y = 1|x) = exp(wT x) p(y = 1|x) exp(wT x) (5.69)
exp(wT x)
p(y = 1|x) = (5.70)
1 exp(wT x)
1
= (5.71)
1 exp(wT x)
99
exp(wT x)
p(y = 0|x) = (5.72)
1 exp(wT x)
1
(5.73)
1 exp(wT x)
(Sigmoid Function)
w w x
0 p(y = 0|x) > p(y = 1|x)
1 p(y = 1|x) > p(y = 0|x)
0 p(y = 0|x)
1 p(y = 1|x)
N
L1 = p(yn = 1|xn , w)yn (1 p(yn = 1|xn , w))1yn (5.74)
n=1
(Likelihood Function)
w
N
L2 = yn log p(yn = 1|xn , w) + (1 yn ) log(1 p(yn = 1|xn , w))
n=1
(5.75)
100
L2 (Convex Function)
L2 = 0
(Optimisation Theory)
[Boyd and Vandenberghe, 2004]
(Newtons Method)
f (w) w w
f (w) = 0
w
f (wi )
wi+1 = wi (5.76)
f (wi )
L2 = 0
L2 (wi )
wi+1 = wi (5.77)
L2 (wi )
(Learning Rate)
L2 L2 (First Derivative) (Second
Derivative) L2
N
L2 = [yn p(yn = 1|xn , w) (1 yn )p(yn = 0|xn , w)]xn (5.78)
n=1
N
L2 = xn p(yn = 1|xn , w)p(yn = 0|xn , w)xTn (5.79)
n=1
w
wT x
(Discriminative Classier)
101
5.4
(Parametric Model)
(Non-parametric)
k (k-
Nearest Neighbour (kNN)
kNN
x x x
xq kNN
xq kNN
xq xq
k
kNN
(x, y)
y = h(x) kNN
hknn (xq ) = majority(h1 (xq ), h2 (xq ), . . . , hk (xq )) (5.80)
k xq
kNN (Lazy Learner)
kNN
( )
(Training Time)
xq kNN
(Eager Learner) kNN
xq
kNN
102
(Real Time)
(Global Approximation)
(Local Approximation) ( )
kNN
kNN kNN
kNN k k
(Cross Validation) k kNN
(Distance-weighted kNN) kNN
wn h(xn )
hknn (xq ) = nN N (5.81)
n=1 n w
1
wn = (5.82)
d(xq , xn )
d(xq , xn ) xq xn
103
5.5
= (5.83)
(Training Error)
(Generalisation Error)
(
)
(Test Error)
1. (Training Dataset)
x y
2. (Testing Dataset)
x y
1.
104
2.
5.5.1
(Overtting)
(Undertting)
5.5.2
1
2
(Confusion Matrix)
Accuracy a+d
a+b+c+d = 1 error
105
Negative Positive
Negative a true negative b false positive
Positive c false negative d true positive
5.2:
d
True Positive Rate (Recall) c+d
a
True Negative Rate (Specicity) a+b
b
False Positive Rate (False Alarm) a+b
c
False Negative Rate c+d
(False Alarm) (Recall)
5.5.3
(Receiver Operation Characteristic Analysis)
2 x False Positive Rate y
True Positive Rate (ROC Graph)
5.1
106
1
0.8
True Positive Rate
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
False Positive Rate
5.1:
(0,1) False Positive Rate
0 True Positive Rate 1
(0.0) (1,1)
(1,0) ( )
107
1 p(y = 1|x) > 0.5
0 0.5 (Decision
Threshold) (Cost)
False Positive ( 1 (Type I Error))
False Negative ( 2 (Type II
Error))
0.5
(False Positive)
1
0.5
p(y = 1|x)
2 0.5
False
Positive Rate True Positive Rate
(ROC Curve)
(Classication Model)
5.2a
(Area Under Curve (AUC))
108
1 0 AUC
1 AUC
5.2 AUC
100 100
80 80
True Positive Rate
40 40
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100
False Positive Rate False Positive Rate
80
True Positive Rate
60
40
20
0 20 40 60 80 100
False Positive Rate
(c) AUC=50%
5.2:
109
[Fawcett, 2006]
5.5.4
(x, y)
1
(Hold-out Method)
k
k
#1
#2
#3
#4
#5
5.3: 5
(
)
110
k k 1
1 k
k
(k-fold Cross Validation) 5 5.3
k = N
Leave-one-out Cross Validation
111
1.
2.
3.
4.
5. k
6. k
7.
112
6
(Supervised Learning)
(Unlabelled Data)
(Class) (Cluster)
(Clustering)
(Partitioning Clustering)
(Hierachical
Clustering)
(Tree)
113
6.1
D = (x1 , . . . , xN ) xi = [x1i , . . . , xM i ]
M (Distance Function)
xi xj d(xi , xj )
i xj ) = ||xi xj ||
(xm m 2
d(xi , xj ) =
m=1
z
z
z = [z1 , . . . , zN ] {1, . . . , k}
1
n
Fobj = ||xn mzn ||2 (6.1)
2 i=1
(k-means Algorithm)
6.1.1
(Mean)
114
Algorithm 3
1: Randomly choose 1:k
2: Initilise z
3: do
4: zn = arg mini{1,...,k} d(xn , i )
5: k = N1k n:zn =k xn
6: while z does not change
7: return z z
Algorithm 3 k
z
z
z
6.1-6.3 6.1
80 4 20
(4,4), (-4,4), (4,-4), (-4,-4)
1
4
6.2
6.3
115
6
6 4 2 0 2 4 6
6.1:
6 4 2 0 2 4 6
6.2:
116
6
6 4 2 0 2 4 6
6.3:
6.1
1
n
F (z1:n , 1:k ) = ||xn zn ||2 (6.2)
2 i=1
1.
F z
2. z F
z F
(Local Minimum) (Coordinate
Descend Algorithm)
117
6.1.2
3
=1 =2 =3
2.2
(k-medoids)
Algorithm 4
Algorithm 4
1: Randomly choose m1:k
2: Initilise z
3: do
4: zn = arg mini{1,...,k} d(xn , mi )
5: mk = median(xn ) where zn = k
6: while z does not change
7: return z z
6.1.3
k k
(Heuristic) k
k=1 (
6.1) k k k
( k) 1 9
6.4 k=1
800 k
k 3 k=4 690 300
118
k
4
800
700
Objective function value
600
500
400
300
200
2 4 6 8
Value of k
6.4:
k=4
6.2
(Hierachical Clustering)
k
119
(Agglomerative
Clustering) (Bottom-up)
6.2.1
(Dendrogram)
6.5
(Monotonic)
a b c d e
6.5: 5
c d
2 e c d
120
a b
6.2.2
3
(Single Linkage)
G H
(Chaining) (
)
(Complate Linkage)
121
(Group Average)
1
dGA (G, H) = d(xi , xj ) (6.5)
NG NH
iG jH
NG NH G H
122
1.
2.
3.
4.
5.
5 10
6 19
8 15
12 5
7 13
123
[Blei et al., 2003] Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of
machine Learning research, 3(Jan):9931022.
[Boyd and Vandenberghe, 2004] Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cam-
bridge university press.
[Cleveland, 1993] Cleveland, W. S. (1993). Visualizing data. Hobart Press.
[Dempster et al., 1977] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological),
pages 138.
[Eiben and Smith, 2003] Eiben, A. E. and Smith, J. E. (2003). Introduction to evolutionary computing,
volume 53. Springer.
[Fawcett, 2006] Fawcett, T. (2006). An introduction to roc analysis. Pattern recognition letters, 27(8):
861874.
[Fawcett and Provost, 1997] Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data mining
and knowledge discovery, 1(3):291316.
[Fayyad et al., 2002] Fayyad, U. M., Wierse, A., and Grinstein, G. G. (2002). Information visualization
in data mining and knowledge discovery. Morgan Kaufmann.
[Friedman, 1997] Friedman, J. H. (1997). On bias, variance, 0/1loss, and the curse-of-dimensionality.
Data mining and knowledge discovery, 1(1):5577.
124
[Guyon and Elisseeff, 2003] Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature
selection. Journal of machine learning research, 3(Mar):11571182.
[Han et al., 2011] Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
[Hoffman and Grinstein, 2002] Hoffman, P. E. and Grinstein, G. G. (2002). A survey of visualizations for
high-dimensional data mining. Information visualization in data mining and knowledge discovery, pages
4782.
[Indyk and Motwani, 1998] Indyk, P. and Motwani, R. (1998). Approximate nearest neighbors: towards
removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory
of computing, pages 604613. ACM.
[Jindal and Liu, 2007] Jindal, N. and Liu, B. (2007). Review spam detection. In Proceedings of the 16th
international conference on World Wide Web, pages 11891190. ACM.
[Johnson et al., 1986] Johnson, W. B., Lindenstrauss, J., and Schechtman, G. (1986). Extensions of
lipschitz maps into banach spaces. Israel Journal of Mathematics, 54(2):129138.
[Ke and Sukthankar, 2004] Ke, Y. and Sukthankar, R. (2004). Pca-sift: A more distinctive representation
for local image descriptors. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings
of the 2004 IEEE Computer Society Conference on, volume 2, pages II506. IEEE.
[Lillesand et al., 2014] Lillesand, T., Kiefer, R. W., and Chipman, J. (2014). Remote sensing and image
interpretation. John Wiley & Sons.
[MacKay, 2003] MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge
university press.
[Manjunath et al., 2001] Manjunath, B. S., Ohm, J.-R., Vasudevan, V. V., and Yamada, A. (2001). Color
and texture descriptors. IEEE Transactions on circuits and systems for video technology, 11(6):703715.
[Ng, 2004] Ng, A. Y. (2004). Feature selection, l 1 vs. l 2 regularization, and rotational invariance. In
Proceedings of the twenty-rst international conference on Machine learning, page 78. ACM.
125
[Ng and Jordan, 2002] Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classiers:
A comparison of logistic regression and naive bayes. In Advances in Neural Information Processing
Systems, pages 841848.
[Ojala et al., 2002] Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Multiresolution gray-scale and
rotation invariant texture classication with local binary patterns. IEEE Transactions on pattern analysis
and machine intelligence, 24(7):971987.
[Pang and Lee, 2008] Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations
and trends in information retrieval, 2(1-2):1135.
[Richards, 1999] Richards, J. A. (1999). Remote sensing digital image analysis, volume 3. Springer.
[Sonka et al., 2014] Sonka, M., Hlavac, V., and Boyle, R. (2014). Image processing, analysis, and ma-
chine vision. Cengage Learning.
[Vapnik and Vapnik, 1998] Vapnik, V. N. and Vapnik, V. (1998). Statistical learning theory, volume 1.
Wiley New York.
[Wasserman, 2013] Wasserman, L. (2013). All of statistics: a concise course in statistical inference.
Springer Science & Business Media.
126