You are on page 1of 22

Speaking in Melbo’s mother tongue

10000010110100101000
01000001101110110111
10111101000010101010
01101101011011010000
Model 10111110000110110101

Features
Vectors
An ordered list of numbers
1.0 -1.5 2.0 -2.5 0.0 ≠ 2.0 1.0 0.0 -1.5 -2.5
Since we have become such good
friends, let me teach you a bit
more about my mother tongue
geometry crash course
can time series models predict how long we need to wait
till the next video gets uploaded ?? … just asking for a
friend …
Vectors 𝑦
Does not matter in which
order we move along the
𝑧
coordinate axes
5

0 𝑥
𝑥 𝑦
-4 -3 -2 -1 0 1 2 3
4

𝐮=¿ 3.5 4.0 𝐯=¿ -1.5 2.5 𝐰=¿ 3.5 2.5 4.0
How to stretch,
𝑦
shrink and flip vectors
4
Scalar multiplication
3 𝐮=¿ 1.5 2.0
2 2 ⋅𝐮=¿ 3.0 4.0
1
𝑥
−1.5⋅𝐮=¿ -2.25 -3.0
-4
4
-3 -2 -1 0 1 2 3
0.5 ⋅𝐮=¿ 0.75 1.0
-1
−0.75 ⋅ 𝐮=¿ -1.125 -1.5
-2
The sign of the scalar decides if the vector
-3 will get flipped or not. A magnitude of
less than one will shrink the vector and
-4 more than one will stretch the vector
How to add𝑦vectors The coordinate-wise rule remains the
same even if adding/subtracting more
than 2 vectors in more than 2 dimensions

4
Vector addition/subtraction
Add/subtract coordinate-wise
3
Complete the Parallelogram
2
𝐮=¿ 1.5 2.0
1
𝑥 𝐯=¿ 2.0 -2.5
𝐮+𝐯=¿ 3.5
-4 -3 -2 -1 0 1 2 3
4 -0.5
-1

-2 𝐮 − 𝐯=¿ -0.5 4.5


-3

-4
−0.5 ⋅𝐮+0.5 ⋅𝐯=¿ 0.25 -2.25
How to measure
𝑦
the length of a vector
4
Euclidean length
3
𝐮=¿ -3.0 4.0
2

-4 -3 -2 -1 0 1 2 3
𝑥 𝐰=¿ 6.0 -2.0 -3.0
4
-1

-2

-3

-4
How to measure
𝑦
the length of a vector
4
Taxicab/Manhattan length
3
𝐮=¿ -3.0 4.0
2

-4 -3 -2 -1 0 1 2 3
𝑥 𝐰=¿ 6.0 -2.0 -3.0
4
-1
These notions of length are also called norms. There
-2 is an entire family of so-called norms defined as

-3 Notice that the Euclidean length is just the norm


whereas the Manhattan length is the norm 
-4
How to measure
𝑦
distances Metrics satisfy three nice properties:
(symmetry)
(identity)
for any vector (triangle inequality).
4 𝐮=¿ 1.5 2.0
3
𝐯=¿ 2.0 -2.5
2
𝐮 − 𝐯=¿ -0.5 4.5
1
𝑥Euclidean distance
-4 -3 -2 -1 0 1 2 3
4
-1 Manhattan distance
-2
These notions of distances are also called
-3 metrics. We can use any of the norms to
define the metric as
-4
How to measure
𝑦
angles Two vectors are at an obtuse angle if their
dot product is . Two vectors are at an acute
angle if their dot product is

The4definition of the dot product and


this way to use it to calculate angles can
𝐮=¿ 1.0 2.0
be used
3 in higher dimensions as well
𝐯=¿ 2.0 -2.0
2
Dot product
1
𝑥
-4
4
-3 -2 -1 0
𝜃 1 2 3

-1

-2

-3

-4
How to measure angles
Let us give you a simple proof Two vectors are
of why dot products can be perpendicular if their
𝑦 used to calculate angles dot product is

4 𝐮=¿ -2.0 -1.0


3
𝐯=¿ -2.0 4.0
2
Dot product
𝜋 1
𝜃=
2
𝑥
-4 -3 -2 -1 0 1 2 3
4
-1

-2

-3

-4
Dot product𝑦helps us measure angles
𝑏
4 𝐮=¿ 𝑎
𝑥 𝑏
𝑦 𝐯=¿ √ 𝑝 +𝑞 𝑞
𝑝 2 2
0
3 Let’s rotate the vectors
2 Doesn’t change the angle or dot
product (proof of latter later)
1
𝑥Claim:
-4 -3 -2 -1 0
𝜃𝑎
1 2 3 4 Proof:

-1

-2 (cos = base/hyp)
-3 Clearly, we do have

-4
An application of norms in ML
ℬ 2 ( 𝐜 ,𝑟 ) ≝ { 𝐮 :‖𝐮 − 𝐜‖2 ≤ 𝑟 } Anomaly or attack detection

𝐜
𝑟

𝐝 𝑠

There are algorithms to find


ℬ1 ( 𝐝 , 𝑠 ) ≝ { 𝐯 :‖ 𝐯 −𝐝‖1 ≤ 𝑠 } the smallest enclosing ball
for a given set of data points
Aoudi et al. Truth will out: Departure-based process-level detection of stealthy attacks on control systems, CCS 2018.
An application of dot products in ML
𝑦 Binary classification is often solved
using a linear model
Changing(in 2D,thejust
changes slope a line)
of the line but keeps the
intercept unchanged

Can rewrite this as

𝑥 where
Changing does not change
the slope of the line. It just
𝑏 changes the intercept


𝐰 𝐱+𝑏>0⇒
The set is called a halfspace. The other set is also a
halfspace. Linear models solve binary classification using a
model that divides the entire space into two halfspaces,
one for each of the two classes. A line or hyperplane
separates the two halfspaces
Linear models in higher dimensions Changing shifts the hyperplane. Note that if we
decrease but keep the same then fewer points may
satisfy i.e., decreasing the bias makes the model more
The same trick works in higher dimensions picky about classifying points as green!
by learning a hyperplane classifier where
The hyperplane itself is often called the
“decision boundary” 𝐰
The vector is the normal or perpendicular
vector of the hyperplane
Consider any two vectors on the hyperplane i.e.,
Note that this means
The vector is parallel to the hyperplane and
perpendicular to all such vectors
Changing
rotates the
hyperplane
To or not to – that is the question
Sometimes, ML algos are simpler if we do not have a bias term
However, having a bias term is often critical so we cheat a bit and hide it
Create another dim in feature vector and fill it with 1 i.e.,
Note that features are now -dimensional, so must be the model
Learn a -dimensional linear model but without bias term
Let the new model be
If we denote , then

Thus, effectively acts as a bias term for us 


Exercise
 Given a hyperplane where , show that the distance of the origin
i.e., vector from the hyperplane is .
 The distance of a point from a hyperplane is defined as the shortest distance of
that point from any point on the hyperplane i.e.,
 Prove the famous Cauchy-Schwartz inequality that states that for
any two vectors , we have
 Hint: try using the claim we just proved
Convex Sets
𝐱
𝐱 𝐱 𝐲
𝑑
𝒞 ⊆ℝ 𝐳
𝐳
𝐲 𝐲
∀ 𝐱 ,𝐲 ∈𝒞
CONVEX SET NON-CONVEX SET
∀ 𝜆 ∈ [ 0,1 ]
𝐳=𝜆 ⋅ 𝐱+ ( 1 − 𝜆 ) ⋅ 𝐲 ∈𝒞 Think about which common shapes
and objects are convex and which are
not – balls, cuboids, stars, rectangles?
If you “fill up” the space above the

Convex Functions function curve and that set looks


convex, then the function is convex
too

𝑑
𝑓 :ℝ →ℝ 𝐱 𝐳 𝐲 𝐱 𝐳 𝐲
∀ 𝐱 ,𝐲 ∈𝒞 CONVEX FUNCTION NON-CONVEX
∀ 𝜆 ∈ [ 0,1 ] FUNCTION
Think of common functions that are convex

𝐳=𝜆 ⋅ 𝐱+ ( 1 − 𝜆 ) ⋅ 𝐲 1-D example:


High-D example:

𝑓 ( 𝐳 ) ≤ 𝜆 ⋅ 𝑓 ( 𝐱 )+ (1 − 𝜆 ) ⋅ 𝑓 ( 𝐲 ) A convex function must


lie below all its chords
Some handy tips for checking convexity
All constant functions are convex
All linear functions are convex
Sums of convex functions are convex
is convex
Positive multiples of convex functions are convex
If is convex and is convex and non-decreasing i.e., , then the
function defined as is convex
If is convex then is also convex
The norm function i.e., is convex
Exercise
 Show that the intersection of two convex sets is always convex
 Find an example of two convex sets whose union is not convex
 Show that every halfspace is a convex set
 Show that the ball is a convex set
 Show that the sum of two convex functions is always convex
 Find an example of two convex functions whose difference is
not a convex function
Summary
Vectors offer an expressive language to express features and outputs
Can be seen as an arrow with a length and a direction
Can also be seen as an ordered list of numbers
Can be scaled and added/subtracted together
Norms (Euclidean, Manhattan etc) allow us to calculate lengths of vectors
Metrics allow us to calculate distances between two points in space
The dot product allows us to calculate the angle between two vectors
Norms and dot products are used in ML to create classifier models
The linear model does binary classfn by dividing the space into two halfspaces
The bias term in a linear model is often hidden inside the normal vector itself
Convex sets contain their chords, convex funcs. lie below their chords
Stay Marvelous!
Chat with you next time

You might also like