Maths For ML

M AT H S M L c h e at- s h e e t
00
1 .1 . S t r a i g h t L i n e +ve Half space 0 5. Equ at io n O f a Circle
1 . T he fir st d e riva tive g ives the slope of the tangent
D e lta O p e r at o r 0 9. Method o f L a g r a n g e m u lt i p l i e r s :
-ve Half space line to the function at a point.
i. Slope-intercept form A method finding the local minima or local
of
ve rst derivative ⇒ the function is increasing
subject to
+ fi
maxima of a function
iii. D i s ta n c e from the origin Optima for f can be
subject to
where Centre coordinates and Radius -ve fi rst derivative ⇒ the function is decreasing
found ∇𝑓
by putting
constraints
m = slope ,c = intercept
2 . T he Second d e riva tive of a function represents its equal to a Null m atrix The problem can be rewritten as :
P oints inside the circle give -ve values when
of the same
ii. Point - Slope form substituted in the circle e uation and points q concavity.
dimensions as ∇
If the second derivative is + ve ⇒ concave upwards
outside the circle give +ve values.
i v. D i s ta n c e from the Point If the second derivative is -ve ⇒ concave
downwards. Lagrange multiplier
m = slope , a point on line 0 6 . R otat io n of Co o rd in at e system

0 8. Gradient descent : Lagrangian function.
' C o m m o n D e r i vat i v e s
:
i i i . T wo - p o i n t F o r m Let s say we have a coordinate system x-y initially
and a point
Iterative algorithm to reach the optima of a function. 10. E igen vector and E i g e n va lu e
03. Vectors
For an y m atrix, there exists a vector such that when
i. Dot Product this vector is multiplied with the matrix, we get a
Where, and
Y Gradinet new vector in the same direction having a different
Y`
Initial
W eight magnitude.
are two points on the line.
X`
E igen Vector of M at rix A
i v. I n t e r c e p t F o r m
ii. Unit Vector Global Cost
X Minimum
a, b are the intercepts on the x and y-axis igen Vector of M at rix
θ in
E A
respectively. If the coordinate system is rotated by an angle
iii. D i s ta n c e b e t w e e n t wo k
anticloc wise direction. P in new system would be: GD Algorithm to optimi e z T here can be multiple eigen vectors, which are
Steps to find the optima
v. G e n e r a l f o r m points
:
S t e p 1 Initially, pic k and randomly
always orthogonal to each other.
genvecto
,
G iven a function f(x), calculate its derivative. T he ei r associated with the largest
i.e. f (x
Step 2 : C ompute and at and eigenvalue indicates the direction in which the data
a, b, c real numbers
0 7. D i f f e r e n t i at i o n
first principles
using
P ut f (x) = 0 to obtain the stationary points x = c respectively.
has the most variance.
i v. Norm (Magnitude) alculate at each stationary points c

vi. Lines
: :
C f (x) x =
Step 3 T he new values of 𝑥 and which are 1 1 . P r i n c i pa l Component A n a ly s i s (PCA)
(i.e f (c)
Lines represents the length of a vector closer to the optima are given as:
C hoose on of the following ;
T he act of finding a new axis to represent the data so
and are :
R ules of d i f f e r e n t i at i o n i. If f (c) > 0 , then x) has a minimum value at x = c.
f( that a few principal components m ay contain most of

ii. If f (c) < 0, then f( x) has a maximum value at x = c.
the information or variance.
Pa r a l l e l :
/ :
v. Angle between t wo v e c t o r s
Sum Difference rule
iii. If f (c) = 0, then f( x) may or may not have a Step 4 : Repeat step 3 until
Perpendicular : C onsider a vector x in the space representing one of
maxima or minima at x = c.
Here, η ⇒ learning rate the points in our data and a unit vector u
02. Hyperplane Absolute

If the η value is ver y small, then the updates will happen ver y representing the direction of the new axis.
Co n sta n t m u lt i p l e r u l e : Maximum
l wly. If it is a large value
s o , it may overs h oot the minima.
i. Vector Form
vi. Projection
Local Va r i a n t s o f G r a d i e n t descent
Maximum
Batc h GD
Product rule : C alculates the partial derivative

using the fu ll t ra inin g set at θ θ
04. Limit each step.
b d
a c e
limit is a value toward which an
Q :
A
uotient rule Mini- Batc h GD
expression converges as one or more
Local
C alculates the partial derivative
variables approach certain values. T he best u will be where the summation of the
Minimum using only a fe w da t a points θ θ j
length of pro ections of all such points (𝑥 ) on the
Absolute (random) from the data set.
Minimum vector u is maximum.
i i . H a l f S pa c e s
Chain rule : Stochastic Gradient descent : Ex p l a i n e d va r i a n c e
f(x) is continuous at a point x = a if: Pa r t i a l d e r i vat i v e
Updates the parameters for

It is either of the two parts into which a he amount of variability in a data set that can be
each training example one by
T
Derivative with respect to one variable, with the
plane divides the 3-D Euclidean space. attributed to each individual principal component.
others held constant. one. where k is a random number from i to n
,

Maths For ML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maths For ML

Uploaded by

Copyright:

Available Formats

M AT H S M L c h e at- s h e e t

m = slope , a point on line 0 6 . R otat io n of Co o rd in at e system

a, b are the intercepts on the x and y-axis igen Vector of M at rix

respectively. If the coordinate system is rotated by an angle

i v. Norm (Magnitude) alculate at each stationary points c

f( that a few principal components m ay contain most of

02. Hyperplane Absolute

Product rule : C alculates the partial derivative

Updates the parameters for

You might also like