You are on page 1of 1

M AT H S M L c h e at- s h e e t

00
1 .1 . S t r a i g h t L i n e +ve Half space 0 5. Equ at io n O f a Circle
1 . T he fir st d e riva tive g ives the slope of the tangent
D e lta O p e r at o r 0 9. Method o f L a g r a n g e m u lt i p l i e r s :
-ve Half space line to the function at a point.
i. Slope-intercept form A method finding the local minima or local
of
ve rst derivative ⇒ the function is increasing
subject to
+ fi
maxima of a function
iii. D i s ta n c e from the origin Optima for f can be
subject to
where Centre coordinates and Radius -ve fi rst derivative ⇒ the function is decreasing

found ∇𝑓
by putting
constraints

m = slope ,c = intercept
2 . T he Second d e riva tive of a function represents its equal to a Null m atrix The problem can be rewritten as :
P oints inside the circle give -ve values when
of the same
ii. Point - Slope form substituted in the circle e uation and points q concavity.
dimensions as ∇
If the second derivative is + ve ⇒ concave upwards
outside the circle give +ve values.
i v. D i s ta n c e from the Point If the second derivative is -ve ⇒ concave
downwards. Lagrange multiplier

m = slope , a point on line 0 6 . R otat io n of Co o rd in at e system


0 8. Gradient descent : Lagrangian function.

' C o m m o n D e r i vat i v e s

:
i i i . T wo - p o i n t F o r m Let s say we have a coordinate system x-y initially
and a point
Iterative algorithm to reach the optima of a function. 10. E igen vector and E i g e n va lu e
03. Vectors
For an y m atrix, there exists a vector such that when
i. Dot Product this vector is multiplied with the matrix, we get a

Where, and
Y Gradinet new vector in the same direction having a different

Y`
Initial
W eight magnitude.
are two points on the line.

X`
E igen Vector of M at rix A
i v. I n t e r c e p t F o r m
ii. Unit Vector Global Cost

X Minimum

a, b are the intercepts on the x and y-axis igen Vector of M at rix

θ in
E A

respectively. If the coordinate system is rotated by an angle

iii. D i s ta n c e b e t w e e n t wo k
anticloc wise direction. P in new system would be: GD Algorithm to optimi e z T here can be multiple eigen vectors, which are
Steps to find the optima
v. G e n e r a l f o r m points
:
S t e p 1 Initially, pic k and randomly
always orthogonal to each other.

genvecto
,
G iven a function f(x), calculate its derivative. T he ei r associated with the largest

i.e. f (x
Step 2 : C ompute and at and eigenvalue indicates the direction in which the data
a, b, c real numbers
0 7. D i f f e r e n t i at i o n

first principles
using
P ut f (x) = 0 to obtain the stationary points x = c respectively.
has the most variance.

i v. Norm (Magnitude) alculate at each stationary points c


vi. Lines
: :
C f (x) x =
Step 3 T he new values of 𝑥 and which are 1 1 . P r i n c i pa l Component A n a ly s i s (PCA)
(i.e f (c)
Lines represents the length of a vector closer to the optima are given as:
C hoose on of the following ;
T he act of finding a new axis to represent the data so
and are :
R ules of d i f f e r e n t i at i o n i. If f (c) > 0 , then x) has a minimum value at x = c.

f( that a few principal components m ay contain most of


ii. If f (c) < 0, then f( x) has a maximum value at x = c.
the information or variance.

Pa r a l l e l :
/ :
v. Angle between t wo v e c t o r s
Sum Difference rule
iii. If f (c) = 0, then f( x) may or may not have a Step 4 : Repeat step 3 until
Perpendicular : C onsider a vector x in the space representing one of
maxima or minima at x = c.
Here, η ⇒ learning rate the points in our data and a unit vector u

02. Hyperplane Absolute


If the η value is ver y small, then the updates will happen ver y representing the direction of the new axis.

Co n sta n t m u lt i p l e r u l e : Maximum
l wly. If it is a large value
s o , it may overs h oot the minima.

i. Vector Form
vi. Projection
Local Va r i a n t s o f G r a d i e n t descent
Maximum

Batc h GD

Product rule : C alculates the partial derivative


using the fu ll t ra inin g set at θ θ
04. Limit each step.
b d
a c e
limit is a value toward which an
Q :
A
uotient rule Mini- Batc h GD
expression converges as one or more

Local
C alculates the partial derivative
variables approach certain values. T he best u will be where the summation of the
Minimum using only a fe w da t a points θ θ j
length of pro ections of all such points (𝑥 ) on the
Absolute (random) from the data set.
Minimum vector u is maximum.

i i . H a l f S pa c e s
Chain rule : Stochastic Gradient descent : Ex p l a i n e d va r i a n c e
f(x) is continuous at a point x = a if: Pa r t i a l d e r i vat i v e

Updates the parameters for


It is either of the two parts into which a he amount of variability in a data set that can be
each training example one by
T
Derivative with respect to one variable, with the
plane divides the 3-D Euclidean space. attributed to each individual principal component.
others held constant. one. where k is a random number from i to n
,

You might also like