You are on page 1of 27

20CS254

FOUNDATIONS OF
DATA SCIENCE
MODULE II
RELATION TO STATISTICS
Representations of relations between data and linear
algebraic operations on matrices
ML models Leveraging Linear Algebra
Contd..

 Machines or computers only understand numbers. And these numbers need to be


represented and processed in a way that lets machines solve problems by
learning from the data instead of learning from predefined instructions (as in the
case of programming).
Contd..

 All types of programming use mathematics at some level. Machine


learning involves programming data to learn the function that best
describes the data.

 The problem (or process) of finding the best parameters of a function


using data is called model training in ML.
Contd..

 Therefore, in a nutshell, machine learning is programming to optimize for


the best possible solution – and we need math to understand how that
problem is solved.

 first step towards learning Math for ML is to learn linear algebra.


Linear Algebra

 Linear Algebra is the mathematical foundation that solves the problem of


representing data as well as computations in machine learning models.

 It is the math of arrays — technically referred to as vectors, matrices and


tensors.

 In the ML context, all major phases of developing a model have linear algebra
running behind the scenes.
Common Areas of Application — Linear Algebra in
Action
Data Representation

 The fuel of ML models, that is data, needs to be converted into arrays before you
can feed it into your models. The computations performed on these arrays include
operations like matrix multiplication (dot product). This further returns the output
that is also represented as a transformed matrix/tensor of numbers.
Word embeddings

 representing large-dimensional data (think of a huge number of variables


in your data) with a smaller dimensional vector.
Contd..
Contd..

 Natural Language Processing (NLP) deals with textual data. Dealing with
text means comprehending the meaning of a large corpus of words. Each
word represents a different meaning which might be similar to another word.
Vector embeddings in linear algebra allow us to represent these words more
efficiently.
Eigenvectors (SVD)

 Finally, concepts like eigenvectors allow us to reduce the number of features


or dimensions of the data while keeping the essence of all of them using
something called principal component analysis.
Contd..

 Linear algebra basically deals with vectors and matrices (different shapes of
arrays) and operations on these arrays. In NumPy, vectors are basically a 1-
dimensional array of numbers but geometrically, they have both magnitude
and direction.
Contd…

 Our data can be represented using a vector. In the figure above, one row in this
data is represented by a feature vector which has 3 elements or components
representing 3 different dimensions. N-entries in a vector makes it n-dimensional
vector space and in this case, we can see 3-dimensions.
Deep Learning — Tensors Flowing Through a
Neural Network

 We can see linear algebra in action across all the major applications today.
Examples include sentiment analysis on a LinkedIn or a Twitter post (embeddings),
detecting a type of lung infection from X-ray images (computer vision), or any
speech to text bot (NLP).
 All of these data types are represented by numbers in tensors. We run vectorized
operations to learn patterns from them using a neural network. It then outputs a
processed tensor which in turn is decoded to produce the final inference of the
model.
 Each phase performs mathematical operations on those data arrays.
Dimensionality Reduction — Vector Space
Transformation
Recommendation Engines — Making use of
embeddings
Vector Representation based on the features
Contd..

 here this matrix of 4X5, 4 rows, and 5 features, was broken down into two
matrices, one that's 4X2 and the other that's 2X5. We basically have new smaller
dimensional vectors for users and movies.
Plotting in 2D vector space
Contd..

 The concept of a dot product (matrix multiplication) of vectors tells us more


about the similarity of two vectors. And it has applications in
correlation/covariance calculation, linear regression, logistic regression, PCA,
convolutions, PageRank and numerous other algorithms.
Industries where Linear Algebra is used heavily

 Statistics
 Chemical Physics
 Genomics
 Word Embeddings — neural networks/deep learning
 Robotics
 Image Processing
 Quantum Physics
How much Linear Algebra should you know to get
started with ML / DL?

 Now, the important question is how you can learn to program these concepts of linear algebra.
The answer is you don’t have to reinvent the wheel, you just need to understand the basics of
vector algebra computationally and you then learn to program those concepts using NumPy.

 NumPy is a scientific computation package that gives us access to all the underlying concepts
of linear algebra. It is fast as it runs compiled C code and it has a large number of
mathematical and scientific functions that we can use.

You might also like