Professional Documents
Culture Documents
FOUNDATIONS OF
DATA SCIENCE
MODULE II
RELATION TO STATISTICS
Representations of relations between data and linear
algebraic operations on matrices
ML models Leveraging Linear Algebra
Contd..
In the ML context, all major phases of developing a model have linear algebra
running behind the scenes.
Common Areas of Application — Linear Algebra in
Action
Data Representation
The fuel of ML models, that is data, needs to be converted into arrays before you
can feed it into your models. The computations performed on these arrays include
operations like matrix multiplication (dot product). This further returns the output
that is also represented as a transformed matrix/tensor of numbers.
Word embeddings
Natural Language Processing (NLP) deals with textual data. Dealing with
text means comprehending the meaning of a large corpus of words. Each
word represents a different meaning which might be similar to another word.
Vector embeddings in linear algebra allow us to represent these words more
efficiently.
Eigenvectors (SVD)
Linear algebra basically deals with vectors and matrices (different shapes of
arrays) and operations on these arrays. In NumPy, vectors are basically a 1-
dimensional array of numbers but geometrically, they have both magnitude
and direction.
Contd…
Our data can be represented using a vector. In the figure above, one row in this
data is represented by a feature vector which has 3 elements or components
representing 3 different dimensions. N-entries in a vector makes it n-dimensional
vector space and in this case, we can see 3-dimensions.
Deep Learning — Tensors Flowing Through a
Neural Network
We can see linear algebra in action across all the major applications today.
Examples include sentiment analysis on a LinkedIn or a Twitter post (embeddings),
detecting a type of lung infection from X-ray images (computer vision), or any
speech to text bot (NLP).
All of these data types are represented by numbers in tensors. We run vectorized
operations to learn patterns from them using a neural network. It then outputs a
processed tensor which in turn is decoded to produce the final inference of the
model.
Each phase performs mathematical operations on those data arrays.
Dimensionality Reduction — Vector Space
Transformation
Recommendation Engines — Making use of
embeddings
Vector Representation based on the features
Contd..
here this matrix of 4X5, 4 rows, and 5 features, was broken down into two
matrices, one that's 4X2 and the other that's 2X5. We basically have new smaller
dimensional vectors for users and movies.
Plotting in 2D vector space
Contd..
Statistics
Chemical Physics
Genomics
Word Embeddings — neural networks/deep learning
Robotics
Image Processing
Quantum Physics
How much Linear Algebra should you know to get
started with ML / DL?
Now, the important question is how you can learn to program these concepts of linear algebra.
The answer is you don’t have to reinvent the wheel, you just need to understand the basics of
vector algebra computationally and you then learn to program those concepts using NumPy.
NumPy is a scientific computation package that gives us access to all the underlying concepts
of linear algebra. It is fast as it runs compiled C code and it has a large number of
mathematical and scientific functions that we can use.