You are on page 1of 71

Applications of Linear Algebra

Lecture 4
”Linear”  “Straight line” (impression)

Linear Non-linear
Coverage
• Minimum knowledge in Linear Algebra for Machine Learning
• Topics
• Vectors and Matrices
• Matrix/Vector operations
• Addition
• multiplication
• Inverse and Transpose
• …
• Linear Regression: equation and visualization
• Supporting matrices in data analytics
• Introduction to NumPy
Background Check

• Given an equation 3x - y + 1 = 0
• How many solution there? Now, guess the number of solution again for each of
A. 0 the following two systems of equations:
3x – y + 1 = 0
B. 1 4x – y – 4 = 0
C. More than 1 but limited
3x - y + 1= 0
D. infinite 3x - y + 10 = 0
Visualization of math equations
from sympy import *
x=symbols(“x”) y = 3x + 1
i.e. 3x - y + 1 = 0
plot(3x+1 )

y = 3x + 1
plot(3*x+1, 4*x -4) y = 4x - 4

3x – y + 1 = 0
4x – y – 4 = 0
3x - y + 1= 0
3x - y + 10 = 0
Key for a sound foundation in math for data science

• Able to describe meaning of (relevant) math formula


• Able to characterize some real-world problems in math notation
• Able to represent certain math problems in programs
• Visualization helps understanding/interpretation of problems
• Software tools (modules, libraries, …) relieve us the tedious
procedure in finding solutions
Scalars Matrices
A scalar is a single number A matrix is a 2-D array of numbers
• Integers, real numbers, rational
numbers, etc. 𝑚𝑚0,0 𝑚𝑚0,1
• We may denote it as a, n, x m = 𝑚𝑚 10 𝑚𝑚1,1
𝑚𝑚2,,0 𝑚𝑚2,1
𝑚𝑚3,0 𝑚𝑚3,1
Vectors
A vector is a 1-D array of numbers This is a matrix of 4 rows, 2 columns, i.e. 4x2
𝑣𝑣0
In Math, we name rows as row1, row2, …
v= 𝑣𝑣1 Python names rows as row0, row1, …

𝑣𝑣𝑛𝑛
Tensors

A tensor is an array of numbers, that may have


• zero dimensions, and be a scalar
• one dimension, and be a vector
• two dimensions, and be a matrix
• or higher dimensions.
Vectors: Python representation
#with Python comments
height_weight_age = height_weight_age = [
[70, 170, 40 ] 70, # inches,
170, # pounds,
40 ] # years
grades = [95, 80, 75, 62 ]
grades = [ 95, # exam1
80, # exam2
75, # exam3
62 ] # exam4
Vector Addition
V1 = [1, 2]
V2 = [2, 1]
V3 = V1 + V2 = [3, 3]

def vector_add (Va, Vb) :


if len(Va) != len(Vb) :
return None #assert len(Va) == len(Vb)
else :
return [ vj + wj for vj, wj in zip (Va, Vb)] Vector subtraction:
V1 = [4, 5, 6]
V3 = vector_add(V1, V2) V2 = [1, 1, 2]
vector_sub(V1, V2) => [3, 4, 4]
Advanced: use operator overloading in Python
More vector operations
• Vector sum: create a new vector whose first element is the sum of all
the first elements, whose second element is the sum of all the second
elements, and so on:
vector_sum([[1, 2], [3, 4], [5, 6], [7, 8]])  [16, 20]
• Vector-scalar multiplication
Advanced: Python Arbitrary Arguments
2 * [1, 2, 3]  [2, 4, 6] def f (*args) : #multi-arguments to a list
aL = []
scalar_mult(2, [1, 2, 3])  [2, 4, 6] for a in args :
• Vector mean: components wide mean value aL.append(a)
#aL now contains a list of arguments
vector_mean([1, 5], [2, 9], [6, 10])  [3, 8] #may zip the list, perform operation
dot product
• Sum of component-wide product
• Example
dot([1, 2, 3], [4, 5, 6])  32
#1*4+2*5+3*6
If project v onto w, the dot product gives the length of the vector
def dot(v: Vector, w: Vector) -> float:
#Computes v_1 * w_1 + ... + v_n * w_n
assert len(v) == len(w),
#vectors must be same length
return sum(v_i * w_i for v_i, w_i in zip(v, w))
Distance between two vectors
Calculation of distance between #code skeleton
two vectors v=(v1, …, vn) and def squared_distance(v: Vector, w: Vector) -> float:
w = (w1, …, wn) #Computes (v_1 - w_1) ** 2 + ... + (v_n - w_n) ** 2
return sum_of_squares(subtract(v, w))

def distance(v: Vector, w: Vector) -> float:


#Computes the distance between v and w
return math.sqrt(squared_distance(v, w))

Example: v = [1, 2, 3], w = [4, 5, 6]


Distance = sqrt((1-4)**2+(2-5)**2+(3-6)**2) = sqrt(27) = 5.196
Practice
• Given two vectors
v1 = [1, 2, 5, 3]
v2 = [3, 1, 4, 2]
Calculate
v1 + v2
v1 – v2
dot product of (v1, v2)
distance between v1 and v2
Matrix: Python Representation

A = [[1, 2, 3, 4], [5, 6, 7, 8]]

M = [[1,2,3], [2,3,4], [4,5,6]]

Better representation: Numpy arrays


to be discussed soon
Matrix Shape
• Dimension of matrix from typing import Tuple
def shape(A: Matrix) -> Tuple[int, int]:
m x n matrix """Returns (# of rows of A, # of columns of A)"""
num_rows = len(A)
m: number of rows num_cols = len(A[0]) if A else 0
n: number of columns # number of elements in first row
return num_rows, num_cols
(m, n): shape of matrix
• Example
shape([[1, 2, 3], [4, 5, 6]])  (2, 3)
Identity Matrix
M = []
Identity matrix: for r in range (5) :
row = []
with 1s on the diagonal for c in range (5) :
and 0s elsewhere if r == c :
row.append(1)
else :
row.append(0)
M.append(row)
Example 5 x 5 identity matrix:
[ [1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
import numpy as np
[0, 0, 1, 0, 0], np.eye(5) #or, np.eye(5, dtype = int)
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]]
Matrices: Application to binary relation
• Binary relationship example
friendship = [(0, 1),(0, 2), (1, 2), (1, 3), (2, 3), (3, 4),(4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]
if (j, k) in friendship, then j is a friend of k
Questions:
(1) if j is a friend of k, could we infer that k is a friend of j?
(2) should you be considered as a friend of yourself?
• Matrix representation of binary relationship
if M[j, k] == 1, j is a friend of k
if M[j, k] == 0, j is not a friend of k
Matrix Representation of
Friend Relationship
Sparse matrix:
a lot of entries with 0s
0 1 1 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 0
Any binary relationship
0 0 0 0 0 0 0 0 1 0 can be represented by a
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 2-D matrix
Symmetric Matrix
Symmetric Relation
If j is a friend of k, then we can determine that k is also a friend
of j. Such relationship is called symmetric relation

Symmetric Matrix (must be of shape (m, m), i.e. square matrix)


for any j and k (assume j ≠ k) if M[j, k] = M[k, j], then M is a
symmetric matrix

Should friend relationship a symmetric relation?


Symmetric Matrix Should you be a friend of yourself?
If yes, the matrix would look like:
0 1 1 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 1 1 0 0
0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0
0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 1 1
Practice
• Identify the following symmetric matrix or not
Practice
• Identify the following symmetric matrix or not
Matrix Operations
• Transposition
• Addition/Subtraction
• Multiplication
• Scalar * matrix
• matrix * vector
• matrix * matrix
• Row or column operations
• Inversion
•…
Matrix Transposition

If A is a m x n matrix, the transposition of A, say AT is a n x m matrix

you swap A [j, k] with A[k, j] once, or row becomes column


Matrix Addition/Subtraction
A + B or A - B: A and B must be of same shape, say, both n x m
add/subtract corresponding elements, e.g. C[j,k] = A[j,k]+B[j,k]
Scalar multiply Matrix
k*M: k is a scalar, M is a matrix
multiply k to every element of M

Example:
k= 2
M = [[1,2,3], [4,5,6], [7,8,9]]
Matrix and Vector Multiplication
M • V or MV
If M is n x m, V must be m x 1, and
the result is a n x 1 vector
if M is 3 x 2, V is 3 x 1  No MV
if M is 3 x 4, V is 4 x 1  MV okay
Several ways of calculating MV
based on dot product
based on scalar matrix
multiplication
How to perform matrix multiplication?

https://www.youtube.com/watch?v=OMA2Mwo0aZg
Practice
A is B is
Given Matrix A as follows
1 0 1 3 2 1
1 2 5 0 1 1 1 0 1
3 4 9 1 0 0 0 2 3
10 20 30
7 6 1
What is A * B?
Assume B is the Transpose Matrix of A
And the first element is marked as B[0][0] A * B equal to B * A?
What is B[1][2]?

Can we multiply A * B? if yes, what is the


shape of result matrix.

Can we multiply B * A? if yes, what is the


shape of result matrix?
Solving System of Equations
Examples Solving system of equations by elimination
and substitution, …
3x - y = 7
2x + y = 8
Solution: x = 3, y=2

How to solve?

4x - 3y + z = - 10
2x + y + 3z = 0
- x + 2y - 5z = 17
https://www.khanacademy.org/math/algebra-home/alg-system-of-equations/alg-
systems-with-three-variables/v/systems-of-three-variables
Solving System of Equations
Examples Solving system of equations by matrix inversion

3x - y = 7
3 −1 𝑥𝑥 7
2x + y = 8 A= X = 𝑦𝑦 B=
2 1 8
Solution: x = 3, y=2
AX = B
X = A-1 B
How to solve?
How to find A-1?
4x - 3y + z = - 10
2x + y + 3z = 0
- x + 2y - 5z = 17
Using numpy for matrix operations

Supplement lecture: Numpy.pptx


Solving System of Equations: numpy.linalg.solve
Examples
import numpy.matlib
3x - y = 7 import numpy as np
a = np.array([[3,-1], [2,1]])
2x + y = 8 b = np.array([7,8])
Solution: x = 3, y=2 x = np.linalg.solve(a, b)
print(x)

How to solve? a = np.array([[4, -3,1], [2,1, 3], [-1,2,-5]])


b = np.array([10,0,17])
x = np.linalg.solve(a, b)
4x - 3y + z = - 10 print(x)
2x + y + 3z = 0
- x + 2y - 5z = 17 [3. 2.]
[ 4.66666667 1.66666667 -3.66666667]
Challenge:
Linear Regression
What is regression?
Searches for relationships among variables
e.g. employee salaries depend on years of
experience, level of education, etc.

https://realpython.com/linear-regression-in-python/
https://realpython.com/linear-regression-in-python/#simple-linear-regression

https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-
examples-linear-model-plot-ols-py
Introduction to Numpy

Link: http://numpy.org
NumPy
• Stands for Numerical Python

• Introduces objects for multidimensional arrays and matrices, as well as


functions that allow to easily perform advanced mathematical and statistical
operations on those objects

• Provides vectorization of mathematical operations on arrays and matrices


which significantly improves the performance

• Many other python libraries are built on NumPy


• It provides

• ndarray for creating multiple dimensional arrays

• Standard math functions for fast operations on entire arrays of data


without having to write loops

• Internally stores data in a contiguous block of memory, independent


of other built-in Python objects, use much less memory than built-in
Python sequences.
NumPy ndarray
• We will focus on

• Concept: ndarray vs list


• Array creation
• Array access and operations
Array: Continuous storage
Storing 2-d arrays as ndarray
Storing 2-d arrays as ndarray

Location of data[1,2]: starting address + 1*column_size + 2


Numpy Array vs. Python List

NumPy-based algorithms are generally 10 to 100 times faster (or more)


than their pure Python counterparts and use significantly less memory.
ndarray Creation and Matrix Shape

data1 = [6, 7.5, 8, 0, 1]


arr1 = np.array(data1) #create 1-d array from
a list

data2 = [[1, 2, 3, 4], [5, 6, 7, 8]] #list of lists


arr2 = np.array(data2) #2-d array
print(arr2.ndim) #2
print(arr2.shape) # (2,4)
array = np.array([[0,1,2],[2,3,4]]) array = np.eye(3)
[[0 1 2] [[1. 0. 0.]
[2 3 4]] [0. 1. 0.]
[0. 0. 1.]]
array = np.zeros((2,3))
[[0. 0. 0.] array = np.arange(0, 10, 2)
[0. 0. 0.]] [0, 2, 4, 6, 8]

array = np.ones((2,3)) array = np.random.randint(0, 10, (3,3))


[[1. 1. 1.] [[6 4 3]
[1. 1. 1.]] [1 5 6]
[9 8 5]]
arange is an array-valued version of the built-in Python range function
Slicing
• One-dimensional arrays are simple; similarly to Python
lists

arr = np.arange(10)
print(arr) # [0 1 2 3 4 5 6 7 8 9]
print(arr[5]) #5
print(arr[5:8]) #[5 6 7]
arr[5:8] = 12 #assignment
print(arr) #[ 0 1 2 3 4 12 12 12 8 9]
Arithmetic with NumPy Arrays
• Any arithmetic operations between equal-size arrays applies the
operation element-wise:
M1 = np.array([[1, 2, 3], [4, 5, 6]])
print(M1)
[[1 2 3]
[4 5 6]]
M2 = np.array([[1,1,1], [2,2,2]])

print(M1 * M2) #This is NOT matrix multiplication


[[ 1 2 3] #multiply corresponding elements only
[ 8 10 12]]
print(M1+M2) #matrix addition
[[0 1 2]
[2 3 4]]
Operations on Arrays: Python List vs. Numpy Array
#Python list #Numpy Arrays
>>> arr = np.array([1,2,3,4,5])
>>> L = [1,2,3,4,5] >>> print(arr*2)
>>> print(L*2) [ 2, 4, 6, 8, 10]

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>>NM = np.array([[1,2,3],[4,5,6]]
>>> print(NM**2)
>>> M = [[1,2,3], [4,5,6]] [[ 1 4 9]
[16 25 36]]
#2x3 matrix >>> print(NM**0.5) #same as print(np.sqrt(NM))
>>>print(M**2) [[1. 1.41421356 1.73205081]
[2. 2.23606798 2.44948974]]
#Error!
>>>
Numpy Matrix Operations
#randomly generate a 3x3 matrix #matrix transposition #scalar multiply matrix
import numpy.matlib M1 =np.array([[1,2,3],[4,5,6]]) M1 =np.array([[1,2,3],[4,5,6]])
import numpy as np M2 = M1.T a =5
print (np.matlib.rand(3,3)) print(M2) M2 = a * M1
print(M2)

[[0.1609588 0.13422134 0.42905032] [[1 4]


[0.615976 0.6464524 0.29792046] [2 5] [[ 5 10 15]
[0.10608082 0.21442143 0.28125886]] [3 6]] [20 25 30]]

Tips: use the following to print all # of floats in ndarrays with only 2 decimal places
np.set_printoptions(formatter={'float_kind': '{:.2f}'.format})
Numpy Matrix Operations

#vector dot product #matrix multiplication #matrix vector multiplication


import numpy as np import numpy.matlib M1 =np.array([[1,2,3],[4,5,6]])
v1 = np.array([1,2, 3]) import numpy as np V = np.array([1,2,1])
v2 = np.array([4,5, 6]) a = np.array([[1,0],[0,1]]) M2 = np.matmul(M1,V)
v = np.dot(v1, v2) b = np.array([[4,1],[2,2]]) print(M2)
print(”v= “, v) print (np.matmul(a,b) )
[[4 1]
[2 2]] [ 8 20]
v= 32

Question: what will be


print(a*b)
Practice
A is B is
Given Matrix A as follows
1 0 1 3 2 1
1 2 5 0 1 1 1 0 1
3 4 9 1 0 0 0 2 3
10 20 30
7 6 1
What is AB?
Write code to calculate the transpose of A
AB equal to BA?
Numpy: Inverse Matrix import numpy as np
A = np.array([[4,3],[3,2]])
A_Inv = np.linalg.inv(A)
print (A)
print (A_Inv)
print (np.matmul(A,A_Inv))

[[4 3] #This is A
[3 2]]
[[-2. 3.] #This is Inverse of A
[ 3. -4.]]
[[1. 0.] #A multiplies Inverse of A
[0. 1.]]
Practice
• Using Matrix Inverse to solve the following system of linear equations

Write code (use numpy) to solve:

4x - 3y + z = - 10
2x + y + 3z = 0
- x + 2y - 5z = 17
Solving System of Equations: numpy.linalg.solve

Examples
import numpy.matlib
3x - y = 7 import numpy as np
a = np.array([[3,-1], [2,1]])
2x + y = 8 b = np.array([7,8])
Solution: x = 3, y=2 x = np.linalg.solve(a, b)
print(x)
a = np.array([[4, -3,1], [2,1, 3], [-1,2,-5]])
b = np.array([-10,0,17])
How to solve? x = np.linalg.solve(a, b)
print(x)
4x - 3y + z = - 10
2x + y + 3z = 0
- x + 2y - 5z = 17 [3. 2.]
[ 1. 4. -2.]
Numpy: linear algebra

More to explore
References
https://www.tutorialspoint.com/numpy/numpy_linear_algebra.htm
https://www.tutorialspoint.com/numpy/numpy_matrix_library.htm

Note: some Python 2 syntax in above references


Math Models for Data Analytics
How to revealing more information with advanced
math knowledge?
Digging for more information
• Scenarios
• Are we able to find some correlations among several “variables”?
• e.g. scores of a series of quizzes vs. how many hours you’re sleeping the night before?
• More examples?

• Are we able to forecast some future trends?


• e.g. a company's sales have increased steadily every month for the past few years, by
conducting a linear analysis on the sales data with monthly sales, the company could
forecast sales in future months.
• Let’s limit our discussion to some simple “linear” models
• Using linear algebra we just reviewed.
Correlation
• Types of Correlation
• Negative correlation
• When the price of a product increases its demand will decrease.
• Positive correlation
• And to the contrary quality supplied will increase with the increase of price.
• Details: see next slides

• How to find and visualize the correlation?


• Find correlation: linear regression
Challenge:
Linear Regression
What is regression?
Searches for relationships among variables
e.g. employee salaries depend on years of
experience, level of education, etc.

https://realpython.com/linear-regression-in-python/
https://realpython.com/linear-regression-in-python/#simple-linear-regression
Example

𝑓𝑓(𝑥𝑥) = 𝑏𝑏₀ + 𝑏𝑏₁𝑥𝑥

Or, y = 𝑏𝑏₀ + 𝑏𝑏₁𝑥𝑥

𝑏𝑏₀: intercept

𝑏𝑏₁: slope
Error Calculation
• Mean Absolute Error (MAE) • Mean Squared Error (MSE)

• Rooted Mean Squared Error (RMSE)


Implementation – Basic Steps
• Steps for Linear Regression Implementation
1. Import the packages and classes you need.
2. Prepare dataset(s)
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is
satisfactory.
5. Apply the model for predictions.
Python Implementation
• Step 1: import modules
import numpy as np
from sklearn.linear_model import LinearRegression

• Step 2: provide dataset(s)


x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
>>> print(x) #[[ 5] [15] [25] [35] [45] [55]]
>>> print(y) # [ 5 20 14 32 22 38]
• Step 3: create a linear regression model and fit with existing data
model = LinearRegression()
model.fit(x, y) #simply: model = LinearRegression().fit(x, y)

• Step 4a: Get result and validate the model


>>> print('intercept:', model.intercept_)
intercept: 5.633333333333329
>>> print('slope:', model.coef_)
slope: [0.54]
• Step 4b: validate the model
>>> r_sq = model.score(x, y) #mean squared error, i.e. R2
>>> print('coefficient of determination:', r_sq)
coefficient of determination: 0.715875613747954 #what is acceptable R2 value?
• Step 5: Perform prediction
>>> y_pred = model.predict(x)
>>> print('predicted response:', y_pred, sep='\n’)
predicted response: [ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]

#Another identical way of predicting results (use linear equation directly)


>>> y_pred = model.intercept_ + model.coef_ * x
>>> print('predicted response:', y_pred, sep='\n’)
predicted response: [[ 8.33333333] [13.73333333] [19.13333333] [24.53333333] [29.93333333] [35.33333333]]

#Predicting new values


>>> x_new = np.arange(5).reshape((-1, 1)) #[[0] [1] [2] [3] [4]]
>>> y_new = model.predict(x_new)
>>> print(y_new)
[5.63333333 6.17333333 6.71333333 7.25333333 7.79333333]
How well is the fitting?
• R-squared (R2) is a statistical measure of how close the data are to the
fitted regression line.

Many Youtube videos on calculating R2


Details omitted
What is acceptable R 2 value?
• Different scholars have different opinions on what constitutes as good
R square (R2) variance
• Some suggested in scholarly research that focuses on marketing issues, R2
values of 0.75, 0.50, or 0.25 for endogenous latent variables can, as a rough
rule of thumb, be respectively described as substantial, moderate or weak.
• In other fields, the standards for a good R-Squared reading can be much
higher, such as 0.9 or above.
• In finance, an R-Squared above 0.7 would generally be seen as showing a high
level of correlation, whereas a measure below 0.4 would show a low
correlation.
Examples of how Math (e.g. linear algebra) used in
data analytics (more in advanced DS courses)

Important message here: Math, very useful!

You might also like