Professional Documents
Culture Documents
4. Linear Regression
Thien Huynh-The
HCM City Univ. Technology and Education
Jan, 2023
Course Contents
1. Introduction (1 week)
2. Intelligent Agents (1 week)
3. Machine Learning: Fundamental Concepts (3 weeks)
1. Feature Engineering
2. Linear Regression
3. Overfitting
4. Machine Learning Algorithms (3 weeks)
1. K-nearest neighbors Note: This is the tentative contents and may
2. K-means clustering be changed depending upon the teaching
3. Naive Bayes classifier progress.
5. Neural Networks (2 weeks) Note: There are no final exam of this course,
6. Deep Learning (4 weeks) therefore it has one reserved week for
marking
1. Convolutional Neural Networks
2. Applications with codes
• Homework assignment:
• Send code to teacher by email before deadline
• Over deadline is zero (no negotiate)
• Code with detailed comments (to rate how much students understand their codes)
• Run codes and record the laptop screen to demonstrate your codes (even it may be error)
• Final project:
• Send code to teacher by email before deadline
• Over deadline is zero (no negotiate)
• Code with detailed comments (to rate how much students understand their codes)
• Run codes directly on student’s laptop and show to teacher
• Answer question from teacher individually
• There are many names for a regression’s dependent variable. It may be called
an outcome variable, criterion variable, endogenous variable, or regressand.
• The independent variables can be called exogenous variables, predictor
variables, or regressors.
• In regression set of records are presented with X and Y values and this values
are used to learn a function, so that if you want to predict Y from an unknown X
this learn function can be used.
• In regression we have to find value of Y, therefore, it is necessary to learn a
function in this case
• Easy implementation
• The linear regression model is computationally simple to implement as it does not demand
a lot of engineering overheads, neither before the model launch nor during its
maintenance.
• Interpretability
• Unlike other deep learning models (neural networks), linear regression is relatively
straightforward. As a result, this algorithm stands ahead of black-box models that fall short
in justifying which input variable causes the output variable to change.
• Scalability
• Linear regression is not computationally heavy and, therefore, fits well in cases where
scaling is essential. For example, the model can scale well regarding increased data
volume (big data).
• Let’s consider a dataset that covers RAM sizes ram capacity cost
and their corresponding costs. 2 12
• If we plot RAM on the X-axis and its cost on 4 17
the Y-axis, a line from the lower-left corner of 8 31
the graph to the upper right represents the 16 68
relationship between X and Y. cost
• Mathematically these slant lines follow the 80
following equation
70
60
y = m*X + b 50
40
• Where X: dependent variable (target) 30
• Y: independed varible 20
0
0 2 4 6 8 10 12 14 16 18
50
termed the independent variable in statistics. Variable 40
x represents the input information provided to the 30
model at any given time. 20
• p0 = y-axis intercept (or the bias term). 10
where
• N: total number of observation (data points)
• yi: actual value of an observation
• mxi + b = prediction
• Along with the cost function, a ‘Gradient
Descent’ algorithm is used to minimize MSE
and find the best-fit line for a given training
dataset in fewer iterations
where is the vector of weights (or model parameter) that we need to determine.
is the ground truth and is the predicted output. is the bias
• With regression, we expect the error between and is minimum, that means
• Loss function
• This function can be written shortly by matrix, vector, and norm as follows
where and
• The loss function should have derivative
• Find the optimal can be achieved by solving the derivative function
• With every matrix , there just exists one value that has norm as minimizing
, where is is pseudo inverse of
• The pseudo inverse of a matrix always exists
• When the matrix is squared and reversible, pseudo inverse is inverse in this
case
w
# height (cm)
X = np.array([[147, 150, 153, 158, 163, 165, 168, 170, 173, 175, 178, 180, 183]]).T
# weight (kg)
y = np.array([[ 49, 50, 51, 54, 58, 59, 60, 62, 63, 64, 66, 67, 68]]).T
# Visualize data
plt.plot(X, y, 'ro')
plt.axis([140, 190, 45, 75])
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.show()
Now using the model to predict the weights of two persons having heights of 155 and 160 cm.
The obtained function as y = 0.5592*x - 33.7354