Updates Regression Notes-1

*
legoneggggONB
-
1
*
&
&- C
&
We MLfor
use
estimating relationships.
Problem statement:
Prepare a marketing plan for a can
company
->. *
Manufacturers of the can
Making a marketplan Recommend which info of
·
model of the can

the Tomakethe most of sales
highlight
to
-
can
Engine Size of the can
Horsepower of the
·
can
Is there relationshipbetween info of the can and sales? Linearity !-

·
a
How is the relationship between info of the can and sales?

strong
·
data
represented
·
Which into contributes to sales? → can be
graph
•
line
accurately can we estimate the effect of each
·
How info on sales? on
How
accurately can we predict the sales? relationship btw
·
→
Clear
·
Is the relationship linean? the
were 9 that
graph
Geometrical understandingBuilding
will
a regieggion
be
" strain
moddappnx.net
th i
q.tk
e
try to find straight
a line
which best fits the data, so that
- -
-
-
↑ we can make predictions.
1 y
Why find best fitline?

So let's person have 4XOE, his predicted salaryis 20K USD.
say a
Basically asand we found that

we given an input ye 20.We are
mappings to y using this best fit line.
Our data already tells a
person have 3 yoe, his salary would be
around IOKUSD but model says his salarywould be around ISK usD, that's
why we want our

resulting line to be as close as to the
straight line.
80 Howdowe find this best fit line?
Firstsolution: Use
your drawing skills to draw a best fit line
How to get this? but there are
major defects and we can't really make
predictions from it.
Second solution: How this straight formed

line
mathematically?
y mab
=
is a mathematical equation of a straight line,
m isslope of the line and
b is intercept.
You alreadyhave earlier classes.
gone throughin your
So, your straight line is dependent on m and b, if we'reable to find best

islope and best intercept thatfits the data, we got our new line it
This line has its own
m and b, does fits

the data.
We
try to find best "m" and "b"
This line has its own m
and b, but does fits the data.
Dummy
This
Example
line has its own
mandb, but does not

fits the data
perfectly.
Horse Power Sales
140 16 f(HousePower) m
(House Power) y intercept
+ -
=
225 39
slope where it intersects
o 210 ⑧ on
y-axis
150 20 We also call this as
weights on
parameters.
200 18
F (House Power) m (House Power)

= +
intercept,
We need to find best fit line
HousePowen which fits the data, the resulting Let'sassume m= 3 and intercept 4) dummy)
=
line is have to find and b.

y mab. So we m
takeour function f(x) 3(n)+4, now we
=
so we as =
f(x) mx b, we
= +
need to build a function that maps "x"Horsepower to "y"sales. simplyputs and then we
get our
prediction
which is the sales.
Notice: here we are multiplying m with a so we can

say that there
is some
weight m
by which we are
multiplying with a
i
Howdo we findthe best &b?
·
One possible way to hit and trial different mandb values and see what works well.
but It will be hectic, Isn't it?and
very even impossible to do by hand.
Wehaveanoptimizationalgorithm
which is similar too persol but isone
it
·
Howtogee what wouldwell?
By Graphical >Just generalize graphical to numerical
Humenical Representation
-esp
60
55 ·
f(x) mx b
=
+
.
↑
.lModepned.valuelzi
value
input
50
e function to find
Residual Jenson between connectvalue & approximated value)
It
.
35
30
↑ Fop n
but the
=
approximated value on model predicted value is 47.

24,
ground truth y=38. So the ennon term on ennon between
8, is (y-y) (47-38) 9.
5 10 15 202530 35
So the ennon term is 9. Higher your enor
y,
=
=
ie, bad your approximate is!
Similarly take out the eis

we can 60
for every data points out of here in the data ⑳
55
-+2..:.
(yz yz) (44 45) 1
-
- -
= =
:
(y, y) (39 43) 4
-
- - =
=
18812e8 (54- y,) (43 43) 0

- < - -
=
of export terms.
:for all data plots
-
a we have no
ofour terms is in"

(, y) (n,
"
b. some eppor
ne) e
-
-
-
=
=
-
30
5 10
Improving a leageline
We have
negative sign, get we can rid of by:
it
squaring clesolute value
·
By squaring the no.

(5,-y,)
·
By taking out the absolute I, y, I

We have 'mno oferror terms, we can take average of m terms: We can also take out
average in 'I format
e e2 ez .... + em
=me; m(y-yaddings to remove"- "sing
the
+ + +
M
approx value
for averaging
Now, our ennor formula becomes:
Costfunction (Mean Squared Erroran
m(z -y,)
formulaconvention We will take out
Why is it useful:
the real reason of '2'
·
It's
way to evaluate the best fit line
J(m,b) =
- m(y,- y,) later on
It's
way to the performance
·
measure
add 2 of "convention"
how well
your model perform on data. cause
worked example of MSE
iostol werentnoteboidoiain".For1
De·
ennors
canaring
all the emorton see
sum of terms
0.5= 6.08 (Ans)
error
=
0.8(47) +
9.2 46.8
=
44 -
46.8 =
-
2.8
Pritcleout relationship relationships

study
Linear regression is statistical method that allows
and study relationships between two
us to summarize
Quantitative (continuous) variables/features.

data
algorithm
Explanatory, independenton Response, predictor variable gumage data
predictonvaluerecturers sales
model ofthe can
enzine size
-
horsepower
type of icelationshipg ·
Deterministicrectionghife
E F c 32
=
+
8 Relationshipbetween degree 'F' and 'C'
Ia
If you know the value of C; know the value F
exact.
you
Others:C:Axdiameters
8
These aredeterministic relationship, theequation exactly describes
relationshipbetween two variables.

·
the
18 2030 40 50
Celsius
We areinterested in statistical relationships, in which the relationship between the variables is not perfect.
Skin Cancer Mortality versus State Lalitude

Find a relationship
between skin cancer
mortality & latitude ↳
E
225
y 389.2
-
Highen altitudes of other northern u.s., the less exposed you'd be to the
= -
5.98x
harmful of the sun, therefore less risk. 200 ·

rays &
.....
·
175 ↑
But the relationship isn't perfect, we don't get exact value. ·
.......
·
It shows some trend. 150 ·
&
It also shows scatten relationship. ..
......
-
125
...
100
50
spitude (atcenter of state

30
Statistical relationghi
We are interested in
finding statistical relationship.
Estimating thecoefficient
Given:(x,y,), (12,y2), (13,33), .... (am,Tm(
We
try to find the bestfit line basedon given data which bestfits the data.
Why isthe bestfittingline?
Line which bestfits the data have in &
ennons
they are as small as possible in some overall sense.
We have a
way to achieve it by least squares criterion.
The form best fit line will take:h(ui) mx b

=
+
So, how to get good at ennous?
We best fit line via MSEor least square criterion.

↑It
can evaluate our
depends on a straight line.
work
Notationalchageg: inputvalueMin
I(m,b) on these
Best fit line depends on two values mb, the slope
Goal
h(x) mx bih(x) B,x B0↑ & an intercept.
+ +
=
=
intercept or biasterm find good slope & intercept

B. Po slope We need tofind mb thatminimize cost function.
panameters or feature
weight
Given:
B,x,+ B0 (for i-,......n)
y= We want to find p& Bo
How do we take out so & B,? & so the
resulting line is
Bo intercept calculation
= close as possible a points.
-
Bo y b,z
=
How do we take this?
n(xi -
x)(yi )
-
(2)
WTH!!!
b, i=
1
n(x, x/2
= -
i1
=
E(pixlyi
=values of the variable in a sample.
p =
the value of the variable

i mean of
Given:(1,1); (2,3);(4,5);(5,7)
x 3&
y4
=
(1 3)(1 4) (2 3)(3 4) (4 3)(5 4) (5 3)(7 4) -
( 2)(3) ( 1)( 1) (1)(1) (2)(3) 6

- -
1 1 6
+
!4
+ - - - + -
- - + - - + +
B=
+ +
+
=
= =
1
1.4
=
4 1 1 +4
(1 3)" (2 3) (4 3)2 (5 3)2 (2) ( 1)2 ( 1) (2)
+ +
- - + +
- + - + +
+ + -
Bo y b,z
=
-
h(x) p,x B0 =
+
4 1.4x3h(x) 1.4x1 (0.2) But what

+
- =
=
4 1.4 0.2 B. & Bo tells?

=
=
-
4.2 -
=
1.2
= -
0.2
Employing BI:
If,p,x0;
·
If, B,<0
·
independent variable & responsevariable will have 't'relationship. independentvariable & responsevariable will have' -
'relationship.
x:predictor
y:dependent variable
a increases X ↑
?
& >
V
lead What is the
y increases Y +relationship?
Eg. House price prediction.

f(size) Im -> price
↓ > ↓
->
Increase in size ofthe house will lead to increase in the price. h(VOF) B,VOE =
+
Bo
h(0) b,x0 30
+
=
Employing so:
The
average value of ywhenz is zero. I more on this in
assignment
Recope onfunction:
what we have
(x) B0+b,x h
hypothesis
· =
·
MSE:1) B0,B,) 1
=
m (h(x) y(i))2 -
·
in order to find
Goal:Minimise MSE Bo & B
We will see an
approach that is more common in
finding bestvalues of the parameters.
30 & B,
80, asimclele question,"Now to find box B=?
Idea #I:
Try out dienal values of BB, see if with that particular parameter, whether costfunction is
decreasing not.
your on
Initialize
your Bo
&p=0; &
you evaluate there, so, of the cost function will be super high. Right?
·
=0
other value of Bo B, like 1.4 82.4 & see if costfunction is

Now, you try out
respectively, decreasing.
·
And keepon
doing this until & unless we
get low enor
But this is deal.

a
Big
What to do?
We can
keep on trying but we should know a strategyto keep on trying!!Right?
whatwe want?
strategic
1
We want a
system which automatically tries out the values every iteration in such a way that
the iteration at nth value parameter's value' is better than before iteration.
whatif the
glialegy?
Whatis the plotof 22?? Ennon ·
·
B,
P,
So,Izm(...) B,
&So the plot of the cost function will also look like a
parabola. - p1
· B,
80, how thing is possible? "B,
B,
y
Using the concept of Differentiation.
X.
3,
p,
·
·How much changes when Xchanges?
Y
&
i+ This is what we want!.!
How much IPchanges when B changes?
In a nutshell we are
looking for this strategy only 11!
y dIPoE
tells the same like how much J(80)
same here well.
changes when so changes as
B. b. =
-
a7(80) Lets expand the equation.
So: =b.-a310) calculates the exact point using

derivative at the current points.
substracts the value scale it

by a
>
because we want to factor of a

minimise the function
Gadientdecentstepsare:
-
Bo &B1
Initialize
dy(pil obi (forje)
·
Calculate derivative
·
gradient at this point.

·
Scaling by it a factor ofa minimise it

by subtracting. (forio, 1) They both do same sort of work!!!
·
Repeat 283 points.
havviIcULLaLeccete
For how much time we can repeat: effectoNy
man.no. of iteration reached.
>Step size is smaller than the tolerance.
=
3.t(h(x)("' y(i)). B(h(x)

-
-
y)
B,i b, =
-
a3(8,) (h(x) y).xj

=
-
(h(u) y).xj
3
-
doing the same for other parameters. pji Bj x

-
=
isupdaterule
We can
technically do for another parameters: ozles)
Repeat until some conditions3 [ This for single
is
training example!!
pii bi+
-
aj(Pi)(foni 0,1) =
Repeatuntil some conditions

3 "
Pi. Bi
=
-
a
m (h(x) y(i))uj() (for every;
-
3
behind a:
story **is rate. a
learning
Whydoes
·
·
The first is the direction to move
How big the stepto take in?
Lecuingatevacationration
If the slope is large we want to take
your
Beta
a
in
large stepbecause we are fan from the min. If the slope is

internation
small we
?
want to take smaller step.
Hand woundedexample
Look at reference
google sheets provided in the resource.
Typeof Reguegneggion
Univerilate
Lineanegreggion:
You have only one input feature variables based on that we mapit to tanget variable.
f(x) y
only
one input feature
Multivariatelineonegreggion:
so want to build house price prediction:
you
of the house (X,)
3
size
No of Bedrooms (X2) information (X, X2, Xs

No of fans (X3) >
Map Price of the house!!
how we canextenditto Multivariatelineonegreggion?

If we have
a features then the hypothesis becomes h(x,x....... Anl
h(X) > 30 p,X, BXz
+ + +
... BnXn
+
↑ *extend it to multiple features. Vectonization

[X, X,,...., Xn] convent a scalan vector to
Ifwe expand Botp, X, .... B Xn nB.X. a vector
program
X,mn
risspl*( 2-:
-in,xn... imp
Y. Bu B,Xi, B2Xiz
=
+
=
+
X xxx,... X,p Bo
X2, Xx ....
Xii Xie.... Xip

X2p
...
+
BpXip
(X()
:
(x(m))
GIBlIm(XB
+
-
z
jT (XB-
T
Im
al
we take this out?
no of nows
IxP no of columns
Internet... generals.:::
* .....i =====
..
Matrix Distribution
P, P2. Bi.. Bp
-
1 yiXaxp
eg: cat recognise ->
-
1,..... it >
y,
- By y
= 1
eyes (X,) p, X2, X2. X2p y2

-> <
=
>
"i
:
. . . .
eans (x2) B2 X yz >S

= Xs1 X32. XsP
. . . .
mouth (3) -> B3

1.3(+1X
->
. . . . . . . . . . .
Xi, X..... Xip ->

g;
-
-PP(PTX
yonh(x) B0X0 b,X, b2Xz B3X3 -> data matrix
+ +
+
=
Xo
xnixn ... . Xnp
.
- Bj
hypothesis XI nXI
. ->
yn
y y bu b,X, B2Xz B3X3 PpXp
&I
-
...
+ +
+ +
Xo1
=
X 2 32
general regression equation

y2p - -
-.
- -
xxp-
I
1 X1x12... Bo
=
y=B0 B,xii p2Xiz BeXis ... BpXip

+ + + +
.. .
X2X2. . . X2P
!
1 ...
y=B b,X, B2X,z 33413 ... BpXTp

:xixi.......
+ + +
- 30 B,X21 BXx2 B3X23 .. BpX2p Xip

Bi
+ + + +
+
=
: 1Xn, Xne . . . ..
Xnp Pp
Y=8 B,Xn1 B2Xn2 BgXnst... +BpXnp
-.
( (X)
- -.
+
+
+
nx1 nx (p 1) +
y
nX I
j1B ·
regressionsent (p 1)x1 +
design matrix
Oh
data matrix
nx(p 1) +
J(B) 2m =
(h(x()) -
y(i)
let's break it down:
1
-
xxp-
I
X =
1 X11 x12 . . . . . . Bo
p B,x,y e
1XY- ↑ po
- -
-
-
-
X2X2. . . X2P X
!
1 ...
=
+
B,X3E
30 B,X21
:xixiz.......
+
xip Bi -
1X3. -
Bo +
i
-
-
1 Xn, Xne.....
XnP Pp i x- j B p,xx y,+
-
-
=
P b,X21 yz
- -
+
-
nx(p 1)
3 B,X3, Y.
+
+
-
E ()=(X5 ( p,xx y)
-
j)
+
-
-
=
=
(30 B,X21 y2)

+
-
↓(30 B,X3, y))

+
-
J(p) zm(X-B )(XB j)

- -
mn
Multiple linerRegregion
Hypothegig: Optini8CLion:
J(B) Em(XB j)(xB j)
1
-
Xp-
I
X =
1 X11 x
12 . . . . . . Po =
- -
X2X22 X2P
!
1
minz(B)
... . .
:xixi.......
: cost function
goal to minimise
Xip Bi
i Solve optimization problem:
Xn,Xne..... Xnp
........T)
1
- -.
Bp a. Initialize the parameters Btoto
algorithm to
-X b. Repeat s optimize the problem.
h(X) xB Bnew: Boid x[Xp8] poid vectorized Gradient descent !!
-
=
X
inputparameter H.W.!.
288 (83,18
matrix vector! =
Analytical solutionof IcheanRegueggion:
(x x)"xy
+
What if we obtain B
by simply plugging Xandy into
B Gradient descent
=
->
Requires multiple iterations.
y.
-
-
XX
-
X =
X+X!2... X,p y
= -
is not inventiblee > Need to choose X.
You may need to: Works well when his
Xi, Xiz...
large.
·
1
Xip ·
.
xn, xn.... Xnp.
i -in remove extra features to ensure thatan can support
incremental
learning.
Problem: Xo X
Closed form Solution
B (X x)'x+y
+
104 X 15 =
-
=
->
Non-iterative.
j =
20 17
>
No need on X.
if his
slow
large
·
30 110
Computing (XTx) is
roughly 0(n3).
"
40 1/2
-50 120
5x1 5x2
-
- -
F5 zt
1 -
+
Step1:XX xx
Foinviz-o in
I
=
1 =
-57
Step 2:
Determinant-
(XTX) = 1XFX/
1554 Adj(xTX) adj of
t=11 14
+
(X X)
-
=
5X718
=
-
54x54 674 = =
30(,x) 574 -78 54 707 -0.0e-

-
=
=
Step3:Compute XTY
1 1 10 =
-
- 1 -
y
15
x = 1 1
1
20
57101228 30
1970
- -
2x5
-
40
50
- -
5x1
Step4:B (x x)x y
+
+
0.030l---= 2070
-
1.87
=
-
0838
= --
y B0 b,X,
+
=
y 2.70 1.77(X) regression equation.

= +
<
Evaluating the Model

·RSE:
n(y, - y, p
· it measures the lack of fit of the model.
>Highen the bad the model is !!!

RSE
> R'statistic: -
·
say you wanna predicty without a corresponding X val. We can
simply predict the average of all values.
y
this not a best fitline.

-
ennor:11.1879
·we can use lineal

regression to bestfit the data.
so using the L.R. we removed considerable
amount of prediction error.
enpop
But how much?
>
A square measures how much pred.emnon

we eliminated.
Totalreduction 41.1879-13.7627
=
>41.1879 -
13.7627
27.4252
=
= 66.59%
41.1879 > we can represent as
·
it tells what %ofthepred.error in the is eliminated when least squares the
us
van we use
regression on a van
> Khan
academy def.
Revanienceexplained by the Model
total variance
O' represents that a model doesn't explain any variations in the bes. Yararound its mean,
100%representsthat a model explains all the variation in the response vanand and its mean.
Limitation of this method

It doesn't tell whether
you have chosen model good on bad.
· is
· It will tell whether the data spread are biased.

you
R-sq. isn't
> A
high on low
good on bad.
MulFiele LinearRegression:
91. Is atleast one of the predictors X, X,.....,Xp useful in pred the response?
> is there
any relationship
between
response predictor?
·
Conpelation used to measure the strength of the relationship between a variables.
Conpelation coefficient
> Is 1, a takes the valuebetween &
inclusivity.
n
2(xi
=
-
x)(y, z)
-
> The
sign indicates the direction of the Ir. relationship positively or negatively related.
#(xi
y( [(y, -
1
-
y)2 > A convelation of O means that there's no

relationship.
egi-look ateqiq of lecture reading material.

egi-look at eq: q-3
Returning back to own Question:
Is atleast one of the predictions X, X,....., Xpuseful in priedthe response?
hypothesis states that the model with no independent vans fits the data.
The null
~
The alternative
-
hypothesis states that the model fits the data better than intercept only model.
9Steps of doing hypothetic tests with L.Rmodels:
formulate hypothesis about the relationship between the predictor and response
1. variables.
e.g. hypothesize if there's a positive relationship between the prediction responsevar

&
> That the
predictor has no effect on response van.
Specify a
*
which is a statement about the value of a population that is assumed to be true.
null
hypothesis,
The
eg: null
hypothesis might be the slope of the regression line is zero.
which means that there isn't any effect of predictor on

response.
Hypothegig formulationforTest
In case of IR, the claim is made that there exist a relationship between response and predictor variables.
In the claim is represented
using the non-zero cookof predictor variables in the Inequations alternativehypothesis.
>
· Thus, the null hypothesis is setthat there is no relationship between response & the pred van. hence thecoel. for each feature.
if B0 B, X, B2X2 +
y
+
=
then null
hypothesis
0 0x(1 0 x42
y
+ +
=
for all predictor van, individual

hypothesis testing is done to determine wither the relationshipbetween response
& the particular predictor van about the significance.
Types of hypothesis tests which can be carried out in

regression models
·
"Test for significance
! -
of regression" t-test Ftest
"this test checks the "This test can be used to
significance of individual simultaneously check
negression coefficient" the
significance of of
no
h(x) Bo +B, X,
=
B2X overal significance begression equation

of coefficient.
regression
I can also be used to take out
on test individual coefficients.
Hypothegig formulationforf-Test
·Hypothesis test done around the claim that there is a linear regression model representing the responsevan & all the predictor van.
~ The null
hypothesis is that the I.R. model does not exist,which means that the value ofall the coefficient is equal to 0.
Determining the legt statistic

·Istatisticsfor
testing hypotheses related to individual coefficients:
Test for
significance by performing at test for the regression
Slope E
Ho.pl=e
P-value:
>Apval. is a statistical measurement used to validate a
hypotheses againstobserved data.
·
Aprval. measures the
probability of obtaining the observed results assuming research on value by yourself!
Hypothegig Testing Readingg
1
There are
ways to test the significance of a regression model,
several to using the mean of the response values as the predicted value for all
including that test, the Ftest, and the p-value. observations.
The test is used to determine wether the

regression coefficients for Here is an example of calculating the sum of squares regression (SSR) is a simple
each predictor variable are
significantly
different from zero. A coefficient linean
regression model with two predictor variable (a, and 2) and one response
different from zero indicates that there is a relationshipbetween
that is
significantly variable (y):
the predictor and response variables, and that the prediction can be used to predict x,x2y
the response. 1 2 5
3 7
3
The t-test is calculated as
where bis the coefficient for the ith predictor 4 I
bi
I variable, and SE(bi) is the standard ennon of 4511
SE(bi) the coefficient. The
begression equation for this model is:
y b b,x,
=
+ +
bzxz
sigTheFtestisusedto determinewhettentheolenallnegression modesto b, and by are the regression coefficients.

whereby,
del Assume that the estimated values for the coefficients are by 1, b, 2,bc 3
= = =
thatdoes not use

any predictor variables).
The F test is calculated as: The predicted response values for each observation can be calculated
using the regression
F
(SR) where MSR is the mean
square
of
equation:
regression, p is the number
j, 1 2x1 3x2 9,
= + + =
=
NSF) predictor variables in the model, MSE

is the mean
square epson;and n is y =
1 2x2 3x3
+
+
15,
=
the total number of observations. y= +2x3

1
3x 4
+
21,
=
The p-value is the

probability that the observed relationship between the yy 1
=
+2x4 3x5=
27 +
predictor and response variables could have occurred by chance, given that there is The mean of the observed response values is (5 +7 +9 11) /4
+
8.5
=
no actual relationship. A small

p-value (typically less than 0.05) indicates that
The SSRis calculated of the differences between the predicted
the observed relationship
is statistically significant, and thatthe regression as the sum
squared
model be used to makepredictions about the responsevariable. The
can p-value response values and the mean of the observed response values:
is calculated that statisticand the of freedom for the
using degrees regression
model SSR (9 =
-
8.5) (15 8.5) (21 8.5) (27 8.5) 0.25

+ - +
- + -
= +
6.25 12.25 18.25

+ +
37
=
In summary, that test, Itest and p-value are all used to evaluate the Therefore, the SSRfor this regression model is 37. This indicates that the
regression
significance
of a
regression model, and to determine whether the predictor model explains a significant amount of the variation in the response variable compared
variables can be used accurately predict the response variable. to using the mean of the response values as the predicted value for all observations.
what is thegum ofsquareg? winautic degrees of freedom cggociated

wil SSR?
regression analysis, the sum of squares regression (SSR)
In is a
measure of the variation in the response variable thatis explained

by the In
regression analysis, the sum of squares regression (SSR) is a measure of the
negression model. It is calculated as the sum of the squared differences variation in the
response variable thatis explained by the regression model. The
between the predicted response values from the
regression model and the degrees of freedom associated with the SSRare the number of observations in the
of the observed response values. data set minus the number of parameters in the model.
mean
regression
The SSR used to evaluate the improvement in the model's fitcompared to using For example, if a data set contains 10 observations and a
is
regression model has two
the mean of the response values as the
predicted value for all observations. parameters (the intercept and the slope), then the degrees of freedom for the SSR
It is calculatedas: 10-2 = 8. This means that the SSR
is calculated the variation in the response
are
using
variable for the S remaining observations, after the effect of the regression model has
SSR [(y,- h)
= where ,is the
predicted response value for the ith been removed.
observation, and is the mean of the observed
response values. Thedegrees of freedom for the SSRare used in calculating the mean square enor
TheSSRcanbeusedinconjunctionwiththesumofsquarestotalssiaseed (MSE), which is the average of the squared differences between the observed and predicted
values. The MSE
is calculated as
model. The SST

is a measure of total variation in the responsevariable,
and the SSEis a measure of the residual variation in the
response variable
MSE SPwhere of is the degrees of freedom for
=
the SSR.
that is not explained by the regression model. The are calculated

and SSE
SST
as: In summary, the degrees of freedom for the SSRare the number of observations in the
SSTECY,-us wherey,istheobserved nesponseralueforThe"e

data set minus the numbers of parameters in the
calculating the MSE.

regression model, and and used in
from the regression model. of used to describe the of independent in which

no
ways a
system can move on
change.
In summary, the SSRis a measure of the variation in the response

variable that is explained calculated as
by the regression model, and is
the sum of the squared differences between the predicted and mean response
values. It is used to evaluate the improvement in the model's fit compared
:
Whatit testfor testing
Linearregregion
hypothegig for
In statistic, a critical value is the value of a test statistic that is used to
In regression analysis, the test is used to determine whether the
negression coefficients for each predictor variables are significantly different
determine the significance of the observedresults. In hypothesis test, the
a
critical value is compared to the calculated value of the test statistic to

from zero. A coefficientthat is significantly differentfrom zero indicates
determine whether the null
hypothesis should be rejected in favour of the
that there is a relationshipbetween the predictor and response variables, and
alternative
that the predictor can be used to predict the response. hypothesis.
Thet-test is calculated as:
For example, in attestfor testing the significance of regression coefficients
bi) regression model, the critical value is the value of the statistic that
in a linear
t =
where b,is the coefficient for the i-th
predictor variable and SE(bi) is the
is used to determine whether a coefficient is significantly differentfrom zero.
standard error of the coefficient.
The critical value is determined
by the level of significance chosen for the
hypothesis test (e.g.0.05), the degrees of freedom for the test, and the type
of alternative hypothesis (one-sided on two-sided).
The statistic is then compared to a critical value from the distribution,
with a specified level of to determine whether the coefficient is
significance, In summary, a critical value is the value of a test statistic that is used to
significantly different from zero. For example, if the level of significance is determine the significance of the observed results in a hypothesis test. It is
0.05and the statistic for a predictor variable is than 2 (on less than
greater compared to the calculated value of the test statistic to determine whether
2)then the coefficient for thatvariable is considered to be significantly the null favor of the alternative
different from zero. hypothesis should be rejected in
hypothesis.
In summary, the t-test is used to test the hypothesis that the regression
coefficients for prediction variable are significantly different from
each
zero, and to determine whether the prediction variables can be usedto

accurately predict the response variable.
Here is an example of a testfor testing the significance of regression coefficients in a linear

regression model with two predictor variables (am2) and one
response
variable (y):
x,x2x3
125
2 37 The
regression equation for this modelis:I b b, x, ba where bo, b, and by are the regression coefficients. Assume thatthe estimated values
+ +
=
for the coefficients are

"
49
b1, b, 2 and bz 3.
=
= =
511
The standard errors for the coefficients can be calculated using the following formulas:
where SSEis the sum of squares endor n is the total number of observations,
SE(bo) squt(SSE (n
=
-
p 1)
-
(yn
x
M
+
/Sxx)) b is the number ofpredictor variables

SE(b,) squt(SSE / (n
=
-
p
-
1) x ( /SSxx))
1
his the mean of the observed response values
x (1/SSxx))
SE(b2) squt(SSE/ (n -p 1)
=
-
and SSxxis the sum of squares of the predictor variables.
For example, the standard ennows for the coefficients are: Thet-statistics for the coefficients can then be calculated as:
Ss(bo) squt (10/2

=
(1/4 (85)2/40)) 0.45
x + =
t(b0) 1/0.45 2.22

=
=
ss(b,) squt (10/2 x(1/40)) 0.15

=
=
t(b,) 2/0.15 13.33

=
=
ss(b2) squt (10/2 (1/40)) 0.15 t(bz) 3/0.15=20.00

=
= x =
The critical value for the statistic, with a level of significance of 0.05and two degrees of freedom (dfn p-)
is e.g. for a one-sided alternative
hypothesis and 2.01 for a two-sided alternative hypothesis.
Since the calculated statistics forb, and bare

greated than the critical value, the coefficients for these
predictor variables are considered to be significantly different from zero. This indicates that , and as can be
used to predictthe
response variable
In summary, that test is used to determine whether the

regression coefficients foreach predictor variable are significantly different from zero and to determine whether the
predictor variables can be used to
accurately predict the response variable.The statisticsfor the coefficients are calculated and compared to the critical value for the
-distribution to determine their
significance.
Whatis onegidedalternative hypotegig
1
If the p-value is less than a pre-determined significance level (e.g., 0.05), then the null
and twogidedalternative hypotegig? hypothesis can be rejected and it can be concluded that the observed results are statistically
significant.This means that the sample statistic is
unlikely to have occurred by chance and that
effective if the p-value
the drug is likely be
to in
treating the condition. On the other hand,
In a hypothesis test, the alternative hypothesis is a statementthatis tested is
greaten than the significance level, then the null hypothesis cannot be rejected and it cannot
against the null hypothesis. The alternative hypothesis is usually the opposite be concluded that the observed results are
statistically significant.
of the null
hypothesis, and is typically used to express the research hypothesis
hypothesis of interest. In of the statistical significance of the observed
on the
summary, the p-value is a measure
results in
hypothesis test. It is the probabilityof obtaining a sample statistic atleast
a
The alternative hypothesis can be one-sided on two-sided. A one-sided alternative as extreme as the one that was
actually observed, assuming that the null hypothesis is true.
hypothesis is a statement that specifies the direction of the difference on If the p-value is less than the pre-determined significance level, then the null
hypothesis
relationship between the variables, while a two-sided alternative hypothesis can be rejected and it can be concluded that the observed results are statistically
is a statementthatdoes not specify the direction of the differences on
significant.
relationship.
For example, in a test for testing the

in a linear
significance of regression coefficients
regression model, one-sided alternative hypothesis might
a be
that is null
hypothegig and
"the coefficient for variable a, is significantly greater than zeno, alternative hypothegig infestatistic?
while a two-sided alternative hypothesis mightbe "the coefficient for
variablea, is
significantly different from zero." In
regression analysis, the null hypothesis for the F statistic is thatall of the predictor
variables in the model are significantly related to the response variable. The alternate
In summary, the alternative hypotnesis is thatatleast one of the predictor variables is significantly related to the
hypothesis is a statement that is tested
against
the null
hypothesis in a hypothesis test. The alternative hypothesis can be one- response variable.
sided on two-sided,
depending on whether it specifies the direction of the
difference on relationship between the variables. The F-statisticiscalculated by dividing the mean square of the regression (MSR) by the
the mean square enson (MSE). Ifthe F-statisticis large, it indicates thatthe model explains
a
significant amount of the variation in the
response variable,and the null
hypothesis
Whatis Figtatistic for
estimating significance can be rejected in favor of the alternate
hypothesis. This means that atleastone of the
in regreggion? predictor variables is

significantly related to the
response variable.
In regression analysis, the F-statistic can be used to evaluate the overall To summarize, the null
hypothesis for the F-statistic in
regression analysis is thatall of the
significance of the regression model. This is done by conducting an Ftest, prediction variables are not significantly related to the response variable, while the alternate
which is a
hypothesis testthatcompares the mean square of the regression hypothesis is that atleast one of the predictor variables is significantly related to the
(MSR) to the mean
square ennon (MSE) response variable. The F statistic is used to evaluate the over all
significance of the
regression model by compaining the mean square of the regression to the mean square
The null
hypothesis for the F-test is that all of the predictor variables in the error. If the F-statisticis large, it indicates that the model is significantly related to
related to the response variable. If the p-value for the response variable and the null be rejected favor of the
model are not
significantly hypothesis can in
the F-statistic is less than a pre-determined

significance level (e.g., 0.05), then alternative
hypothesis.
the null
hypothesis can be rejected and itcan be concluded that at least one of the
predictor variables is significantly related to
the response variable.
In other words, the F-statistic is used to determine whether the

regression
model as a whole is significantly related to the
response variable. This is useful
because it allows significance of the model, bather
you to evaluate the overall
than just the significance of the model, rather than just the significance of
individual predictor variables.
By using the Fstatistic, you can determine
whether the model is a
good fit for the data and whether it can be used to
make reliable predictions about the response variable.
Whatis the p-value?

The p-value is a
probability measure thatis used in statistical hypothesis testing
to determine the
significance of the observed results. It is the probability of
obtaining a sample statistic that is at least as extreme as the one that was
actually observed, assuming that the hull hypothesis is true.
For example, suppose researcher is
a
conducting a study to examine the
effectiveness of a new
drug in treating a certain condition. The researcher collects
data on the effectiveness of the drug in a sample of individuals and calculates
a sample statistic (e.g. the mean improvement in symptoms). The researchen then
compares the sample statistic to the hypothesized value under the null
hypothesis (e.g. no improvement in symptoms) and calculates the p-value.

Asquiationof lineauegregion
limeality assumption
The linearity assumption in linear
regression refers to the assumption in linear regression refers to the assumption that the relationshipbetween the dependent
variable and the independent variables is linean
This means
you plot the data on a graph, the points should form a straight line. If the points
thatif do not form a
straightline, then the relationship between the
variables is non-linean, and a different type of statistical model might
be more appropriate.
For
example, suppose you want to model the relationship between a person's age (the independent variable) and their income (the dependent variable).
If plotthe data on a
you graph, and the points form a straight line, then you can use linear regression to model the relationshipage and income.
However, if the points do not form a straight line, then the relationshipbetween age and income is non-linean, and you might need to use
a different type of statistical model.
In summary, the linearity assumption in linean regression refers to the assumption that the relationshipbetween the dependent variables and the independent
variables is linear. If this assumption is not met, then a different type of statistical model
mightbe more appropriate.
Suppose we have the following data:
X 4 X Y
2
1
"
1
2 2
3 38
42
48
7
5 18
AssumptionFegt
V
L
viqual insection
One approach is to create a scatterplot ofthe
concelation mation
Another approach is to use statistical tests to assess
Linequity
You can
test
linearity
also use a test, such as the
Goldfeld. Quandt test, to
predictor variables and the response variable, and the
linearity assumption. One common method is to formally test the linearity
the plots to see ifthereis a line an the Pearson conpelation coefficient assumption.
visually
inspect use to measure
relationship. If the relationshipappears to be the

strength of the linean relationshipbetween the
non-linear, you can try transforming predictor predictor variables and the response variable. A
variables and he
plotting the scatterplots to see if the conpelation coefficient of 0 indicates no linean
relationship becomes more linear. relationship, while a coefficient of 1 indicates a
perfect linean relationship.
Remedieg
Eagfomationof value. L
Interactionterring adifferent
rging
Transforming the predictor variables:If the relationship Adding higherorder terms on interaction terms: Using a different model:Ifthe linearity
between the predictor variables and the response variable You can
try adding higher on den terms on interaction assumption cannot be satisfied using the
appears to be non-linear, you can try transforming the if it improves the
terms to
your model to see
linearity above methods,
you may need to consider
predictor variables to see if it improves the linearity assumption. using a different type of model, such as
assumption. For example, if you have a predictor a nonlinear
regression model on a
For example, try taking the log, square root, variable that appears to have non-linean relationship
you can a
generalized linean model.
square of the predictor

with the
on variables. response variable, you can try adding a quadratic
term (a) to model to see if it improves the fit.
your
It's important to note that it's not

always necessary For example, if the linearity assumption is violated and the modelincludes a predictor variable thathas a non-linean
to satisfy relationshipwith the response variable, the estimatedcoefficients for that predictor variable may not accurately
the linearity assumption in order to build a good
predictive model. However, if the linearity assumption is reflect the true relationshipbetween the variables. This can lead to inconnect predictions and a pooper fit of the
significantly violated, it can lead to biased and inaccurate model to the data.
estimates of the model parameters, so it's worth considering
these remedies to see if they can improve the model fit.
Independence aggumption
The independence assumption for linear
regression refers to the assumption that the easons (the differences between the predicted values and the observed values)
are independent of each other. In other words, it assumes that the ennows are not related to one another and are not influenced
by previous empons.
Violating the independence assumption can affect the validity of the model and the conclusions drawn from it. If the endors are not independent, it can result in biased
estimates of the model coefficients and inflated standard endors, which affect the precision and of the model.
can
accuracy
One
way to test for independence is to plot the residuals (the differences between the predicted values observed values)
and the
against the predicted values.
If the residuals are bandomly scattered around a horizontal line, it independent of the predicted values. This can provide some
suggests that the endors are
evidence that the independence assumption has not been violated.
Another test for test, such as the Durbin-Watson test. This test compares the autoconelation of the residuals to a known
way to independence is to use a statistical
value. If the test statistic is close to the known value,it suggests that the errors are independent.
It's important to note that these tests are not foolproof and it is still possible for the independence assumption to be violated even if the tests do not detect any issues.
It is always a good idea to
carefully
examine the data and consider the context in which the data was collected to ensure that the independence assumption has not
been violated.
An example of the independence assumption

being violated in a linear regression model mightbe a study examining the relationshipbetween the amountof time a
studentspends and their exam performance. If the researchers collected data on the amount of time each studentspent
studying and their
studying corresponding
exam score, but the students were allowed to collaborate and share notes, this could violate the
independence assumption. In this case, the endows (differences between
the predicted exam scores and the actual exam scores) would not be independent, because the students' exam scores would be influenced by each other's study habits
and notes.
Anotherexample
mightbe a study examining the relationshipbetween the price of a productand the numbers of units sold. If the data was collected over a period of
time, and the price of the productwas changed during thatperiod, the (differences between the predicted number of unit sold and the actual number of units
endors
sold) would not be independent. This is because the number of units sold would be influenced
by the price of the product,which changed over time.
In both of these
examples, violating the independenceassumption
could lead to inconnect conclusions about the
relationshipbetween the
predictors and outcome variables, and could result in inaccurate predictions
on estimates. It is important to carefully consider the contextin which the
data was collected and to check for violations of the independence assumption
before drawing conclusions from the data.
Homogoedagticity
Homoscedasticity is a statistical assumption that states that the variance of the error team (also known as residuals) in a
regression model is constant across
of predictor variables. In other words, it assumes that the amount of endor
all values in the model is the same
begandless of the value of the predictor variables.
This is an important assumption because itallows us to use statistical techniques that rely on the assumption of constant variance, such as t-tests and F-tests,
to test
hypotheses about the relationships between the predictor and response variables in the model.
In layman's terms, homoscedasticity can be thought of as the assumption that the spread on dispension of the residuals (ennors) is constant across all values
of the predictor variables. If the varience ofthe residuals is not constant, it is said to be heteroscedastic, which can lead to problems when
using certain
statistical techniques to
analyze the model.
It's important to homoscedasticity is not always present in real-world data, and it is often necessary to check for and connect for
note that
heteroscedasit icity in order to properly analyze a
regression model. There arevarious techniques that can be used to check for and adness
heteroscedasiticity, such as transforming the data on using weighted least squares regression.
Aggumption
Normality
thenormality assumptiononlineannegressionnetenstotheassumptionthat theepnonsthedifferencesbetweenthepredictedvalues andresound
the mean and fewer endows at the extreme ends of the distribution,
Violating the normality assumption can affect the

validity of the model and the conclusions drawn from it. If the endors and not normally distributed, it
can result in biased estimates of the model coefficients and inflated standard endors, which can affect the precision and accuracy of the model.
One
way normality is to plot the residuals (the differences between the predicted values and the observed values) and lookfor a bell-shaped
to test for curve.
If the residuals are

normally distributed, it suggests that the normality assumption has not been violated.
Another Anderson
way to testfor normality is to use a statistical test, such as the
Darling test on the Shapino Walk test. These tests
compare the
-
-
distribution of the residuals to a normal distribution and be used to determine whether the
provide a p-value, which can normality assumption has
been violated.
No Aggumption
Multicollinearity
In the context of multiple linear regression, the no multicollinearity assumption refers to the assumption that there is no
strong linear relationshipbetween
the predictor variables. In other words, itassumes that the predictor variables are not highly correlated with each other. This is an important assumption
because it allows us to interpret the coefficients of the predictor variables in the regression model as measures of the unique effect of each predictor on
the response variable, predictions constant.
holding all other
In
layman's terms, the no
multicollinearity thought of as the assumption that the predictor variables are independent of each other, on
assumption can be
that
they do not have a strong linean relationship. If the predictor variables are highly conselated, it can be difficult to disentangle their individual effects on
the response variable, which can difficult to interpret the results of the
make it
regression analysis.
It's important to note that multicollinearity is a common problem in regression analysis, and it is often necessary to check for and address multicollinearity
in order to
properly analyze a multiple linean regression model. There are various techniques that can be used to check for and address multicollinearity,
such as
examining the variance inflation factor (DIF) on using ride regression on LASSO regression.
No
Aggumption Example
Multicollinearity
suppose you are a marketing research analyst and you want to understand the factors that influence sales ofa particular product. You collect data on
several potential predictor variables,
including the price of the product, the advertising budget for the product, and the number of stones that can by the product.
You then fit a multiple linean regression model to predict sales as a function of these predictor variables.
In this, the no
multicollinearity assumption would be important because you want to be able to interpret the coefficients ofthe predictor variables as
measures of their unique effect on sales,
holding all other predictors constant. If there is a linean
relationship between the prediction variables (e.g.,
strong
if the
advertising budget highly
is corbelated with the number of stones that
canny the product), it can be difficult to disentangle their individual effects
sales, which make it difficult to interpret the results of the
on can
regression analysis.
To check for multicollinearity in this case,

you might examine the correlation matrixof the predictor variables and for
strong correlations between the
look
variables. You could also compute the variance inflation factor (NIF) for each predictor variable, which measures the extent to which
multicollinearity is present
in the model. If the VIF values are all below a certain threshold (e.g. 5 on 10), it is likely that multicollinearity is not a significant problem in the model.

Updates Regression Notes-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Updates Regression Notes-1

Uploaded by

Copyright:

Available Formats

*

model of the can

Is there relationshipbetween info of the can and sales? Linearity !-

How is the relationship between info of the can and sales?

↑ we can make predictions.

Why find best fitline?

Basically asand we found that

why we want our

predictions from it.

Second solution: How this straight formed

So, your straight line is dependent on m and b, if we'reable to find best

This line has its own

m and b, does fits

and b, but does fits the data.

mandb, but does not

F (House Power) m (House Power)

line is have to find and b.

Notice: here we are multiplying m with a so we can

approximated value on model predicted value is 47.

ie, bad your approximate is!

Similarly take out the eis

for every data points out of here in the data ⑳

18812e8 (54- y,) (43 43) 0

ofour terms is in"

By squaring the no.

By taking out the absolute I, y, I

formulaconvention We will take out

Pritcleout relationship relationships

Quantitative (continuous) variables/features.

8 Relationshipbetween degree 'F' and 'C'

relationshipbetween two variables.

Skin Cancer Mortality versus State Lalitude

harmful of the sun, therefore less risk. 200 ·

spitude (atcenter of state

The form best fit line will take:h(ui) mx b

So, how to get good at ennous?

We best fit line via MSEor least square criterion.

intercept or biasterm find good slope & intercept

How do we take this?

the value of the variable

(1 3)(1 4) (2 3)(3 4) (4 3)(5 4) (5 3)(7 4) -

( 2)(3) ( 1)( 1) (1)(1) (2)(3) 6

4 1.4x3h(x) 1.4x1 (0.2) But what

4 1.4 0.2 B. & Bo tells?

Eg. House price prediction.

other value of Bo B, like 1.4 82.4 & see if costfunction is

But this is deal.

80, how thing is possible? "B,

i+ This is what we want!.!

How much IPchanges when B changes?

a7(80) Lets expand the equation.

So: =b.-a310) calculates the exact point using

substracts the value scale it

because we want to factor of a

gradient at this point.

Scaling by it a factor ofa minimise it

3.t(h(x)("' y(i)). B(h(x)

a3(8,) (h(x) y).xj

doing the same for other parameters. pji Bj x

Repeatuntil some conditions

large stepbecause we are fan from the min. If the slope is

No of Bedrooms (X2) information (X, X2, Xs

how we canextenditto Multivariatelineonegreggion?

↑ *extend it to multiple features. Vectonization