Regresión Lineal

n
X
O
O ˇ/ D arg min
.O˛; X
.yi ! ˛ ! 2ˇxO i / :2
Regresión
WeWe remark
remark Exercise
(OLS)Othis
.˛;
that
that this argymin
3.7 i D.˛;ˇ/
Dcalculation
Find
ˇ/ calculation theC
.y
˛holds
holds !for
values
i ˇxfor C
!O
˛i any
˛
anyand
ˇx
" ;
finite
finite
i : Dthat
i / iˇ 1; :minimize
value : of
: ; n:
of s
sXX .. the sum o
O iD1 XX
s ˛O and ˇ are actually estimates of aniD1intercept and a slope, respectively,
.˛;ˇ/
on Exercise
Exercise
line fitted
Determinar 3.7
to 3.7
los Find
data the
parámetros
Find f.x
the ; y
values
/g n
values ˛by
O˛O and
andthe
y ˇO
ˇOleast
that
que
that minimize
n
squares
minimize
X the sum
method.
sum of squares
More
of squares
i
One has to understand that ˛O and i O
ˇ are random variables
iD1 since they can 2 be expresse
eestimators
has to understand
minimicen bethat
can la suma˛O and
expressed
de O
ˇas are random
los cuadrados .y ! ˛ ! ˇx
variables sincei they can bei expressed/
as functions of random observations xiX and
n y . Random variables ˛O and ˇO are calle
X
n
unctions of random observations xi and yi . Random variables
i iD1 O
˛ and ˇO are called
2
estimators of the true unknownn (fixed) parameters .yi i!!˛˛ !
.y ! and
˛ ˇx
ˇx ii //2ˇ.
mators of the true unknown (fixed) X parameters ˛ and ˇ.
The estimatorsO can be obtained by differentiating
iD1 2 the sum of squares (3.1) wi
The estimators.˛; canDbearg
O ˇ/ min by.ydifferentiating
obtained i ! ˛ ! ˇxi / :the sum of squares (3.1) with
iD1
respect to ˛ and ˇ and by.˛;ˇ/
looking for a zero point of the derivative. We obtain
pect to ˛ and ˇ and by lookingiD1 for a zero point of the derivative. We obtain
Pn Xn
P@n iD1 .yi ! ˛ ! ˇx 2 i/
2
X n
@ ˛OiD1
nderstand that and O are
.yiˇ! ˛ !random D !2 since
ˇxi / variables !˛!
.yi they ˇxbe
can i/ D 0;
expressed (3.
@˛ D !2 .yi ! ˛ ! ˇxi / D 0; (3.2)
@˛
of random observations xi and yi . Random iD1
iD1 variables ˛ O and ˇO are called
Xn X n
the true unknown (fixed) parameters ˛ X and
n!1 ˇ. X
!1n
Dn
˛ !1 yi!1
!n ˇ xi ; (3.
ators can be obtained by differentiating
˛Dn theyisum
! n ofˇsquares
xi ; (3.1) with(3.3)
iD1 iD1
and ˇ and by looking for a zero point ofiD1 the derivative.
iD1We obtain
andP
n 2 n
X
@ iD1 .yiP! ˛ ! ˇxi /
P@n niD1 .yi ! ˛ !Dˇx !2
/ 2 .y
n i !
Xn ˛ ! ˇx / D 0;
i (3.2)
@ iD1@˛.yi ! ˛ ! ˇxi /2 i iD1DX!2 .yi ! ˛ ! ˇxi /xi D 0: (3.
@ˇ D !2 .yi ! ˛ ! ˇxi /xi D 0: (3.4)
@ˇ n iD1
n
XiD1 X
X X X X X X
xi / 2
0 D y x ! n !1
y x C n!1 ˇ (3.4) xi ! ˇ
DSustituyendo
Substituting!2
for ˛ .yi ! ˛ ! ˇxi /xi D 0: ! i
leads to i i i
2
Xn X n Xn
iD1 n
iD1X iD1 Xn iD1 iD
iD1
!1 !1 !2 2
D yi xi ! X
nn
y!1
i X n
x i C
n
X n ˇ !1
x
n
X i ! ˇ Xn
x2 i :
D
Solving
0 y x
the ! n y
i i above equation
i x C n
i in ˇ gives the
ˇ x i ! x
following
ˇ i : estimate
iD1 iD1 iD1 iD1 iD1
iD1 iD1 iD1 iD1 iD1
!1
Pn Pn Pn
bove
Solvingequation
the above in ˇ gives
equation in ˇthe gives following
the ! n estimate
following estimate
iD1 yi iD1 xi ! iD1 yi xi
n Resolviendo
n para nˇ D 2 ! P n " P
X X P !1 P
P X n P n Pn !1 P X n
n
x
2
! n
x 2
!1 n y x ! y
iD1 x 2 i iD1 i
yi xinCˇnDiD1
!1 n ˇ yi iD1 !P
n
iD1
x
i
ixi P !
iD1
" ! i
Pˇn
iD1
iD1y i xxi i
ii P : P
iD1
ˇ
iD1
D ! P n n
!1
iD1 "n
iD12 x
2 n
i Pn yiD1
! n
i x
iD1i x!i n
2 !1 n
y i
n
x i
!1 iD1 iD1 iD1
n Pn iD1 xi D!1 ! PnP iD1n
xP
2
i n2 ! Pn "2
yi xi ! n yi xiD1! xi n!1 x
Pn D Pn !1 Pn !PP iD1 iD1
iD1 n "i iD1 i
n ˇ gives the following y
iD1 i i x ! estimate
n
iD1 xi ! iD1
2 y
nPn i iD1iD1
!1 n
xi xi
2
D Pn Pn2 !PniD1 yi"x2i ! nNyxN
P P iD1 x P
iD1
i ! y i xni
!1
! DnNy x
N P iD1n x i2
!1 n
y n D Pn n
x ! y 2xi iD1 i
x ! nN x 2
iD1 i P iD1n i iD1 i x 2
iD1 ! nNxi
! Pn "2 yisxP
iD1 i ! n nN y x
N s XY
n !1 D P x n D! XY
: x 2 D :
iD1 i iD1 2 i
x
iD1 iXX s
2
! nN x sXX
Pn !1
P n P n
iD1 y i x i ! n sXY iD1 yi iD1 xi O O sXY
Hence, theHence,
sumDof squares :
the sum is minimized
of squares foris ˛ D O
˛
minimized D yN ! ˇ x
N
for and ˇ
D DO ˇD D y
N ! . O xN and ˇ
Pn 2 sXX!1
! P n " 2
˛ ˛ s XX ˇ
iD1 xi !n iD1 xi
X
96
Por tanto la suma de cuadrados3 se minimiza con:
Moving to Higher Dimensions
3 Moving to Higher Dimensions
minimized for ˛ D ˛O D yN ! ˇO xN and ˇ D ˇO D sXY
sXX
.
An unbiased estimator 2b!s2 of ! 2 is given by RSS=.n ! 2/.
sedyN estimator
D The ! DofˇO!D isXYgiven
ˇO xN and b
! following ˇ
2
. by RSS=.n ! 2/.
relation holds
sXX between (3.35) and (3.37):
wing relation holds between (3.35) and (3.37):
n
X n
X n
X
2
n
X .yi !
X D
ny/ .b y/n2 C
yi ! X y i /2 ;
.yi ! b (3.38)
2
.yi ! y/
i D1 D y i !i D1y/2 C
.b !b
.yii D1 y i /2 ; (3.38)
i D1 Total variation DVarianza
Varianza Explained variation
i D1 C Unexplained variation:
i D1
Varianza
Total variation
Total D Explained variation
Explicada C Unexplained
No Explicada variation:
The coefficient of determination is r :
2
determination is
cient of Coeficiente r 2
:
dePdeterminación:
n
2
y i ! y/
.b
2 i D1 explained variation
Pn r D D " (3.39)
Pn 2
total variation
y i ! y/.yi ! y/2
.b
i D1 i D1 explained variation
Varianza Explicada
r2 D D " (3.39)
P
n
total variation
.yi ! y/ increases with the
The coefficient of determination2 Varianza Total
proportion of explained variation
i D1 (3.27). In the extreme cases where r 2 D 1, all of the variation
by the linear relation
is explained by the linear regression (3.27). The other extreme, r 2 D 0, is where
ficient
the of determination
empirical increases
covariance is sXY Dwith0. the
Theproportion
coefficient of
of explained variation
determination can be
earrewritten
relationas(3.27). In the extreme cases where r 2 D 1, all of the variation
r D i D " (3.39)
P
n i D1
total variation
.yi ! y/2
m of squares, thei D1
minimum in (3.28), is given by:
Residual Sum of Squares
ient of determination increases
X n with the proportion of explained variation
ar relation (3.27). In the extreme
RSS D .yi ! b
y i cases
/ 2
: where r 2 D 1, all (3.37)
of the variation
d by the linear regression
i D1 (3.27). The other extreme, r 2 D 0, is where
cal covariance is sXY D 0. The coefficient of determination can be

Otra manera:
P
n
y i /2
.yi ! b
2 i D1
r D1! : (3.40)
Pn
.yi ! y/2
i D1
), it can be seen that in the linear regression (3.27), r 2 D rXY

2
is the
he correlation between X and Y .
11 For the above pullover example, we estimate
Ejemplo: Venta de pullovers y = Ventas
ear Model for Two Variables x97= Precio
Datos = pullover.dat
Regression of sales Pullover Data
rice (X2 ) of
The overall mean is
he dashed line
zoom 190
Ventas
Varianza Varianza
Sales (X1)
Total No Explicada
180
Varianza
Explicada
170
90 95 100
Price (X2)
Precio
Regression of X5 Swiss bank notes
ner frame) on X4 12
ner frame) for
ank notes
enuine
bank 11
a simple
vector linear
of observations
model and on
the the response
analysis of variablemodel
variance and acan databe ma
vi
ce nce model
model
dlanatory
byRegresión can
can
p explanatory be be viewed
viewed
variables as as
xa a
respectively. Let y .n " 1/ and X .n " p/
icular casevariables.
of a more
lineal An important
general
múltiple linear application
model where of the
the developed
variations of the
one
ofthe
he variations
variations
observations ofofonone
one variable
variable
the responsey y variable and a data matrix on the p
ares
ely.
y. fitting.
explained
Let
Let y y
.n by
"
.n The
"p
1/ and
1/ idea
explanatory
and X X is
.n to
"
.n "
p/approximate
variables
p/ x by a linear
respectively.
y Let combination
y .n " 1/ and
Sean An important
variables. y applicationel vector
of thede observaciones
developed theory is de
thela
least
X a ,
eg.andvector
i.e. ayO of
2
data observations
C.X /.
matrix The
on theon
problemthe response
is to find O
variable
ˇ 2 R pand
sucha data
that matrix
O
y D
ble and
The a
ideadata
is matrix
to on
approximate the
p p
y by a linear combination O
y of columns
variable
lanatory
f developed
in /.
yC.X theThe respuesta
variables.
least-squaresAn y la
important
sense. matriz
The de
application
linear observaciones
of
modelthe developed
can be con
theory
written p
as
e2he developed theory
theory is the
is theleast
least O
problem is to find ˇ 2 R such that yO D X ˇ is the best
p O
variables
ares fitting. explicativas
The idea is to o predictores
approximate y by respectivamente.
a linear combination El
Oo
y
ear
near combination
e modelocombination
least-squares yO y
sense.of
O columns
of columns
The linear model can be written as
,such
i.e.that
yOthatlineal
2yOC.X se O escribe
The en
problem laisforma:
to find O 2 R p
such that O D O
Xpsuch yD /.
O DX X O
ˇ ˇis isthethe best y D X ˇ C ";
best ˇ y X ˇ
fcan
an ybein the
written least-squares
be written as as y sense.
D X ˇ The
C "; linear model can be written as
(3.50)
ere " are the
es el error errors. The least squares solution is given by ˇ:O
he errors. The least squares (3.50)
(3.50) y DisXgiven
solution ˇ C ";by ˇ:O
Para estimar O los parámetros > usando mínimos cuadrados
>
debemos ˇ
O D arg
resolvermin el.y ! X
> problema:
ˇ/ .y ! X ˇ/ D arg min "
O ":
ere
ven
ˇgiven" are
by
D argby ˇ:the
minˇ:O errors.
.y ! X The
ˇ/
ˇ least
.y ! Xsquares
ˇ/ D solution
arg min " is
>
":given by
ˇ ˇ: (3.51)
ˇ ˇ
> >O > >
min
gˇmin" "": X ˇ/ .y ! X ˇ/ D arg min " ":
ˇ":D arg min .y(3.51)
!(3.51)
ˇ ˇ
ˇ
It follows immediately
D C that C @x D a@x
@a axij xi xj
>2 xk j ;
kj@
immediately
@xk that@xk> @xk @xk @x AxjD1D
iD1 jD1 D a:
@a x
e gradient vector definition from the introductory section on terminol- @x
D a: @a >>
x @x>x @x
@xof partial >derivatives @a x @a
hwhich
element of the
is just thevector
kth element x 2Ax. @x is
of vector
@a equal
a: to @xdifferentiating
DSimilarly, k
D ak .
mmediately
Using
ferentiating
that
the above two with
properties, D
we a:
respect
have @x
to k gives for the last formula
thexfollowing
@x !P
p Pp
Similarly, differentiating
!P@a> xP2 >
@ x Ax
"
@2Ax P @x >
Ax P@ iD1 j
rly, differentiating
@x> Ax @
p
iD1 D
p
jD1a:aij xiDxj @.:/ D@a 2A:
kk x 2 @ i6Dk aik xi xk @ j6Dk akj xk x
D
k
D @x @x@x> @x!>P Dp Pp C " @x C @x
@x @x!>P P @x
@ k " k a x x @xk
@x ij i j @xk
,Exercise
differentiating @x Ax (idempotent)
p p iD1 jD1
2.6 Show @x @
>that a projection
Ax iD1D jD1 aijwith xi xjmatrix
respecthas to eigenvalues
xk gives only in
gives
kthe set f0; 1g.Demostración:D @x @x
@x ! Ppwhich Pp is just "
@x the kth element of vector P P
P > @ P jD12 aij xi xj> p 2Ax. aik xi xk
A is a @x Ax
projection matrix ifiD1
A D A D A . Let
X be @.:/
an eigenvalue@a kk x 2
of A @and i6D k @
with
kk xk respect tok axikkD
xgives
i xk j6Dk akj xkthexj above two
@a 2 @ i6D @ Using ! i properties,
D k
we Chave the " i following C f
ect to
its@x x C gives
corresponding
k C
@x@xeigenvector: @x D 2 a x ;
kj j @xk @x k @x k
k k @x
Pk jD1 P p
to xk gives @.:/ P
@a kk x 2 @ A" i6 D k
P
aik!x"i xk @ j6Dk X apkj@xkxxj Ax
2 > X @2Ax
@.:/ @akk xk D
2 @ k
i6Dk Ca x x
ik i k i @D a C
j6Dk which
i i x x
kj k j is just the kth element D 2D of akjvector
xj ; D 2Ax.
2A:
kth element
D of vector
C
@xk P @xk 2Ax. P C @xk D 2
@xkabove a x
kj j>; >
@x @x @x A 2
" D ! @x
A" Usingp the @x@x two jD1 @x
properties, we have the f
/ve ktwo@a xproperties,
kk k
2 k @ we
i6Dkhave
a x the
x
ik i kk following
@ j6Dk
i for
a x ithe
kj k jx klast
i formula
X jD1
D C C D2 akj xj ;
k
which
@xk
is just the
>
@xk
kth Exercise
element
A"@x
of 2.6
i D
k !i A"
vector Show
2Ax.i that a projection (idempotent)
jD1 @2 >
x Ax matrix
@2Ax
@2
x
ust the kth element DAx @2Ax
of vector 2Ax.
the
D set
2A: f0; D1g. 2 D D
the Usingtwo
above the
@x@xabove two
>
properties,
>properties,
@x we have
A" i wei have
the
! " i
following theforfollowing
the last for the last@x@x
formula formula @x
> >
the kth element of vector 2Ax. >
above
ow thattwo properties,(idempotent)
a projection A !is "
we have the matrixa projection
D
following
i i !
has
2
"
i for matrix
i the last formula
Exercise
eigenvalues 2.6
only if A
in
Show
D A
that
2
a
D A . Let(idempote
projection
!i be an
2 >
@ x Ax @ x Ax 2 @2Ax
2 > its corresponding
@2Ax eigenvector:
D ! i DD !D : the
i 2A: > set Df0;2A:
1g.
2 >
@ x 2Ax >@2Ax @x@x > > @x
ion matrix if A D A >D2 A @x@x >
D . Let @x
!D
> i be2A:an eigenvalue A ofaAprojection
is and "i matrix if A D A2 D A> . L
gIteigenvector:
is obvious that ! i D !i only
@x@x @xif !i is equal to 1 or 0. A"i D !i "i
X Y.
> O > !1 >
ssedO as
as.Yˇ!D
ressed X.X
as
We ˇ ˇ OD
ˇ/, Di.e.,
define
> .X
X !1
>
/ X
.Xthe X / >
!1
X/ Y.
function X >
X fY.Y.
.ˇ/ D .Y ! X ˇ/ >
.Y ! X ˇ/, i.e.,
Y !Consideramos
X ˇ/> .Y ! X ˇ/,la función:
i.e., >>
> nethethe
function function
function
> f .ˇ/ f f
.ˇ/
.ˇ/ DD
> D .Y ! X ˇ/ .Y .Y ! X
X >ˇ/
ˇ/> .Y
.Y!
! X!X X ˇ/,
>ˇ/,
ˇ/, i.e.,
>
i.e., > >
i.e.,
Y C ˇ X X ˇ: f .ˇ/ D Y Y ! 2ˇ X Y C ˇ X X ˇ:
! 2ˇ > X > Y f
C ˇ
D
> >
Y
X>> X ˇ: >
Y ! >X > >Y C ˇ >> >>
f .ˇ/
Theforf D.ˇ/.ˇ/
minimum D
>
Y zero Y
Y !ofofY !
f .ˇ/>2ˇ
can> X
YC
be Y
foundC>
ˇ by ˇ X>X XX
X searching
ˇ:ˇ:
X ˇ: for the zero of its derivative
arching the 2ˇ itsX derivative
nd by of
nimum searching
f .ˇ/ can for
be the zero
found by of> its derivative
searching >for>the zero of its derivative
mum of f@f.ˇ/ .ˇ/ can be >found by
>
@Y Y ! 2ˇ X Y C ˇ X X ˇ searching for the zero of its derivative
Xofˇf .ˇ/ can be >
D found by
>
searching for the zero
D of
!2X its
>
Yderivative
C 2X >
X ˇ D 0:
>D > >!2X @ˇ Y>C>2X X>ˇ D @ˇ0:
>
C/ ˇ X X
@Y> Y ! 2ˇ> X> Y >C ˇ >X >X
ˇ > ˇ
>D @Y Y ! > 2ˇ D> X!2X Y C> Y ˇC> X2X X Xˇ D
ˇ D!2X0: >
Y
> C 2X >
X>ˇ D 0:
YD YItDe ! 2ˇ
followsdonde X Y encontramos
that the C@ˇ ˇ X ˇ,
solution, OXhasˇ tolasatisfy >O DY
Dsolución
!2X C> 2X > !1 X>ˇ D 0:
Y. 0:
@ˇ D !2X Y C 2X XXˇ D
ˇ .X X /
ˇO D .X Let> us X /now !1 >
@ˇXverify Y.
O>has !1
that we have found the minimum by calculating the second
O D .X > XO/!1 X > Y.
hat
oesatisfythe O
solution,
ˇ Dofby
derivative .X ˇ,
the function to
X / Xf .ˇ/ satisfy
>
Y. ˇ
in Osecond
the point>ˇ: !1 >
minimum
at theverify
solution, Ocalculating
ˇ, has tofound the
satisfytheˇ minimum
D .X X by X Y.
/ calculating
now
found the that
O
minimumwe have by O
calculating >the second
!1 > the second
wsolution,
:of verify
the function
ˇ, has
that fwe to
have
in
satisfy
2 found
the
f point
D
ˇ the
O .X > X/
minimum
Y C
X
by
> Y.
calculating the second
erify
point that O
Comprobamos
ˇ: we
.ˇ/
have
@
found
.ˇ/
que
the
D
ˇ:
@.!2X
efectivamente
minimum
O
2X
by
X ˇ/
es
calculating
D un2X mínimo
>
the
X : second
the function f .ˇ/ in@ˇ@ˇ the >point ˇ: @ˇ
>
unction
X X ˇ/ @ 2f .ˇ/ in the point
f .ˇ/ > @.!2X > ˇ:
Y O C 2X >
X ˇ/
> >
D 2X X : >> >
X Y C@22X X
f .ˇ/> X has Dˇ/ Y C >
X D 2X X : >
The @ˇ@ˇ
matrix @.!2X
D full
2X rank,
X@ˇ: therefore
2X the
ˇ/ matrix > X X is positive definite and,
@ˇ O D > > D 2X X :
@ f .ˇ/
2hence,
@ˇ@ˇ > indeed the Ylocation
ˇ is @.!2X C 2X@ˇ of the X ˇ/ minimum of > the residual square function f .ˇ/.
xmatrix D>
X has>XfullXrank, therefore the positiva
matrix D >2X X :
X X is positive definite and,
@ˇ@ˇ is es >
definida
positive @ˇ definite and,
fore
indeed the
hasthe the
fullmatrix
location X of
rank, therefore X the is positive
minimum
the matrix ofdefinite
the > and,square function f .ˇ/.
residual
Xm of residual square function f .ˇ/. X X is positive definite and,
e minimum
ndeed the locationof the residual
of the minimum square function
of the f .ˇ/. square function f .ˇ/.
> residual
fitted
2.47).value yO D X ˇ D XO .X X /> X!1 y >D iD1 Py is the projection of y onto
The fitted
The value
least yO D
squares
O D
X ˇresiduals
> X .X!1 X>/ X y D Py is the projection of y onto
are
as computed
/iduals are in (2.47). ˇOD .X >X / !1X >y: n (3.52)
n
C.X / as computed in ˇ (2.47).
D .X X / X y: X X(3.52)
he leastThe El least
valor
squares estimado
residuals are para y se calcula:
!1O !1
value yODDy X O
squares residuals
ˇXO D OD X .X e
> D y
!1
are
! y
>O ˛
D D
y ! n X ˇ D Qy y iD
D Py is the projection of y onto
! .I n n ! ˇP/y: x i ;
y ! yO ! ˇ Qy >X D /.I!1nX!>yP/y:
d value yO D X ˇ D X .X X / X y O D Py is iD1 the projection of yiD1 onto
omputed in (2.47). e D y e!DyO yD! yyO ! D X
y ˇX
! DˇOQy D QyDD .In.I! P/y:
! P/y:
computedThe in (2.47).
vector e isarethe projection of y onto the orthogonal complement of C.X / n
st squares
ection of residuals
onto the se
Elyresiduo orthogonal complement of C.X /.
ast squares residuals are calcula:
vector The is the e3.5
e vector
Remark projection
is the
A projection
linearof ymodel onto
of y onto the
with orthogonal
thean orthogonal
intercept complement
complement
˛ can also of
of C.X
C.X be /./.written
modelP with D
framework.
e an
y ! intercept
O
The
y D ! ˛ ˇOcan
approximating
y X D Qy also D
equationbe
.I written
!is:
P/y: in this
ark Remark
3.5 A 3.5
n elinear
D yA!linear O
model
y D y model
!withX ˇOwith
2 D
an an D intercept
intercept
Qy X .Inn !
n
˛ ˛can
P/y: canalsoalso be be written
written inin this this
imating equation
@framework.
iD1 .yThe i !approximating
is: ˛ ! ˇxi / equation is:
ework. The
Un approximating
modelo
r e is the projection of lineal
y y onto
D
equation
˛con
Cthe ˇ D
la
x !2
is:
intersección
orthogonal
C " " " C .y i
complement
ˇ x ! C˛ "! of
I ˇx
i D
C.X i /x i
/.
1; : D
: : ; 0:
n:
or e is the projection @ˇ of iy onto the1orthogonal i1 complement
p ip i of C.X /.
C ˇ1 xi1 C " " " C ˇ yip Dxip˛ C C "ˇi1 xI i1i CD" "1; " C: :ˇ:piD1
;xn: ip C "i I i D 1; : : : ; n:
5 A linearyi model D ˛ C with
ˇ x anC intercept
" " " C ˇ x ˛ip C can"i also
I i Dbe1; written
: : : ; n: in this
3.5 A This linear canmodel
be written withas:an intercept ˛ can also be written in this
1 i1 p
k. The approximating equation is:
uting
rk. The for
This can˛beleads
writtento
approximating equation is:
as:
can besewrittenpuede as: escribir con este yesquema D X " " en la forma:
ˇ C "
yyi DD˛˛CCˇˇ1"xxi1"C " " " C ˇ x C " " I "i D 1; : : : ; n:
""" " C ˇp xip C "i ˇI iCD" 1; : : : ; n:
D
i y D X 1 ˇi1 C
n C n
p ip
y
" n
X
"
i
n
! 2 n
X "
where"X D .1n X!1 X y D X X ˇ
/ (we add a column of ones C " X
to the data). We have by (3.5 X
ebewritten as: !1 2
we add
con
0 D
where
written
a
Xas:
column
D y .1
i x
of
ni ! n
X
ones
/ (we
to
add
the
a y
column
data). i We
of x ones
i
have
C n
to
by
the data).
ˇ
(3.52):
We x
havei by ! (3.52):
ˇ x i
! "
e X " D .1iD1 n X / (we add a column iD1 ! " of
iD1˛ones
O to the data). We
iD1 have by (3.52): iD1
y tenemos
! " que:yyˇD OD O
" Xˇ"ˇD
"" ˛"
O"C " D
"> .X ">
" !1 X">/ X
" !1 ">
y:
O
˛ ! DX " ˇ O
CD ˇO
" .X X / X y:
ˇO " D O D .X "> X " !1 ">
/ O
˛ X ˇ y:
the above ˇ equation ˇO "
D in
"D .1n X / (we add a column of ones to the data). ˇ gives
D .X the
">
X "following
/ !1 ">
X We have
y: estimate
by (3.52):
D .1Example
n X / (we add a column O of ones
3.15 Let us come back to the “classic blue” to the data). We have by (3.52):
pullovers exam
SE(β
can
r eachalso1 ) be
is increase
$1,000 used to inperform
television hypothesis
advertising, tests
thereon0will
H the
: β 1be= an0 average
hypothesis
etween
ostaverage,
common 42falland 53 units.
somewhere
hypothesis between
test 6,130 and testing
involves 7,940 units.theFurthermore,
null test
crease
evidence in sales of between 42 and 53 units.
ran
eachalso Determinando
be
$1,000 used
increase
Standard errors versus to la
in perform
television significancia
hypothesis
advertising, therede
tests las
on
will
can also be used to perform hypothesishypothesis
variables:
the
be an average
tests on the
crease
nd
ost Y in
common.sales
In of between
hypothesis 42 and
test 53 units.
involves testing the null testnull hypoth
efficients.
Supongamos The most
que common
queremos hypothesis
saber test
si la variable Xinvolves
influye
H :entesting
β Y.̸=
Standard errors can also be used to perform hypothesis tests on the hypothesistest
a 1 the
Usamos
0,
hypothesisnull
una prueba
here is no
de hipótesis
pothesis
eefficients.
in order of relationship between X and Y
The most common hypothesis test involves testing null
(3.12) the null test null
pothesis
since
ofLarelationship
if β 1 = 0 then the model (3.5) reduces
hypothesis to Y =
hypoth
β
hipotesis nula:
-statistic,
here is no
e hypothesis H0 : There
not isbetween
no
associated
t-statistic
X and
relationship
with Y Y
between
. To test X(3.12)
andnull
the Y hypothesis,(3.12)
null
alternative
hypothesis
we n
No
H0 : There hay
is relación
whether β̂ , entre
our
no relationship
1 Xbetween
eY
estimate Xfor β
and 1Y, is sufficiently
hypothesisfar from
(3.12)
e
erehypothesis
rsusisthe alternative
some hypothesis
relationship between X and Y . (3.13)
alternative
be
La hipotesis confident
alternativa: that β1 is non-zero. How
rsus the alternative hypothesis
far is
hypothesis
far enough
alterna
hypoth
alternative
re is(3.14)
someH relationship
: depends
There
Hay is between
some
relación on the X and Y
accuracy
relationship
entre X e Y . of β̂1 —that
between (3.13)
X and is,
Y . it depends
(3.13) on SE
s corresponds to testing
a hypothesis
Ha : There is some
small, thenrelationship between X
even relatively and Yvalues
small . (3.13)
of β̂1 may provid
De otra
corresponds
athematically, to manera,
thistesting más operacional:
corresponds to testing
athematically,H that β
0 : corresponds
this β1 = 01 ̸
= 0, and
to testing hence that there is a relationship betw
way fromH0 : contrast, β1 = 0 if SE( H0 :β̂β11) = is 0large, then β̂1 must be large in absol
for us to Hreject
0 : β1 = the0 null hypothesis. In practice, we comp
e
rsus
expect Ha :given β1 ̸= by 0,
rsus
m. The t-Ha : β1 ̸= 0, H H: βa : ̸=β10,̸= 0,
n the model (3.5) reducesa to1 Y = β0 + ϵ, and Xβ̂1is− 0
ximately En la práctica se calcula la t - estadística con: t = , usando
nce
nY the
nce .ififTo
ββmodel==0 0the
11test
then
(3.5)
then null
thethe
reduces model to (3.5)
hypothesis,
model (3.5) wereduces
Y reduces
= βneed
0 +
to ϵ, Yto toandYβ0 X =
=determine βis
+ ϵ,SE( +β̂ϵ,1X)and
0 and is X is
ttYaassociated
imate. simple
associated
To for testβwith1
with
the, is .YTo
null . hypothesis,
To
Ysufficientlytesttest
thethe nullnull
far we
from hypothesis,
need zero
hypothesis, to we we
determine
that needwe need
tocan to determine
determine
la distribución
hether β̂ , our - which
t con n-2
estimate for grados
measures
β , is dethe libertad
sufficientlynumber para
far ofdeterminar
fromstandard
zero ladeviations
that probabilidad
we can de
that
1mate
hetheris β̂for
non-zero. β
11, our 1 , estimate
is sufficiently
How for
far βis far
1 , is from
1 sufficiently
enough? zero that
farThis fromof wezerocan
that we can
course
que |t| tome
to or cierto valor
confident
confident
1racy of β̂1that
is non-zero. thatβHow
—that 1β is
0.1isis, non-zero.
Ifnon-zero.
there
far
it is farHow
depends How
really faris
enough?
on farno
is
SE( β̂is1enough?
far
This ).farIfofenough?
relationship This This
β̂1 ) of
course
SE( between
is courseof course
X and Y ,
Roughly
pends
pends ononˆthetheaccuracy
accuracy of of
β̂1 —that
β̂1 —that it ˆdepends
is, itis,depends on SE( onβ̂ˆ1SE(
). Ifβ̂SE(
1 ). β̂
If1 )SE(
is β̂1 ) is
tiona bevariance
normality
ofi a)thenormally !
errors,
2
2
has
assumption to
the
be
distributed, of
t-test
3.6
estimated
the
for
by
errors,
N
2 the
(0, Multiple
anσ 2
estimator
the I
hypothesis which
t-test
), for
! O 2 Linear Model
ˇ
that
means
the
D
will bethat
hypothesis
0 works
giventhe
ˇ
as
below.
D 0 Under asof t
expectation
works
Var(e
follows.
= σ
normality (and therefore
assumption the
of the errors,
! same for
the t-test all e i
forof ),
theand hypothesis ˇ D 0 works as
2 (3.31),
tandard iserror
0, E(e (SE)
i ) =of
Var. 0,
the
ˇ/Othe variance
estimator
D ′ is
: is
the Var(e
square i ) =
root σ (and therefore
(3.31) the same for al
at Cov(e
One , ei ′ ) = 0the
follows.
i computes all in̸=
forstatistic " sXXi .2The assumption of a ′ . The assum
it
etostatistic follows
One from
computes Var(e)
the =
The
statistic σ I that
simple Cov(e linear , e ′ ) = 0 for
i i model and the analysis of all i ̸ = i
construct confidence O intervals O and test of! hypothe-
error normal
(SE) of distribution
the SE. ˇ/ D fVar.
estimator isisthe
required
ˇ/g
square
1=2
Dto construct
root of (3.31), :confidence intervals (3.32)and test of
ut these assumptions and the
ses statistics. More
particular
details
procedures
about
.ncaseˇ"Oto
these
O of
/ atheir
sXXtest 1=2 more
ˇ assumptions and the procedures
general linear model to
sample of data tare O
ˇ areinexplained
explained t
Sect. D
11.9. (3.33)
validity O on D
the basis
O 1=2 ofDhypothesis !
a given sample
t D SE.
SE.
O
ˇ/ by
Oˇ of pdataexplanatory
are (3.33) inthe
explained variables (3.33) x res
Sect. 11.9.
an use SE.formula
this ˇ/ D fVar. to test
ˇ/g the
O be .n that
: ˇ/ D 0. In an (3.32)
application
f β is2 obtained by SE.ˇ/ a vector
" s / 1=2
of observations on Dimensions
the response
nceand ! has The to beleast squares
estimated by estimate
an estimator XXof β!O 2isthat obtained
3 will be
Moving by
given to below.
Higher Under
rejects
and the
rejects hypothesis
la the hypothesisa unatnivel
a a5de
at %%significancia
5 significance
significance level
level
5% if ifjj t t j#j#t0:975In!2
t0:975In!2 , , where
where the the
mality
thisβ̂
Rechazamos
=
97.5 (X
formula%
′
X)
assumption −1
quantile
to X
test
hipótesis
′
of
of y.
the the
the errors,
Student’s
hypothesis
explanatory
thethat t-test for0.the
distribution
D
variables.
In
del
hypothesis
(11.25)
′
an is −1
si,
clearly
An
′
application ˇthe important
D 0 works
95
the % asapplicati
critical value
othesis
donde at
97.5ela% 5 %
cuantil significance
quantile
97.5% of de level
thelaStudent’s
distribuciónt if
tn!2
n!2 ˇj
de tβ̂ j#=
distribution
Student t
(X t X)is
es el X
,
clearly y.
where
valor the 95
críticothe
%al critical
95% value
para la
ws.
has for
to the
be
hecomputesfor
Student’s
prueba two-sided
estimated
the
de two-sided
dos by test.
an
test.For
estimator
distribution
entradas. For
Para, n squares
#
n #
b̌jisE(O
!
30,
30,2
this
that
this
clearly
estefitting.
can
will
can
puede bebe
be
the
The
0:975In!2
n-2
replaced
given
replaced
95
ser idea
by
below.
by
% critical
reemplazado 1.96,is
1.96,
Underto
the
the
value
por approximate
97.5
97.5
1.96, % % quantile
quantile
este es y
nenormal
rd error distribution
of each
the 96
tn!2 with
coefficient
statistic mean is given
β̂) = βby and the covari-
square root of the 3 Moving
diagonal
assumption
of Itofcan
el the theof
normal be
theshown
errors,
distribution.that β̂An
the t-test follow
for
Anestimator
of , thea normal
i.e. hypothesis
!OO ofof !distribution
will be0Theworks
given with mean
asthe
inthe
problem E( isβ̂)to=find
following. β anˇ
normal 2 2
dedistribution. estimator !Oy will
ˇD be given in following.
2
st. For n # 30, thisO can be replaced
cuantil 97.5% la distribución X 2 C.X
by 1.96, the 97.5 % quantile
normal. /.
f the matrix
ance Var.
matrix Var(
ˇ/. Inβ̂)standard
= σ 2 I situations,
as the variance ! 2
of“classic
the error "
bution. Example
ExampleAn 3.10
estimator
3.10
An
Let
Let−1us!O
unbiased
us
2
ofapply
apply ! fit
2 of
the
thewill
noestimator
linear
ˇy
linearbe
O in
giventhe
regression
regression
2
of
least-squares
in model
model
2the
is2isgiven
(3.27) to
(3.27) to
following. by
sense.
the
RSS=.n
The
the “classic linear
blue” m
blue”
!on2/.the
wn.
putes For
Nthe linear
statistic
∼pullovers.
Para p
σ =1,
pullovers.
(β, 2
(Xmodel
′
un
TheX) with
estimador
sales). intercept,
manager
The sales manager believest Dsesgado
believes one
b
! that
thatmay
de there
there estimate
! (11.26)
está adado
′
is(Xa X) it
porby
strong
−1 dependence (3.33)
strong). dependence on the
The following relation β̂ ∼
O (3.27) N (β,
holdstobetween σ (3.35)blue”and (3.37):
us apply the
En general linear regression SE.
model ˇ/ the “classic
E( β̂)
s manager= β);
Note that more
believesβ̂ details
is unbiased
that about
1ˇO
there (11.26)
(since
is a E(
strong can=dependence
β̂) beβ); found
more pones the
details laabout y
dimensiónD de
(11.26) X ˇ C
can
22 tD > (3.33)
estimator
ejects the of O
!
hypothesis
σ D
is at a 5 %X n .y
significance
O ! y/Olevel .y
X if
n !j t O
y/;j# 2 t X ,nwhere the
in Appendix C.6. n ! An
SE.
.p Cunbiased
ˇ/ 1/ estimator
2
of σ is 2 0:975In!2
2
% quantile of the Student’s tn!2 .y distribution
! y/ Dis clearly.by the
! 95
y/ % C critical .y value
! b
y i ;
/
)he
e two-sided ê ′ ê
test. For #X30, 1′ where
this
i
can !be
n" are
replaced the byerrors.
′
i
1.96,, the The least
97.5the squares
i
% 1quantile ! n solut
hypothesis
= dimension at
2 a 5 %
(y
=
n−significance
β̂)i D1(y − X
level β̂)if j
ê 2t j#i D1t ê ê
(11.27) where i D1 2
seile
the of In testing D . we reject
0:975In!2
the hypothesis at the
pσ̂+= !=will in = êi .
normal ˇ. ˇ i 0
−distribution.
ofnthe (Student’s 1) tn!2 An estimator
−
nn distribution
− (
Total
( p
p +
+ 1)
1)!O isof
2
variation
2
jclearly
D nthe
− be95given
p
Explained
( %+ 1) thevalue
critical nfollowing.
variation− ( p
C + 1)
Unexplained v
e level ˛ if jtj " t
ided test. For n # 30, this can be replaced
1!˛=2In!.pC1/ by 1.96,
ˇ . More general issues on testing>linear
i=1
O Dthearg min
97.5 % !
quantile
.y X ˇ/ i=1! X ˇ
.y
mple 3.10 Let us apply the linear regression model (3.27) to the “classic blue”
addressed
lvers.
distribution. in Chap.
An 7.β̂ and
estimator !O ofare will there
be residuals.
given
is ain strong
the following.
2 2 ˇ
he data as ê = y − X called
! that
the data as êis=
The errors arecoefficient
estimatedof
The sales manager believes
The from
determination r 2y: − Xβ̂ and are called resid
dependence on the
at all
are βi ’s are
different from different
zero, and from we zero, andcan
therefore we conclude
thereforethat canthere
conclude
is an that there is an
etweenSometimes,
El any
test global
x
ssociation betweeni and one
(no
y. is interested
incluye
The null in
el intercepto)
hypothesis whether
is
any xi and y. The null hypothesis is a regression model is useful in the sen
that all βi ’s H0are: βdifferent
= β = from
· · · = zero,
β = and
0 we therefore can conclude that there is a
1 2H0 : β1 = pβ2 = · · · = β p = 0
association
sted bybe
between
overall
thetested F-test
any x i and y. The null hypothesis is
nd can by the overall F-test
: = "
′ H
(ŷ − ȳ) (ŷ − ȳ)/( p) ′ 0 n1 − p − β β 2 1 · · ·(=
= ŷi
β
− p =2 0 "
ȳ)
(ŷ −=ȳ)/(
ȳ) overall i − 2
= and can′ be tested(ŷby−the p)
F-test n
" − p − .1 (
i i ŷ ȳ)
(y − ŷ)F (y=− ŷ)/(n −′ p − 1) p = i ei 2
" " 2 .
(y − ŷ) (ŷ(y− −ȳ)ŷ)/(n
′ (ŷ −−ȳ)/(
p −p)1) n −p p−1 (i e
ŷ − ȳ)2
othesis is rejected if F > F1−α; p,n− p−1 . Note that the null hypothesisi in i i
he null F = is rejected
hypothesis if F > F = . Note that the " null .
hypothesis in
s only the equality of (yslope
− ŷ)parameters
′ (y − ŷ)/(n and does
−1−α; not
p −p,n− include the
1) p−1 i i p intercept e2
is case tests only the equality of slope parameters and does not include the intercept
The Lanull hypothesis
hipótesis is rejectedsiif F > F1−α; p,n− p−1 . Note that the null hypothesis
nula es rechazada
rm.
7.2this
In case tests only
this chapter, wethe equality
have alreadyofexplored
slope parameters and does
the associations not include the interce
between
term.
ator,
xample and 11.7.2
bill withIn
delivery time forwe
this chapter, thehave
pizzaalready
data (Appendix
exploredA.4).
the If we
associations between
linear model
ranch, including
operator, and all with
bill of thedelivery
three variables,
time wethe
for obtain thedata
pizza following
(Appendix A.4). If we
Example 11.7.2 In this chapter, we have already explored the associations betwee
t abranch,
multipleoperator,
linear model including
and bill all of the
with delivery three
time variables,
for the we obtain
pizza data the following
(Appendix A.4). If w
sults:
fit a multiple linear model including all of the three variables, we obtain the followin
lm(time∼branch+bill+operator))
results:
summary(lm(time∼branch+bill+operator))
ients:
summary(lm(time∼branch+bill+operator))
Estimate Std. Error t value Pr(>|t|)
Coefficients:
ept) 26.19138 0.78752 33.258 < 2e-16 ***
-3.03606 Estimate
ast Coefficients: 0.42330 Std. Error
-7.172 t value
1.25e-12 *** Pr(>|t|)
A esta probabilidad se le llama valor p (p - value)
En términos generales, interpretamos el valor p de la siguiente manera:

un pequeño valor p indica que es poco probable observar una
asociación tan sustancial entre el predictor y la variable respuesta debido
al azar, o sea tal asociación entre el predictor y la variable respuesta es
real si vemos un p-valor pequeño, entonces podemos inferir que hay una
asociación entre el predictor y la respuesta. Rechazamos la hipótesis
nula, es decir, declaramos que existe una relación entre X e Y si el valor p
es lo suficientemente pequeño. Los puntos de corte típicos de p para
rechazar la hipótesis nula son 5% o 1%
ssumptions (independence, zero mean and constant variance ! ),
orem 2.1 (Jordan Decomposition) Each symmetric matrix A.p ! p/ can be
ore detailsonabout
conducted these
ˇ. Using the assumptions and the
properties of Chap. 4, it procedures to test their
is easy to prove:
en as
basis of a given sample of data are explained in Sect. 11.9.
ares estimate O βDisˇ obtained by
E.of
ˇ/ p
X -1
> 2 >
O
Var.ˇ/ D ! 2A′ D
.X !
> −1
X / ƒ
!1′: = σ
! D " # # (2.18)
β̂ = (X X) X y. j j j
(11.25)
j D1
that follow
of theβ̂t-test forathe
normal distribution
multivariate with mean
linear regression E(β̂) =
situation is β and covari-
La última
2 relación es válida para el caso de que las variables están centradas, es decir -
x = 0.
re
(β̂) = σ I as
b̌j 2 ′ −1
β̂ t∼DN (β, σ :(X X) ). (11.26)
SE.b̌j /ƒ D diag."1 ; : : : ; "p /
nbiased (since E(β̂) = β); more details about (11.26) can be found
6. An unbiased estimator of σ 2 is
where
n
Xβ̂)′ (y− Xβ̂) ê′ ê 1: : ; # / !
= ! D .# ;
=1 2 # ; : p êi2 . (11.27)
n − ( p + 1) n − ( p + 1) n − ( p + 1)
i=1
n orthogonal matrix consisting of the eigenvectors #j of A.
stimated from the data as ê =!y −"Xβ̂ and are called residuals.
g a detailed
mple data example,
2.4 Suppose that A Dwe12 would
2
. like
The to outlineare
eigenvalues how to draw
found by solving jA "
3
ut the population of interest from the linear model. As we have seen,
D 0. This is equivalent to
e unknown in the model and are estimated from the data using (11.25)
ese are our pointˇˇ estimates. Weˇ note that if β j = 0, then β j xj = 0,
ˇ
ˇ 1 " " 2 ˇ Dxj.1
ˇ 2 3 " " ˇ j . This
odel will not contain the term β means
" "/.3 that
" "/ " the
4 Dcovariate
0: xj
bute to explaining the variation in y. Testing the hypothesis β j = 0
mputation
2.1
sctral
of
(Jordan of
In terms of !, the
El matrices. The
3.6
eigenvalues
Decomposition) Lasso solution
spectral
Multiple
and Each
decomposition
(9.11) inLinear
eigenvectors
variables en laor
symmetric is an
matrix
the orthonormal
Jordan Note
Model
important
A.p
decomposition
! p/ issue
can
design case can be be
links
in the
the
calculatedDecompositions
efecto de considerar menos
in a usual unconstrained minimisation m<p
regresión
problem. that in this case the
releast
of asquares
matrixsolutionto the is eigenvalues
given by and the eigenvectors.
3.660 The Multiple
2 A Short Excursion
simple Linear X mp
linear
2Model
into AMatrix
model
Short Algebra
Excursion into 2Matrix
andPara the
valores A
analysis
de λ
Algebra
Short Excursion
cercanosof into Ma
variance
a 0 al
em 2.1 (Jordan
utation of eigenvalues Decomposition)
A D ! ˇƒ and >eigenvectors Each -1
symmetric
is >an important matrix A.p
issue ! in
p/ can
the be
(2.18)se produce
O !
0
D .XD = X/ "
> !1
Xj #y
>
j #D j X y:
>
desechar variables
as
matrices.
positions
Spectral The particular
spectral
Decompositions decomposition case D1 of or a more
Jordan general
decomposition linear links model
the where the
290
a matrix The 2.2simple
to the eigenvalues
Spectral
linear and model
j
Decompositions
the and the analysis
eigenvectors.
una
of
disminución
variance
9coeficiente
Variable
de
model
Selection
la varainza
canmás
be view
Then the minimisation are explained byX
problem is written as
mpp explanatory variables x respectively.
del ! haciendo
particular case of a more general -1 linear model estable where the variations of one var
la estimación.
2.1
values(Jordan
In
mputationterms
andof of the be
Decomposition)
eigenvectors
are!,eigenvalues
explained
TheO
ˇ D arg
A
LassoaD
var( vector
computation
! is
by ƒ
solution
and an
minp ky ! Xof
p !) =
> 2
Each ofσD
(9.11)
important
eigenvectors
explanatory 2observations
symmetric
in
ˇk2eigenvalues
C !kˇk1
"
the #
issue #
is
jvariables
j j
>
matrix
orthonormal
anin the on
A.pdesign
important the
!
respectively. p/ response
case can
issuecan be
in
Let
and eigenvectors is an important iss
x (2.18)
be ythe
.n variable
" 1/ and X a
calculated
ectral in a
decomposition
of matrices.beThe usual explanatory
unconstrained
spectral
aanalysis
vector
ƒ or
D Jordan
ˇ2R
of decomposition
observations
matrices.
diag." 1 ; variables.
minimisation
decomposition
: : : The
;
j D1 or
" p on problem.
spectral
/ the An
Jordan links important
Note
response the
decomposition in thisapplication
thatvariable
decomposition case
or links
and theathe
Jordan data of theon
matrix
decompositio d
290 squares solution is given by
least 9 Variable Selection 9 Variable Selection
eigenvalues and
e of a matrixexplanatory the
to structure
the eigenvectors.
eigenvalues
squares of afitting.
variables. matrixandpAn the
to
> The eigenvectors.
important
the idea
eigenvalues is X to
p
application approximate
and theof eigenvectors.
the developed y by a linear
theory is th
D arg minp .y ! XX ˇ/ .y -1 ! X>ˇ/ C ! jˇj j
squares fitting.
O 0 !The > >idea !1is to approximate y by a linear combination
(2.18) O yO of c
D D be2 R
> >
omposition)
m
the2.1 In (Jordan
Lasso terms A
Each
of Theorem
solution !, of
Decomposition)
(9.11) ! in, the
ƒ
Lasso
ˇ ˇ2R
symmetric
the X i.e.
D
2.1solution
.X yO 2Each
matrix
(Jordan
X
orthonormal / "
C.X
(9.11)
X jA.p# # !
symmetric
D/.j the
Decomposition)
yj in
design X The
p/ can problem
orthonormal
y:
case can
j D1 be
matrix be Each A.p is
!
design to
p/ find
can
case
symmetric be
can ˇ
matrix
O 2 R such that yO D X ˇO is t A.p p
su
!
alcalculated
as unconstrained of X , i.e.
in a minimisation
usual y O 2
unconstrained C.X
ƒyDproblem. diag."/. The
D1 problem
minimisation
Note is to
problem. find ˇ
theNote that
p
inlinear
this casemodel the
written
fit of as in the 1 ; : : :that
least-squares ; "pin/ this case sense. The can
j m+1 p
X
Then
ionleast the
squares
is given fit solution
minimisation
by of yDinarg isD
thegiven
problem
! min least-squares
.# is
;
1!by written
#2y2 ; >: :
X : ˇ;as#C psense.
/ˇ >
ˇ C The! linear
jˇ j j model can be written as
p ˇ2Rp p
ere X X -1 j D1 X p
> > O > > > > > >
A D !matrix
gonal O
ƒ 0 ! D ˇ
ˇ D .XE( X / Xjˇ2R D
consisting arg
!1
A " D >min
# of! O
# ky
the
ƒ
yj ˇpDj D
0 !
!
X .X >
y:) =
X D
eigenvectors
>
ˇk 2
C !1 !kˇk
X2 / Xj yj D " A># 1D # of !
yj X A.
DX
X
p (2.18)
ƒ y:ˇ C
! ="; y D
D " # #(2.18)
j jXj ˇ C ";
jDD1 ƒ arg
m+1 ! min
D " ! 2ˇ
diag." O;0> j D1m+1
: :ˇ : ;
C " ˇ >
/ ˇ C ! jˇ j j D1m+1
!
1ˇ2R2 D .# 1 ; # ; : : p
: ; # / p j
X by solving jA "
4Then
ation Suppose
problem that
is written
the minimisation A D as p
. The eigenvalues are found
areproblem is written asX ˇ/ O
1 2 p
> j D1
whereD"arg 2the
min 3 .y errors.
! X ˇ/ The .y least
! squares
C ! solution
jˇ j is given by ˇ:
his is
Esteequivalent
último where
término
where
to es
ˇ2Rp" are
distinto p de ! the
cero
Xthe eigenvectors
errors.
por lo tanto
The D1
least
"
j
squares Lo que
solution
indica que
is give
ˇrthogonal
O D arg minmatrix consisting of #j of A.
2 j
ky ! X O
ˇk D C
arg !kˇk min ky
D arg Ominp p !2ˇj ˇj 2C ˇj C>!jˇ ! X O 0ˇk2 C !kˇk 2
ˇ 1 jj :
2 1
>
ˇ2Rp
es un
ƒ D diag." ˇ estimador sesgado
; : : : ; " ˇ
ƒˇ ˇ2R
/ DD!
ˇ2R arg "
diag." min ;.y: : :!; "X /ˇ/ X p
.y !
D diag."
X ˇ/ D arg min " ":
le 2.4 Suppose ˇ
ˇ 1 " "1 D2 arg
!
that A ˇ2R ˇ
D
ˇmin
p .#!
DDp .1 1
j D1
1;22y#
"
>
;
2 .ˇ :O
ˇ: 1:ˇ
X
The
"/.3 X Dp; C# / >
eigenvalues
"
ˇ
"/
ˇpCƒ
parg min .y
" 4
!
D 0:
jˇ
are ! j j
found X
p ˇ/
1; : :>
by
: ; "pˇ/
solving
.y ! X
jA "
ˇ/ D arg m
D Searg logra
min ˇ .y ! > ˇla 2
! varianza
3
X ˇ/ C !peroro jˇ>jse j incrementa X
The objective
0. Thisˇ2R 2
así,
is equivalent
X 3 ˇ/"
disminuir
function D ".y
arg
can
to ˇ2Rp min
now be.y minimised
! X ˇ/ .y by ! ˇ j D1 el sesgo en el estimador
separate
X ˇ/ C minimisation
! jˇ j of its j th ˇ
ere p
and where
gonal matrix consisting of the eigenvectors # ofpA. j D1
j
element. To solve j X
EAE250A Rodrigo Troncoso O.
Visión intuitiva de la multicolinealidad
Y Y Y
X1
X1
X2 X2 X1 X2
a b c
Sin colinealidad colinealidad moderada colinealidad alta
Figura 1
13.3. Consecuencias de la Multicolinealidad
1. Los estimadores presentan varianzas y covarianzas altas. Esto, a pesar de ser los mejores
estimadores lineales insesgados.
2. Los intervalos de confianza tienden a ser más amplios, haciendo más fácil la aceptación
de la hipótesis nula que el parámetro es cero.
3. El estadígrafo t de uno o más coeficientes tiende a ser estadísticamente no significativo.

Si un PC tiene varianza 0 entonces implica una relación lineal constante entre las variables.
Esto a su vez implica que una de las variables es redundante ya que sus valores pueden
determinarse por los valores de las otras variables. Podemos entonces reducir el número de
variables de p a p-q, donde q es el número de PC con varianza nula.
En la regresión, si hay relaciones constantes entre las variables, la varianza de los coeficientes
de la regresión crecen, haciendo que la estimación sea inestable. Este fenómeno se denomina
multicolinealidad. Una forma de evitarla es usando regresiones con componentes principales.
Eliminando los componentes principales con más baja varianza, permite eliminar las relaciones
constantes entre las variables y por tanto, la multicolinealidad. El problema con esto es que en
ocasiones es difícil dar una interpretación a los componentes principales.
las multicolinealidades a menudo, pero no siempre, se indican mediante grandes correlaciones

entre subconjuntos de las variables y, si existen multicolinealidades, entonces las varianzas de
algunos de los coeficientes de regresión estimados pueden llegar a ser muy grandes, dando
lugar a estimaciones inestables y potencialmente erróneas de la ecuación de regresión
Para superar este problema, se han propuesto varios enfoques. Una posibilidad es usar solo
un subconjunto de las variables de predicción, donde el subconjunto se elige para que no
contenga multicolinealidades.
Debido a la falta de correlación, la contribución y el coeficiente estimado de una PC no se ven

afectados por otras PC que también se incluyan en la regresión, mientras que para las variables
originales, tanto las contribuciones como los coeficientes pueden cambiar drásticamente
cuando se agrega o elimina otra variable de la ecuación
> .X>> X / is of full rank and thus invertible. Minimising the expres-
that
C ˇ X Xˇ O > > !1
> >
1) with respectDto!2Xˇ D Y C 2X X
.X
yields: X / D 0:
Xˇ y: (3.52)
ˇ
y D X ˇ C "; ɛ (3.50)
alue yO D X ˇO D X .X ˇO >
D X /
.X
!1>X >!1
X / yX D>Pyy: is the projection of y onto(3.52)
satisfy
least O
squares
ˇ D .X >
solution
X / !1 >
Xis Y.
given by O
ˇ:
mputed
kthat
and.X in
> (2.47).
/ is of fullMinimising
thusXinvertible. rank and thus the invertible.
expres- Minimising the expres-
found
squares
1) with the minimum
residuals
respect toO ˇ are by calculating
yields: > !1 > the second
value yOO D X ˇ D X .X X / X y D Py is the projection of y onto
point
computed ˇ:
! >X ˇ/!1 .y>
! X
in (2.47). ˇ/ D arg min " >
": (3.51)
.X X / e D
X >
y !
y: O
y D ˇOy D! X
.X >
ˇO D
Xˇ/!1
QyX >
(3.52)
Dy:.In ! P/y: (3.52)
ast squares residuals are
X > Y C 2X > X ˇ/ >
>d value
!1 y > O D 2X > X :!1 >
is the projection of onto
eXis/the O yD D
@ˇXprojection
X Py
ˇ D
ofisXythe
.X projection
ontoX the
/ OX of
y
e D yii! yO D y ! X ˇ D Qy D .In ! P/y:D
orthogonal
y onto
Pycomplement
apalancamiento of C.X
(leverage) y
/.
computed in (2.47).
A squares
east linear residuals
model > with
are an intercept ˛ can also be written in this
ore
or e isthe
The matrix
theDistancia
projection
approximating X yisonto
of
X equation
de Cook
positive definite and,
the orthogonal
is: complement of C.X /.
minimum
O of
A Dlinear
the residual square
O function f .ˇ/.
!.5X ˇ QyeD Dmodel
y.I!n y!
ODwith
y !an
P/y: X ˇintercept
D Qy D ˛ .Incan also be written in this
! P/y:
yi D
k. The ˛ C ˇ1 xi1 Cequation
approximating " " " C ˇp is:
xip C "i I i D 1; : : : ; n:
nto
or ethe orthogonal
is the complement
projection of C.X
of y onto the orthogonal
/. complement of C.X /.
writtenyias:
D ˛ C ˇ x
Estimado C
de" la
" " C ˇ x
variable C " I
dependientei D
y 1;
j sin : : : ;
consideran: la observación i
an intercept ˛ can also be written in this
1 i1 p ip i
3.5 A linear model with an intercept ˛ can also be written in this
tion is: approximating
rk. The equation
Criterio " is:"
para determinar si una observación es significativa (dato atípico que
be written as: puedeyafectar C " entera)
D X laˇregresión
" C ˇp y
xiipDC˛"iCI ˇi1D
xi11;C: ": ": "; C
n: ˇp xip C "i I i D 1; : : : ; n:
D .1n X / (we add a column " "
y DofXones
ˇ to
C the
" data). We have by (3.52):

Regresión Lineal

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regresión Lineal

Uploaded by

Copyright:

Available Formats

n

cal covariance is sXY D 0. The coefficient of determination can be

), it can be seen that in the linear regression (3.27), r 2 D rXY

En términos generales, interpretamos el valor p de la siguiente manera:

13.3. Consecuencias de la Multicolinealidad

3. El estadígrafo t de uno o más coeficientes tiende a ser estadísticamente no significativo.

las multicolinealidades a menudo, pero no siempre, se indican mediante grandes correlaciones

Debido a la falta de correlación, la contribución y el coeficiente estimado de una PC no se ven

You might also like