Professional Documents
Culture Documents
X
O
O ˇ/ D arg min
.O˛; X
.yi ! ˛ ! 2ˇxO i / :2
Regresión
WeWe remark
remark Exercise
(OLS)Othis
.˛;
that
that this argymin
3.7 i D.˛;ˇ/
Dcalculation
Find
ˇ/ calculation theC
.y
˛holds
holds !for
values
i ˇxfor C
!O
˛i any
˛
anyand
ˇx
" ;
finite
finite
i : Dthat
i / iˇ 1; :minimize
value : of
: ; n:
of s
sXX .. the sum o
O iD1 XX
s ˛O and ˇ are actually estimates of aniD1intercept and a slope, respectively,
.˛;ˇ/
on Exercise
Exercise
line fitted
Determinar 3.7
to 3.7
los Find
data the
parámetros
Find f.x
the ; y
values
/g n
values ˛by
O˛O and
andthe
y ˇO
ˇOleast
that
que
that minimize
n
squares
minimize
X the sum
method.
sum of squares
More
of squares
i
One has to understand that ˛O and i O
ˇ are random variables
iD1 since they can 2 be expresse
eestimators
has to understand
minimicen bethat
can la suma˛O and
expressed
de O
ˇas are random
los cuadrados .y ! ˛ ! ˇx
variables sincei they can bei expressed/
as functions of random observations xiX and
n y . Random variables ˛O and ˇO are calle
X
n
unctions of random observations xi and yi . Random variables
i iD1 O
˛ and ˇO are called
2
estimators of the true unknownn (fixed) parameters .yi i!!˛˛ !
.y ! and
˛ ˇx
ˇx ii //2ˇ.
mators of the true unknown (fixed) X parameters ˛ and ˇ.
The estimatorsO can be obtained by differentiating
iD1 2 the sum of squares (3.1) wi
The estimators.˛; canDbearg
O ˇ/ min by.ydifferentiating
obtained i ! ˛ ! ˇxi / :the sum of squares (3.1) with
iD1
respect to ˛ and ˇ and by.˛;ˇ/
looking for a zero point of the derivative. We obtain
pect to ˛ and ˇ and by lookingiD1 for a zero point of the derivative. We obtain
Pn Xn
P@n iD1 .yi ! ˛ ! ˇx 2 i/
2
X n
@ ˛OiD1
nderstand that and O are
.yiˇ! ˛ !random D !2 since
ˇxi / variables !˛!
.yi they ˇxbe
can i/ D 0;
expressed (3.
@˛ D !2 .yi ! ˛ ! ˇxi / D 0; (3.2)
@˛
of random observations xi and yi . Random iD1
iD1 variables ˛ O and ˇO are called
Xn X n
the true unknown (fixed) parameters ˛ X and
n!1 ˇ. X
!1n
Dn
˛ !1 yi!1
!n ˇ xi ; (3.
ators can be obtained by differentiating
˛Dn theyisum
! n ofˇsquares
xi ; (3.1) with(3.3)
iD1 iD1
and ˇ and by looking for a zero point ofiD1 the derivative.
iD1We obtain
andP
n 2 n
X
@ iD1 .yiP! ˛ ! ˇxi /
P@n niD1 .yi ! ˛ !Dˇx !2
/ 2 .y
n i !
Xn ˛ ! ˇx / D 0;
i (3.2)
@ iD1@˛.yi ! ˛ ! ˇxi /2 i iD1DX!2 .yi ! ˛ ! ˇxi /xi D 0: (3.
@ˇ D !2 .yi ! ˛ ! ˇxi /xi D 0: (3.4)
@ˇ n iD1
n
XiD1 X
X X X X X X
xi / 2
0 D y x ! n !1
y x C n!1 ˇ (3.4) xi ! ˇ
DSustituyendo
Substituting!2
for ˛ .yi ! ˛ ! ˇxi /xi D 0: ! i
leads to i i i
2
Xn X n Xn
iD1 n
iD1X iD1 Xn iD1 iD
iD1
!1 !1 !2 2
D yi xi ! X
nn
y!1
i X n
x i C
n
X n ˇ !1
x
n
X i ! ˇ Xn
x2 i :
D
Solving
0 y x
the ! n y
i i above equation
i x C n
i in ˇ gives the
ˇ x i ! x
following
ˇ i : estimate
iD1 iD1 iD1 iD1 iD1
iD1 iD1 iD1 iD1 iD1
!1
Pn Pn Pn
bove
Solvingequation
the above in ˇ gives
equation in ˇthe gives following
the ! n estimate
following estimate
iD1 yi iD1 xi ! iD1 yi xi
n Resolviendo
n para nˇ D 2 ! P n " P
X X P !1 P
P X n P n Pn !1 P X n
n
x
2
! n
x 2
!1 n y x ! y
iD1 x 2 i iD1 i
yi xinCˇnDiD1
!1 n ˇ yi iD1 !P
n
iD1
x
i
ixi P !
iD1
" ! i
Pˇn
iD1
iD1y i xxi i
ii P : P
iD1
ˇ
iD1
D ! P n n
!1
iD1 "n
iD12 x
2 n
i Pn yiD1
! n
i x
iD1i x!i n
2 !1 n
y i
n
x i
!1 iD1 iD1 iD1
n Pn iD1 xi D!1 ! PnP iD1n
xP
2
i n2 ! Pn "2
yi xi ! n yi xiD1! xi n!1 x
Pn D Pn !1 Pn !PP iD1 iD1
iD1 n "i iD1 i
n ˇ gives the following y
iD1 i i x ! estimate
n
iD1 xi ! iD1
2 y
nPn i iD1iD1
!1 n
xi xi
2
D Pn Pn2 !PniD1 yi"x2i ! nNyxN
P P iD1 x P
iD1
i ! y i xni
!1
! DnNy x
N P iD1n x i2
!1 n
y n D Pn n
x ! y 2xi iD1 i
x ! nN x 2
iD1 i P iD1n i iD1 i x 2
iD1 ! nNxi
! Pn "2 yisxP
iD1 i ! n nN y x
N s XY
n !1 D P x n D! XY
: x 2 D :
iD1 i iD1 2 i
x
iD1 iXX s
2
! nN x sXX
Pn !1
P n P n
iD1 y i x i ! n sXY iD1 yi iD1 xi O O sXY
Hence, theHence,
sumDof squares :
the sum is minimized
of squares foris ˛ D O
˛
minimized D yN ! ˇ x
N
for and ˇ
D DO ˇD D y
N ! . O xN and ˇ
Pn 2 sXX!1
! P n " 2
˛ ˛ s XX ˇ
iD1 xi !n iD1 xi
X
96
Por tanto la suma de cuadrados3 se minimiza con:
Moving to Higher Dimensions
3 Moving to Higher Dimensions
minimized for ˛ D ˛O D yN ! ˇO xN and ˇ D ˇO D sXY
sXX
.
An unbiased estimator 2b!s2 of ! 2 is given by RSS=.n ! 2/.
sedyN estimator
D The ! DofˇO!D isXYgiven
ˇO xN and b
! following ˇ
2
. by RSS=.n ! 2/.
relation holds
sXX between (3.35) and (3.37):
wing relation holds between (3.35) and (3.37):
n
X n
X n
X
2
n
X .yi !
X D
ny/ .b y/n2 C
yi ! X y i /2 ;
.yi ! b (3.38)
2
.yi ! y/
i D1 D y i !i D1y/2 C
.b !b
.yii D1 y i /2 ; (3.38)
i D1 Total variation DVarianza
Varianza Explained variation
i D1 C Unexplained variation:
i D1
Varianza
Total variation
Total D Explained variation
Explicada C Unexplained
No Explicada variation:
The coefficient of determination is r :
2
determination is
cient of Coeficiente r 2
:
dePdeterminación:
n
2
y i ! y/
.b
2 i D1 explained variation
Pn r D D " (3.39)
Pn 2
total variation
y i ! y/.yi ! y/2
.b
i D1 i D1 explained variation
Varianza Explicada
r2 D D " (3.39)
P
n
total variation
.yi ! y/ increases with the
The coefficient of determination2 Varianza Total
proportion of explained variation
i D1 (3.27). In the extreme cases where r 2 D 1, all of the variation
by the linear relation
is explained by the linear regression (3.27). The other extreme, r 2 D 0, is where
ficient
the of determination
empirical increases
covariance is sXY Dwith0. the
Theproportion
coefficient of
of explained variation
determination can be
earrewritten
relationas(3.27). In the extreme cases where r 2 D 1, all of the variation
r D i D " (3.39)
P
n i D1
total variation
.yi ! y/2
m of squares, thei D1
minimum in (3.28), is given by:
Residual Sum of Squares
ient of determination increases
X n with the proportion of explained variation
ar relation (3.27). In the extreme
RSS D .yi ! b
y i cases
/ 2
: where r 2 D 1, all (3.37)
of the variation
d by the linear regression
i D1 (3.27). The other extreme, r 2 D 0, is where
Varianza Varianza
Sales (X1)
Total No Explicada
180
Varianza
Explicada
170
90 95 100
Price (X2)
Precio
Regression of X5 Swiss bank notes
ner frame) on X4 12
ner frame) for
ank notes
enuine
bank 11
a simple
vector linear
of observations
model and on
the the response
analysis of variablemodel
variance and acan databe ma
vi
ce nce model
model
dlanatory
byRegresión can
can
p explanatory be be viewed
viewed
variables as as
xa a
respectively. Let y .n " 1/ and X .n " p/
icular casevariables.
of a more
lineal An important
general
múltiple linear application
model where of the
the developed
variations of the
one
ofthe
he variations
variations
observations ofofonone
one variable
variable
the responsey y variable and a data matrix on the p
ares
ely.
y. fitting.
explained
Let
Let y y
.n by
"
.n The
"p
1/ and
1/ idea
explanatory
and X X is
.n to
"
.n "
p/approximate
variables
p/ x by a linear
respectively.
y Let combination
y .n " 1/ and
Sean An important
variables. y applicationel vector
of thede observaciones
developed theory is de
thela
least
X a ,
eg.andvector
i.e. ayO of
2
data observations
C.X /.
matrix The
on theon
problemthe response
is to find O
variable
ˇ 2 R pand
sucha data
that matrix
O
y D
ble and
The a
ideadata
is matrix
to on
approximate the
p p
y by a linear combination O
y of columns
variable
lanatory
f developed
in /.
yC.X theThe respuesta
variables.
least-squaresAn y la
important
sense. matriz
The de
application
linear observaciones
of
modelthe developed
can be con
theory
written p
as
e2he developed theory
theory is the
is theleast
least O
problem is to find ˇ 2 R such that yO D X ˇ is the best
p O
variables
ares fitting. explicativas
The idea is to o predictores
approximate y by respectivamente.
a linear combination El
Oo
y
ear
near combination
e modelocombination
least-squares yO y
sense.of
O columns
of columns
The linear model can be written as
,such
i.e.that
yOthatlineal
2yOC.X se O escribe
The en
problem laisforma:
to find O 2 R p
such that O D O
Xpsuch yD /.
O DX X O
ˇ ˇis isthethe best y D X ˇ C ";
best ˇ y X ˇ
fcan
an ybein the
written least-squares
be written as as y sense.
D X ˇ The
C "; linear model can be written as
(3.50)
ere " are the
es el error errors. The least squares solution is given by ˇ:O
he errors. The least squares (3.50)
(3.50) y DisXgiven
solution ˇ C ";by ˇ:O
Para estimar O los parámetros > usando mínimos cuadrados
>
debemos ˇ
O D arg
resolvermin el.y ! X
> problema:
ˇ/ .y ! X ˇ/ D arg min "
O ":
ere
ven
ˇgiven" are
by
D argby ˇ:the
minˇ:O errors.
.y ! X The
ˇ/
ˇ least
.y ! Xsquares
ˇ/ D solution
arg min " is
>
":given by
ˇ ˇ: (3.51)
ˇ ˇ
> >O > >
min
gˇmin" "": X ˇ/ .y ! X ˇ/ D arg min " ":
ˇ":D arg min .y(3.51)
!(3.51)
ˇ ˇ
ˇ
It follows immediately
D C that C @x D a@x
@a axij xi xj
>2 xk j ;
kj@
immediately
@xk that@xk> @xk @xk @x AxjD1D
iD1 jD1 D a:
@a x
e gradient vector definition from the introductory section on terminol- @x
D a: @a >>
x @x>x @x
@xof partial >derivatives @a x @a
hwhich
element of the
is just thevector
kth element x 2Ax. @x is
of vector
@a equal
a: to @xdifferentiating
DSimilarly, k
D ak .
mmediately
Using
ferentiating
that
the above two with
properties, D
we a:
respect
have @x
to k gives for the last formula
thexfollowing
@x !P
p Pp
Similarly, differentiating
!P@a> xP2 >
@ x Ax
"
@2Ax P @x >
Ax P@ iD1 j
rly, differentiating
@x> Ax @
p
iD1 D
p
jD1a:aij xiDxj @.:/ D@a 2A:
kk x 2 @ i6Dk aik xi xk @ j6Dk akj xk x
D
k
D @x @x@x> @x!>P Dp Pp C " @x C @x
@x @x!>P P @x
@ k " k a x x @xk
@x ij i j @xk
,Exercise
differentiating @x Ax (idempotent)
p p iD1 jD1
2.6 Show @x @
>that a projection
Ax iD1D jD1 aijwith xi xjmatrix
respecthas to eigenvalues
xk gives only in
gives
kthe set f0; 1g.Demostración:D @x @x
@x ! Ppwhich Pp is just "
@x the kth element of vector P P
P > @ P jD12 aij xi xj> p 2Ax. aik xi xk
A is a @x Ax
projection matrix ifiD1
A D A D A . Let
X be @.:/
an eigenvalue@a kk x 2
of A @and i6D k @
with
kk xk respect tok axikkD
xgives
i xk j6Dk akj xkthexj above two
@a 2 @ i6D @ Using ! i properties,
D k
we Chave the " i following C f
ect to
its@x x C gives
corresponding
k C
@x@xeigenvector: @x D 2 a x ;
kj j @xk @x k @x k
k k @x
Pk jD1 P p
to xk gives @.:/ P
@a kk x 2 @ A" i6 D k
P
aik!x"i xk @ j6Dk X apkj@xkxxj Ax
2 > X @2Ax
@.:/ @akk xk D
2 @ k
i6Dk Ca x x
ik i k i @D a C
j6Dk which
i i x x
kj k j is just the kth element D 2D of akjvector
xj ; D 2Ax.
2A:
kth element
D of vector
C
@xk P @xk 2Ax. P C @xk D 2
@xkabove a x
kj j>; >
@x @x @x A 2
" D ! @x
A" Usingp the @x@x two jD1 @x
properties, we have the f
/ve ktwo@a xproperties,
kk k
2 k @ we
i6Dkhave
a x the
x
ik i kk following
@ j6Dk
i for
a x ithe
kj k jx klast
i formula
X jD1
D C C D2 akj xj ;
k
which
@xk
is just the
>
@xk
kth Exercise
element
A"@x
of 2.6
i D
k !i A"
vector Show
2Ax.i that a projection (idempotent)
jD1 @2 >
x Ax matrix
@2Ax
@2
x
ust the kth element DAx @2Ax
of vector 2Ax.
the
D set
2A: f0; D1g. 2 D D
the Usingtwo
above the
@x@xabove two
>
properties,
>properties,
@x we have
A" i wei have
the
! " i
following theforfollowing
the last for the last@x@x
formula formula @x
> >
the kth element of vector 2Ax. >
above
ow thattwo properties,(idempotent)
a projection A !is "
we have the matrixa projection
D
following
i i !
has
2
"
i for matrix
i the last formula
Exercise
eigenvalues 2.6
only if A
in
Show
D A
that
2
a
D A . Let(idempote
projection
!i be an
2 >
@ x Ax @ x Ax 2 @2Ax
2 > its corresponding
@2Ax eigenvector:
D ! i DD !D : the
i 2A: > set Df0;2A:
1g.
2 >
@ x 2Ax >@2Ax @x@x > > @x
ion matrix if A D A >D2 A @x@x >
D . Let @x
!D
> i be2A:an eigenvalue A ofaAprojection
is and "i matrix if A D A2 D A> . L
gIteigenvector:
is obvious that ! i D !i only
@x@x @xif !i is equal to 1 or 0. A"i D !i "i
X Y.
> O > !1 >
ssedO as
as.Yˇ!D
ressed X.X
as
We ˇ ˇ OD
ˇ/, Di.e.,
define
> .X
X !1
>
/ X
.Xthe X / >
!1
X/ Y.
function X >
X fY.Y.
.ˇ/ D .Y ! X ˇ/ >
.Y ! X ˇ/, i.e.,
Y !Consideramos
X ˇ/> .Y ! X ˇ/,la función:
i.e., >>
> nethethe
function function
function
> f .ˇ/ f f
.ˇ/
.ˇ/ DD
> D .Y ! X ˇ/ .Y .Y ! X
X >ˇ/
ˇ/> .Y
.Y!
! X!X X ˇ/,
>ˇ/,
ˇ/, i.e.,
>
i.e., > >
i.e.,
Y C ˇ X X ˇ: f .ˇ/ D Y Y ! 2ˇ X Y C ˇ X X ˇ:
! 2ˇ > X > Y f
C ˇ
D
> >
Y
X>> X ˇ: >
Y ! >X > >Y C ˇ >> >>
f .ˇ/
Theforf D.ˇ/.ˇ/
minimum D
>
Y zero Y
Y !ofofY !
f .ˇ/>2ˇ
can> X
YC
be Y
foundC>
ˇ by ˇ X>X XX
X searching
ˇ:ˇ:
X ˇ: for the zero of its derivative
arching the 2ˇ itsX derivative
nd by of
nimum searching
f .ˇ/ can for
be the zero
found by of> its derivative
searching >for>the zero of its derivative
mum of f@f.ˇ/ .ˇ/ can be >found by
>
@Y Y ! 2ˇ X Y C ˇ X X ˇ searching for the zero of its derivative
Xofˇf .ˇ/ can be >
D found by
>
searching for the zero
D of
!2X its
>
Yderivative
C 2X >
X ˇ D 0:
>D > >!2X @ˇ Y>C>2X X>ˇ D @ˇ0:
>
C/ ˇ X X
@Y> Y ! 2ˇ> X> Y >C ˇ >X >X
ˇ > ˇ
>D @Y Y ! > 2ˇ D> X!2X Y C> Y ˇC> X2X X Xˇ D
ˇ D!2X0: >
Y
> C 2X >
X>ˇ D 0:
YD YItDe ! 2ˇ
followsdonde X Y encontramos
that the C@ˇ ˇ X ˇ,
solution, OXhasˇ tolasatisfy >O DY
Dsolución
!2X C> 2X > !1 X>ˇ D 0:
Y. 0:
@ˇ D !2X Y C 2X XXˇ D
ˇ .X X /
ˇO D .X Let> us X /now !1 >
@ˇXverify Y.
O>has !1
that we have found the minimum by calculating the second
O D .X > XO/!1 X > Y.
hat
oesatisfythe O
solution,
ˇ Dofby
derivative .X ˇ,
the function to
X / Xf .ˇ/ satisfy
>
Y. ˇ
in Osecond
the point>ˇ: !1 >
minimum
at theverify
solution, Ocalculating
ˇ, has tofound the
satisfytheˇ minimum
D .X X by X Y.
/ calculating
now
found the that
O
minimumwe have by O
calculating >the second
!1 > the second
wsolution,
:of verify
the function
ˇ, has
that fwe to
have
in
satisfy
2 found
the
f point
D
ˇ the
O .X > X/
minimum
Y C
X
by
> Y.
calculating the second
erify
point that O
Comprobamos
ˇ: we
.ˇ/
have
@
found
.ˇ/
que
the
D
ˇ:
@.!2X
efectivamente
minimum
O
2X
by
X ˇ/
es
calculating
D un2X mínimo
>
the
X : second
the function f .ˇ/ in@ˇ@ˇ the >point ˇ: @ˇ
>
unction
X X ˇ/ @ 2f .ˇ/ in the point
f .ˇ/ > @.!2X > ˇ:
Y O C 2X >
X ˇ/
> >
D 2X X : >> >
X Y C@22X X
f .ˇ/> X has Dˇ/ Y C >
X D 2X X : >
The @ˇ@ˇ
matrix @.!2X
D full
2X rank,
X@ˇ: therefore
2X the
ˇ/ matrix > X X is positive definite and,
@ˇ O D > > D 2X X :
@ f .ˇ/
2hence,
@ˇ@ˇ > indeed the Ylocation
ˇ is @.!2X C 2X@ˇ of the X ˇ/ minimum of > the residual square function f .ˇ/.
xmatrix D>
X has>XfullXrank, therefore the positiva
matrix D >2X X :
X X is positive definite and,
@ˇ@ˇ is es >
definida
positive @ˇ definite and,
fore
indeed the
hasthe the
fullmatrix
location X of
rank, therefore X the is positive
minimum
the matrix ofdefinite
the > and,square function f .ˇ/.
residual
Xm of residual square function f .ˇ/. X X is positive definite and,
e minimum
ndeed the locationof the residual
of the minimum square function
of the f .ˇ/. square function f .ˇ/.
> residual
fitted
2.47).value yO D X ˇ D XO .X X /> X!1 y >D iD1 Py is the projection of y onto
The fitted
The value
least yO D
squares
O D
X ˇresiduals
> X .X!1 X>/ X y D Py is the projection of y onto
are
as computed
/iduals are in (2.47). ˇOD .X >X / !1X >y: n (3.52)
n
C.X / as computed in ˇ (2.47).
D .X X / X y: X X(3.52)
he leastThe El least
valor
squares estimado
residuals are para y se calcula:
!1O !1
value yODDy X O
squares residuals
ˇXO D OD X .X e
> D y
!1
are
! y
>O ˛
D D
y ! n X ˇ D Qy y iD
D Py is the projection of y onto
! .I n n ! ˇP/y: x i ;
y ! yO ! ˇ Qy >X D /.I!1nX!>yP/y:
d value yO D X ˇ D X .X X / X y O D Py is iD1 the projection of yiD1 onto
omputed in (2.47). e D y e!DyO yD! yyO ! D X
y ˇX
! DˇOQy D QyDD .In.I! P/y:
! P/y:
computedThe in (2.47).
vector e isarethe projection of y onto the orthogonal complement of C.X / n
st squares
ection of residuals
onto the se
Elyresiduo orthogonal complement of C.X /.
ast squares residuals are calcula:
vector The is the e3.5
e vector
Remark projection
is the
A projection
linearof ymodel onto
of y onto the
with orthogonal
thean orthogonal
intercept complement
complement
˛ can also of
of C.X
C.X be /./.written
modelP with D
framework.
e an
y ! intercept
O
The
y D ! ˛ ˇOcan
approximating
y X D Qy also D
equationbe
.I written
!is:
P/y: in this
ark Remark
3.5 A 3.5
n elinear
D yA!linear O
model
y D y model
!withX ˇOwith
2 D
an an D intercept
intercept
Qy X .Inn !
n
˛ ˛can
P/y: canalsoalso be be written
written inin this this
imating equation
@framework.
iD1 .yThe i !approximating
is: ˛ ! ˇxi / equation is:
ework. The
Un approximating
modelo
r e is the projection of lineal
y y onto
D
equation
˛con
Cthe ˇ D
la
x !2
is:
intersección
orthogonal
C " " " C .y i
complement
ˇ x ! C˛ "! of
I ˇx
i D
C.X i /x i
/.
1; : D
: : ; 0:
n:
or e is the projection @ˇ of iy onto the1orthogonal i1 complement
p ip i of C.X /.
C ˇ1 xi1 C " " " C ˇ yip Dxip˛ C C "ˇi1 xI i1i CD" "1; " C: :ˇ:piD1
;xn: ip C "i I i D 1; : : : ; n:
5 A linearyi model D ˛ C with
ˇ x anC intercept
" " " C ˇ x ˛ip C can"i also
I i Dbe1; written
: : : ; n: in this
3.5 A This linear canmodel
be written withas:an intercept ˛ can also be written in this
1 i1 p
k. The approximating equation is:
uting
rk. The for
This can˛beleads
writtento
approximating equation is:
as:
can besewrittenpuede as: escribir con este yesquema D X " " en la forma:
ˇ C "
yyi DD˛˛CCˇˇ1"xxi1"C " " " C ˇ x C " " I "i D 1; : : : ; n:
""" " C ˇp xip C "i ˇI iCD" 1; : : : ; n:
D
i y D X 1 ˇi1 C
n C n
p ip
y
" n
X
"
i
n
! 2 n
X "
where"X D .1n X!1 X y D X X ˇ
/ (we add a column of ones C " X
to the data). We have by (3.5 X
ebewritten as: !1 2
we add
con
0 D
where
written
a
Xas:
column
D y .1
i x
of
ni ! n
X
ones
/ (we
to
add
the
a y
column
data). i We
of x ones
i
have
C n
to
by
the data).
ˇ
(3.52):
We x
havei by ! (3.52):
ˇ x i
! "
e X " D .1iD1 n X / (we add a column iD1 ! " of
iD1˛ones
O to the data). We
iD1 have by (3.52): iD1
y tenemos
! " que:yyˇD OD O
" Xˇ"ˇD
"" ˛"
O"C " D
"> .X ">
" !1 X">/ X
" !1 ">
y:
O
˛ ! DX " ˇ O
CD ˇO
" .X X / X y:
ˇO " D O D .X "> X " !1 ">
/ O
˛ X ˇ y:
the above ˇ equation ˇO "
D in
"D .1n X / (we add a column of ones to the data). ˇ gives
D .X the
">
X "following
/ !1 ">
X We have
y: estimate
by (3.52):
D .1Example
n X / (we add a column O of ones
3.15 Let us come back to the “classic blue” to the data). We have by (3.52):
pullovers exam
SE(β
can
r eachalso1 ) be
is increase
$1,000 used to inperform
television hypothesis
advertising, tests
thereon0will
H the
: β 1be= an0 average
hypothesis
etween
ostaverage,
common 42falland 53 units.
somewhere
hypothesis between
test 6,130 and testing
involves 7,940 units.theFurthermore,
null test
crease
evidence in sales of between 42 and 53 units.
ran
eachalso Determinando
be
$1,000 used
increase
Standard errors versus to la
in perform
television significancia
hypothesis
advertising, therede
tests las
on
will
can also be used to perform hypothesishypothesis
variables:
the
be an average
tests on the
crease
nd
ost Y in
common.sales
In of between
hypothesis 42 and
test 53 units.
involves testing the null testnull hypoth
efficients.
Supongamos The most
que common
queremos hypothesis
saber test
si la variable Xinvolves
influye
H :entesting
β Y.̸=
Standard errors can also be used to perform hypothesis tests on the hypothesistest
a 1 the
Usamos
0,
hypothesisnull
una prueba
here is no
de hipótesis
pothesis
eefficients.
in order of relationship between X and Y
The most common hypothesis test involves testing null
(3.12) the null test null
pothesis
since
ofLarelationship
if β 1 = 0 then the model (3.5) reduces
hypothesis to Y =
hypoth
β
hipotesis nula:
-statistic,
here is no
e hypothesis H0 : There
not isbetween
no
associated
t-statistic
X and
relationship
with Y Y
between
. To test X(3.12)
andnull
the Y hypothesis,(3.12)
null
alternative
hypothesis
we n
No
H0 : There hay
is relación
whether β̂ , entre
our
no relationship
1 Xbetween
eY
estimate Xfor β
and 1Y, is sufficiently
hypothesisfar from
(3.12)
e
erehypothesis
rsusisthe alternative
some hypothesis
relationship between X and Y . (3.13)
alternative
be
La hipotesis confident
alternativa: that β1 is non-zero. How
rsus the alternative hypothesis
far is
hypothesis
far enough
alterna
hypoth
alternative
re is(3.14)
someH relationship
: depends
There
Hay is between
some
relación on the X and Y
accuracy
relationship
entre X e Y . of β̂1 —that
between (3.13)
X and is,
Y . it depends
(3.13) on SE
s corresponds to testing
a hypothesis
Ha : There is some
small, thenrelationship between X
even relatively and Yvalues
small . (3.13)
of β̂1 may provid
De otra
corresponds
athematically, to manera,
thistesting más operacional:
corresponds to testing
athematically,H that β
0 : corresponds
this β1 = 01 ̸
= 0, and
to testing hence that there is a relationship betw
way fromH0 : contrast, β1 = 0 if SE( H0 :β̂β11) = is 0large, then β̂1 must be large in absol
for us to Hreject
0 : β1 = the0 null hypothesis. In practice, we comp
e
rsus
expect Ha :given β1 ̸= by 0,
rsus
m. The t-Ha : β1 ̸= 0, H H: βa : ̸=β10,̸= 0,
n the model (3.5) reducesa to1 Y = β0 + ϵ, and Xβ̂1is− 0
ximately En la práctica se calcula la t - estadística con: t = , usando
nce
nY the
nce .ififTo
ββmodel==0 0the
11test
then
(3.5)
then null
thethe
reduces model to (3.5)
hypothesis,
model (3.5) wereduces
Y reduces
= βneed
0 +
to ϵ, Yto toandYβ0 X =
=determine βis
+ ϵ,SE( +β̂ϵ,1X)and
0 and is X is
ttYaassociated
imate. simple
associated
To for testβwith1
with
the, is .YTo
null . hypothesis,
To
Ysufficientlytesttest
thethe nullnull
far we
from hypothesis,
need zero
hypothesis, to we we
determine
that needwe need
tocan to determine
determine
la distribución
hether β̂ , our - which
t con n-2
estimate for grados
measures
β , is dethe libertad
sufficientlynumber para
far ofdeterminar
fromstandard
zero ladeviations
that probabilidad
we can de
that
1mate
hetheris β̂for
non-zero. β
11, our 1 , estimate
is sufficiently
How for
far βis far
1 , is from
1 sufficiently
enough? zero that
farThis fromof wezerocan
that we can
course
que |t| tome
to or cierto valor
confident
confident
1racy of β̂1that
is non-zero. thatβHow
—that 1β is
0.1isis, non-zero.
Ifnon-zero.
there
far
it is farHow
depends How
really faris
enough?
on farno
is
SE( β̂is1enough?
far
This ).farIfofenough?
relationship This This
β̂1 ) of
course
SE( between
is courseof course
X and Y ,
Roughly
pends
pends ononˆthetheaccuracy
accuracy of of
β̂1 —that
β̂1 —that it ˆdepends
is, itis,depends on SE( onβ̂ˆ1SE(
). Ifβ̂SE(
1 ). β̂
If1 )SE(
is β̂1 ) is
tiona bevariance
normality
ofi a)thenormally !
errors,
2
2
has
assumption to
the
be
distributed, of
t-test
3.6
estimated
the
for
by
errors,
N
2 the
(0, Multiple
anσ 2
estimator
the I
hypothesis which
t-test
), for
! O 2 Linear Model
ˇ
that
means
the
D
will bethat
hypothesis
0 works
giventhe
ˇ
as
below.
D 0 Under asof t
expectation
works
Var(e
follows.
= σ
normality (and therefore
assumption the
of the errors,
! same for
the t-test all e i
forof ),
theand hypothesis ˇ D 0 works as
2 (3.31),
tandard iserror
0, E(e (SE)
i ) =of
Var. 0,
the
ˇ/Othe variance
estimator
D ′ is
: is
the Var(e
square i ) =
root σ (and therefore
(3.31) the same for al
at Cov(e
One , ei ′ ) = 0the
follows.
i computes all in̸=
forstatistic " sXXi .2The assumption of a ′ . The assum
it
etostatistic follows
One from
computes Var(e)
the =
The
statistic σ I that
simple Cov(e linear , e ′ ) = 0 for
i i model and the analysis of all i ̸ = i
construct confidence O intervals O and test of! hypothe-
error normal
(SE) of distribution
the SE. ˇ/ D fVar.
estimator isisthe
required
ˇ/g
square
1=2
Dto construct
root of (3.31), :confidence intervals (3.32)and test of
ut these assumptions and the
ses statistics. More
particular
details
procedures
about
.ncaseˇ"Oto
these
O of
/ atheir
sXXtest 1=2 more
ˇ assumptions and the procedures
general linear model to
sample of data tare O
ˇ areinexplained
explained t
Sect. D
11.9. (3.33)
validity O on D
the basis
O 1=2 ofDhypothesis !
a given sample
t D SE.
SE.
O
ˇ/ by
Oˇ of pdataexplanatory
are (3.33) inthe
explained variables (3.33) x res
Sect. 11.9.
an use SE.formula
this ˇ/ D fVar. to test
ˇ/g the
O be .n that
: ˇ/ D 0. In an (3.32)
application
f β is2 obtained by SE.ˇ/ a vector
" s / 1=2
of observations on Dimensions
the response
nceand ! has The to beleast squares
estimated by estimate
an estimator XXof β!O 2isthat obtained
3 will be
Moving by
given to below.
Higher Under
rejects
and the
rejects hypothesis
la the hypothesisa unatnivel
a a5de
at %%significancia
5 significance
significance level
level
5% if ifjj t t j#j#t0:975In!2
t0:975In!2 , , where
where the the
mality
thisβ̂
Rechazamos
=
97.5 (X
formula%
′
X)
assumption −1
quantile
to X
test
hipótesis
′
of
of y.
the the
the errors,
Student’s
hypothesis
explanatory
thethat t-test for0.the
distribution
D
variables.
In
del
hypothesis
(11.25)
′
an is −1
si,
clearly
An
′
application ˇthe important
D 0 works
95
the % asapplicati
critical value
othesis
donde at
97.5ela% 5 %
cuantil significance
quantile
97.5% of de level
thelaStudent’s
distribuciónt if
tn!2
n!2 ˇj
de tβ̂ j#=
distribution
Student t
(X t X)is
es el X
,
clearly y.
where
valor the 95
críticothe
%al critical
95% value
para la
ws.
has for
to the
be
hecomputesfor
Student’s
prueba two-sided
estimated
the
de two-sided
dos by test.
an
test.For
estimator
distribution
entradas. For
Para, n squares
#
n #
b̌jisE(O
!
30,
30,2
this
that
this
clearly
estefitting.
can
will
can
puede bebe
be
the
The
0:975In!2
n-2
replaced
given
replaced
95
ser idea
by
below.
by
% critical
reemplazado 1.96,is
1.96,
Underto
the
the
value
por approximate
97.5
97.5
1.96, % % quantile
quantile
este es y
nenormal
rd error distribution
of each
the 96
tn!2 with
coefficient
statistic mean is given
β̂) = βby and the covari-
square root of the 3 Moving
diagonal
assumption
of Itofcan
el the theof
normal be
theshown
errors,
distribution.that β̂An
the t-test follow
for
Anestimator
of , thea normal
i.e. hypothesis
!OO ofof !distribution
will be0Theworks
given with mean
asthe
inthe
problem E( isβ̂)to=find
following. β anˇ
normal 2 2
dedistribution. estimator !Oy will
ˇD be given in following.
2
st. For n # 30, thisO can be replaced
cuantil 97.5% la distribución X 2 C.X
by 1.96, the 97.5 % quantile
normal. /.
f the matrix
ance Var.
matrix Var(
ˇ/. Inβ̂)standard
= σ 2 I situations,
as the variance ! 2
of“classic
the error "
bution. Example
ExampleAn 3.10
estimator
3.10
An
Let
Let−1us!O
unbiased
us
2
ofapply
apply ! fit
2 of
the
thewill
noestimator
linear
ˇy
linearbe
O in
giventhe
regression
regression
2
of
least-squares
in model
model
2the
is2isgiven
(3.27) to
(3.27) to
following. by
sense.
the
RSS=.n
The
the “classic linear
blue” m
blue”
!on2/.the
wn.
putes For
Nthe linear
statistic
∼pullovers.
Para p
σ =1,
pullovers.
(β, 2
(Xmodel
′
un
TheX) with
estimador
sales). intercept,
manager
The sales manager believest Dsesgado
believes one
b
! that
thatmay
de there
there estimate
! (11.26)
está adado
′
is(Xa X) it
porby
strong
−1 dependence (3.33)
strong). dependence on the
The following relation β̂ ∼
O (3.27) N (β,
holdstobetween σ (3.35)blue”and (3.37):
us apply the
En general linear regression SE.
model ˇ/ the “classic
E( β̂)
s manager= β);
Note that more
believesβ̂ details
is unbiased
that about
1ˇO
there (11.26)
(since
is a E(
strong can=dependence
β̂) beβ); found
more pones the
details laabout y
dimensiónD de
(11.26) X ˇ C
can
22 tD > (3.33)
estimator
ejects the of O
!
hypothesis
σ D
is at a 5 %X n .y
significance
O ! y/Olevel .y
X if
n !j t O
y/;j# 2 t X ,nwhere the
in Appendix C.6. n ! An
SE.
.p Cunbiased
ˇ/ 1/ estimator
2
of σ is 2 0:975In!2
2
% quantile of the Student’s tn!2 .y distribution
! y/ Dis clearly.by the
! 95
y/ % C critical .y value
! b
y i ;
/
)he
e two-sided ê ′ ê
test. For #X30, 1′ where
this
i
can !be
n" are
replaced the byerrors.
′
i
1.96,, the The least
97.5the squares
i
% 1quantile ! n solut
hypothesis
= dimension at
2 a 5 %
(y
=
n−significance
β̂)i D1(y − X
level β̂)if j
ê 2t j#i D1t ê ê
(11.27) where i D1 2
seile
the of In testing D . we reject
0:975In!2
the hypothesis at the
pσ̂+= !=will in = êi .
normal ˇ. ˇ i 0
−distribution.
ofnthe (Student’s 1) tn!2 An estimator
−
nn distribution
− (
Total
( p
p +
+ 1)
1)!O isof
2
variation
2
jclearly
D nthe
− be95given
p
Explained
( %+ 1) thevalue
critical nfollowing.
variation− ( p
C + 1)
Unexplained v
e level ˛ if jtj " t
ided test. For n # 30, this can be replaced
1!˛=2In!.pC1/ by 1.96,
ˇ . More general issues on testing>linear
i=1
O Dthearg min
97.5 % !
quantile
.y X ˇ/ i=1! X ˇ
.y
mple 3.10 Let us apply the linear regression model (3.27) to the “classic blue”
addressed
lvers.
distribution. in Chap.
An 7.β̂ and
estimator !O ofare will there
be residuals.
given
is ain strong
the following.
2 2 ˇ
he data as ê = y − X called
! that
the data as êis=
The errors arecoefficient
estimatedof
The sales manager believes
The from
determination r 2y: − Xβ̂ and are called resid
dependence on the
at all
are βi ’s are
different from different
zero, and from we zero, andcan
therefore we conclude
thereforethat canthere
conclude
is an that there is an
etweenSometimes,
El any
test global
x
ssociation betweeni and one
(no
y. is interested
incluye
The null in
el intercepto)
hypothesis whether
is
any xi and y. The null hypothesis is a regression model is useful in the sen
that all βi ’s H0are: βdifferent
= β = from
· · · = zero,
β = and
0 we therefore can conclude that there is a
1 2H0 : β1 = pβ2 = · · · = β p = 0
association
sted bybe
between
overall
thetested F-test
any x i and y. The null hypothesis is
nd can by the overall F-test
: = "
′ H
(ŷ − ȳ) (ŷ − ȳ)/( p) ′ 0 n1 − p − β β 2 1 · · ·(=
= ŷi
β
− p =2 0 "
ȳ)
(ŷ −=ȳ)/(
ȳ) overall i − 2
= and can′ be tested(ŷby−the p)
F-test n
" − p − .1 (
i i ŷ ȳ)
(y − ŷ)F (y=− ŷ)/(n −′ p − 1) p = i ei 2
" " 2 .
(y − ŷ) (ŷ(y− −ȳ)ŷ)/(n
′ (ŷ −−ȳ)/(
p −p)1) n −p p−1 (i e
ŷ − ȳ)2
othesis is rejected if F > F1−α; p,n− p−1 . Note that the null hypothesisi in i i
he null F = is rejected
hypothesis if F > F = . Note that the " null .
hypothesis in
s only the equality of (yslope
− ŷ)parameters
′ (y − ŷ)/(n and does
−1−α; not
p −p,n− include the
1) p−1 i i p intercept e2
is case tests only the equality of slope parameters and does not include the intercept
The Lanull hypothesis
hipótesis is rejectedsiif F > F1−α; p,n− p−1 . Note that the null hypothesis
nula es rechazada
rm.
7.2this
In case tests only
this chapter, wethe equality
have alreadyofexplored
slope parameters and does
the associations not include the interce
between
term.
ator,
xample and 11.7.2
bill withIn
delivery time forwe
this chapter, thehave
pizzaalready
data (Appendix
exploredA.4).
the If we
associations between
linear model
ranch, including
operator, and all with
bill of thedelivery
three variables,
time wethe
for obtain thedata
pizza following
(Appendix A.4). If we
Example 11.7.2 In this chapter, we have already explored the associations betwee
t abranch,
multipleoperator,
linear model including
and bill all of the
with delivery three
time variables,
for the we obtain
pizza data the following
(Appendix A.4). If w
sults:
fit a multiple linear model including all of the three variables, we obtain the followin
lm(time∼branch+bill+operator))
results:
summary(lm(time∼branch+bill+operator))
ients:
summary(lm(time∼branch+bill+operator))
Estimate Std. Error t value Pr(>|t|)
Coefficients:
ept) 26.19138 0.78752 33.258 < 2e-16 ***
-3.03606 Estimate
ast Coefficients: 0.42330 Std. Error
-7.172 t value
1.25e-12 *** Pr(>|t|)
A esta probabilidad se le llama valor p (p - value)
Y Y Y
X1
X1
X2 X2 X1 X2
a b c
Sin colinealidad colinealidad moderada colinealidad alta
Figura 1
1. Los estimadores presentan varianzas y covarianzas altas. Esto, a pesar de ser los mejores
estimadores lineales insesgados.
2. Los intervalos de confianza tienden a ser más amplios, haciendo más fácil la aceptación
de la hipótesis nula que el parámetro es cero.