You are on page 1of 61

VIN NGHIN CU Y X HI HC

ng dng phn tch hi quy


Nguyn Trng Nam

Copyright Bn quyn thuc v tc gi v thongke.info. Khi s dng mt
phn hoc ton b bi ging ngh mi ngi trch dn: tn tc gi v
thongke.info. V d: Nguyn Th Linh Thongke.info.

Ni dung

Ti sao cn phn tch hi quy?
Cc bc xy dng m hnh hi quy
Hi quy tuyn tnh a bin
Hi quy logic





y l v d v hi quy a bin, chng ta c tnh
Y=im tng kt nm u tin i hc X1=xp
hng THPT, X2= im thi vo i hc, X3=gii
tnh.
Ti sao cn phn tch hi quy?
Trong nghin cu bn th nghim (quasi-experiment), nghin cu vin
khng th c kh nng thay i (manipulate) cc bin c lp, do
thng c cc bin nhiu xut hin. Chng ta c gng khc phc
tnh hung ny bng phng php thng k c th l s dng hi quy
a bin.
Trong hi quy a bin mi lin h ca bin ph thuc (kt qu) v bin
c lp (tc ng) c nh gi trong khi kim sot cc bin nhiu
khc
Mc ch ca hi quy a bin: 1) d bo (prediction): tm hiu/pht
hin cc yu t c th d bo mt hin tng (bin kt qu); 2) gii
thch (explaination): tm hiu/pht hin cc h thng/quy trnh hoc
nguyn nhn dn ti mt hin tng.
James Cotter (2001) HUMD5122-Applied
Regression Analysis
L do cn phn tch a bin- v d
ISMS Nghin cu nh gi tc ng ca mt chng trnh can thip
(ko di 2 nm) ln kin thc v hnh vi ca tr em ng ph ti HP
v HCMC 2010
Hai nhm: tham gia vo d n (nhm can thip) nhm khng tham
gia d n (nhm chng)
S thay i kin thc, thi , hnh vi ca nhm can thip s khng
ch chu tc ng ca d n ni ring m cn chu tc ng ca cc yu
t khc v d cc chng trnh PC HIV khc trn a bn, mi trng
sng, tui tc, gii tnh, c s dng ma ty, c bn dm, ngh kim
sng.
Nh vy mi lin quan gia tham gia d n v thay i hnh vi phi
c xem xt/phn tch khi kim sot cc tc ng ca cc yu t
nhiu khc. y chnh l nguyn l ca phn tch a bin
Ch s Nam
N=403
N
N=179
Tng
N=582

Tip cn
d n
%
Khng
tip cn
d n
%

Tip cn
d n
%
Khng
tip cn
d n
%

Tip cn
d n
%
Khng
tip cn
d n
%
Tng
%
C kin thc
ng v
HIV/AIDS
57.6*** 29.8 54.9** 28.9 56.9*** 29.5 39.7
im TB v
kin thc
HIV SD
7.49
1.25***
5.77
2.32
7.56
1.18***
6.33
1.85
7.50
1.23***
5.97
1.18
6.55
2.02
* P < 0.05, ** P < 0.01, *** P < 0.001
Bin c lp


N=582
C kin thc ng v HIV vi khng c
kin thc ng
T sut chnh (OR) 95% CI
Tip cn vi d n NAM
Khng tip cn (Nhm tham chiu-TC) - -
Tip cn 3.13*** (2.15, 4.54)
Trnh hc vn
<Tiu hc (TC) - -
Tiu hc-THCS 1.54 (0.85, 2.79)
>THCS 2.16* (1.01, 4.65)
S dng ma ty
Khng s dng (TC) - -
S dng ma ty, khng chch 1.61* (1.02, 2.53)
Chch ma ty 2.02** (1.19, 3.45)
Ngh kim sng
Bn hng rong hoc nh giy (TC) - -
Lm vic phm php 0.80 (0.46, 1.40)
Bn dm 1.04 (0.57, 1.90)
Lao ng ph thng 0.78 (0.48, 1.26)
Tht nghip/khng c thu nhp 0.42* (0.19, 0.92)

Cc bin kim sot: gii, tui, MSM v ngh kim sng.


* P < 0.05, ** P < 0.01, *** P < 0.001

Bng: So snh mc s dng BCS vi PNMD trong vng 6 thng qua gia nhm tip
cn v khng tip cn d n


Tip cn vi d
n
N=29
%
Khng tip cn vi
d n
N=48
%
Tng
N=77
%
Mc s dng BCS vi
PNMD
Lun lun 58.6* 35.4 38.2
Thng xuyn 10.3 14.6 13.0
Thnh thong 31.0 22.9 26.0
Him khi 0 27.9 16.0
Mt s bng chng cho thy d n NAM c tc ng tch cc vo hnh vi s
dng BCS vi PNMD ca nam TNDP.T l lun lun s dng BCS trong vng 6
thng qua ca trong nhm c tip cn vi d n NAM l 59% trong khi t l ny
nhm khng tip cn vi d n ch l 35%.
* P < 0.05, ** P < 0.01, *** P < 0.001


Bin c lp
N=82
S dng BCS vi PNMD tt c cc ln
(So vi khng phi tt c cc ln)
OR (t sut chnh) 95% CI
Tip cn vi d n NAM
Khng tip cn (TC) - -
Tip cn 3.55 (0.89, 14.11)
Tnh/TP
Hai Phong (TC) - -
Ho Chi Minh 0.06*** (0.01, 0.28)
Phn loi TNDP
Cc nhm khc (TC) - -
Nhm A 0.19* (0.04, 0.94)
S dng ma ty
Khng s dng (TC) -
S dng ma ty nhng
khng chch
0.74 (0.16, 3.41)
Tim chch 0.13* (0.02, 0.91)
Mi quan h gia tip cn d n v s dng BCS trong phn tch a bin khng
cn ngha thng k (P > 0.005). Kt qu ny khc so vi cc phn tch i bin,
bi v m hnh a bin kim sot cc yu t nhiu tim tng
Hi quy logic gia s dng bao cao su vi PNMD v tip cn vi d n NAM
*P < .05 **P<.01 ***P<.001


Xy dng m hnh hi quy: a bin, logic
Hi quy a bin
Y = Bin ph thuc, lin tc
X1, X2, Tt c cc bin c lp l lin tc
hoc
X1, X2, Cc bin c lp bao gm c
bin lin tc v bin nh phn (dummy).
X1, X2, Tt c cc bin c lp l bin
nh phn (dummy).
Hi quy Logic
Y: Bin ph thuc l bin nh phn (bin
u ra).
X1, X2, Tt c cc bin c lp l bin
lin tc hoc
X1, X2, Cc bin c lp bao gm c
bin lin tc v bin nh phn (dummy).
X1, X2, Tt c cc bin c lp l bin
nh phn (dummy).
V d.
Hi quy a bin (Multiple Regression)
S ln khm thai = b0 + b1(tui) + b2(dn tc)+
b3(hc vn) + b4(tnh trng hn nhn).
Hi quy Logic (Logistic Regression)
Hnh vi no ph thai = b0 + b1(tui) + b2(dn
tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5
(Ngh nghip) +b6 (tui quan h tnh dc ln
u).

Cc bc xy dng m hnh hi quy
1) Xc nh m hnh
Chuyn cu hi nghin cu thnh phng trnh
hi quy.
Xc nh cc bin c lp (bin d on).
2) nh gi cc yu t nguy c nh hng ti hiu
lc ca kim nh thng k (valid inference)
C mu: mu m bo 20 mu/bin c lp.
Phn phi chun ca bin s.
m bo cc bin c lp khng tng quan.
Kim tra v loi tr outliers.
Cc bc xy dng m hnh hi quy
3) Xy dng m hnh (fitting models)
Fitting full model (da trn hc thuyt theory).
Fitting tng model bng cch thm bin, nh gi
model fit da trn cc tiu ch thng k, la chn
model tt nht good fit (thm d explotary).
T ng, v d Stepwise regression.
4) chy m hnh hi quy v trnh by kt qu
Xy dng cc m hnh- Chn cc bin trong
m hnh nh th no?


Y: Bin ph thuc (bin kt qu).
X1, X2, X3: Bin c lp (bin d on).
M hnh hi quy a bin l tng l m hnh c cc bin c lp c
mi lin quan ln (tuyn tnh) vi Y (bin ph thuc) v bin c lp
tng i c lp vi nhau.
iu ny t ra cu hi chung l lm th no thit k m hnh hi a
bin tt?. Trong trng hp chng ta ang s dng hi quy a bin
kim nh gi thuyt, tt nht l nn da vo chnh gi thuyt
quyt nh nhng bin c lp no s c s dng trong m hnh.
Nhng trong vic thit k m hnh tt kim nh mt gi thuyt,
chng ta cng cn phi dng mt s cc tiu ch thng k c
cp quyt nh xy dng m hnh.
James Cotter (2001) HUMD5122-Applied
Regression Analysis
Cc nguyn tc thit k m hnh hi quy a
bin tt.

C gng a tt c cc bin c lin quan quan trng vo
m hnh hi quy (nu khng th tham s c tnh c th b
sai s). Trong nghin cu bn th nghim, chng ta c
gng a tt c cc bin nhiu khng kim sot c quan
trng vo m hnh.
m bo s cn bng gia m hnh t tham sParsimony
v Good fit (c th lm tng ln bng cch thm cc
tham s).
Khng nn s dng qu nhiu bin c lp cho mt s hn
inh i tng nghin cu. Mt nguyn tc l mi bin
c lp c a vo m hnh phi c t nht 20 i tng
quan st (Tt nht l 40-50 i tng cho 1 bin c lp,
nht l khi xy dng lun thuyt).
Cc nguyn tc thit k m hnh hi quy a
bin tt.
S dng cc bin c lp khng c mi tng quan ln
nhau (Trnh Multicolinearity). Bin c lp phi tng i
c lp.

Khng a cc bin c lp ging nhau (tha) vo cng
mt m hnh. V d: khng s dng 2 bin (X1) cp hc
trong k thi cui cng v (X2) xp hng trong k thi cui
cng trong cng mt m hnh d on v mt s cc
thay i kt qu ca hc sinh- v 2 bin ny u da trn
cc khi nim v thng k gn nh nhau (redundant).
James Cotter (2001) HUMD5122-Applied
Regression Analysis
V d.
Hi quy a bin (Multiple Regression)
S ln i khm thai = b0 + b1(tui) + b2(dn
tc)+ b3(hc vn) + b4(tnh trng hn nhn) +
b5(ngh nghip).
Hi quy Logic (Logistic Regression)
Hnh vi no ph thai = b0 + b1(tui) + b2(dn
tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5
(Ngh nghip) +b6 (tui quan h tnh dc ln
u).

Cc bin c s dng trong m hnh.
1/ M hnh hi quy a bin.
Bin ph thuc: S ln i khm thai (Q83)
Bin c lp.
1. Tui (Q2)
2. Dn tc (q3)
3. Hc vn (q10)
4. Tnh trng hn nhn (q5)
5. Ngh nghip (Q8)

2/ M hnh hi quy logistic.
Bin ph thuc: tng no ph thai cha? (Q40_recode).
Cc bin c lp.
1. Tui (Q2)
2. Dn tc (q3)
3. Hc vn (q10)
4. Tnh trng hn nhn (q5)
5. Ngh nghip (Q8)
6. Tui quan h tnh dc ln u (Q27).
Chun b cc bin cho m hnh hi quy
(Variable transformation for regression).
1/ Kim tra s phn b chun ca bin ph thuc.
2/ Kim tra tnh c lp-khng tng quan ca cc bin c
lp.
3/ to hoc recode li cc bin c lp danh mc thnh cc
bin dummy.
Vi cc bin c 2 la chn tr li, recode li thnh 1 v 0.
Vi nhng bin c t 3 la chn tr li tr ln, th s to cc bin
dummy (1-0) cho mi la chn. a (n-1) bin vo m hnh (bin
cn li m khng c a vo m hnh s l bin tham kho -
reference cho cc bin khc).
V d: Hi quy a bin.

Hi quy a bin (Multiple Regression)
S ln i khm thai = b0 + b1(tui) + b2(dn
tc)+ b3(hc vn) + b4(tnh trng hn nhn) +
b5(ngh nghip).
C th thm: tnh trnh kinh t

Kim tra s phn b chun ca bin ph thuc: S ln i khm thai?
Bin ph thuc:
Cc bin c lp trong m hnh phi c lp v
khng tng quan vi nhau
1. Da vo kt qu t cc nghin cu khc
2. Da vo cc phng php tnh ton.
- Kim tra bng lnh correlate:
Analyze/correlate/bivariate.
- Kim tra trc tip trong linear thng qua
collinerity diagnostics: tolerance (<0.1 -
bad) and VIF (1/tolerance)
Cch 1: Kim tra bng lnh correlate:
Analyze/correlate/bivariate.
CORRELATIONS
/VARIABLES=Q2 Q3 Q10 Q5 Q8
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.






Vi nhng gi tr Pearson Corrleration >0.7, chng t cc bin
tng quan ln vi nhau do phi loi 1 trong cc bin ra khi
m hnh.



Cch 2: Kim tra trc tip khi chy regression thng qua
colinearity diagnostics.
TOLERANCE (<0.1 bad)
VIF (1/tolerance)

Chuyn cc bin c lp (ri rc) v dng
dummy.
*********independent vars*******
***ethnicity****

Recode Q3 (1=1) (2 thr 8 =0) into ethnicre.
var label ethnicre "Ethnicity-Kinh and other".
value label ethnicre 1"Kinh" 0"Other".
missing value ethnicre(9).

*******education********

RECODE q7 (SYSMIS=SYSMIS) (0 thru 5 = 1) (6 thru 9 = 2) (10 thru 12= 3) (13 thru 15=4) (99=SYSMIS) INTO educat.
VARIABLE LABEL educat 'educat - Education completed, categorized'.
VALUE LABEL educat
1 'Primary/Under primary School'
2 'Secondary School'
3 'High/vocational school'
4 'College/University and above'.
EXECUTE.
FREQUENCIES educat.
****To cc bin dummy
compute edu2=9.
if (educat=2) edu2=1.
if (educat=1 or educat=3 or educat=4) edu2=0.
VARIABLE LABELS edu2 "Edu2-Secondary school".
value labels edu2 1"Secondary school" 0 "Other".
missing values edu2 (9).

Compute edu3=9..
if (educat=3) edu3=1.
if (educat=1 or educat=2 or educat=4) edu3=0.
VARIABLE LABELS edu3 "Edu3-High school".
value labels edu3 1"High school" 0 "Other".
missing values edu3 (9).

Compute edu4=9.
if (educat=4) edu4=1.
if (educat=1 or educat=2 or educat=3) edu4=0.
VARIABLE LABELS edu4 "Edu4-College/higher".
value labels edu4 1"college/higher" 0 "Other".
missing values edu4 (9).
***************Marital status

RECODE q5 (1=1) (2=2) (3=1) (4=2) (5=3) INTO q5recode.
VARIABLE LABEL Q5RECODE 'Q5recode-Marital status recategorized'.
VALUE LABELs Q5recode
1 'Married or lives with partner'
2 'Divorced/widowed/separated/not living with spouse'
3 'Single (never married)'.
execute.
******To cc bin dummy.
compute mar1=9.
if (q5recode=1) mar1=1.
if (q5recode=2 or q5recode=3) mar1=0.
VARIABLE LABELS mar1 "Mar1-Married or lives with a partner".
value labels mar1 1"Married/live with a partner" 0"Other".
missing values mar1(9).
execute.

compute mar2=9.
if (q5recode=2) mar2=1.
if (q5recode=1 or q5recode=3) mar2=0.
VARIABLE LABELS mar2 "Mar2-divorced/widowed".
value labels mar2 1"widowed/divorced" 0"Other".
missing values mar2(9).
execute.
***********Occupation

Recode Q8 (1=0) (2 thr 8 =1) into occunew.
var label occunew "Occupation-Famer and other".
value label occunew 0"Famer" 1"Other".
missing value occunew(9).

Fitting full model.
S LN KHM THAI = a + b1(tui) + b2(dn
tc) + b3(hc vn cp 2) + b4 (hc vn cp 3) +
b5(hc vn trn cp 3) + b6 (tt hn nhn)+ b7
(ngh nghip)

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Q83_Re
/METHOD=ENTER Q2 ethnicre edu2 edu3 edu4 mar1 mar2 occunew.

Fitting tng model bng cch thm bin, nh gi
model fit da trn cc tiu ch thng k.

*****Model with 3 independent vars
S LN KHM THAI = a + b1(tui) + b2(dn
tc) + b3 (ngh nghip)

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Q83_Re
/METHOD=ENTER Q2 ethnicre occunew.


*******5 independents vars

S LN KHM THAI = a + b1(tui) + b2(dn tc) + b3 (ngh
nghip) + b4(cp 2) + b5(cp 3) + b6(> cp 3)+b7 (tt hn nhn).

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Q83_Re
/METHOD=ENTER Q2 ethnicre edu2 edu3 edu4 occunew mar1 mar2.


S thay i R
2
sau khi thm bin ngh nghip
v hc vn.
Stepwise.
*******Backward

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Q83_Re
/METHOD=BACKWARD Q2 ethnicre mar1 mar2 occunew edu2 edu3
edu4.


******Forward

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Q83_Re
/METHOD=FORWARD Q2 ethnicre mar1 mar2 ethnicre mar1 mar2
occunew edu2 edu3 edu4.


V d.
Hi quy Logic (Logistic Regression)

Hnh vi no ph thai = b0 + b1(tui) + b2(dn
tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5
(Ngh nghip) +b7 (tui quan h tnh dc ln
u).

Cc bin c lp trong m hnh phi uncorrelated or independent
vi nhau

1. Da vo kt qu cc nghin cu khc
2. Da vo cc phng php tnh ton.
- Kim tra bng lnh correlate: Analyze/correlate/bivariate.
- Kim tra trc tip trong logistic regression thng qua
correlation.


Recode bin c lp thnh bin dummy
*********Age of first intercourse
Recode Q27 (14 thru 17=1) (18 thru 24=2) (25 thru highest=3) into agefirstsex_cat.
***To bin dummy**
compute agefirstsex2=9.
if (agefirstsex_cat=2) agefirstsex2=1.
if (agefirstsex_cat=1) or (agefirstsex_cat=3) agefirstsex2=0.
VARIABLE LABELS agefirstsex2 "18-24 had first intercourse".
value labels agefirstsex2 1 "had first sex at 18-24" 0"Other".
missing values agefirstsex2(9).

compute agefirstsex3=9.
if (agefirstsex_cat= 3) agefirstsex3=1.
if (agefirstsex_cat=1) or (agefirstsex_cat=2) agefirstsex3=0.
VARIABLE LABELS agefirstsex3 ">24 had first intercourse".
value labels agefirstsex3 1 "had first sex at >24" 0"Other".
missing values agefirstsex3 (9).
Fitting full model.
tng no thai = a + b1(tui) + b2(dn tc) + b3(hn
nhn) + b4(ngh nghip) + b5(hc cp 2) + b6(hc cp 3)
+ b7(trn cp 3) + b8(tui ln u c sex)

LOGISTIC REGRESSION VARIABLES everabor
/METHOD=ENTER Q2 ethnicre mar1 mar2 occunew edu2
edu3 edu4 agefirstsex2 agefirstsex3
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20)
CUT(0.5).
Fitting tng model bng cch thm bin, nh gi
model fit da trn cc tiu ch thng k.

*****Model vi 2 bin c lp (independent vars)
tng no thai = a + b1(tui) + b2(dn tc)

LOGISTIC REGRESSION VARIABLES everabor
/METHOD=ENTER Q2 ethnicre
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

*******m hnh vi 4 bin c lp - independents vars

tng no thai = a + b1(tui) + b2(dn tc) + b3(hn
nhn) + b4(ngh nghip)

LOGISTIC REGRESSION VARIABLES everabor
/METHOD=ENTER Q2 ethnicre mar1 mar2 occunew
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20)
CUT(0.5).

M hnh vi 6 bin c lp - independent vars
tng no thai = a + b1(tui) + b2(dn tc) +
b3a( lp gia nh) + b3b(ly d/ga)+ b4(ngh
nghip) + b5a(hc cp 2) + b5b(hc cp 3) +
b5c(trn cp 3) + b6a(tui ln u c sex 19-
24)+b6b(tui ln u c sex >24)
LOGISTIC REGRESSION VARIABLES everabor
/METHOD=ENTER Q2 ethnicre mar1 mar2 occunew
edu2 edu3 edu4 agefirstsex2 agefirstsex3
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20)
CUT(0.5).

S thay i ca -2log likelihooh
M hnh 2 bin c lp: 1509
M hnh 4 bin: 1492
M hnh 6 bin: 1469
Stepwise.
*******Backward

LOGISTIC REGRESSION VARIABLES everabor
/METHOD=BSTEP(COND) Q2 ethnicre mar1 mar2
occunew edu2 edu3 edu4 agefirstsex2 agefirstsex3
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20)
CUT(0.5).


******Forward
LOGISTIC REGRESSION VARIABLES everabor
/METHOD=FSTEP(COND) Q2 edu2 edu3 edu4 ethnicre
mar1 mar2 occunew agefirstsex2 agefirstsex3
/PRINT=GOODFIT CORR
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20)
CUT(0.5).

You might also like