You are on page 1of 61

VIN NGHIN CU Y X HI HC

ng dng phn tch hi quy


Nguyn Trng Nam

Copyright Bn quyn thuc v tc gi v thongke.info. Khi s dng mt phn hoc ton b bi ging ngh mi ngi trch dn: tn tc gi v thongke.info. V d: Nguyn Th Linh Thongke.info.

Ni dung
Ti sao cn phn tch hi quy? Cc bc xy dng m hnh hi quy Hi quy tuyn tnh a bin Hi quy logic

y l v d v hi quy a bin, chng ta c tnh Y=im tng kt nm u tin i hc X1=xp hng THPT, X2= im thi vo i hc, X3=gii tnh.

Ti sao cn phn tch hi quy?

Trong nghin cu bn th nghim (quasi-experiment), nghin cu vin khng th c kh nng thay i (manipulate) cc bin c lp, do thng c cc bin nhiu xut hin. Chng ta c gng khc phc tnh hung ny bng phng php thng k c th l s dng hi quy a bin. Trong hi quy a bin mi lin h ca bin ph thuc (kt qu) v bin c lp (tc ng) c nh gi trong khi kim sot cc bin nhiu khc Mc ch ca hi quy a bin: 1) d bo (prediction): tm hiu/pht hin cc yu t c th d bo mt hin tng (bin kt qu); 2) gii thch (explaination): tm hiu/pht hin cc h thng/quy trnh hoc nguyn nhn dn ti mt hin tng.

James Cotter (2001) HUMD5122-Applied Regression Analysis

L do cn phn tch a bin- v d

ISMS Nghin cu nh gi tc ng ca mt chng trnh can thip (ko di 2 nm) ln kin thc v hnh vi ca tr em ng ph ti HP v HCMC 2010 Hai nhm: tham gia vo d n (nhm can thip) nhm khng tham gia d n (nhm chng) S thay i kin thc, thi , hnh vi ca nhm can thip s khng ch chu tc ng ca d n ni ring m cn chu tc ng ca cc yu t khc v d cc chng trnh PC HIV khc trn a bn, mi trng sng, tui tc, gii tnh, c s dng ma ty, c bn dm, ngh kim sng. Nh vy mi lin quan gia tham gia d n v thay i hnh vi phi c xem xt/phn tch khi kim sot cc tc ng ca cc yu t nhiu khc. y chnh l nguyn l ca phn tch a bin

Ch s

Nam N=403 Tip cn d n % Khng tip cn d n %

N N=179 Tip cn d n % Khng tip cn d n % Tip cn d n %

Tng N=582 Khng tip cn d n % Tng %

C kin thc ng v HIV/AIDS

57.6***

29.8

54.9**

28.9

56.9***

29.5

39.7

im TB v kin thc HIV SD

7.49 1.25***

5.77 2.32

7.56 1.18***

6.33 1.85

7.50 1.23***

5.97 1.18

6.55 2.02

* P < 0.05, ** P < 0.01, *** P < 0.001

Bin c lp N=582 Tip cn vi d n NAM Khng tip cn (Nhm tham chiu-TC) Tip cn Trnh hc vn <Tiu hc (TC) Tiu hc-THCS >THCS S dng ma ty Khng s dng (TC) S dng ma ty, khng chch Chch ma ty Ngh kim sng Bn hng rong hoc nh giy (TC) Lm vic phm php Bn dm Lao ng ph thng Tht nghip/khng c thu nhp
* P < 0.05, ** P < 0.01, *** P < 0.001
Cc

C kin thc ng v HIV vi khng c kin thc ng T sut chnh (OR) 95% CI 3.13*** 1.54 2.16* 1.61* 2.02** 0.80 1.04 0.78 0.42* (2.15, 4.54) (0.85, 2.79) (1.01, 4.65) (1.02, 2.53) (1.19, 3.45) (0.46, 1.40) (0.57, 1.90) (0.48, 1.26) (0.19, 0.92)

bin kim sot: gii, tui, MSM v ngh kim sng.

Bng: So snh mc s dng BCS vi PNMD trong vng 6 thng qua gia nhm tip cn v khng tip cn d n

Tip cn vi d n N=29 % Mc s dng BCS vi PNMD Lun lun Thng xuyn Thnh thong Him khi
* P < 0.05, ** P < 0.01, *** P < 0.001

Khng tip cn vi d n N=48 %

Tng N=77 %

58.6* 10.3 31.0 0

35.4 14.6 22.9 27.9

38.2 13.0 26.0 16.0

Mt s bng chng cho thy d n NAM c tc ng tch cc vo hnh vi s dng BCS vi PNMD ca nam TNDP.T l lun lun s dng BCS trong vng 6 thng qua ca trong nhm c tip cn vi d n NAM l 59% trong khi t l ny nhm khng tip cn vi d n ch l 35%.

Hi quy logic gia s dng bao cao su vi PNMD v tip cn vi d n NAM


Bin c lp N=82 Tip cn vi d n NAM Khng tip cn (TC) Tip cn Tnh/TP Hai Phong (TC) Ho Chi Minh Phn loi TNDP Cc nhm khc (TC) Nhm A S dng ma ty Khng s dng (TC) S dng ma ty nhng khng chch Tim chch
*P < .05 **P<.01 ***P<.001

S dng BCS vi PNMD tt c cc ln (So vi khng phi tt c cc ln) OR (t sut chnh) 95% CI

3.55
0.06*** 0.19* 0.74 0.13*

(0.89, 14.11)
(0.01, 0.28) (0.04, 0.94)

(0.16, 3.41) (0.02, 0.91)

Mi quan h gia tip cn d n v s dng BCS trong phn tch a bin khng cn ngha thng k (P > 0.005). Kt qu ny khc so vi cc phn tch i bin, bi v m hnh a bin kim sot cc yu t nhiu tim tng

Xy dng m hnh hi quy: a bin, logic

Hi quy a bin
Y = Bin ph thuc, lin tc X1, X2, Tt c cc bin c lp l lin tc hoc X1, X2, Cc bin c lp bao gm c bin lin tc v bin nh phn (dummy). X1, X2, Tt c cc bin c lp l bin nh phn (dummy).

Hi quy Logic
Y: Bin ph thuc l bin nh phn (bin u ra). X1, X2, Tt c cc bin c lp l bin lin tc hoc X1, X2, Cc bin c lp bao gm c bin lin tc v bin nh phn (dummy). X1, X2, Tt c cc bin c lp l bin nh phn (dummy).

V d.

Hi quy a bin (Multiple Regression)


S ln khm thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn).

Hi quy Logic (Logistic Regression)


Hnh vi no ph thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5 (Ngh nghip) +b6 (tui quan h tnh dc ln u).

Cc bc xy dng m hnh hi quy

1) Xc nh m hnh

Chuyn cu hi nghin cu thnh phng trnh hi quy. Xc nh cc bin c lp (bin d on). 2) nh gi cc yu t nguy c nh hng ti hiu lc ca kim nh thng k (valid inference)
C mu: mu m bo 20 mu/bin c lp. Phn phi chun ca bin s. m bo cc bin c lp khng tng quan. Kim tra v loi tr outliers.

Cc bc xy dng m hnh hi quy


3) Xy dng m hnh (fitting models) Fitting full model (da trn hc thuyt theory). Fitting tng model bng cch thm bin, nh gi model fit da trn cc tiu ch thng k, la chn model tt nht good fit (thm d explotary). T ng, v d Stepwise regression. 4) chy m hnh hi quy v trnh by kt qu

Xy dng cc m hnh- Chn cc bin trong m hnh nh th no?


Y: Bin ph thuc (bin kt qu). X1, X2, X3: Bin c lp (bin d on). M hnh hi quy a bin l tng l m hnh c cc bin c lp c mi lin quan ln (tuyn tnh) vi Y (bin ph thuc) v bin c lp tng i c lp vi nhau. iu ny t ra cu hi chung l lm th no thit k m hnh hi a bin tt?. Trong trng hp chng ta ang s dng hi quy a bin kim nh gi thuyt, tt nht l nn da vo chnh gi thuyt quyt nh nhng bin c lp no s c s dng trong m hnh. Nhng trong vic thit k m hnh tt kim nh mt gi thuyt, chng ta cng cn phi dng mt s cc tiu ch thng k c cp quyt nh xy dng m hnh.

James Cotter (2001) HUMD5122-Applied Regression Analysis

Cc nguyn tc thit k m hnh hi quy a bin tt.


C gng a tt c cc bin c lin quan quan trng vo m hnh hi quy (nu khng th tham s c tnh c th b sai s). Trong nghin cu bn th nghim, chng ta c gng a tt c cc bin nhiu khng kim sot c quan trng vo m hnh. m bo s cn bng gia m hnh t tham sParsimony v Good fit (c th lm tng ln bng cch thm cc tham s). Khng nn s dng qu nhiu bin c lp cho mt s hn inh i tng nghin cu. Mt nguyn tc l mi bin c lp c a vo m hnh phi c t nht 20 i tng quan st (Tt nht l 40-50 i tng cho 1 bin c lp, nht l khi xy dng lun thuyt).

Cc nguyn tc thit k m hnh hi quy a bin tt.

S dng cc bin c lp khng c mi tng quan ln nhau (Trnh Multicolinearity). Bin c lp phi tng i c lp. Khng a cc bin c lp ging nhau (tha) vo cng mt m hnh. V d: khng s dng 2 bin (X1) cp hc trong k thi cui cng v (X2) xp hng trong k thi cui cng trong cng mt m hnh d on v mt s cc thay i kt qu ca hc sinh- v 2 bin ny u da trn cc khi nim v thng k gn nh nhau (redundant).

James Cotter (2001) HUMD5122-Applied Regression Analysis

V d.

Hi quy a bin (Multiple Regression)


S ln i khm thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn) + b5(ngh nghip).

Hi quy Logic (Logistic Regression)


Hnh vi no ph thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5 (Ngh nghip) +b6 (tui quan h tnh dc ln u).

Cc bin c s dng trong m hnh.


1/ M hnh hi quy a bin. Bin ph thuc: S ln i khm thai (Q83) Bin c lp. 1. Tui (Q2) 2. Dn tc (q3) 3. Hc vn (q10) 4. Tnh trng hn nhn (q5) 5. Ngh nghip (Q8) 2/ M hnh hi quy logistic. Bin ph thuc: tng no ph thai cha? (Q40_recode). Cc bin c lp. 1. Tui (Q2) 2. Dn tc (q3) 3. Hc vn (q10) 4. Tnh trng hn nhn (q5) 5. Ngh nghip (Q8) 6. Tui quan h tnh dc ln u (Q27).

Chun b cc bin cho m hnh hi quy


(Variable transformation for regression).
1/ Kim tra s phn b chun ca bin ph thuc. 2/ Kim tra tnh c lp-khng tng quan ca cc bin c lp. 3/ to hoc recode li cc bin c lp danh mc thnh cc bin dummy.
Vi cc bin c 2 la chn tr li, recode li thnh 1 v 0. Vi nhng bin c t 3 la chn tr li tr ln, th s to cc bin dummy (1-0) cho mi la chn. a (n-1) bin vo m hnh (bin cn li m khng c a vo m hnh s l bin tham kho reference cho cc bin khc).

V d: Hi quy a bin.

Hi quy a bin (Multiple Regression)


S ln i khm thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn) + b5(ngh nghip). C th thm: tnh trnh kinh t

Bin ph thuc:

Kim tra s phn b chun ca bin ph thuc: S ln i khm thai?

Cc bin c lp trong m hnh phi c lp v khng tng quan vi nhau

Da vo kt qu t cc nghin cu khc 2. Da vo cc phng php tnh ton. - Kim tra bng lnh correlate: Analyze/correlate/bivariate. - Kim tra trc tip trong linear thng qua collinerity diagnostics: tolerance (<0.1
1.

bad) and VIF (1/tolerance)

Cch 1: Kim tra bng lnh correlate: Analyze/correlate/bivariate.


CORRELATIONS /VARIABLES=Q2 Q3 Q10 Q5 Q8 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.

Vi nhng gi tr Pearson Corrleration >0.7, chng t cc bin tng quan ln vi nhau do phi loi 1 trong cc bin ra khi m hnh.

Cch 2: Kim tra trc tip khi chy regression thng qua

colinearity diagnostics. TOLERANCE (<0.1 bad) VIF (1/tolerance)

Chuyn cc bin c lp (ri rc) v dng dummy. *********independent vars******* ***ethnicity**** Recode Q3 (1=1) (2 thr 8 =0) into ethnicre. var label ethnicre "Ethnicity-Kinh and other". value label ethnicre 1"Kinh" 0"Other". missing value ethnicre(9).

*******education******** RECODE q7 (SYSMIS=SYSMIS) (0 thru 5 = 1) (6 thru 9 = 2) (10 thru 12= 3) (13 thru 15=4) (99=SYSMIS) INTO educat. VARIABLE LABEL educat 'educat - Education completed, categorized'. VALUE LABEL educat 1 'Primary/Under primary School' 2 'Secondary School' 3 'High/vocational school' 4 'College/University and above'. EXECUTE.

FREQUENCIES educat.
compute edu2=9. if (educat=2) edu2=1. if (educat=1 or educat=3 or educat=4) edu2=0. VARIABLE LABELS edu2 "Edu2-Secondary school". value labels edu2 1"Secondary school" 0 "Other". missing values edu2 (9). Compute edu3=9.. if (educat=3) edu3=1. if (educat=1 or educat=2 or educat=4) edu3=0. VARIABLE LABELS edu3 "Edu3-High school". value labels edu3 1"High school" 0 "Other". missing values edu3 (9). Compute edu4=9. if (educat=4) edu4=1. if (educat=1 or educat=2 or educat=3) edu4=0. VARIABLE LABELS edu4 "Edu4-College/higher". value labels edu4 1"college/higher" 0 "Other". missing values edu4 (9).

****To cc bin dummy


***************Marital status

RECODE q5 (1=1) (2=2) (3=1) (4=2) (5=3) INTO q5recode. VARIABLE LABEL Q5RECODE 'Q5recode-Marital status recategorized'. VALUE LABELs Q5recode 1 'Married or lives with partner' 2 'Divorced/widowed/separated/not living with spouse' 3 'Single (never married)'. execute. ******To cc bin dummy. compute mar1=9. if (q5recode=1) mar1=1. if (q5recode=2 or q5recode=3) mar1=0. VARIABLE LABELS mar1 "Mar1-Married or lives with a partner". value labels mar1 1"Married/live with a partner" 0"Other". missing values mar1(9). execute.

compute mar2=9. if (q5recode=2) mar2=1. if (q5recode=1 or q5recode=3) mar2=0. VARIABLE LABELS mar2 "Mar2-divorced/widowed". value labels mar2 1"widowed/divorced" 0"Other". missing values mar2(9). execute.

***********Occupation
Recode Q8 (1=0) (2 thr 8 =1) into occunew. var label occunew "Occupation-Famer and other". value label occunew 0"Famer" 1"Other". missing value occunew(9).

Fitting full model.


S LN KHM THAI = a + b1(tui) + b2(dn tc) + b3(hc vn cp 2) + b4 (hc vn cp 3) + b5(hc vn trn cp 3) + b6 (tt hn nhn)+ b7 (ngh nghip)
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Q83_Re /METHOD=ENTER Q2 ethnicre edu2 edu3 edu4 mar1 mar2 occunew.

Fitting tng model bng cch thm bin, nh gi model fit da trn cc tiu ch thng k. *****Model with 3 independent vars S LN KHM THAI = a + b1(tui) + b2(dn tc) + b3 (ngh nghip)
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Q83_Re /METHOD=ENTER Q2 ethnicre occunew.

*******5 independents vars S LN KHM THAI = a + b1(tui) + b2(dn tc) + b3 (ngh nghip) + b4(cp 2) + b5(cp 3) + b6(> cp 3)+b7 (tt hn nhn). REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Q83_Re /METHOD=ENTER Q2 ethnicre edu2 edu3 edu4 occunew mar1 mar2.

S thay i R2 sau khi thm bin ngh nghip v hc vn.

Stepwise.
*******Backward
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Q83_Re /METHOD=BACKWARD Q2 ethnicre mar1 mar2 occunew edu2 edu3 edu4.

******Forward
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Q83_Re /METHOD=FORWARD Q2 ethnicre mar1 mar2 ethnicre mar1 mar2 occunew edu2 edu3 edu4.

V d.

Hi quy Logic (Logistic Regression)


Hnh vi no ph thai = b0 + b1(tui) + b2(dn tc)+ b3(hc vn) + b4(tnh trng hn nhn)+b5 (Ngh nghip) +b7 (tui quan h tnh dc ln u).

Cc bin c lp trong m hnh phi uncorrelated or independent vi nhau

1. 2.
-

Da vo kt qu cc nghin cu khc Da vo cc phng php tnh ton.


Kim tra bng lnh correlate: Analyze/correlate/bivariate. Kim tra trc tip trong logistic regression thng qua correlation.

Recode bin c lp thnh bin dummy


*********Age of first intercourse Recode Q27 (14 thru 17=1) (18 thru 24=2) (25 thru highest=3) into agefirstsex_cat. ***To bin dummy** compute agefirstsex2=9. if (agefirstsex_cat=2) agefirstsex2=1. if (agefirstsex_cat=1) or (agefirstsex_cat=3) agefirstsex2=0. VARIABLE LABELS agefirstsex2 "18-24 had first intercourse". value labels agefirstsex2 1 "had first sex at 18-24" 0"Other". missing values agefirstsex2(9).

compute agefirstsex3=9. if (agefirstsex_cat= 3) agefirstsex3=1. if (agefirstsex_cat=1) or (agefirstsex_cat=2) agefirstsex3=0. VARIABLE LABELS agefirstsex3 ">24 had first intercourse". value labels agefirstsex3 1 "had first sex at >24" 0"Other". missing values agefirstsex3 (9).

Fitting full model.


tng no thai = a + b1(tui) + b2(dn tc) + b3(hn nhn) + b4(ngh nghip) + b5(hc cp 2) + b6(hc cp 3) + b7(trn cp 3) + b8(tui ln u c sex) LOGISTIC REGRESSION VARIABLES everabor /METHOD=ENTER Q2 ethnicre mar1 mar2 occunew edu2 edu3 edu4 agefirstsex2 agefirstsex3 /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Fitting tng model bng cch thm bin, nh gi model fit da trn cc tiu ch thng k. *****Model vi 2 bin c lp (independent vars) tng no thai = a + b1(tui) + b2(dn tc)
LOGISTIC REGRESSION VARIABLES everabor /METHOD=ENTER Q2 ethnicre /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

*******m hnh vi 4 bin c lp - independents vars tng no thai = a + b1(tui) + b2(dn tc) + b3(hn nhn) + b4(ngh nghip) LOGISTIC REGRESSION VARIABLES everabor /METHOD=ENTER Q2 ethnicre mar1 mar2 occunew /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

M hnh vi 6 bin c lp - independent vars


tng no thai = a + b1(tui) + b2(dn tc) + b3a( lp gia nh) + b3b(ly d/ga)+ b4(ngh nghip) + b5a(hc cp 2) + b5b(hc cp 3) + b5c(trn cp 3) + b6a(tui ln u c sex 1924)+b6b(tui ln u c sex >24)

LOGISTIC REGRESSION VARIABLES everabor /METHOD=ENTER Q2 ethnicre mar1 mar2 occunew edu2 edu3 edu4 agefirstsex2 agefirstsex3 /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

S thay i ca -2log likelihooh


M hnh 2 bin c lp: 1509 M hnh 4 bin: 1492 M hnh 6 bin: 1469

Stepwise.
*******Backward

LOGISTIC REGRESSION VARIABLES everabor /METHOD=BSTEP(COND) Q2 ethnicre mar1 mar2 occunew edu2 edu3 edu4 agefirstsex2 agefirstsex3 /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

******Forward

LOGISTIC REGRESSION VARIABLES everabor /METHOD=FSTEP(COND) Q2 edu2 edu3 edu4 ethnicre mar1 mar2 occunew agefirstsex2 agefirstsex3 /PRINT=GOODFIT CORR /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

You might also like