You are on page 1of 94

S dng ch-ng trnh Stata khai thc s liu

iu tra Mc sng h gia nh (VLSS) *

ni dung
Ch-ng I: Gii thiu chung v ch-ng trnh Stata ............... 2
1.
2.
3.
4.
5.

T chc l-u tr d liu trong Stata (Dataset in Stata) ... 2


Khi ng v thot khi Stata (Open and exit) ............ 3
Giao din Stata 7 (Stata interface) ...................... 3
Bin bn lm vic (log file) ............................. 6
Nhp v l-u d liu (Use, input and and save) ............ 8

Ch-ng II: Khai thc d liu .................................. 10


1. Cu trc lnh trong Stata (Stata command syntax) ........ 11
2.Ton t v hm s (Operators and functions) .............. 14
3. M t d liu (Data reporting) .......................... 16
4. Bin tp v sa cha d liu (Data manipulation) ........ 28
5. Quyn s trong VHLSS (Weight) ........................... 43
Ch-ng III: Kim nh gi thit v phn tch hi quy .......... 47
1. c l-ng v kim nh gi thit (Estimation and hypothesis
testing) ................................................... 47
2. Phn tch t-ng quan v hi quy (Correlation and
regression) ................................................ 54
Ch-ng IV: V th .......................................... 61
1. V th (graph) ....................................... 61
2. Mt s loi th th-ng dng .......................... 70
3. L-u tr v hin th th (Saving and graph using) ..... 77
Ch-ng V: Lp trnh trong Stata ............................... 79
1.
2.
3.
4.
5.

Gii thiu chung v ch-ng trnh do-file ................ 79


Local v global macros .................................. 84
Tch v h-ng v ma trn (scalar and matrix) ............ 87
Lnh iu kin v vng lp .............................. 90
Gii thiu v file ado .................................. 92

Ti liu tham kho ............................................ 94


Ph lc ....................................................... 94

Ch-ng I: Gii thiu chung v ch-ng trnh Stata

1. T chc l-u tr d liu trong Stata (Dataset in Stata)


Stata l phn mm thng k s dng qun l, phn tch s liu
v v th. Stata cho php l-u tr thng tin v cc c im
ca cc i t-ng nghin cu. S liu l-u tr trong Stata c th
-c hin th d-i dng bng nh- v d sau:
hhcode
101
102
103

headname
Nguyen Van A
Le Thi B
Tran Van C

hhsize
6
5
10

incomepc
2100
3210
1200

Quan st (bn ghi)


Mi mt hng ngang ca bng s liu -c gi l mt quan st
(observation), hay mt bn ghi (record) l-u tr s liu v mt
i t-ng nghin cu. v d trn c 3 quan st l-u tr s liu
v M h (hhcode); Tn ch h (headname); Quy m h (hhsize);
Thu nhp bnh qun (incomepc) ca 3 h gia nh.
Bin (tr-ng; thuc tnh)
Thng tin v i t-ng nghin cu -c thu thp v l-u tr theo
cc c im ca chng. Cc c im ny -c gi l bin
(variable), hay tr-ng (field). Bin -c xem l cc ct ca
bng s liu. v d trn c 4 bin, vi tn l hhcoed,
hedname, hhsize, v incomepc. Tn bin di t 1 n 32 k t,
-c bt u ch hoc du gch d-i (_). Tn bin ch bao gm
ch, s v du gch d-i. Cc k t c bit khc khng th dng
t tn cho bin.
Bin xc nh (identifying variables)
Thng th-ng trong cc bin s c cc bin dng nhn dng
quan st, -c gi l bin xc nh. Nh c cc bin xc nh
ny m cc quan st c th phn bit -c vi nhau. Mi mt quan
st c mt gi tr ca cc bin ny. v d trn, bin xc nh
l hhcode, i vi mi mt quan st bin hhcode nhn mt gi
tr.
Cc c im ca bin
Cc bin c th -c gn nhn (ch thch). V d bin hhcode c
th -c gn nhn l M h.
Bin c th -c nh dng (format) l bin s v bin k t vi
cc loi l-u tr khc nhau. Bin s c th l-u tr d-i loi
byte; int; long; float; double. Cn bin k t th c th l-u
tr d-i dng str1 n str80 cho cc di khc nhau.
2

Kiu l-u
tr dng
s

Dung
l-ng
(Byte)

byte
int
long
float
double

1
2
4
4
8

Gi tr nh
nht

Gi tr ln
nht

-127
126
-32,767
32,766
-2,147,483,647 2,147,483,646
-10^36
10^36
-10^308
10^308

Kiu

S nguyn
S nguyn
S nguyn
S thc
S thc

Cc bin s c th bao gm cc bin ri rc v lin tc. Cc


bin nh- l quy m h gia nh, gii tnh ch h, vng a l,
trnh gio dc l cc bin ri rc (discrete) (hay cn gi l
bin phn loi (categorical)). Cc bin ny c th -c l-u tr
d-i dng byte, int, v long. Cc bin lin tc (continuous) nhthu nhp, chi tiu ca h th l-u tr d-i dng float hoc
double.
Bin k t (string) dng l-u tr cc loi k t. V d bin
headname l bin kiu k t dng l-u tr tn ca ch h.
Kiu l-u
tr dng
ch

Byte

di ln
nht

str1
str2
...
str80

1
2

1
2

80

80

2. Khi ng v thot khi Stata (Open and exit)


Stata -c khi ng t-ng t nh- cc ch-ng trnh tin hc ng
dng khc, bng cch kch vo biu t-ng ca tp wstata.exe
trong Windows explorer, hoc chn bng cch chn Start ->
Program -> Stata. Ch-ng trnh -c thot ra bng lnh exit t
ca s lnh Stata Command, hoc tu chn exit trong thc n
(menu) File.
3. Giao din Stata 7 (Stata interface)1
Sau khi Stata -c khi ng, giao din ca Stata s -c hin
ln, bao gm thanh thc n (menu bar) trn cng, d-i l
thanh cng c (tool bar) v cc ca s (windows).

Phin bn Stata 8 c giao din t-ng t nh- phin bn Stata 7. Khc


bit ln nht l Stata 8 c thm tu chn Statistics trong thanh thc
n. Tu chn ny cho php thc hin cc mt s lnh thng k bng cc
tu chn qua giao din ca s m khng phi g cc lnh trong ca s
Command.
3

Cc ca s ca Stata
Cc ca s ca Stata -c m ra bng vic la chn cc tu chn
thanh thc n Windows (menu bar). Cc ca s ny bao gm:
Results

Hin th cc lnh v kt qu

Graph

Hin th th

Viewer

Hin th ca s tr gip (help) v hin th ni


dung cc file vn bn (text)

Command

Dng g cc cu lnh

Review

Hin th cc lnh thc hin

Variables

Hin th danh sch cc bin ca tp s liu

Data editor

Hin th v sa cha s liu d-i dng bng

Do-file
editor

Hin th ca s son tho ch-ng trnh

Thanh thc n (Menu bar)

Bng cch kch vo thanh thc n v cc tu chn trong ,


Stata s thc hin cc lnh khc nhau. Thanh thc n bao gm
cc nhm lnh sau y:
File
Open

M file s liu

View

Xem cc file ca Stata trong ca s Viewer

Save

L-u file s liu

Save as

L-u file s liu d-i tn mi

File name

Chn tn file -a vo ca s lnh

Log

ng, m, xem li log file

Save graph

L-u gi file th

Print graph

In th

Print
results

In kt qu

Exit

Thot khi Stata

Edit
Copy text

Sao chp vn bn (text)

Copy tables

Sao chp bng biu

Paste

Dn

Table
options

copy La chn sao chp bng s liu

Graph
options

copy La chn sao chp th (khng c trong


Stata 7)

Prefs
kch c

Cc tu chn v mu sc, phng ch, v

Windows
Results

M ca s kt qu

Graph

M ca s th

Log

M ca s log file

Viewer

M ca s tr gip (help) v xem ni dung


file

Command

M ca s cu lnh
5

Review

M ca s cc lnh thc hin

Variables

M ca s danh sch cc bin ca tp s


liu

Help/Search

M ca s tr gip (help)

Data editor

M ca xem s liu l-u tr d-i dng


bng

Do-file editor

M ca s vit ch-ng trnh

Help
dng Stata

Cc tr gip lin quan n vic s

Thanh cng c (tool bar)


Cc tu chn trn thanh cng c -c thit k thc hin cc
lnh thng dng ca Stata. Nu chng ta di chuyn con tr n
cc nt ny th s hin ln cc cu hung dn, bao gm:
Open (use)

M file s liu Stata

Save

L-u tr file s liu ra a

Print results

In ni dung ca ca s kt qu

Begin log

M, ng v xem ni dung ca file log

Start viewer

M ca s tr tr (help)

Bring Dialog
to font

Window -a ca s hp thoi ra pha tr-c

Bring Result
to font

Window -a ca s kt qu ra pha tr-c

Bring Graph Window to -a ca s v th ra pha tr-c


font
Do-file editor

M ca s son tho ch-ng trnh

Data editor

M ca s sa cha s liu

Data browser

M ca s xem s liu

Clear
condition
Break

more- Tt lnh more


Dng vic thc hin lnh hoc ch-ng
trnh

4. Bin bn lm vic (log file)


Thng th-ng khi lm vic vi Stata, ng-i s dng mun ghi li
bin bn lm vic bao gm cc lnh, cc thng bo v cc kt qu
6

phn tch thu -c. Stata cho php ghi li cc bin bn lm vic
bng lnh log using.
C php:
log using (-ng dn\tn tp) [, append replace [ text | smcl ]
]
Cc tu chn:
append

Ghi bin bn lm vic tip vo 1 file c


sn

replace

Ghi li bin bn lm vic ln 1 file c


sn

text

To bin bn lm vic d-i


(text) (phn m rng l log)

smcl

To bin bn lm vic d-i dng smcl (phn m


rng l smcl), y cng l tu chn ngm nh

dng

vn

bn

V d:
log using baitap1

To tp baitap1 ghi li bin bn lm


vic ti th- mc hin thi, phn m
rng mc nh l smcl

. log using baitap1


-----------------------------------------------------------------------------log:
log type:
opened on:
log
using
replace

C:\baitap1.smcl
smcl
17 Feb 2004, 15:32:03
baitap1, To tp baitap1
baitap1 c sn

ghi

ln

tp

log
using To tp baitap2 ti a D, d-i
d:\baitap2, text
dng vn bn (text) (phn m rng l
log)
log
using Ghi tip tc bin bn lm vic tp
d:\baitap2, append
baitap2 ti a D
Cc tp vi phn m rng smcl c th chuyn thnh cc tp text
bng lnh translate.
V d:
translate baitap1.smcl

exercise1.log

log off

Lnh ny tm thi dng vic ghi li bin bn lm vic vo tp


log/smcl ang m
log on
Lnh ny tip tc ghi bin bn lm vic vo tp log ang m.
Lnh ny -c dng sau ln log using hoc log off.
log close
Lnh ny ng v l-u tr tp log ang m.
Ch :
-

Stata cho php ch ghi li nhng g m ng-i s dng g


trong ca s command, vic ny gip cho vic sau ny vit
cc ch-ng trnh da trn nhng bin bn lm vic. C
php:
cmdlog using (-ng dn\tn tp) [, append replace]
cmdlog {off | on | close}

xem cc file log/smcl vo thanh thc n: file/log/view


(hoc ca s lnh command g: view (tn tp)); hoc c
th m bng cc ch-ng trnh son thao vn bn khc nhMS-Word; Notepad

5. Nhp v l-u d liu (Use, input and and save)


M tp s liu ang c:
C php:
use (-ng dn\tn tp)
Lnh ny m tp Stata, vi phn rng l .dta, -c ch ra tn
tp.
V d:
use ho1.dta

m tp ho1.dta th- mc hin thi

use "D:\VHLSS
2004\ho1.dta", clear

m tp ho1.ta th- mc VHLSS 2004


trn D

Tp s liu Stata c th -c m bng la chn Open trn thc


n File; hoc nt Open (use) trn thanh cng c tool bar.
Nu file s liu c dung l-ng ln th chng ta phi thit lp
b nh cn dng cho Stata bng lnh:
set memory #[k|m]
V d:
set mem 32m
set mem 32000k
8

Nhp s liu
C mt s cch nhp s liu t bn phm vo b nh ca Stata.
-

S dng ca s Stata editor nhp s liu. Hoc t ca


s command, g lnh edit. Sau nhp s liu theo kiu
biu bng trong ca s ny.

S dng lnh: input [danh sch bin + nh dng nu cn]


Sau s dng bn phm nhp s liu ln l-t cho cc
bin ca tng quan st. Gi tr -c nhp cch nhau 1 k
t trng. Kt thc nhp s liu bng lnh end.
V d:
. input hhcode str15 name income
hhcode

name

income

1. 101 "Nguyen Van A" 1200


2. 102 "Nguyen Van B" 1350
3. 103 "Tran Thi C" 2310
4. end
Stata cho php nhp s liu t cc file c s d liu khc.
Tr-c ht cc file s liu ny cn -c l-u tr d-i dng text
(c th bng ch-ng trnh Excel), cc quan st -c cc nhau 1
dng v cc gi tr cch nhau 1 du phy (commas) hoc du cch
(tab). Sau dng lnh insheet nhp s liu ny vo Stata.
C php:
insheet [danh sch bin] using (tn tp text) [, [no]names comma
tab clear]
Lnh ny s c vo b nh ca Stata cc quan st ca tp text,
v ch ra tn cc bin s -c to ra.
Cc tu chn:
[no]names

Cho php nhp tn bin -c ch ra dng th


nht ca file text

comma

Thng bo l cc gi tr ca file text -c


phn cch bng du phy

tab

Thng bo l cc gi tr ca file text -c


phn cch bng du tab

clear

S liu -c c vo s thay th s liu ang


-c th-ng tr trong b nh ca Stata

V d:
. insheet using c:\income.txt
9

(3 vars, 4 obs)
. insheet maho hoten thunhap using c:\income.txt
(note: variable names in file ignored)
(3 vars, 4 obs)
L-u tr s liu
C php:
save (-ng dn\tn tp) [,replace]
Lnh ny l-u tr s liu ang trong b nh ca Stata thnh tp
ch nh d-i tn tp. Nu tu chn replace -c ch ra th tp
s liu ny s ghi ln tp hin thi (tt nhin tn tp s
liu l ging nhau).
Vic l-u tr s liu c th thc hin bng cc ty chn Save v
Save as trong thanh thc n (menu bar); hoc nt Save trn
thanh cng c (tool bar).
Ch : Xem thm lnh infile v outfile

Ch-ng II: Khai thc d liu

10

1. Cu trc lnh trong Stata (Stata command syntax)


Cu trc c bn ca mt lnh trong Stata nh- sau:
[by danh sch bin:] C php lnh [danh sch bin] [biu thc]
[iu kin] [phm vi] [quyn s] [, tu chn]
Trong phn H-ng dn s- dng (Help) ca Stata, c php lnh
trnh by bng ting Anh nh- sau:
[by varlist:] command
[weight] [, options]

[varlist]

[=exp]

[if

exp]

[in

range]

Trong du ngoc vung k hiu cc tu chn.


Ch :
-

Cc cu lnh Stata -c vit bng ch th-ng.

i vi tn bin, Stata phn bit ch vit th-ng vi ch


vit hoa. V d, trong cng mt tp s liu, bin Ho_ten
v bin ho_ten l 2 bin khc nhau.

Cc tu chn -c k hiu trong du ngoc vung


tu chn ny c th c hoc khng trong cu lnh.
s bt buc (tn bin) -c t trong du ngoc
cu lnh s khng thc hin -c nu cc tham s
ny khng -c khai bo.

Mt s lnh Stata cho php vit tt. V d lnh summarize


c th vit tt l sum. Trong cun ti liu ny phn gch
chn d-i c php ca cu lnh l c php vit tt ca
cu lnh .

Cc v d trong cun ti liu ny s dng s liu iu tra


Mc sng dn c- nm 1998 do Tng cc Thng k tin hnh.
Trong Tp chi tiu tng hp Hhexp98n.dta th-ng xuyn
-c s dng.

[ ]. Cc
Cc tham
< >. Cc
bt buc

by danh sch bin (by varlist): Stata s thc hin cu lnh vi


theo tng gi tr -c ch ra bi danh sch bin. Bin -c ch
ra bi danh sch bin -c yu cu sp xp tr-c khi thc hin
lnh.
V d:
. sort sex
. by sex: sum

rlpcex1

-> sex = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-----------------------------------------------------

11

rlpcex1 |

4375

2980.906

2430.648

357.318

45801.71

-> sex = 2
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
1624
3748.368
3231.241
376.9805
30624.77

. sort sex urban98


. by sex urban98: sum

rlpcex1

-> sex = 1, urban98 = Rural


Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
3344
2308.134
1345.671
357.318
24386.43
-> sex = 1, urban98 = Urban
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
1031
5163.01
3602.245
682.9575
45801.71
-> sex = 2, urban98 = Rural
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
925
2553.448
1776.178
376.9805
25527.95
-> sex = 2, urban98 = Urban
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
699
5329.628
3962.946
1057.797
30624.77

Danh sch bin (varlist)


Ch ra danh sch cc bin chu tc ng ca cu lnh. Nu nhkhng c bin no -c ch ra th lnh Stata s c tc dng ln
tt c cc bin (all variables)
V d:
. sum hhsize sex reg7
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------hhsize |
5999
4.752292
1.954292
1
19
sex |
5999
1.270712
.4443645
1
2
reg7 |
5999
4.01917
2.145305
1
7

12

. sum
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------househol |
5999
19617.86
11201.92
101
38820
year |
5999
97.94666
.2247337
97
98
month |
5999
6.340723
3.011082
1
12
--Break-r(1);

Lnh sum ny hin th thng k c bn ca tt c cc bin trong


tp s liu.
iu kin (if exp)
Stata ch thc hin cu lnh i vi cc quan st m gi tr ca
n cho kt qu ca biu thc l ng.
V d:
. sum

poor if reg7==1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
859
.4982538
.5002882
0
1

Lnh ny ch c tc dng i vi cc quan st m bin reg7 c


gi tr bng 1.
Phm vi (in range)
Ch ra phm vi cc quan st chu tc ng ca cu lnh. Range
(phm vi) c th c cc dng sau:
sum poor in 10

Tnh gi tr trung bnh ca bin poor cho quan


st 10 (chnh bng gi tr ca bin poor ti
quan st th 10)

sum
poor
10/100

in Tnh gi tr trung bnh ca bin poor cho quan


st t 10 n 100

sum
poor
f/100

in Tnh gi tr trung bnh ca bin poor cho quan


st t u tin n 100

sum
poor
100/l

in Tnh gi tr trung bnh ca bin poor cho quan


st t th 100 n quan st cui cng

Quyn s (weight)
Cho php tnh ton s dng quyn s. Tu chn v quyn s s
-c trnh by k mc 5 ca ch-ng ny.
13

Cc tu chn (Options)
Nhiu cu lnh Stata cho php cc tu chn ring. Cc tu chn
ny -c ch ra sau du phy.
V d:
Lnh sum c tu chn l detail, cho php tnh ton thm mt s
thng k khc ngoi gi tr trung bnh v lnh chun.
. sum

rlpcex1, detail
comp.M&Reg price adj.pc tot exp
------------------------------------------------------------Percentiles
Smallest
1%
682.9575
357.318
5%
1012.433
366.2792
10%
1238.088
376.9805
Obs
5999
25%
1671.054
381.3502
Sum of Wgt.
5999
50%

2397.042

75%
90%
95%
99%

3711.917
5940.803
8045.32
14163.04

Largest
26944.64
30624.77
31066.5
45801.71

Mean
Std. Dev.

3188.667
2692.567

Variance
Skewness
Kurtosis

7249918
3.791027
29.21398

Ch :
-

Stata cho php vit tt cc lnh v ty chn. Trong ti


liu ny, phn gch chn d-i cc lnh c ngha l lnh
c th vit tt bng k t trong phn gch chn ny. V d
nh- lnh use c ngha l c th -c vit tt bi u.

C php ca cc cu lnh trong ti liu ny -c vit bng


ting Anh, cho php ng-i c c th i chiu vi phn
h-ng dn s dng trong Stata.

2.Ton t v hm s (Operators and functions)


Cc ton t (operators)
Cc ton t trong Stata -c k hiu nh- sau:
K hiu

ngha

S hc
+

Cng

Tr

Nhn

Chia

Lu tha

Quan h
14

>

Ln hn

<

Nh hn

>=

Ln hn hoc bng

<=

Nh hn hoc bng

==

Bng

~=

Khng bng (khc)

!=

Khng bng (khc)

Lgc
~

Khng

Hoc

&

Ch :
Trong biu thc du == -c dng cho vic kim nh biu thc,
v d nh- -c dng sau lnh if. Cn du = -c dng cho lnh
to bin.
V d:
gen RRD=0
replace RRD=1 if reg8==1
Cc hm s (function)
Hm s th-ng -c dng trong biu thc (exp) ca cu lnh
Stata. Nu coi Y l mt hm s ca f(X1, X2,, Xn) th lnh v
hm s trong Stata s tnh gi tr ca Y nu cho cc gi tr ca
Xi. Stata c 8 loi hm s:
Mathematical
functions
Statistical functions
Random numbers
String functions
Special functions
Date functions
Time-series functions

Cc hm ton hc
Hm thng k
Hm cho s ngu nhin
Hm lin quan n dy k t
Hm c bit
Hm ngy thng
Hm chui thi gian
Hm ma trn

Matrix functions
V d:
gen absx=abs(x)
gen log_exp=log(rlpcex1)
15

Cc k hiu c th v cc hm s ny c th xem mc help


functions.
3. M t d liu (Data reporting)
3.1. Xo b nh ca Stata
C php:
clear
Lnh ny xo cc d liu trong b nh ca Stata, bt u cho mt
file lm vic mi.
3.2. H-ng dn s dng lnh Stata
C php:
help <Cu lnh Stata>
Lnh ny hin th h-ng dn s dng cc lnh Stata, lnh Stata
cn phi -c g y v chnh xc.
V d:
. help sum
help for sum not found
try help contents or search sum
. help summarize
---------------------------------------------------------------------------------------help
(manual:

for

summarize

[R] summarize)

---------------------------------------------------------------------------------------Summary statistics
.
Ch :
Chng ta c th tm h-ng dn s dng theo t kho bng lnh
search. Lnh search c th -c thc hin bng tu chn Search
thc n help.
3.3. M t d liu
C php:
describe [danh sch bin]
Lnh ny hin th thng tin chung nh- tn bin, nh dng, nhn
bin ca cc bin -c lit k bi danh sch bin ca file s
16

liu ang m. Nu nh- khng c bin no -c ch ra th lnh


describe s hin th thng tin ca tt c cc bin.
V d:
. des

househol year month vlssmphs

storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------househol
long
%12.0g
household code
year
float %9.0g
Year of interview
month
float %9.0g
Month of interview
vlssmphs
byte
%8.0g
1 if vlss, 2 if mphs source

3.4. Hin th gi tr ca cc bin


C php:
list [danh sch bin] [iu kin] [phm vi] [, nolabel]
Lnh ny hin th gi tr ca cc bin -c ch ra bi danh sch
bin. Tu chn nolable cho php hin th gi tr s ch khng
phi l gi tr gn nhn.
V d:
. list

1.
2.
3.
4.
5.
. list

1.
2.
3.
4.
5.

househol farm in 1/5


househol
36307
28002
36017
32418
15215

farm
farm
farm
farm
non farm
non farm

househol farm in 1/5, nolabel


househol
36307
28002
36017
32418
15215

farm
1
1
1
0
0

3.5. Hin th dy k t v biu thc


C php:
display ["Dy (chui) k t"] [biu thc]
Lnh ny hin th dy k t hoc gi tr ca biu thc.
17

V d:
. dis "So lieu VLSS 1998"
So lieu VLSS 1998

. dis 120*100/30
400
3.6. Sa cha, xem s liu
C php:
edit

[danh sch bin] [iu kin] [phm vi] [, nolabel]

browse [danh sch bin] [iu kin] [phm vi] [, nolabel]


Lnh edit ny m ca s Data editor ng-i s dng sa cha,
nhp s liu. Tu chn nolable cho php hin th gi tr s ch
khng phi l gi tr gn nhn. Lnh ny c th -c chn t tu
chn Data editor trong thanh thc n Windows.
Lnh browse ging lnh edit nh-ng khng cho php sa cha s
liu.
3.7. m quan st
C php:
count [iu kin] [phm vi]
Lnh ny m s quan st -c ch ra bi iu kin (exp) v phm
vi (range). Nu iu kin (exp) v phm vi (range) khng -c
ch ra th s hin th s quan st ca tp s liu.
V d:
. count
5999
. count if reg7==1
859
. count if reg7==1 & urban98==1
187
. count if reg7==1 & urban98==0
672
3.8. Thng k c bn
C php:
summarize [danh sch bin] [quyn s] [iu kin] [phm vi] [,
detail]
18

Lnh ny tnh ton v hin th nhng thng k c bn ca cc


bin -c ch ra bi danh sch bin. Tu chn detail cho php
hin th thm mt s thng k nh- nhn, lnh v cc gi
tr ca thp v phn.
V d:
. sum

rlpcex1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sum

rlpcex1, detail

comp.M&Reg price adj.pc tot exp


------------------------------------------------------------Percentiles
Smallest
1%
682.9575
357.318
5%
1012.433
366.2792
10%
1238.088
376.9805
Obs
5999
25%
1671.054
381.3502
Sum of Wgt.
5999
50%

2397.042

75%
90%
95%
99%

3711.917
5940.803
8045.32
14163.04

Largest
26944.64
30624.77
31066.5
45801.71

Mean
Std. Dev.

3188.667
2692.567

Variance
Skewness
Kurtosis

7249918
3.791027
29.21398

3.9. Hin th thng tin chung v bin


C php:
inspect [danh sch bin] [iu kin] [phm vi]
Lnh ny m t v s liu ca cc bin kiu s. N -a ra thng
tin v s m, d-ng, s nguyn, gi tr thiu (missing) ca gi
tr ca bin.
V d:
. gen x=invnorm(uniform())
. inspect x
x:
---|

Negative

Number of Observations
NonTotal
Integers
Integers
2964
2964

19

|
#
|
#
|
#
|
#
#
#
| .
#
#
#
.
+----------------------3.918931
3.641588
(More than 99 unique values)

Zero
Positive
Total
Missing

3035
----5999
----5999

-----

3035
----5999

Ch : c th xem thm lnh codebook


3.10. To bng tn sut
To bng tn sut 1 chiu
C php:
tabulate <tn bin> [quyn s] [iu kin] [phm vi] [, missing
nolabel]
tab1 <danh sch
missing nolabel]

bin>

[quyn

s]

[iu

kin]

[phm

vi]

[,

Lnh ny to bng tn sut 1 chiu ca bin -c ch ra. Lnh


tabulate ch cho php c 1 bin -c ch ra, nu c hn 1 bin
-c ch ra th Stata s hiu l to bng tn sut 2 chiu.
Cc tu chn:
missing
Cho php cc quan st khng c gi tr (missing)
-c xp vo 1 loi.
nolabel
khng phi nhn bin

Cho php hin th gi tr s ca bin, ch

V d:
. tab sex
Gender of |
HH.head |
(1:M;2:F) |
Freq.
Percent
Cum.
------------+----------------------------------1 |
4375
72.93
72.93
2 |
1624
27.07
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 urban98 reg7
-> tabulation of urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16

20

Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of reg7
Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00

To bng tn sut 2 chiu


C php:
tabulate <tn bin 1> <tn bin 2> [quyn s] [iu kin] [phm
vi] [, chi2 missing nofreq cell column row]
tab2 <danh sch bin> [quyn s] [iu kin] [phm vi] [, chi2
missing nofreq cell column row]
Lnh tablulate ny tnh v hin th bng tn sut 2 chiu ca 2
bin -c ch ra. Lnh tab2 to bng tn sut 2 chiu ca tng
cp bin -c ch ra trong danh sch bin.
V d:
. tab urban98 farm
1:urban | Type of HH (1:farm;
98; |
0:nonfarm)
0:rural 98 | non farm
farm |
Total
-----------+----------------------+---------Rural |
1021
3248 |
4269
Urban |
1540
190 |
1730
-----------+----------------------+---------Total |
2561
3438 |
5999

Cc tu chn:
chi2

Thc hin kim nh gi thit l hai bin c


lp

missing

Cho php cc quan st khng c gi tr -c xp


vo 1 loi
21

nofreq

Khng hin th tn sut

cell

Hin th tn sut t-ng i (t l %) ca cc

column

Hin th tn sut t-ng i (t l %) ca cc


theo ct

row

Hin th tn sut t-ng i (t l %) ca cc


theo hng

V d:
. tab

reg7 urban98, cell nof

| 1:urban 98; 0:rural


Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
11.20
3.12 |
14.32
region2 |
13.05
6.53 |
19.59
region3 |
10.00
1.80 |
11.80
region4 |
8.37
4.20 |
12.57
region5 |
6.13
0.00 |
6.13
region6 |
8.57
8.48 |
17.05
region7 |
13.84
4.70 |
18.54
-----------+----------------------+---------Total |
71.16
28.84 |
100.00
. tab farm urban98, column row
Type of HH | 1:urban 98; 0:rural
(1:farm; |
98
0:nonfarm) |
Rural
Urban |
Total
-----------+----------------------+---------non farm |
1021
1540 |
2561
|
39.87
60.13 |
100.00
|
23.92
89.02 |
42.69
-----------+----------------------+---------farm |
3248
190 |
3438
|
94.47
5.53 |
100.00
|
76.08
10.98 |
57.31
-----------+----------------------+---------Total |
4269
1730 |
5999
|
71.16
28.84 |
100.00
|
100.00
100.00 |
100.00

3.11. To bng thng k tng hp bng lnh tabulatesummarize


C php:
tabulate <tn bin 1> <tn bin 2> [quyn s] [iu kin] [phm
vi] , summarize(tn bin 3) [means standard freq missing ]
Lnh ny to bng mt hoc hai chiu nh ngha bi bin 1 hoc
bin 2 v mi cho gi tr thng k trung bnh, lch chun
v tn sut ca bin 3.
22

V d:
. tab

farm urban98, sum(poor)


Means, Standard Deviations and Frequencies of poor

Type of HH | 1:urban 98; 0:rural


(1:farm; |
98
0:nonfarm) |
Rural
Urban |
Total
-----------+----------------------+---------non farm | .2791381 .06168831 | .14837954
| .44879538 .24066673 | .35554523
|
1021
1540 |
2561
-----------+----------------------+---------farm | .42302956 .12105263 | .4063409
| .4941161 .32705022 | .49122109
|
3248
190 |
3438
-----------+----------------------+---------Total | .3886156 .06820809 | .29621604
| .48749275 .25217555 | .45662551
|
4269
1730 |
5999

Cc tu chn:
means

Hin th mi gi tr trung bnh

standard

Hin th mi gi tr lch chun

freq

Hin th mi gi tr tn sut

missing

Cho php cc quan st khng c gi tr -c xp


vo 1 loi

V d:
. replace poor=poor*100
(1777 real changes made)
. format poor %4.2f
. tab reg7 urban98, sum(poor) means
Means of poor
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
61.46
8.02 |
49.83
region2 |
32.57
5.87 |
23.66
region3 |
44.83
10.19 |
39.55
region4 |
37.25
11.51 |
28.65
region5 |
47.28
. |
47.28
region6 |
12.45
2.16 |
7.33
region7 |
35.78
10.28 |
29.32
-----------+----------------------+---------Total |
38.86
6.82 |
29.62

23

3.12. To bng thng k tng hp bng lnh tabstat


C php:
tabstat <danh sch bin> [quyn s] [iu kin] [phm vi] [,
statistics(c
php
tk
[...])
by(tn
bin)
missing
format[(%fmt)]]
Lnh ny tnh ton cc thng k ca cc bin -c ch ra bi
danh sch bin cho tng gi tr ca bin phn loi (categorical)
-c ch ra bi by(tn bin).
V d:
. tabstat

rlfood rlhhex1, stats(mean median) by(reg7)

Summary statistics: mean, p50


by categories of: reg7 (Code by 7 regions)
reg7 |
rlfood
rlhhex1
--------+-------------------region1 | 5595.556 9560.349
| 5350.916 8536.373
----------------------------region2 | 6419.427 12951.14
| 5664.145 9997.146
----------------------------region3 | 5692.201 10885.38
| 5369.411 9022.334
----------------------------region4 | 6512.576 13525.41
| 5790.046 11077.51
----------------------------region5 | 5894.983 11217.05
| 5380.505 9421.447
----------------------------region6 | 9746.158 23515.01
| 8428.743 18514.39
----------------------------region7 | 6556.616 13068.11
| 6066.128 11043.99
----------------------------Total | 6787.898 14010.74
| 5951.567 10733.19
-----------------------------

Cc tu chn:
statistics(statname
[...])

Ch ra thng k cn tnh cho danh sch


bin

by(tn bin)

Ch ra bin phn loi (categorical)

Missing

Gi tr thiu (mising) ca bin loi -c


xem nh- 1 loi
24

format[(%fmt)]

Ch ra nh dng ca s liu hin th

Stata cho php cc loi thng k -c ch ra bi statistics(c


php thng k [...]) nh- sau:
C php thng k

ngha

mean

Trung bnh mean

count

m s quan st

n
quan st)

Ging

nh-

lnh

sum

Tng cng

max

Gi tr ln nht

min

Gi tr nh nht

range
Gi tr nh nht
sd
sdmean
skewness

Bin

count

(m

= Gi tr ln nht -

lch chun
lch chun ca trung bnh = lch
chun / {(S quan st)^0.5}
lch ca phn phi

kurtosis

nhn

median

Trung v (Ging nh- p50)

p1

1% phn v

p5

5% phn v

p10

10% phn v

p25

25% phn v

p50

50% phn v (trung v)

p75

75% phn v

p90

90% phn v

p95

95% phn v

p99

99% phn v

iqr

p75 - p25

t-ng -ng vi

"p25 p50 p75"

V d:
. tabstat

rlpcex1, stats(mean sd q) by(reg7) format(%5.1f)

Summary for variables: rlpcex1


by categories of: reg7 (Code by 7 regions)

25

reg7 |
mean
sd
p25
p50
p75
--------+-------------------------------------------------region1 |
2174.8
1265.1
1328.0
1792.1
2710.8
region2 |
3294.0
2511.9
1816.7
2532.5
3822.0
region3 |
2503.3
1918.0
1489.7
2001.2
2808.1
region4 |
2933.7
2260.5
1697.9
2362.2
3471.4
region5 |
2087.3
1285.4
1217.3
1850.8
2700.5
region6 |
5257.5
4005.7
2676.7
4154.1
6431.8
region7 |
2931.1
2137.2
1680.1
2321.9
3414.7
----------------------------------------------------------Total |
3188.7
2692.6
1671.1
2397.0
3711.9
-----------------------------------------------------------

3.13. To bng thng k tng hp bng lnh table


C php:
table <bin dng> [bin ct [bin ct trn cng]] [iu kin]
[phm vi] [quyn s] [, contents(ni dung) row col format(%fmt)
missing]
Lnh ny cho php tnh cc thng k ca cc bin -c ch ra
trong contents theo dng bng, trong cc hng -c nh ngha
bi bin dng, cn cc ct -c nh ngha bi
bin ct (v
bin ct trn cng). Cc bin hng v ct ny l cc bin phn
loi (categorical).
V d:
. table reg7 urban98 farm, contents(mean poor)
---------------------------------------------------|
Type of HH (1:farm; 0:nonfarm) and
|
1:urban 98; 0:rural 98
Code by 7 | ---- non farm --------- farm -----regions
|
Rural
Urban
Rural
Urban
----------+----------------------------------------region1 | 19.35484 6.015038
65.7377 12.96296
region2 | 26.66667 4.624278
33.96524 15.21739
region3 | 40.98361 10.11236
45.8159 10.52632
region4 |
21.6 11.63793
42.44032
10
region5 | 30.76923
49.24012
region6 | 15.04065 2.195609
10.07463
0
region7 | 38.62816 10.04184
34.35805 11.62791
----------------------------------------------------

Cc tu chn:
Contents(ni dung)
Lit k danh sch cc bin v cc thng
k. Cc k hiu thng k t-ng t nh- lnh
tabstat
26

row

Hin th thng k tng ca cc dng

col

Hin th thng k tng ca cc ct

format(%fmt)

Ch ra nh dng ca s liu hin th

missing
xem nh- 1 loi

Gi tr thiu (mising) ca bin loi -c

V d:
. table reg7 urban98 farm, contents(mean poor) row col format(%4.2f)
-----------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and 1:urban
|
98; 0:rural 98
Code by 7 | ----- non farm ---------- farm -----regions
| Rural Urban Total
Rural Urban Total
----------+------------------------------------------region1 | 19.35
6.02 10.26
65.74 12.96 61.45
region2 | 26.67
4.62 11.29
33.97 15.22 32.70
region3 | 40.98 10.11 27.96
45.82 10.53 44.47
region4 | 21.60 11.64 15.13
42.44 10.00 40.81
region5 | 30.77
30.77
49.24
49.24
region6 | 15.04
2.20
6.43
10.07
0.00
9.78
region7 | 38.63 10.04 25.39
34.36 11.63 32.72
|
Total | 27.91
6.17 14.84
42.30 12.11 40.63
-----------------------------------------------------. table urban98 farm, contents(mean poor sd poor) row col format(%4.2f)
---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total
----------+----------------------------Rural |
27.91
42.30
38.86
|
44.88
49.41
48.75
|
Urban |
6.17
12.11
6.82
|
24.07
32.71
25.22
|
Total |
14.84
40.63
29.62
|
35.55
49.12
45.66
---------------------------------------. table urban98
format(%4.2f)

farm,

contents(mean

rlpcex1

mean

rlhhex1)

row

col

---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total

27

----------+----------------------------Rural | 2835.83
2212.12
2361.29
| 13242.03 10120.89 10867.36
|
Urban | 5476.86
3232.17
5230.33
| 22984.44 11903.19 21767.43
|
Total | 4423.95
2268.49
3188.67
| 19100.41 10219.39 14010.74
----------------------------------------

4. Bin tp v sa cha d liu (Data manipulation)


4.1. To bin mi
To bin bng lnh generate
C php:
generate <bin mi> = biu thc [iu kin] [phm vi]
Lnh ny cho php to bin mi c gi tr bng gi tr ca biu
thc -c ch ra.
V d:
. gen poor = 1 if rlpcex1 < 1790
(4222 missing values generated)
. gen nonpoor=1 if rlpcex1 >= 1790
(1777 missing values generated)
Lnh to bin gi tabulategenerate
C php:
tabulate <bin phn loi>, generate(bin mi)
Lnh generate c th kt hp vi tab to cc bin gi . Bin
mi to ra s c dng l bin mi 1, bin mi 2, bin mi
3, v..v. Bin ny chnh l cc bin gi -c to ra trn c s
ca bin phn loi.
V d:

. tab reg7, gen(region)


Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41

28

region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 region1 region2
-> tabulation of region1
reg7==regio |
n1 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
5140
85.68
85.68
1 |
859
14.32
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of region2
reg7==regio |
n2 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4824
80.41
80.41
1 |
1175
19.59
100.00
------------+----------------------------------Total |
5999
100.00

y bin reg7 c 7 gi tr t 1 n 7 t-ng ng vi 7 bin gi


t region1 n region7 s -c to ra. Bin region1 nhn gi tr
bng 1 nu nh- bin reg7 nhn gi tr 1, nu khng th bng 0.
T-ng t bin region7 nhn gi tr 1 nu nh- bin reg7 bng 7.
v d trn lnh tabulategenerate t-ng -ng vi 7 lnh sau:
gen region1=(reg7==1)
gen region2=(reg7==2)

gen region7=(reg7==7)
To bin bng lnh egen
C php:
egen <bin
by(bin)]

mi>

fcn(tham

s)

[iu

kin]

[phm

vi]

[,

Lnh ny cho php to bin mi theo gi tr ca hm s -c ch


ra bi fcn. Bin mi ny s nhn gi tr c nh cho mi quan
st. Hm s y c th l:
count(exp)

m s quan st ca biu thc

mean(exp)

Cho gi tr trung bnh ca biu thc

median(exp)

Cho gi tr trung v ca biu thc


29

sd(exp)

Cho gi tr lch chun ca biu thc

Cc hm s khc c th xem phn help egen.


V d:
. egen sumexp=sum(rlpcex1)
. sum sumexp
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------sumexp |
5999
1.91e+07
0
1.91e+07
1.91e+07
. egen g=median( food+ nonfood1)
. sum g
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------g |
5999
11063.6
0
11063.6
11063.6

Thay th gi tr ca bin
C php:
replace <bin> = biu thc [iu kin] [phm vi]
Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc
nh bi biu thc exp.
V d:
replace poor=poor*100
replace pcexp = hhexp/hhsize
To bin phn loi bng lnh encode
C php:
encode <bin> [iu kin] [phm vi], generate(bin mi)
Lnh ny cho php to bin phn loi mi (categorical) kiu s
t-ng ng vi cc gi tr ca bin kiu ch ch ra bi tn bin
(-c xp theo vn ch ci).
V d:
. gen str15(mucsong) = "Kha"
. drop

mucsong

. gen mucsong="Rat ngheo"


type mismatch
r(109);
. gen str15(mucsong)="Rat ngheo"

30

. replace mucsong="Ngheo" if
(1087 real changes made)

rlpcex1<1790 &

. replace mucsong="Khong ngheo" if


(4222 real changes made)

rlpcex1>1290

rlpcex1>=1790

. tab mucsong
mucsong |
Freq.
Percent
Cum.
----------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
----------------+----------------------------------Total |
5999
100.00
. sum mucsong
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------mucsong |
0
. encode mucsong, gen(ma_ms)
. tab ma_ms
ma_ms |
Freq.
Percent
Cum.
------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
------------+----------------------------------Total |
5999
100.00
. sum ma_ms
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------ma_ms |
5999
1.411235
.6871957
1
3

To bin bng lnh xtile


C php:
xtile <bin mi> = biu thc [quyn s] [iu kin] [phm vi] [,
nquantiles(#)]
Lnh ny to bin phn nhm cho biu thc theo phn v.
nquantiles(#) ch ra s l-ng phn v.

Trong

V d: To bin ng v phn theo chi tiu


. xtile quinexp= rlpcex1, nq(5)

31

. tab quinexp
5 quantiles |
of rlpcex1 |
Freq.
Percent
Cum.
------------+----------------------------------1 |
1200
20.00
20.00
2 |
1200
20.00
40.01
3 |
1200
20.00
60.01
4 |
1200
20.00
80.01
5 |
1199
19.99
100.00
------------+----------------------------------Total |
5999
100.00
. tab

quinexp, sum( rlpcex1)


| Summary of comp.M&Reg price adj.pc
5 quantiles |
tot exp
of rlpcex1 |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
1184.3975
261.20537
1200
2 |
1803.6331
151.66604
1200
3 |
2408.4867
211.5407
1200
4 |
3390.1065
403.08913
1200
5 |
7160.021
3690.3672
1199
------------+-----------------------------------Total |
3188.6671
2692.5673
5999

4.2. i tn bin
C php:
rename <tn bin c> <tn bin mi>
Lnh ny thc hin vic i tn c ca mt bin sang tn mi.
V d:
rename poor nguoingheo
rename rpcexp1 chitieu
4.3. Lnh xo bin, xo quan st
C php:
drop <danh sch bin> Lnh ny xo bin -c ch ra bi danh
sch bin
drop <iu kin>
biu thc

Lnh ny xo quan st tho mn iu kin

drop <phm vi> [iu kin] Lnh ny xo quan st -c ch ra bi


phm vi (v c th phi tho mn iu kin
biu thc)
keep <danh sch bin> Lnh ny gi li cc bin -c ch ra bi
danh sch bin, cc bin khng -c ch ra
s b xo i
32

keep <iu kin>

Lnh ny gi li cc quan st tho mn


iu kin biu thc, cc quan st khc s
b xo i

keep <phm vi> [iu kin] Lnh ny gi li cc quan st -c


ch ra bi phm vi (v c th
tho mn
iu kin biu thc), cc quan st khc s
b xo i.
V d:
drop poor urban98

Xo 2 bin poor v urban98

drop if sex==1
tr bng 1

Xo cc quan st c bin sex nhn gi

drop in 1/20

Xo quan st t 1 n 20

keep househol
khc b xo i

Ch gi li bin househol, cc bin

keep in f/50
Gi li quan st t u tin n 50,
cc quan st khc b xo i
4.4. Lnh i gi tr ca bin phn loi
C php:
recode <tn bin>
vi]

gi tr c = gi tr mi [iu kin] [phm

Lnh ny i gi tr ca bin phn loi theo cc quy tc -c


ch ra sau .
V d:
. recode sex 0=1
(0 changes made)
. recode sex . = 0
(0 changes made)
. recode hhsize 1/5=1 6/10 = 2 * = 3
(5785 changes made)
. tab hhsize
Household |
size |
Freq.
Percent
Cum.
------------+----------------------------------1 |
4164
69.41
69.41
2 |
1786
29.77
99.18
3 |
49
0.82
100.00
------------+----------------------------------Total |
5999
100.00

33

. tab urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00

. recode urban98 0=1 1=0


(5999 changes made)
. tab urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
1730
28.84
28.84
Urban |
4269
71.16
100.00
------------+----------------------------------Total |
5999
100.00

4.5. Lnh gn nhn cho bin


Gn nhn cho bin
C php:
label variable <tn bin> Nhn ca bin
Lnh ny gn nhn l mt dy k t cho bin.
V d:
. gen ngheo=poor
. des ngheo
storage display
value
variable name
type
format
label
variable label
--------------------------------------------------------------------------ngheo
float %9.0g
. tab ngheo
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. label var ngheo "Nguoi co thu nhap duoi chuan ngheo"
. tab ngheo
Nguoi co |
thu nhap |

34

duoi chuan |
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. des ngheo
storage display
value
variable name
type
format
label
variable label
---------------------------------------------------------------------------ngheo
float %9.0g
Nguoi co thu nhap duoi chuan
ngheo

Gn gi tr cho bin phn loi


label define <tn b nhn> # "nhn" [# "nhn" ...] [, add
modify]
label dir
label list <tn b nhn>
label drop {tn b nhn [tn b nhn ...] | _all}
label values <tn bin> [tn b nhn]
Lnh label define gn nhn cho mt b gi tr s. Tn ca b
nhn -c ch ra sau t kho define, # l gi tr s, nhn l
chui k t t-ng ng vi gi tr s y. C hai tu chn y:
tu chn add thm gi tr v nhn t-ng ng vo 1 b nhn c
sn. Tu chn modify cho php sa cha gi tr v nhn ca 1 b
nhn c sn.
Lnh label dir hin th nhng b nhn c sn, cn lnh label
list hin th gi tr ca b nhn -c ch ra. Lnh label drop
xo cc b nhn c sn.
V d:
To nhn c tn l nngheo vi gi tr 1 c ngha l ng-i ngho,
cn 0 c ngha l ng-i khng ngho.
. label define nngheo 0 "Ngheo" 1 "Khong ngheo"
. label dir
nngheo
region
loaiho
diploma
urban
agegroup
. label list nngheo

35

nngheo:
0 Khong ngheo
1 Ngheo
. label drop _all
. label dir

Lnh label values s gn cc nhn ca 1 b nhn cho cc gi tr


s ca 1 bin phn loi.
V d:
. tab ngheo
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. list ngheo in 1/5

1.
2.
3.
4.
5.

ngheo
1
0
1
1
0

. label values ngheo nngheo


. tab ngheo
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------Ngheo |
4222
70.38
70.38
Khong ngheo |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. list ngheo in 1/5
ngheo
1. Khong ngheo
2.
Ngheo
3. Khong ngheo
4. Khong ngheo
5.
Ngheo

4.6. Sp xp s liu
36

C php:
sort <danh sch bin> [phm vi]
gsort [+|-]tn bin [[+|-]tn bin [...]]
Lnh sort sp xp quan st theo th t tng dn ca gi tr ca
cc bin -c ch ra trong danh sch bin.
Lnh gsort cho php sp xp cc quan st theo th t tng dn
ca ca cc bin (danh sch bin), nu du + -c ch ra (y
cng l gi tr ngm nh), hoc theo th t gim dn, nu du -c ch ra.
V d:
sort reg7 hhsize Lnh ny sp xp cc quan st theo th t tng
dn ca bin vng reg7, trong mi vng cc quan
st li -c sp xp theo th t tng dn ca
bin quy m h hhsize.
gsort reg7 hhsize
tng
vng
gim

Lnh ny sp xp cc quan st theo th t


dn ca bin vng reg7, nh-ng trong mi
cc quan st li -c sp xp theo th t
dn ca bin quy m h hhsize.

4.7. Trn s liu


Lnh thu gn s liu - collapse
C php:
collapse <biu thc thng k> [quyn s] [iu kin] [phm vi]
[, by(danh sch bin)]
trong :
Biu thc thng k l danh sch cc thng k v cc bin t-ng
ng. Cc thng k -c k hiu nh- mc 3.12 ca ch-ng ny.
Lnh collapse s to ra mt tp s liu mi bao gm cc bin
-c ch ra bi danh sch bin, vi cc gi tr -c tnh theo
thng k t-ng ng. Cc quan st ca tp s liu c s -c nhm
li theo cc gi tr cng loi ca bin -c ch ra bi by(danh
sch bin).
V d:
Chng ta c file s liu v thu nhp v chi tiu ca cc h
thnh vin trong gia nh:
ma_tv
1
2
3

ma_ho
101
101
101

thunhap Chitieu
200
500
1200
400
0
200
37

4
1
2
3
1
2
3
4
1
2
3
4
5
6

101
102
102
102
103
103
103
103
104
104
104
104
104
104

0
3200
1200
200
300
2100
0
0
4300
3500
300
0
0
0

200
500
320
200
500
250
300
300
800
500
500
300
200
200

Chng ta s dng lnh collapse to file v thu nhp v chi


tiu bnh qun ca cc h, v to thm 1 bin v qui m h.
. gen quimo=1
. collapse

(mean) thunhap (mean) chitieu (sum) quimo, by(ma_ho)

Tp s liu mi c dng:
ma_ho
101
102
103
104

thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667

quimo
4
3
4
6

Kt hp s liu - lnh merge


C php:
merge [danh
replace]

sch

bin]

using

<tn

tp

dng>

[,

update

Lnh merge s ni cc quan st ca tp s liu ang m trong


Stata (gi l tp ch (master dataset)) vi cc quan st t-ng
ng ca tp s liu khc -c ch ra sau t kho using (gi l
tp s dng (using dataset)) thnh 1 tp mi. Cc bin ch ra
trong danh sch bin -c gi l bin xc nh (identifying
variables), v phi -c sp xt bng lnh sort (hoc gsort)
tr-c khi thc hin lnh merge.
V d:
Chng ta c 2 tp s liu nh- sau:
thunhap.dta
ma_ho

thunhap

chitieu

quimo
38

101
102
103
104

350
1533.33
600
1350

325
340
337.5
416.667

4
3
4
6

dialy.dta
ma_ho
204
102
103
104

thanhthi
0
1
0
0

vung
1
4
3
6

Lnh merge s -c thc hin nh- sau:


. use "C:\dialy.dta", clear
. sort

ma_ho

. save "C:\dialy.dta"
file C:\dialy.dta saved
. use "C:\thunhap.dta", clear
. sort
. merge

ma_ho
ma_ho using

"C:\dialy.dta"

ma_ho was byte now int


. edit
Tp kt qu c dng nh- sau:
ma_ho
101
102
103
104
204

thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
.
.

quimo
4
3
4
6
.

thanhthi
.
1
0
0
0

vung
.
4
3
6
1

_merge
1
3
3
3
2

Trong tp kt qu c thm 1 bin tn l _merge, bin ny nhn


cc gi tr nh- sau:
_merge==1

Nu nh- quan st ch -c to t tp ch

_merge==2

Nu nh- quan st ch -c to t s dng

_merge==3
s dng

Nu nh- quan st -c to t c tp ch v tp

Cc tu chn:
Trong tr-ng hp hai tp s liu c cc bin trng nhau, cc tu
chn sau y cho php x l s liu theo cc cch khc nhau:

39

update

Nu s liu ca bin trng nhau ca tp ch c gi


tr thiu th gi tr thiu ny nhn gi tr ca bin
trng nhau ca tp s dng.

replace

Gi tr ca bin trng nhau ca tp ch s nhn gi


tr ca bin trng nhau ca tp s dng.

Nu khng tu chn no -c ch ra th theo ngm nh, gi tr


ca bin ca tp ch s khng thay i.
Ni s liu lnh append
C php:
append using <tn tp>
Lnh ny cho php ni tp -c ch ra bi using vo vi tp ang
-c m theo cc bin c cng tn v nh dng. S quan st ca
tp mi bng tng s s quan st ca 2 tp.
V d: c tp thunhap2.dta nh- sau
ma_ho
105
106
107
108
109

thunhap
1350
1500
800
1500
2500

chitieu
425
370
556
417
540

gioitinh
1
0
0
0
1

Hai tp ny s -c ni vi nhau bng lnh append nh- sau:


. use "C:\thunhap.dta", clear
. append using "C:\thunhap2.dta"
. edit
Tp kt qu c dng:
ma_ho
101
102
103
104
105
106
107
108
109

thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
1350
425
1500
370
800
556
1500
417
2500
540

quimo
4
3
4
6

gioitinh

1
0
0
0
1

Ch : Xem thm lnh expand dung to ra cc quan st ging


nhau.
4.8. Chuyn dng s liu
40

C php:
reshape wide <tn
[values]) ... ]

bin>,

i(danh

sch

bin)

j(tn

bin

reshape long <tn


[values]) ... ]

bin>,

i(danh

sch

bin)

j(tn

bin

reshape wide
reshape long
Lnh ny cho php chuyn s liu t dng ngang sang s liu dng
dc (tu chn long), v t dng dc sang dng ngang (tu chn
wide). i(danh sch bin) ch ra bin xc nh (indentifying
variables) dng phn bit cc quan st vi nhau trong s liu
dng ngang (gi l quan st cp 1). j(tn bin) ch ra bin dng
phn bit gia cc quan st cp 2 s liu dng dc.
V d 1:
Chng ta c th s liu dng bng ngang nh- mt ma trn nh- sau:
- i maho
101
102
103

-------------------- xj -----------------quimo thunhap95 thunhap96 thunhap97


5
4500
4400
5400
4
3400
3300
3700
6
5000
5400
5500

s liu ny s -c chuyn sang dng bng dc nh- sau:


- i maho
101
101
101
102
102
102
103
103
103

quimo
5
5
5
4
4
4
6
6
6

- j nam
95
96
97
95
96
97
95
96
97

- xji thunhap
4500
4400
5400
3400
3300
3700
5000
5400
5500

V lnh reshape s -c vit nh- sau:


. reshape long thunhap, i(maho) j(nam)
(note: j = 95 96 97)
Data
wide
->
long
--------------------------------------------------------------------Number of obs.
3
->
9
Number of variables
5
->
4
j variable (3 values)
->
nam

41

xij variables:
thunhap95 thunhap96 thunhap97
->
thunhap
--------------------------------------------------------------------* Va chuyen nguoc lai tu dang doc sang dang ngang nhu sau
. reshape wide thunhap, i(maho) j(nam)
(note: j = 95 96 97)
Data
long
->
wide
-----------------------------------------------------------------------Number of obs.
9
->
3
Number of variables
4
->
5
j variable (3 values)
nam
->
(dropped)
xij variables:
thunhap
->
thunhap95 thunhap96 thunhap97
----------------------------------------------------------------------

V d 2:
Chng ta c s liu dng bng sau y:
maho
101

sotien1
1200

102

1300

103

2500

104

3000

nguon1
Ngan hang
A
Ngan hang
B
Ngan hang
A
Ngan hang
A

sotien2
2000

nguon2
Ngan hang A

1000

Ngan hang C

2000

Ngan hang B

Bng ny -c chuyn sang bng dng dc nh- sau:


. reshape long sotien nguon, i(maho) j(lanvay)
(note: j = 1 2)
Data
wide
->
long
--------------------------------------------------------------------Number of obs.
4
->
8
Number of variables
5
->
4
j variable (2 values)
->
lanvay
xij variables:
sotien1 sotien2
->
sotien
nguon1 nguon2
->
nguon
---------------------------------------------------------------------

Bng dc c dng nh- sau:


maho
101
101
102

lanvay
1
2
1

sotien
1200
2000
1300

nguon
Ngan hang A
Ngan hang A
Ngan hang B
42

102
103
103
104
104

2
1
2
1
2

2500
1000
3000
2000

Ngan hang A
Ngan hang C
Ngan hang A
Ngan hang B

5. Quyn s trong VHLSS (Weight)


5.1. Quyn s trong iu tra chn mu
Trong iu tra chn mu, cc quan st -c la chn mt cch
ngu nhin nh-ng thng th-ng cc quan st th-ng c xc sut
la chn khc nhau. Quyn s bng gi tr nghch o ca xc
sut -c chn vo mu. Nu nh- quan st i c quyn s l wi th
c th ni quan st i trong mu i din cho wi phn t trong
tng th. Cc -c l-ng suy din v tng th cn phi tnh n
quyn s chn mu, nu khng th kt qu s b sai lch.
V d:
Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh
vi dn s t-ng ng l 4.5 triu v 500 nghn ng-i. Chng ta
mun chn mt mu ngu nhin vi c mu l 500 quan st
nghin cu v thu nhp ca ng bng Sng Hng cng nh- 2 tnh
ny. Nu nh- theo t l v dn s gia 2 tnh th chng ta s
thu -c mu gm 450 h ti H Ni v 50 h ti Nam nh. Tuy
nhin mu -c chn mt cch ngu nhin trn c vng nn s c
kh nng l chng ta thu -c mt mu m khng c quan st no
ca tnh Nam nh, hoc c vi s l-ng rt nh. cho mu mang
tnh i din cho cc tnh th nn chn 400 quan st ti H Ni
v 100 quan st ti Nam nh.
Nu thu nhp bnh qun ca H Ni l 900 nghn/ thng, v ca
Nam nh l 300 nghn/thng th thu nhp bnh qun ca c vng
ng bng Sng Hng khng th tnh l (900 + 300)/2, v cc quan
st trong mu khng -c chn t l vi cc tnh. Mi quan st
ti H ni i din cho 11250 h trong vng (4500000/400). y
chnh l quyn s ca quan st, bng gi tr nghch o ca xc
sut -c chn vo mu. Cn mi quan st ti Nam nh i din
cho 50000 quan st ca vng (500000/100). Thu nhp ca vng ng
bng Sng Hng s -c tnh nh- sau:

Thu nhap

900 400 11250 300 100 50000


840
400 11250 100 50000

Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt,


chnh l s h ca Vit Nam m mi h i din. Quyn s th hai
l quyn s ca thnh vin h, hhsizewt l s ng-i Vit Nam m
43

mi thnh vin ca h i din. Quyn s ca thnh vin h bng


quyn s h nhn vi quy m h.
V d: Quyn s trong VLSS 1998
. tab reg7, sum(wt)
Code by 7 |
Summary of sample quyn s
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
3218.4296
850.74246
859
region2 |
3133.7277
849.12325
1175
region3 |
3185.1794
801.74266
708
region4 |
2199.37
492.37202
754
region5 |
1336.3098
269.14747
368
region6 |
1963.8964
528.69328
1023
region7 |
2938.2122
547.72125
1112
------------+-----------------------------------Total |
2688.5003
900.01379
5999
. tab reg7, sum(hhsizewt)
Code by 7 |
Summary of =hhsize*wt
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
15790.857
7555.7552
859
region2 |
12656.003
5970.9089
1175
region3 |
14814.504
7236.7592
708
region4 |
10794.537
5235.562
754
region5 |
7564.731
3185.9336
368
region6 |
9447.7077
4535.0816
1023
region7 |
14653.702
6639.8297
1112
------------+-----------------------------------Total |
12636.546
6597.6574
5999
. di 2688.5003*5999
16128313
. di 12636.546*5999
75806639

5.2. Cc la chn v quyn s


Stata cho php s dng 4 loi loi quyn s sau y:
fweights:

quyn s tn sut (frequency weights), Stata s hiu


quyn s y c ngha l s ln m mi quan st mi
quan st -c lp li trong tnh ton.

pweights:

quyn s chn mu (sampling weights), Stata s hiu


quyn s l gi tr nghch o ca xc sut -c chn
vo mu, hay s phn t trong tng th m mi quan
st trong mu i din.

44

aweights

quyn s phn tch (analytical weights), Stata s


hiu quyn s t l nghch vi ph-ng sai ca quan
st.

iweights

quyn s quan trng (importance weights), y


quyn s ch mc quan trng ca cc quan st.

i vi iu tra mc sng cc lnh s dng quyn s pweights v


fweights.
V d:
. sum poor
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
5999
29.6216
45.66255
0
100
. sum poor [fw=hhsize]
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
28509
34.17517
47.43051
0
100
.
.
.

tab

reg7 urban98

| 1:urban 98; 0:rural


Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
672
187 |
859
region2 |
783
392 |
1175
region3 |
600
108 |
708
region4 |
502
252 |
754
region5 |
368
0 |
368
region6 |
514
509 |
1023
region7 |
830
282 |
1112
-----------+----------------------+---------Total |
4269
1730 |
5999

.
.

tab

reg7 urban98 [fw= hhsizewt]

| 1:urban 98; 0:rural


Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 11993763
1570583 | 13564346
region2 | 11057932
3812871 | 14870803
region3 |
9582621
906048 | 10488669
region4 |
5618709
2520372 |
8139081
region5 |
2783821
0 |
2783821

45

region6 |
4545303
5119702 |
9665005
region7 | 13220727
3074190 | 16294917
-----------+----------------------+---------Total | 58802876
17003766 | 75806642

. tab reg7 urban98 , sum(hhsize) means


Means of Household size
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 5.1205357 3.7326203 | 4.8183935
region2 | 4.045977 4.0459184 | 4.0459574
region3 | 4.6666667 4.6759259 | 4.6680791
region4 | 4.8027888 5.1190476 | 4.9084881
region5 | 5.7065217
. | 5.7065217
region6 | 5.0719844 4.7131631 | 4.8934506
region7 | 5.1373494 4.3971631 | 4.9496403
-----------+----------------------+---------Total | 4.8702272 4.4612717 | 4.752292
. tab reg7 urban98 [fw=wt], sum(hhsize) means
Means and Number of Observations of Household size
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 5.1328749 3.6698008 | 4.9063857
|
2336656
427975 |
2764631
-----------+----------------------+---------region2 | 4.0564115
3.987975 | 4.0386415
|
2726038
956092 |
3682130
-----------+----------------------+---------region3 | 4.6508908 4.6530097 | 4.6510738
|
2060384
194723 |
2255107
-----------+----------------------+---------region4 | 4.8136253
5.132367 | 4.9080132
|
1167251
491074 |
1658325
-----------+----------------------+---------region5 | 5.6609112
. | 5.6609112
|
491762
0 |
491762
-----------+----------------------+---------region6 | 5.0486426 4.6174858 | 4.8106956
|
900302
1108764 |
2009066
-----------+----------------------+---------region7 | 5.1494132 4.3925283 | 4.9872852
|
2567424
699868 |
3267292
-----------+----------------------+---------Total | 4.8003065 4.3841133 | 4.7002214
| 12249817
3878496 | 16128313

46

.
. table reg7 urban98 , c(mean poor) col row format(%4.1f)
------------------------------| 1:urban 98; 0:rural
Code by 7 |
98
regions
| Rural Urban Total
----------+-------------------region1 | 61.5
8.0
49.8
region2 | 32.6
5.9
23.7
region3 | 44.8
10.2
39.5
region4 | 37.3
11.5
28.6
region5 | 47.3
47.3
region6 | 12.5
2.2
7.3
region7 | 35.8
10.3
29.3
|
Total | 38.9
6.8
29.6
------------------------------. table reg7 urban98 [pw=hhsizewt], c(mean poor) col row format(%4.1f)
------------------------------| 1:urban 98; 0:rural
Code by 7 |
98
regions
| Rural Urban Total
----------+-------------------region1 | 65.2
8.3
58.6
region2 | 36.1
7.0
28.7
region3 | 51.3
14.3
48.1
region4 | 43.6
16.6
35.2
region5 | 52.4
52.4
region6 | 13.0
2.9
7.6
region7 | 42.0
15.3
36.9
|
Total | 45.5
9.2
37.4
-------------------------------

Ch-ng III: Kim nh gi thit v phn tch hi quy

1. c l-ng v kim nh gi thit (Estimation and hypothesis


testing)
1.1. c l-ng gi tr trung bnh bng khong tin cy
C php:
ci
[danh sch bin] [quyn s] [iu kin] [phm
level(#) binomial poisson exposure(tn bin) total]

vi]

[,

Lnh ny tnh sai s chun v khong tin cy cho gi tr trung


bnh ca mu theo quy lut chun, nh thc v Poatxng.
Cc tu chn:
level(#)

ch nh mc tin cy cho -c l-ng


47

khong tin cy. # nhn gi tr t 10 n


99, gi tr ngm nh l 95.
binomial

p dng cho quy lut nh thc

poisson

p dng cho quy lut Poatxng

exposure(tn bin)

p dng cho quy lut Poatxng, tn bin


ch ra bin thi l-ng (thng th-ng l
thi gian hoc din tch) m trong xy
ra cc s kin -c ch ra bi danh sch
bin

total

dng khi ma by prefix -c s dung, yu


cu -c l-ng khong tin cy cho ton b
nhm.

V d:
. ci

poor

Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733
.
.
. sort reg7
. by reg7: ci poor, total
_______________________________________________________________________________
-> reg7 = region1
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
859
49.82538
1.706961
46.47507
53.17569
_______________________________________________________________________________
-> reg7 = region2
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1175
23.65957
1.240357
21.22601
26.09314
_______________________________________________________________________________
-> reg7 = region3
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
708
39.54802
1.838899
35.93767
43.15838
_______________________________________________________________________________
-> reg7 = region4

48

Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
754
28.64721
1.64759
25.4128
31.88163
_______________________________________________________________________________
-> reg7 = region5
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
368
47.28261
2.606121
42.1578
52.40741
_______________________________________________________________________________
-> reg7 = region6
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1023
7.331378
.8153306
5.731465
8.931292
_______________________________________________________________________________
-> reg7 = region7
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1112
29.31655
1.365709
26.63689
31.99621
_______________________________________________________________________________
-> Total
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733

Ch :
Cc lnh -c l-ng c th -c s dng khi bit cc tham s v
mu. y c th -c gi l cc lnh s dng tham s trc tip
(Commands using immediate arguments). Cc lnh ny rt hu dng
khi chng ta khng c s liu gc v bin.
cii
<s quan st>
<gi tr trungbnh> < lch chun> [,
level(#) ]
(phn phi chun)
cii <s quan st> <s ln thnh cng ca quan st>
]
(phn phi nh thc)

[, level(#)

#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn


gi tr t-ng ng vi php th thnh cng (thng th-ng nhn gi
tr bng 1)
cii <gi tr thi l-ng> <s ln s kin xy ra>
level(#) ] (phn phi Poatxng)

poisson [

V d:
49

. cii 5999 1777, level (90)


-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[90% Conf. Interval]
-------------+------------------------------------------------------------|
5999
.296216
.005895
.2865107
.3060676
. cii 12 27, poisson
-- Poisson Exact -Variable | Exposure
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------|
12
2.25
.4330127
1.483144
3.273587

1.2. Kim nh gi thuyt thng k


1.2.1. Kim nh gi tr trung bnh ca mu
Phn phi khng mt
C php:
prtest <bin>= # [iu kin] [phm vi] [, level(#)]
Lnh ny thc hin kim nh gi thuyt v t l gi tr ca
bin phn phi theo quy lut khng mt (Ho: p = p0).
V d:
. prtest poor=0.44 if reg7==1
One-sample test of proportion

poor: Number of obs =

859

---------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-----------------------------------------------------------------poor | .4982538
.0170597
29.2065
0.0000
.4648174 .5316901
---------------------------------------------------------------------------Ho: proportion(poor) = .44
Ha: poor < .44
z = 3.440
P < z = 0.9997

Ha: poor ~= .44


z = 3.440
P > |z| = 0.0006

Ha: poor > .44


z = 3.440
P > z = 0.0003

prtest <bin 1> = <tn bin2> [iu kin] [phm vi] [, level(#)]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t
l ca hai gi tr bin -c ch ra bi tn bin (Ho: pX = pY).
V d: Kim nh xem t l ngho i gia vng 2 v vng 4 c
khac nhau khng:
. gen poor2=poor if reg7==2
(4824 missing values generated)

50

. gen poor4=poor if reg7==4


(5245 missing values generated)
. prtest poor2 = poor4
Two-sample test of proportion

poor2: Number of obs =


poor4: Number of obs =

1175
754

-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 | .2365957
.0123983
19.0829
0.0000
.2122955
.2608959
poor4 | .2864721
.016465
17.3989
0.0000
.2542014
.3187429
---------+-------------------------------------------------------------------diff | -.0498764
.020611
-.0902732
-.0094796
| under Ho:
.0203666 -2.44893
0.0143
-----------------------------------------------------------------------------Ho: proportion(poor2) - proportion(poor4) = diff = 0
Ha: diff < 0
z = -2.449
P < z = 0.0072

prtest <bin>
[level(#)]

Ha: diff ~= 0
z = -2.449
P > |z| = 0.0143

[iu

kin]

[phm

Ha: diff > 0


z = -2.449
P > z = 0.9928

vi],

by(bin

phn

nhm)

Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t


l ca hai nhm -c ch ra bi bin phn nhm (Ho: pX1 = pX2).
V d:
. prtest poor, by(sex)
Two-sample test of proportion

1: Number of obs =
2: Number of obs =

4375
1624

-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
.3248
.00708
45.8755
0.0000
.3109234
.3386766
2 | .2192118
.0102661
21.353
0.0000
.1990906
.239333
---------+-------------------------------------------------------------------diff | .1055882
.0124708
.0811459
.1300304
| under Ho:
.0132673
7.95855
0.0000
-----------------------------------------------------------------------------Ho: proportion(1) - proportion(2) = diff = 0
Ha: diff < 0
z = 7.959
P < z = 1.0000

Ha: diff ~= 0
z = 7.959
P > |z| = 0.0000

Ha: diff > 0


z = 7.959
P > z = 0.0000

Phn phi nh thc


C php:
51

bitest <bin> = #p [quyn s] [iu kin] [phm vi]


Lnh ny kim nh gi thuyt v tham s p trong quy lut nh
thc (xc sut thnh cng ca php th) ca bin -c ch ra bi
tn bin. (Ho: p = p0)
V d:
. bitest poor=0.44 if reg7==1
Variable |
N
Observed k
Expected k
Assumed p
Observed p
-------------+-----------------------------------------------------------poor |
859
428
377.96
0.44000
0.49825
Pr(k >= 428)
= 0.000344
Pr(k <= 428)
= 0.999732
Pr(k <= 328 or k >= 428) = 0.000660

(one-sided test)
(one-sided test)
(two-sided test)

. bitesti 859 428 0.44


N
Observed k
Expected k
Assumed p
Observed p
-----------------------------------------------------------859
428
377.96
0.44000
0.49825
Pr(k >= 428)
= 0.000344
Pr(k <= 428)
= 0.999732
Pr(k <= 328 or k >= 428) = 0.000660

(one-sided test)
(one-sided test)
(two-sided test)

Quy lut phn phi chun


C php:
ttest <bin> = # [iu kin] [phm vi] [, level(#) ]
Lnh ny kim nh gi thuyt v gi tr ca tham s trung bnh
ca bin ngu nhin tun theo quy lut chun -c ch ra bi tn
bin (Ho: = 0)
V d:
.

ttest

rlpcex1=3200

One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Degrees of freedom: 5998
Ho: mean(rlpcex1) = 3200
Ha: mean < 3200
t = -0.3260
P < t =
0.3722

Ha: mean ~= 3200


t = -0.3260
P > |t| =
0.7444

Ha: mean > 3200


t = -0.3260
P > t =
0.6278

52

ttest <bin 1> = <bin 2> [iu kin] [phm vi] [, unpaired
unequal level(#) ]
Lnh ny thc hin kim nh gi thuyt rng hai bin c gi tr
trung bnh bng nhau. (Ho:

X = Y).

Cc tu chn:
unpaired

S liu ca hai bin khng cng cp

unequal

Phung sai ca hai bin khng bng nhau

V d:
. ttest poor2=poor4, unpaired unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 |
1175
.2365957
.0124036
.425173
.2122601
.2609314
poor4 |
754
.2864721
.0164759
.4524128
.254128
.3188163
---------+-------------------------------------------------------------------combined |
1929
.2560912
.0099404
.436586
.2365962
.2755863
---------+-------------------------------------------------------------------diff |
-.0498764
.0206229
-.0903285
-.0094243
-----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 1532.64
Ho: mean(poor2) - mean(poor4) = diff = 0
Ha: diff < 0
t = -2.4185
P < t =
0.0079

Ha: diff ~= 0
t = -2.4185
P > |t| =
0.0157

Ha: diff > 0


t = -2.4185
P > t =
0.9921

ttest <bin> [iu kin] [phm vi], by(bin phn nhm) [ unequal
level(#) ]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi
tr trung bnh ca hai nhm -c ch ra bi bin phn nhm (Ho:
X1 = X2).
V d:
. ttest

rlpcex1, by(sex)

Two-sample t test with equal variances


-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
4375
2980.906
36.74795
2430.648
2908.862
3052.951
2 |
1624
3748.368
80.18189
3231.241
3591.097
3905.638

53

---------+-------------------------------------------------------------------combined |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
---------+-------------------------------------------------------------------diff |
-767.4613
77.6155
-919.6156
-615.3071
-----------------------------------------------------------------------------Degrees of freedom: 5997
Ho: mean(1) - mean(2) = diff = 0
Ha: diff < 0
t = -9.8880
P < t =
0.0000

Ha: diff ~= 0
t = -9.8880
P > |t| =
0.0000

Ha: diff > 0


t = -9.8880
P > t =
1.0000

1.2.2. Kim nh gi tr lch chun


C php:
sdtest

<bin> = # [iu kin] [phm vi] [, level(#) ]

sdtest

<bin 1> = <bin 2> [iu kin] [phm vi] [, level(#) ]

sdtest
<bin> [iu kin] [phm vi] , by(bin phn nhm) [
level(#) ]
Lnh ny kim dnh tham s lch chun ca bin ngu nhin
tun theo quy lut chun -c ch ra bi tn bin. C php ca
ln ny t-ng t vi c php ca lnh ttest
V d:
. sum

rlpcex1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sdtest rlpcex1=2700
One-sample test of variance
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Ho: sd(rlpcex1) = 2700
chi2(5998) = 5965.022
Ha: sd(rlpcex1) < 2700
P < chi2 = 0.3838

Ha: sd(rlpcex1) ~= 2700


2*(P < chi2) = 0.7676

2. Phn tch t-ng quan

Ha: sd(rlpcex1) > 2700


P > chi2 = 0.6162

v hi quy (Correlation and regression)

2.1. Phn tch t-ng quan


C php:
54

correlate [danh sch bin] [quyn s] [iu kin] [phm vi] [,


means covariance _coef wrap]
Lnh ny tnh ma trn h s t-ong quan (correlation
coefficient), hoc hip ph-ng sai (covariance) cho cc bin
-c lit k trong danh sch bin. S quan st -c dng l s
quan st ca bin c t quan st nht.
Cc tu chn:
means

Hin th cc thng k khc nh- gi tr trung


bnh, lch chun, gi tr ln nht, nh
nht

covariance

-a ra ma trn hip ph-ng sai thay v h s


t-ng quan

_coef

Tnh ma trn tung quan ca cc h s ca -c


l-ng gn nht

wrap

Hin th cc dng ca ma trn lin nhau nu


c qua nhiu cc bin -c lit k

V d:
. corr hhsize poor
(obs=5999)

rlpcex1 sex

|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
poor |
0.2425
1.0000
rlpcex1 | -0.2172 -0.4452
1.0000
sex | -0.2570 -0.1028
0.1267
1.0000

. corr hhsize poor


(obs=5999)

rlpcex1 sex, means cov

Variable |
Mean
Std. Dev.
Min
Max
-------------+---------------------------------------------------hhsize |
4.752292
1.954292
1
19
poor |
.296216
.4566255
0
1
rlpcex1 |
3188.667
2692.567
357.318
45801.71
sex |
1.270712
.4443645
1
2

|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize | 3.81926
poor | .216435 .208507
rlpcex1 | -1142.93 -547.335 7.2e+06
sex | -.223195 -.020849 151.543
.19746

55

pwcorr
[danh sch bin] [quyn s] [iu kin] [phm vi] [,
obs sig print(#) star(#)]
Lnh ny tnh h s t-ng quan cho tng cp bin -c ch ra bi
danh sch bin.
Cc tu chn:
obs

Hin th s quan st dng tnh h s t-ng


quan

sig

Hin th mc ngha ca cc h s t-ng quan

print(#)

Ch ra mc ngha theo ch cc h s
t-ng quan c mc ngha nh hn mc ny mi
-c hin th

star(#)

nh du sao i vi cc h s t-ng quan c


mc ngh nh hn mc -c ch ra bi star

V d:
. pwcorr hhsize poor rlpcex1 sex, obs sig star(5)
|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
|
|
5999
|
poor |
0.2425* 1.0000
|
0.0000
|
5999
5999
|
rlpcex1 | -0.2172* -0.4452* 1.0000
|
0.0000
0.0000
|
5999
5999
5999
|
sex | -0.2570* -0.1028* 0.1267* 1.0000
|
0.0000
0.0000
0.0000
|
5999
5999
5999
5999
|

pcorr <bin> <danh sch bin> [quyn s] [iu kin] [phm vi]
Lnh ny tnh h s t-ng quan ca bin -c ch ra bi tn bin
vi cc bin -c trong danh sch bin
V d:
. pwcorr poor hhsize

rlpcex1 sex

|
poor
hhsize rlpcex1
sex
-------------+------------------------------------

56

poor
hhsize
rlpcex1
sex

|
|
|
|

1.0000
0.2425
-0.4452
-0.1028

1.0000
-0.2172
-0.2570

1.0000
0.1267

1.0000

2.2. Phn tch hi quy


Ph-ng php bnh ph-ng nh nht (Ordinary-Least Square)
C php:
regress <bin ph thuc> [danh sch bin] [quyn s] [iu kin]
[phm vi] [, option]
Lnh ny -c l-ng cc h s ca hm bin ph thuc (dependent
variable) theo cc bin c lp (danh sch bin) theo ph-ng
php bnh ph-ng nh nht.
V d:
. reg

rlpcex1

reg7

sex

hhsize

Source |
SS
df
MS
-------------+-----------------------------Model | 3.8639e+09
3 1.2880e+09
Residual | 3.9621e+10 5995 6609032.15
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

Number of obs
F( 3, 5995)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

5999
194.88
0.0000
0.0889
0.0884
2570.8

-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
240.9633
15.5905
15.46
0.000
210.4003
271.5263
sex |
403.2984
77.38324
5.21
0.000
251.5994
554.9974
hhsize | -305.6382
17.70692
-17.26
0.000
-340.3501
-270.9263
_cons |
3160.201
155.6576
20.30
0.000
2855.056
3465.346
------------------------------------------------------------------------------

Cc tu chn:
level(#)
ca h s

Ch ra mc tin cy cho -c l-ng khong tin cy

noconstant

Khng c h s (intercept) trong hm hi quy

noheader

Ch hin th kt qu phn tch v cc h s

beta

Hin th h s -c chun ho, dng so snh


mc nh h-ng ca cc h s vi nhau

Ph-ng php kh nng ln nht (Maximum-Likelihood)


C php:
probit <bin ph thuc> [danh sch bin] [quyn s] [iu kin]
[phm vi] [, tu chn]
57

Lnh ny thc hin hi quy bin ph thuc theo cc bin -c ch


ra trong danh sch bin theo ph-ng php kh nng ln nht. Bin
ph thuc th-ng l bin gi vi hai gi tr 0 v 1.
V d:
. probit

poor

Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

reg7 sex
log
log
log
log

hhsize

likelihood
likelihood
likelihood
likelihood

Probit estimates

Log likelihood = -3364.8025

=
=
=
=

-3645.1363
-3367.2185
-3364.8032
-3364.8025
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2

=
=
=
=

5999
560.67
0.0000
0.0769

-----------------------------------------------------------------------------poor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
-.116342
.0084551
-13.76
0.000
-.1329136
-.0997703
sex | -.1284525
.0422247
-3.04
0.002
-.2112113
-.0456937
hhsize |
.1808115
.0095806
18.87
0.000
.1620338
.1995892
_cons | -.8088731
.0824798
-9.81
0.000
-.9705306
-.6472157
------------------------------------------------------------------------------

c l-ng gi tr bin ph thuc v phn dC php:


predict <tn bin mi> [iu kin] [phm vi] [, xb stdp resid]
Lnh ny -c thc hin sau lnh regress (hoc probit) to ra
1 bin mi c gi tr -c tnh tu theo tu chn -c ch ra.
Cc tu chn:
xb
cho php -c l-ng gi tr ca bin ph thuc thu
-c t hm hi quy:
X
Y
i
0
1 i

stdp

-c l-ng sai s chun ca gia tr -c l-ng:


2
SE i Var (0 ) X i Var ( 1 ) 2X i Cov(0 , 1 )

redid

-c l-ng gi tr phn d-:

e i Yi Y
i

V d:
predict exphat, xb
To ra bin mi exphat c gi tr -c l-ng ca bin ph thuc
(fitted value) theo h s thu -c t hm hi quy.
58

predict expres, resid


To ra bin expres c gi tr ca phn d-.
Kim nh v h s ca hm hi quy
C php:
test [gi tr biu thc]
test [danh sch bin]
testparm <danh sch bin> [, equal ]
Lnh test kim nh cc gi thit v h s ca hm hi quy va
mi -c -c l-ng
V d:
test urban98 =2000
Kim nh gi thit h s ca bin urban98 = 0
test region1 = region2
Kim nh gi thit h s ca bin region1 bng h s ca bin
region2
test

region1 = (region2+region3)/2

Kim nh gi thit v quan h gia cc h s ca bin region1,


region2, va region3
test

region1 region2 region3

Kim nh gi thit h s ca bin region1, region2, va region3


u bng 0
testparm region*
Kim nh gi thit v ca h s ca bin region1 n region7
u bng 0

. tab reg7, gen(region)


Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. reg

rlpcex1 urban98 region* sex

educyr98 hhsize

59

Source |
SS
df
MS
-------------+-----------------------------Model | 1.6960e+10
10 1.6960e+09
Residual | 2.6525e+10 5988 4429712.49
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

Number of obs
F( 10, 5988)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

5999
382.87
0.0000
0.3900
0.3890
2104.7

-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban98 |
1995.163
66.46943
30.02
0.000
1864.859
2125.467
region1 | -923.7066
132.8334
-6.95
0.000
-1184.108
-663.3052
region2 | -362.6047
130.2254
-2.78
0.005
-617.8934
-107.316
region3 | -558.0354
137.1551
-4.07
0.000
-826.9089
-289.1619
region4 | -100.7586
135.8372
-0.74
0.458
-367.0486
165.5313
region5 | (dropped)
region6 |
1742.688
131.9928
13.20
0.000
1483.934
2001.441
region7 |
151.9854
128.0272
1.19
0.235
-98.99396
402.9648
sex |
270.9142
66.61031
4.07
0.000
140.3339
401.4944
educyr98 |
153.3281
6.836934
22.43
0.000
139.9253
166.731
hhsize |
-257.691
14.73741
-17.49
0.000
-286.5816
-228.8004
_cons |
2362.355
178.3197
13.25
0.000
2012.784
2711.926
-----------------------------------------------------------------------------. test

urban98 =2000

( 1)

urban98 = 2000.0
F(

. test
( 1)

( 1)

region1 - region2 = 0.0

( 1)
( 2)
( 3)

34.57
0.0000

region1 - .5 region2 - .5 region3 = 0.0


1, 5988) =
Prob > F =

27.80
0.0000

region1 region2 region3


region1 = 0.0
region2 = 0.0
region3 = 0.0
F(

. testparm
( 1)
( 2)

1, 5988) =
Prob > F =

region1 = (region2+region3)/2

F(

. test

0.01
0.9420

region1 = region2

F(

. test

1, 5988) =
Prob > F =

3, 5988) =
Prob > F =

20.22
0.0000

region*

region1 = 0.0
region2 = 0.0

60

(
(
(
(
(

3)
4)
5)
6)
7)

region3 = 0.0
region4 = 0.0
region5 = 0.0
region6 = 0.0
region7 = 0.0
Constraint 5 dropped
F(

6, 5988) =
Prob > F =

148.55
0.0000

Ch-ng IV: V th

1. V th (graph)
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi] [,
loi__th tu_chn_ring tu_chn_chung]
Trong :
loi__th (graph_type)

Ch ra loi th cn v

tu_chn_ring (specific_options)
tng loi th

Cc tu chn lin quan n

tu_chn_chung (common_options) Cc tu chn c th s dng


chung cho cc loi th nh- tu
chn v nh nhn trn cc trc ca
th

Stata cho php v 8 loi th nh- sau (graph_type):


(1) th 2 chiu (two-way scatterplots)
. graph

rlpcex1 age
61

comp.M&Reg price adj.pc tot exp

45801.7

357.318
16

95
Age of household head

(2) Ma trn th 2 chiu (two-way scatterplot matrices)


. gr

rlpcex1 age educyr98 hhsize, matrix


16

95

19
45801.7

comp.M&Reg price
adj.pc tot exp
357.318
95

Age of household
head
16
22

schooling year
of HH.head
0
19

Household size

1
357.318

45801.7

22

(3) th tn sut (histograms)


. gr

rlpcex1, bin(50) normal

62

Fraction

.329888

0
357.318

45801.7
comp.M&Reg price adj.pc tot exp

(4) th ri mt chiu (one-way scatterplots)


. gr rlpcex1, oneway

357.318

comp.M&Reg price adj.pc tot exp

45801.71

(5) th hnh hp (box-and-whisker plots)

63

comp.M&Reg price adj.pc tot exp


45801.7

357.318

(6) th ct (bar chart)


. sort reg7
. gr poor, bar means by(reg7)
poor
.498254

(7) th hnh trn (pie charts)


. for num 1/7: gen poorX=poor if reg7==X
->

gen poor1=poor if reg7==1

(5140 missing values generated)


->

gen poor2=poor if reg7==2

(4824 missing values generated)


->

gen poor3=poor if reg7==3


64

(5291 missing values generated)


->

gen poor4=poor if reg7==4

(5245 missing values generated)


->

gen poor5=poor if reg7==5

(5631 missing values generated)


->

gen poor6=poor if reg7==6

(4976 missing values generated)


->

gen poor7=poor if reg7==7

(4887 missing values generated)


. graph

poor1-poor7, pie
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

(8) th hnh sao (star charts)


chart_type l star

65

Audi 5000

Audi Fox

BMW 320i

Datsun 200

Datsun 210

Price
Mileage (mpg)
Repair Record 1978
Datsun 510

Datsun 810

Fiat Strada

Honda Accord

Honda Civic

Headroom (in.)
Trunk space (cu. ft.)
Weight (lbs.)
Length (in.)

Mazda GLC

Renault

Subaru

Toyota Celica

Toyota Corolla

Turn Circle (ft.)


Displacement (cu. in.)

Toyota Corona

VW Dasher

VW Diesel

VW Rabbit

VW Scirocco

Volvo 260

Cc la chn chung (common_options)


* To tp s liu
. tabulate

hhsize, sum

(rlpcex1)

| Summary of comp.M&Reg price adj.pc


Household |
tot exp
size |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
4696.0254
4619.5012
214
2 |
4131.4892
3677.2297
497
3 |
3834.8615
2913.8177
731
4 |
3428.8011
2599.7301
1404
5 |
2930.5486
2168.0644
1318
6 |
2626.6848
2277.1893
867
7 |
2501.0912
2186.1605
480
8 |
2329.7009
1803.7873
255
9 |
2207.0166
1380.5607
126
10 |
2252.3772
1423.7576
58
11 |
2370.7034
1404.7148
29
12 |
1747.3691
924.72977
9
13 |
2114.1337
2109.0077
4
14 |
1579.78
990.81152
4
16 |
2994.5771
2061.6804
2
19 |
4833.936
0
1
------------+-----------------------------------Total |
3188.6671
2692.5673
5999
. tab hhsize,
|
Household |

sum(educyr98)
Summary of schooling year of
HH.head

66

size |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
3.7897196
4.3956537
214
2 |
5.7545272
4.7225549
497
3 |
7.3023256
4.6396425
731
4 |
8.2578348
4.2659841
1404
5 |
7.7243298
4.2998488
1318
6 |
6.8788927
4.0778062
867
7 |
6.3348958
4.1241759
480
8 |
5.7333333
3.9623557
255
9 |
5.7936508
3.4878474
126
10 |
6.1724138
3.1851516
58
11 |
4.7931034
3.1665586
29
12 |
4.4444444
3.6438685
9
13 |
5
5.0990195
4
14 |
3
2.1602469
4
16 |
4
1.4142136
2
19 |
2
0
1
------------+-----------------------------------Total |
7.0944185
4.4160917
5999
. replace meanexp= meanexp/1000
(16 real changes made)
. replace meanexp= meanexp/1000
. rename var71 ahhsize
. rename var72 meanexp
. rename var73 meanedu
. replace meanexp= meanexp/1000
. label var meanexp Chi tieu binh quan
. label var meanedu So nam hoc
. label var ahhsize Quy mo ho

* Cc tu chn v tiu v trc to


Ly v d th 2 chiu, trc tung th hin chi tiu bnh qun
v s nm hc bnh qun ca ch h, trc honh th hin quy m
h gia nh.
. gr meanexp meanedu ahhsize

67

meanexp

meanedu

8.25783

1.57978
1

19
ahhsize

* La chn v tiu :
title("chui k t") t1title("chui k t") t2title("chui k
t")
b1title("chui
k
t")
b2title("chui
k
t")
l1title("chui k t") l2title("chui k t") r1title("chui k
t") r2title("chui k t")
Lnh ny ghi cc tiu trn pha trn (top),
(bottom), bn tri (left) v bn phi (right) th.

pha

d-i

V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc
cua chu ho) b2title (Quy mo ho gia dinh)

68

Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

8.25783

1.57978
1

19
Quy mo ho gia dinh

Do thi chi tieu va hoc van chu ho

* Hin th gi tr trc th
xlabel[(gi tr s)] ylabel[(gi tr s)] rlabel[(gi tr s)]
tlabel[(gi tr s)]
V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc
cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel
Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho

69

Ch : Cc la chn khc c th xem phn help bng lnh: help


graxes
Cc tu chn v -ng ni
xline[(gi tr s)] yline[(gi tr
tline[(gi tr s)]

s)] rline[(gi

tr s)]

connect(c[[p]] ... c[[p]])


V d:
. gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc
cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5
10 to 20) yline(2 4 to 8) connect(ll)

Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho


2. Mt s loi th th-ng dng
2.1. th 2 chiu
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], twoway
[tu_chn_chung rescale]
Tu chn rescale cho php hin th hai trc tung vi gi tr
khc nhau
. gen meanexp1=meanexp*1000
. label var meanexp1 "Chi tieu binh quan"

70

. gr meanexp1 meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho) l1title(Chi tieu binh quan (nghin dong)) b2title (Quy mo
ho gia dinh) xlabel ylabel rlabel(2 4 to 8) connect(ll) rescale
Chi tieu binh quan

So nam hoc
8

4000
6
3000

So nam hoc

Chi tieu binh quan (nghin dong)

5000

4
2000

1000

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho

2.2. th tn sut
C php:
graph [bin] [quyn s] [iu kin] [phm vi],
[tu_chn_chung bin(#) freq normal[(#,#)] density(#)]

histogram

Cc tu chn:
bin(#)

Ch ra s l-ng khong cho th, gi tr


ngm nh l bin(5)

Freq

Gi tr tn sut s -c hin th trn trc


tung

normal[(#,#)] V hm phn phi chun


density(#)]

-c dng vi la chn normal, ch ra s


l-ng im -c l-ng hm mt theo phn
phi chun

V d:
th tn sut ca chi tiu binh qun u ng-i
. gr

rlpcex1, hist bin(20) normal

71

Fraction

.56026

0
357.318

45801.7
comp.M&Reg price adj.pc tot exp

. gr

rlpcex1, hist bin(50) normal freq

Frequency

1979

0
357.318

45801.7
comp.M&Reg price adj.pc tot exp

. gr

rlpcex1, hist bin(50) normal freq by(reg7)

72

region1

region2

region3

region4

region5

region6

415

Frequency

415

0
357.318

region7

45801.7

357.318

45801.7

415

0
357.318

45801.7

comp.M&Reg price adj.pc tot exp

Histograms by Code by 7 regions

2.3. th hnh ct
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], bar
[tu_chn_chung [no]alt means stack]
V d:
th gi tr trung bnh hc vn ca ch h v quy m h gia
nh theo 7 vng
.

gr

educyr98 hhsize, bar means by(reg7)


schooling year of HH.head

Household size

8.64426

73

. label define region 1 "region1" 2 "region2" 3 "region3" 4


"region4" 5 "region5" 6 "region6" 7 "region7"
. label values reg7 region
. tab reg7
Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. gr
alt

educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10)


schooling year of HH.head

Household size

10

region1

region3
region2

region5
region4

region7
region6

La chn stack
. gen persons=1
. gr persons urban98, bar ylabel by(reg7) stack alt

74

persons

1:urban 98; 0:rural 98

1500

1000

500

region1

region3

region5

region2

region4

region7
region6

V d:
Hy v th sau:
foodpoor

poor

600

400

200

region1

region3
region2

region5
region4

region7
region6

2.4. th hnh trn


C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], pie
[tu_chn_chung]
Lnh ny v th hnh trn Mi bin s chim 1 phn ca hnh
trn v t l ca phn ny do tng gi tr ca cc quan st cu
bin quyt nh.
V d:
V th t l phn trm s ng-i ngho ca mi vng trn tng
s ng-i ngho ca c n-c.
75

. gr poor1-poor7, pie
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

. gen nonfpood=poor- foodpoor


. label var

nonfpood "poor but still above food poverty line"

. gen nonpoor=( rlpcex1>=1790)


. gr

foodpoor nonfpood nonpoor, pie

. set textsize 90
12% foodpoor
18% poor but stil l above food povert
70% nonpoor

. set textsize 100


. gr

foodpoor nonfpood nonpoor, pie by(reg7) total

76

region1

region2

region3

12% foodpoor
18% poor but still above food povert
70% nonpoor

region4

region5

region7

region6

Total

3. L-u tr v hin th th (Saving and graph using)


l-u tr th th ti ca s graph, vo thc n File, chn
Save graph, sau la chn -ng dn v tn file cho th,
phn m rng ngm nh l gph.
th cng c th -c l-u tr bng tu chn
[,replace]) vit sau lnh graph

saving(tn tp

V d:
. gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt
saving ("c:\ do thi 1")
.
gr
persons
urban98,
saving("c:\do thi 2")

bar

ylabel

by(reg7)

stack

alt

khng hin th th th c th dng lnh tt ch hin


th th bng lnh
set graphics { on | off }
. set graphics off
. gr poor1-poor7, pie saving ("c:\do thi 3", replace)
(note: file c:\do thi 3.gph not found)
Stata cho php hin th cc th l-u tr bng lnh:
graph using <tp tp th 1> [tp tp th 2 ...] [,
margin(#)]
77

margin(#) ch ra khong cch l bao quanh th theo gi tr


phn trm ca din tch th. Gi tr ngm nh l 0.
V d:
. set graphics on
. graph using "c:\do thi 1" "c:\do thi 2" "c:\do
margin(10) title("Mot so dac diem cua ho gia dinh")
region1

region2

region3

persons

thi

3",

1:urban 98; 0:rural 98

12% foodpoor
18% poor but still above food povert

1500

70% nonpoor

region4

region7

region5

region6

1000

500

Total

region1

region3
region2

region5
region4

region7
region6

24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

Mot so dac diem cua ho gia dinh


Ch :
Chng ta co th kt hp lnh saving vi using l-u tr ra
th mi. V d:
. graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3",
margin(10) title("Mot so dac die m cua ho gia dinh")
saving("c:\do thi tong hop")
. graph using "c:\do thi tong hop"

78

Ch-ng V: Lp trnh trong Stata

1. Gii thiu chung v ch-ng trnh do-file


1.1. M v l-u tr do-file
Stata cho php vit cc tp -c gi l do-file
bao gm cc
lnh ca Stata. Thay v thc hin tng lnh mt t ca s lnh
command, cc tp do-file s ln l-t thc hin cc lnh .
Ch-ng trnh Stata -c son tho trong ca s do-file editor.
Ca s ny -c m bng cch kch vo thc n Windows v chn
tu chn do-file editor. Mt cch khc m ca s ny l g
lnh doedit ti ca s lnh command.
V d:
Mt ch-ng trnh c th -c son tho trong ca s do-file
editor nh- sau:
---------------clear
set mem 32m
use "C:\VLSS98\Hhexp98n.dta", clear
tab urban98
sum hhsize
gen new=hhsizet
gen new=hhsize
----------------

Sau khi son tho, do-file s -c l-u tr bng tu chn Save as


trong thc n File ca ca s do-file editor. Tn ca do-file
c th -c ch ra ngay ti lnh doedit nh- sau:
doedit (tn do-file)
Tp do-file c phn m rng l do.

79

v d trn chng ta c th l-u tr on ch-ng trnh d-i tn


l ch-ng trnh 1 ti th- mc Vlss98 trn a C.
1.2. Thc hin cc tp do-file
chy do-file th ti ca s lnh chng ta g mt trong hai
lnh sau:
do

filename [, nostop]

run filename [, nostop]


Lnh run thc hin cc lnh trong do-file nh-ng khng hin th
kt qu ra mn hnh.
Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata
s bo li v ngng vic thc hin cc cu lnh sau . Tuy
nhin nu tu chn nostop -c ch ra th Stata s b qua cu
lnh b li v tip tc thc hin cc lnh sau cu lnh li .
V d:
. do "c:\vlss98\chuong trinh 1"
. clear
. set mem 32m
(32768k)
. use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98
1:urban 98; |
0:rural 98 |

Freq.

Percent

Cum.

------------+----------------------------------Rural |

4269

71.16

71.16

Urban |

1730

28.84

100.00

------------+----------------------------------Total |

5999

100.00

. sum hhsize
Variable |

Obs

Mean

Std. Dev.

Min

Max
-------------+---------------------------------------------------hhsize |

5999

4.752292

1.954292

19
80

. gen new=hhsizet
hhsizet not found
r(111);
end of do-file
r(111);

Vi tu chn nostop
. do "c:\vlss98\chuong trinh 1", nostop
. clear
. set mem 32m
(32768k)
. use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98
1:urban 98; |
0:rural 98 |

Freq.

Percent

Cum.

------------+----------------------------------Rural |

4269

71.16

71.16

Urban |

1730

28.84

100.00

------------+----------------------------------Total |

5999

100.00

. sum hhsize
Variable |

Obs

Mean

Std. Dev.

Min

Max
-------------+---------------------------------------------------hhsize |

5999

4.752292

1.954292

19

. gen new=hhsizet
hhsizet not found
r(111);
. gen new=hhsize
81

. end of do-file
Thc hin (chy) bng lnh run
. run "c:\vlss98\chuong trinh 1", nostop
hhsizet not found
Cc do-file c th thc hin bng tu chn Do trong thc n
File, hoc thc hin trc tip trong ca s Do-file editor bng
tu chn Do hoc Run trong thc n Tool.
1.3. Mt s l-u khi son tho do-file
version #
Khi son tho cc tp do-file chng ta nn -a dng lnh ny vo
u ch-ng trnh thng bo phin bn Stata -c dng son
tho do-file. V d nu nh- chng ta dng Stata 7.0 son tho
do-file th cu lnh ny s -c -a vo u ch-ng trnh nhsau:
version 7.0
clear
use Hhexp98n.dta
tab reg7
.
Cc phin bn Stata khc nhau s c th c s khc nhau v c
php hoc ngha ca cc cu lnh. Lnh version cho php ch-ng
trnh Stata chy c th hiu ng -c ni dung ca tp do-file
-c vit bi cc phin bn khc.
set memory #[k|m]
Nu nh- file s liu i hi b nh ln hn b nh m Stata ang
s dng th chng ta phi thit lp b nh ln hn cho Stata
bng lnh trn. Ch l khng nn thit lp b nh ln hn b
nh ca RAM my tnh.
V d:
. use "C:\Hhexp98n.dta", clear
no room to add more observations
r(901);
. set mem 32m
(32768k)
. use "C:\Hhexp98n.dta", clear
set more off/on
82

Theo ch ngm nh, khi thc hin mt lnh nu nh- kt qu


ca vic x l lnh di hn ca s kt qu (Stata Results),
mn hnh s dng li v chng ta s phi n phm (chng hn
Enter hoc Space bar) kt qu tip tc -c hin th. Lnh
set more off cho php kt qu khng b dng li m -c hin th
lin tc cho n khi thc hin xong cu lnh hoc do-file. Lnh
set more on khi phc li ch ngm nh.
K t * v /*

*/

Stata s khng thc hin cc cu lnh -c bt u bng k t *


hoc nm gia hai nhm k t /* */. Cc k t ny dng vit
ch thch trong do-file.
V d:
-------------------version 7.0
set mem 32m
use "C:\Hhexp98n.dta", clear
* Tao bien thu nhap cua ho gia dinh
/* Bien nay bang Thu nhap binh quan
nhan voi Quy mo ho*/
gen hhexp = rlpcex1 * hhsize
#delimit ;
Khi cu lnh trong do-file editor qu di th chng ta c th
dng lnh ny thng bo rng 1 cu lnh -c kt thc bng k
t (;). Theo ch ngm nh th cu lnh -c kt thc khi
xung dng bng vic g phm Enter. khi phc li ch ngm
nh th dng lnh #delimit cr
V d: lnh v th ch-ng tr-c:
graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc
cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5
10 to 20) yline(2 4 to 8) connect(ll)
tung -ng vi:
#delimit ;
graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van
chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu
ho)
b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)
83

yline(2 4 to 8) connect(ll) ;
gen hhexp = rlpcex1 * hhsize ;
..
Sau chng ta nn khi phc li ch ngm nh nu nh- cc
cu lnh sau c th vit trn 1 dng bng lnh:
#delimit cr
Ch :
-

Chng ta c th dng k t /* */ vit cu lnh di nhsau:


graph meanexp meanedu ahhsize, title (Do thi chi tieu va
hoc van chu ho) /*
*/
l1title(Chi tieu binh quan (tr dong)) l2title(So nam
hoc cua chu ho) /*
*/ b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10
to 20)
yline(2 4 to 8) connect(ll);

Cc lnh # delimit v cch vit cu lnh di s dng k t


/* */ ch dng -c trong do-file ch khng dng -c ti
ca s lnh command.

2. Local v global macros


Macros l cc bin -c dng trong cc ch-ng trnh Stata. Bin
macros -c xem nh- 1 on k t - gi l macroname (tn ca
macros) - t-ng ng vi 1 dy k t khc - -c gi l macro
contents (ni dung ca macro).
C hai loi macros l local macros (macros ni b) v global
macros (macros ton b).
2.1. Local macros
Nu chng ta g:
. local hogd age hhsize rlpcex1
(Du nhy kp co th b qua, tc l c th g: local hogd age
hhsize rlpcex1)
Khi th
`hogd s -c hiu t-ng -ng vi: age hhsize
rlpcex1. hogd -c gi l tn ca macros, cn age hhsize rlpcex1
l ni dung ca macros. s dng ni dung ca macros, chng ta
g tn ca macros gia du trch dn bn tri ( ) nm
pha trn bn tri bn phm - v du trch dn bn phi ( )
nm pha phi bn d-i ca bn phm.
84

Nh- vy nu chng ta g:
. summarize `hogd
th t-ng -ng vi g:
. summarize

age

hhsize

rlpcex1

Nu chng ta g:
. local tb summarize
th chng ta c th thc hin lnh summarize
rlpcex1 bng cch g:

age

hhsize

. `tb' `hogd'

Variable |

Obs

Mean

Std. Dev.

Min

Max
-------------+----------------------------------------------------------age |
5999
48.01284
13.7702
16
95
hhsize |
5999
4.752292
1.954292
1
19
rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
hin th ni dung ca local macros th chng ta g lnh
macros list _(tn local macros)
V d:
. macro list _hogd
_hogd:

age hhsize rlpcex1

xo local macros th chng ta c th dung lnh


macros drop _(tn local macros)
V d:
. macro drop _hogd
. macro list _hogd
local macro `hogd' not found
r(111);
2.2. Global macros
Nu chng ta g:
. global diaban reg7 province commune

85

(hoc c th b qua du ngoc kp: global diaban reg7 province


commune)
Khi th $diaban t-ng -ng vi: reg7 province commune.
diaban -c gi l tn ca macros, cn reg7 province commune l
ni dung ca macros. s dng -c ni dung ca global macros
chng ta g k hiu $ lin tr-c tn ca macros.
Nh- vy nu chng ta g:
. describe $diaban
th t-ng -ng vi g:
. describe : reg7 province commune
. describe $diaban
storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------reg7
int
%8.0g
Code by
7 regions
province
float
%9.0g
Province
code
commune
float
%9.0g
commune code
PSU-SVY commands
. global mota "describe"
. $mota $diaban
storage
display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------reg7
int
%8.0g
Code by
7 regions
province
float
%9.0g
Province
code
commune
float
%9.0g
commune code
PSU-SVY commands
hin th ni dung ca global macros th chng ta g lnh
macros list (tn global macros)
V d:
. global diaban "reg7 province commune"
. macro list diaban
diaban:

reg7 province commune


86

xo global macros th chng ta c th dng lnh


macros drop (tn local macros)
V d:
. macro drop diaban
. macro list diaban
global macro $diaban not found
r(111);
2.3. S khc nhau gia local macros v global macros
Local macros ch tn ti trong 1 ch-ng trnh. Mt
s khng hiu -c cc local macros -c s dng
trnh khc. Trong khi , mt khi -c khai
macros -c hiu bi tt c cc ch-ng trnh v tn
nh ca Stata trong sut qu trnh hot ng.

ch-ng trnh
cc ch-ng
bo, global
ti trong b

V d:
Thc hin on ch-ng trnh khai bo local macros a. Sau thc
hin lnh hin th ni dung local macros ny, nh-ng macros ny
khng tn ti on ch-ng trinh khc hay b nh ca Stata.
. do "C:\WINDOWS\TEMP\STD010000.tmp"
. local a "chuong trinh thong ke Stata"
. end of do-file
. macro list _a
local macro `a' not found
r(111);
Trong khi i vi global macros
. do "C:\WINDOWS\TEMP\STD010000.tmp"
. global b "chuong trinh thong ke Stata"
. end of do-file
. macro list b
b:

chuong trinh thong ke Stata

3. Tch v h-ng v ma trn (scalar and matrix)


3.1. Ma trn (matrix)
Stata nh ngha ma trn A[r, c] l mt mng hnh ch nht gm r
hng (row) v c ct (column).
V d:
87

Nu ma trn A -c to ra th chng ta c th xem ni dung


ca ma trn nh- sau:
. matrix list A

A[3,3]
c1

c2

r1

r2

r3

10

11

c3

14

y ma trn A bao gm 9 phn t (element): 1, 2, 4, 3, 4, 7,


10, 11, 14. Cc ct -c t tn l c1, c2, v c3, v cc hng
l r1, r2, v r3. Phn t l giao im ca dng 1 v ct 2 -c
k hiu l A[1, 2]. Trong v d ny A[1, 2] cha gi tr bng 2.
3.2. Tch v h-ng (scalar)
Tch v h-ng cha 1 phn t l s. Tch v h-ng -c nh
ngha bng lnh sau:
scalar scalar_name = expression
V d:
. scalar a = 10
. scalar list a
a

10

. scalar b = a* 2
. scalar list b
b =

20

Trong chng mc no , tch v h-ng c th xem nh- mt tr-ng


hp c bit ca ma trn ch c 1 phn t (mt hng v mt ct).
3.3. Mt s lnh lm vic vi ma trn
Thit lp kch th-c ma trn
Gia tr ngm nh ca kch th-c ma trn l ti a 40 hng v 40
ct. Chng ta c th thay i kch th-c ti a ny bng lnh:
. set matsize 500
Lnh ny cho php cc ma trn -c to ra c th bao gm 500
hng v 500 ct.
To ma trn
Ma trn c th to ra bng cc cu lnh trc tip.
88

V d:
matrix
mymat
(1,2\3,4)

= Cc phn t -c phn bit bi du phy,


cn cc hng -c phn bit bi du gch
cho

matrix myvec = (1 To ra vct hng


5 3 1 3)
matrix
mycol
(1/5/3/1/3)

= To ra vct ct

Ma trn cng c th -c to ra t s liu bng lnh:


mkmat <danh sch bin> [iu kin] [phm vi] [, matrix(tn ma
trn) ]
V d:
. input maho quymo thunhap
maho

quymo

thunhap

1. 101 6 1200
2. 103 5 1400
3. 105 5 3200
4. 107 9 1000
5. 109 4 2500
6. end
. mkmat

maho quymo thunhap, matrix(A)

. matrix list A
A[5,3]
maho

quymo

thunhap

r1

101

1200

r2

103

1400

r3

105

3200

r4

107

1000

r5

109

2500

Tnh ton ma trn


matrix D

= B

matrix C
(C+C)/2
matrix D = A*A

To ra ma trn D bng ma trn B


= Tnh li ma trn C da trn gi tr ca
ca n
To ra ma trn D bng tch ma trn A v ma
89

trn chuyn v A
Xo ma trn
Ma trn v tch v h-ng c th xo khi b nh bng lnh:
matrix drop <ma trn>
scalar drop <tch v h-ng>
V d:
. matrix drop A
. scalar drop B
4. Lnh iu kin v vng lp
4.1. Lnh ifelse
C php:
iu kin (iu kin logic) {
Nhm cu lnh 1
}
else

Cu lnh

Stata s kim tra iu kin logic (expression), nu iu kin


ny ng th cc lnh Nhm cu lnh 1 s -c thc hin, nu
iu kin sai th lnh ng sau else s -c thc hin, trong
tr-ng hp else khng -c ch ra th Stata s thc hin cc
lnh sau lnh if {}.
V d:
----------------local a=invnorm(uniform())
if `a'>=0 {
display "So ngau nhien tao ra lon hon hoac bang 0"
}
else di "So ngau nhien tao ra nho hon 0"
macro list _a
Ch :
-

S k hiu { } cho php vit nhiu cu lnh sau else


iu kin (iu kin) {
commands 1

}
else

{
90

comands 2
}
-

Cc lnh ifelse c th -c s dng lng vi nhau


iu kin (iu kin) {
Nhm cu lnh 1

}
else

iu kin (iu kin) {

.
4.2. Lnh while
C php:
while <iu kin logic> {
Nhm cu lnh
}
Stata s kim tra iu kin logic (expression), nu iu kin
ny ng th cc lnh Nhm cu lnh s -c thc hin, nu
iu kin sai th cc lnh ny s khng -c thc hin.
V d:
local i=1
while `i<= 10 {
if mod(`i',2) {
display "`i' is odd"
}
else {
display "`i' is even"
}
local i=`i+1
}
Ch :
Vng lp c th -c dng li nu s dng tu chn sau y
gia vng lp:
continue [, break]
Nu gp lnh continue, Stata s b qua cc lnh sau v quay
li lnh u tin ca vng lp. Nu c tu chn break -c ch
ra th Stata s thot khi vng lp.
91

V d: Tm tch s chung nh nht ca 2, 3 v 5


local i=1
while `i<= 1000 {
if mod(`i',2)==0 & mod(`i',3)==0 & mod(`i',5)==0
{

di "The least common multiple of 2, 3,


and 5 is `i'"
continue, break
}
}
5. Gii thiu v file ado
To ch-ng trnh
Mt on ch-ng trnh trong Stata c th -c nh ngha bng
lnh:
Program define <tn ch-ng trnh>
Cc cu lnh
end
on ch-ng trnh ny -c vit trong ca s Do-file editor. Mt
khi n -c chy th on ch-ng trnh ny s l-u tr trong b
nh ca Stata, v ch cn gi ra bng cch g tn ch-ng trnh
(progname)
V d:
quietly program define povline
display as text _col(3) "Poverty line" _col(16) "{c |}"
_col(20) "Food" _col(30) "Overall"
di as text _col(2) "{hline 14}{c +}{hline 26}"
di as text _col(8) "Value" _col(16) "{c |}" as result
_col(20) "1380" _col(33) "1920"
end
Sau khi chng ta chy lnh ny bng run hoc do, th ti ca s
command, chng ta g:
. povline
Poverty line |

Food

Overall

---------------+-------------------------92

Value

1380

1920

Ch :
Nu chng ta chy li lnh program define povline, v nhn -c
thng bo:
povline already defined
r(110);
Tc l ch-ng trnh povline -c to ra ri, xo ch-ng
ny i th chng ta dng lnh:
program drop poveline
hoc xo tt c cc ch-ng trnh
program drop _all
Ado-file
Cc ado-file to ra cc lnh ca Stata. Trong Stata c hai loi
lnh. Loi th nht -c vit trong Stata, v d nh- lnh
summarize. Loi th hai -c nh ngha bi cc tp ado, v d
nh- lnh ci.
bit -c lnh Stata thuc loi no, g lnh which:
. which sum
built-in command:

summarize

. which ci
C:\STATA\ado\base\c\ci.ado
*! version 3.3.4

04sep2000

Cc ado-file chnh l cc ch-ng trnh -c nh ngha bng lnh


program define, v l-u tr vi phn m rng l ado. Stata s tm
kim cc ado-file cc th- mc:
. sysdir
STATA:
UPDATES:

C:\STATA\
C:\STATA\ado\updates\

BASE:

C:\STATA\ado\base\

SITE:

C:\STATA\ado\site\

STBPLUS:

c:\ado\stbplus\

PERSONAL:

c:\ado\personal\

OLDPLACE:

c:\ado\

V d:
93

Chng ta c th l-u tr lnh povline d-i dng ado v l-u tr


thu mc C:\STATA\ado\base\
Lnh ny s -c thc hin khi ta g povline m khng cn chng
ta phi thc hin cu lnh tr-c do-file.
Bi tp: Vit lnh povline vi cc la chn cho cc nm 1993,
1998, v 2002.

Ti liu tham kho


H-ng dn s dng trong phn mm Stata 7.0 (on-line help). (Tu
chn Contents trong thc n Help).

Ph lc
Cc thng k c bn ca mu tun theo quy lut chun
Trung bnh:
n

i 1

Ph-ng sai:
n

(x

x)2

s2

i 1

n 1

lch chun:
s

s2

lch trung bnh tuyt i:


n

MAD

i 1

lch:
n

Skewness

(x

x)3 / n

i 1

s3

nhn:
n

Kurtosis

(x

x) 4 / n

i 1

s4

94

You might also like