Basic Stata For Biostatistics

1
Basic Stata and

biostatistics for
DNCD
THANANAN JIVARAMONAIKUL, MD.
Outline 2
1. Import file, open file, save file, save command, ปร ับแต่งหน้าตา

2. Type of data and data management
3. Study design : cross-sectional, case-control, cohort
4. Type of statistic :
► Descriptive: means, S.D., median, quarter, IQR
► Analytic : comparison, association
5. Measure in epidemiology
► Frequency : prevalence, incidence
► Comparison : Chi2 test, t-test, Wilcoxan test
► Association : RR, OR, Regression (linear, logistic, Poisson)
6. Confounding & effect modification
3
้ั
เปิ ดโปรแกรมครงแรก 4
5
นาเข ้าไฟล ์ xls/cvs

6
7
8
9
10
่ สน
ปรับแต่งหน้าตา Stata เพิมสี ่ ้
ั ก่อนเริมใช
งาน
11
12
13
Data management
Type of data (1) 14
NOMINAL DATA
► values that the data may have do not have specific order
► values act as labels with no real meaning
► categories, states
► Binomial: two possible values
► Multinomial: more than two possible values
e.g. Health status healthy =1 sick=2
e.g. Treatment new regimen = 1 standard regimen = 2
e.g. hair color brown =1 blond =2 black =100
ORDINAL DATA
► values with some kind of ordering
► data that has been measured or counted
e.g. social class: upper=1 middle = 2 working = 3
e.g. glioblastoma tumor grade: 1 2 3 4 5
e.g. position in a race: 1st 2nd 3rd
Type of data (3) 15
DISCRETE
► distinct or separate parts, with no finite detail
e.g. children in family
CONTINUOUS
► between any two values, there would be a third
e.g. between meters there are centimeters
INTERVAL
► equal intervals between values and an arbitrary zero on the scale
e.g. temperature gradient
RATIO
► equal intervals between values and an absolute zero
e.g. weight, body mass index
Examples of coding 16
Cat. 1 2 99
Type of data 17
► Str2 = red / byte = black

Save command in Stata 18
Rename variable / destring 19
Destring data from string to numeric 20
Generate new variable 21
Label data in variable (1) 22
การลบตัวแปร 27
่ อตั
► คลิกขวาทีชื ่ วแปร แล ้วกด
drop
► พิมพ ์โค ้ด drop ตามด ้วยชือ่
ตัวแปร
พูดจาภาษา stata (1) 28
► Gen มาจาก generate หมายถึง สร ้างตัวแปร
Ex. Gen bmi = weight/(height^2)
ใหม่ สร ้างตัวแปรชือ่ bmi เท่ากับ
► Replace หมายถึง แทนที ค่าในตัวแปรชือ่ weight หาร ( ค่าในตัวแปรชือ่ height ยกกกาลัง 2)
► Recode หมายถึงแทนค่า Ex. Gen diff = pre-post if age !=. & (pre-post)>0
่ ง if ต ้องเป็ น == สร ้างตัวแปรชือ่ diff เท่ากับ ค่าในตัวแปรชือ่ pre ลบด ้วย ค่าในตัวแปรชือ่
► If หมายถึง ถ ้า โดย = ทีมาหลั
► & หมายถึง และ post เมือ ่ age ไม่เท่ากับค่าว่าง และ ค่าใน pre ลบด ้วย post มากกว่า 0
► |(อยู่ตรง ข.ไข่) หมายถึง หรือ Ex. Replace age = . If age == 999 | age > 100
► . “” หมายถึง ค่าว่าง แทนที่ ตัวแปรชือ่ age ให ้เท่ากับค่าว่าง ถ ้า ตัวแปรชือ่ age เท่ากับ 999
► != หมายถึง ไม่เท่ากับ หรือมากกว่า 100
► + - * / ^ บวก ลบ คูณ หาร ยกกาลัง Ex. Recode age min/20=1 21/60=2 61/max=3, gen (agegr)
้
► Clear หมายถึงล ้างข ้อมูลทังหมด แปลงค่า age ค่าน้อยสุดถึง20เท่ากับ 1 ค่า21ถึง60เท่ากับ 2 ค่า61ถึงสูงสุด
► Use หมายถึงใช ้ข ้อมูลหรือเปิ ดข ้อมูล เท่ากับ 3
สร ้างเป็ นตัวแปรชือ่ agegr
► Disp หรือ display แปลว่า แสดงผล
► Tab หรือ tabulate หมายถึงตาราง Ex. Disp 52/5*100 แสดงผล 52 หารด ้วย 5 คูณ 100
Ex. Tab var1 var2 สร ้างตาราง var1 เป็ น row และ var2 เป็ น
Terminology 29
► Constants ค่าคงที่
► Variable ตัวแปร
► Independent ตัวแปรตาม
► Dependent ตัวแปรต ้น
► Extraneous ตัวแปรภายนอก
► Confounding
► มีความสม ั พันธ์กบ
ั ทัง้ predictive และ outcome
variable โดยต ้องเป็ น cause of outcome แต่
ไม่ได ้มี predictor เป็ น cause (ไม่ได ้เป็ น
intermediate) หลังจาก adjust ต่างจาก crude
► Effect modifier/Interaction
► ั พันธ์ ระหว่าง predictor กับ
ทาให ้ ความสม
outcome เสยี homogeneity
► stratified specific risk ratio แตกต่างกัน
30
Study Design
Type of study 31
► Observational
► Descriptive (Parameter Estimation)
► Cross-sectional
► Analytic (Hypothesis testing)
► Cross-sectional
► Case-control
► Cohort
► Experimental
► True experimental >> RCT, non-RCT
► Quasi experimental
Observational study 32
Cohort
Case Non-case Total Exposure 🡪 Outcome ปัจจัยไปหาผลลัพท ์
้
บอก causality ได ้ ศึกษาได ้ทังไปข ้างหน้าและ
Exposed A B A+B ยอ้ นหลัง
ใช ้ศึกษากับ outcome ทีเป็ ่ น incidence
่ กษาจากตอนทียั
เริมศึ ่ งไม่เริมป่
่ วย
Not C D C+D
exposed
Smoking (y/n)🡪 Cancer (y/n)
Total A+C B+D A+B+C+D
Diabetes(y/n)🡪 Chronic kidney
disease
Case-Control
Outcome 🡪 Exposure Cross-sectional
บอก causality ได ้ Exposure <-> Outcome
่ น prevalence
ใช ้ศึกษากับ outcome ทีเป็ บอกได ้แค่สม
ั พันธ ์กันหรือไม่ บอกค่าเป็ น
เหตุผลไม่ได ้
Cancer (y/n)🡪 Smoking (y/n)
Stroke after surgery(y/n)🡪 surgery Smoking (y/n)<-> Cancer (y/n)
time Depress <-> Quality of life
Cross-sectional 33
► Relationship of Exposure <-> Outcome

► Prevalence Odds ratio
Case Non-case Total
Population Sampling
Exposed A B A+B
Representative
at risk sample Not C D C+D
exposed
Cross-sectional 34
► Single point in time (Snapshot) -> Exposure and Outcome
► measured at one point in time or over a period
► No Follow up
► ้
ทาง่าย ใชเวลาน ้อย
► ข ้อมูล Individual
► Measure of frequency : Prevalence
► Measure of association: Prevalence ratio (PR),Prevalence Odds ratio(POR)
► ใชกั้ บ โรคทีไ่ ม่ทราบ onset ex. โรคเรือ
้ รัง
► ได ้ข ้อมูล POR <- Overestimate risk มากกว่า PR
► No Temporal sequence บอกได้แค่วา
่ มีAssociation ไม่ได้บอกว่าอะไรเป็นสาเหตุ
► ใข ้กับRare exposure หรือ Rare outcome ยาก
► Prevalence-incidence bias (Length Bias / Survival bias)-> เจอโรคทีม
่ ี long durationได ้มากกว่า
Case-control 35
► causality
► Outcome 🡪 Exposure
Case Non-case Total ► Odds ratio
Exposed A B A+B
Not C D C+D
Target population
exposed V
Total A+C B+D A+B+C+D Source population
V
Sampling Eligible
population(Characteristics )
V
Case Non-case
population population Study Participants
Case-control 36
► Pro ► Bias ทีเ่ กิดขึน

้ บ่อย
► เลือกMultiple Exposureได ้ ► Selection bias
► ใชกั้ บRare Diseaseได ้ ► Participants not represent Population
► ไม่ม ี fellow up ► Information bias

► Cons ► Memory bias -> Can’t remember ->
Misclassification exposure
► บอก Casual Temporalityได ้
► Recall bias -> Case try to recall more than
► Uncertain temporal sequence
control ต ้องมีcomparison group
(ถ ้ามี Information bias)
ex. แม่มล
ี กู เป็ นโรคจะrecallมากกว่าแม่มล
ี ก
ู ปกติ
► Measure of frequency: Prevalence
► Interview bias -> ex. Review caseมากกว่าControl
► Measure of association: Odds Ratio
Cohort Prospective time sequence 37
► causality Retrospective time sequence

No
► Exposure 🡪 outcome
disease
► Relative risk ratio Risk
factors
Disease
No
disease
Population
Sampling
Representative No
at risk sample No risk disease
factors
Disease
Disease Exclude
Cohort 38
► Disease-free Population follow up over time ► Pro

► ติดตามได ้ Incident (New case) / AR ► ใชกั้ บRare Exposure, multiple outcome,
► Exposure -> Follow -> Outcome ► บอก temporal sequenceได ้(Exposureเกิดก่อนOutcome)
► Fixed Exposure (ex Blood group), ► บอกCause -> Effect Relation (Temporality) ได ้ดีกว่า
Case-Control
► Time-dependent Exposure (ex Blood Sugar)
► Con
► Hard Outcome ex. Death , Disease
► ้
แพง ใชเวลามาก
► Intermediate Outcome ex. CD4
► Loss Follow up
► Comparison group ต ้องมี as similar as possible
► Measure of frequency : Incidence
► ดีสด
ุ ต ้องInternal Comparison
(เปรียบเทียบในCohortเดียวกัน) ► Measurements Association : Relative Risk
แต่ถ ้าไม่ได ้ก็ใช ้ Comparison Group
(External Comparison)
39
Types of statistic
π
Types of statistic P
X
Sampling
Technique
μ
40
s
By Level of Generalization
► Descriptive Statistics Generalization/
Inferential Statistics
► Inferential Statistics
▪ Parameter Estimation
π1 = π2
▪ Hypothesis Testing μ1 = μ2
• Comparison
Generalization
• Association /
• Multivariable data analysis
• Multivariate data analysis
Epidemiological study design 41
Time
Distribution Place Descriptive
Person
Epidemiology
Causality
determinant Analytic
Risk factor
Epidemiological study design 42
Case report
Descriptive Case series
Cross-
sectional
Observation
Cross-
sectional
Analytic
Case-
control
Cohort
Descriptive study 43
Describe sample statistics

► Categorical Variables
▪ Ratio, Proportion, Percent (%)
► Continuous Variable
▪ Normal Distribution
⮚ Mean, SD
▪ Not normal Distribution
⮚ Median, Range/IQR
Normal distribution testing : 44
Shapiro-Wilk test
If P-value > 0.05 แปลว่า เป็ น normal

distribution
Normal distribution testing : 45
Histogram
ไม่ใช่โค ้งปกติ เบ ้ขวา

Continuous Variable 46
ไม่ใช่ normal
distribution
• Median = 23 Normal distribution
• Q1 = 21 • Mean = 23.58
• Q3 =26 • S.D. = 3.40
• IQR = Q3-Q1= 5
47
Measures in epidemiology
การวัดทางระบาดวิทยา
Aims of Epidemiologic Research 48
1. Describe → Measure of Frequency

a. How common of CHD among adults in Province A?
b. What is the frequency of CHD among males and females?
2. Explain → Measure of Association
a. Why men are more likely than women to develop CHD?
b. Does smoking increase the risk of CHD?
3. Predict → Measure of Impact
a. How many CHD cases would occur if we provided a specific
intervention?
b. How many new CHD cases will occur in province A next year?
4. What could be done to prevent new cases? And how?
49
Measures of frequency
่
การวัดความถี/การกระจาย
Measure of FREQUENCY 50
► Ratio
► relative magnitude of two quantities or a comparison of any two values
► The numerator and denominator need not be related
► =A/B
► Proportion
► the comparison of a part to the whole
► It is a type of ratio in which the numerator is included in the denominator
► =A/(A+B)
► Rate
► Measure an event occurs in a defined population over a specified period of
time
► = A/time
Proportion สัดส่วน ตัวตัง้ เป็ น ส่วนหนึ่ งของ 51
ตัวหาร
Ratio อัตราส่วน ตัวตัง้ ไม่ใช่ ส่วนหนึ่ ง 52
ของตัวหาร
Female : Male = 1.12 : 1

Rate, Ratio or Proportion ? 53
Indicator ต ัวตง้ั Numerator ตัวหาร Rate/Ratio/Proportio

Denominator n
Ratio
General fertility rate จานวนเด็กเกิดมีชพ
ี จาวนผูห้ ญิงอายุ 15-49 ปี
่
อ ัตราเจริญพันธุ ์ทัวไป ้
ทังหมด Ratio
Infant mortality rate ่
จานวนทารกทีตายในชวบ จานวนเด็กเกิดมีชพ ้
ี ทังหมด
ปี แรก ในปี นั้น Proportion
Case-fatality rate ่ ยชีวต
จานวนผูป้ ่ วยทีเสี ิ ้
จานวนผูป้ ่ วยทังหมดใน
อ ัตราป่ วยตาย ช่วงเวลานั้น Proportion
Mortality rate จานวนผูเ้ สียชีวต ิ ใน ้
จานวนประชากรทังหมดใน
อ ัตราตาย ช่วงเวลานั้น ช่วงเวลานั้น Proportion
Attack rate จานวนผูป้ ่ วยใหม่ใน จานวนประชากรทังหมดใน ้
อัตราป่ วย ช่วงเวลานั้น ตอนทีเริ ่ าการศึกษา
่ มท
54
Prevalence
Prevalence (1) ความชุก 55
►
Prevalence (2) 56
►
Point Prevalence 57
= 4/8 = 0.5 = 50%

Prevalence (3) 58
►
Period Prevalence 59
60
Incidence
Incidence (1) อุบต
ั ก
ิ ารณ์ 61
► Measure happening or occurrence of

“events/processes” during a specified period of time
► Count only new cases, i.e. new events
► ่ ดโรค “รายใหม่” (new case) ภายใน
วัดสัดส่วนหรืออัตราของผูป้ ่ วยทีเกิ
ช่วงเวลาใดเวลาหนึ่ ง
่ ฒนามาจากประชากรทีมี
ทีพั ่ ความเสียงต่
่ อการเกิดโรค (Population
at risk) แต่ไม่เคยเป็ นโรคมาก่อน
1. Incidence proportion
2. Incidence rate ; เวลาเป็ นตัวหาร
Incidence (2) 62
►
Incidence proportion 63
=4/(8-1) = 4/7 = 57.14%

= 2/6 = 33.33%
Incidence (3) 66
้
Example: ในระหว่างการระบาดของเชือไวร ัสโคโรนา 2019 ผูป้ ่ วย 50
จาก 2000 คนเสียชีวต
ิ
่ อการเสียชีวต
จงหาความเสียงต่ ิ ในผู ้ป่ วยกลุ่มนี ้
= 50/2000 = 0.025 =25 per 1000 = 2.5%
Incidence (4) 67
►
Incidence rate 68
=3/(3+3+4+1+2+2+4) = 3/18 = 0.17

ปี 2560 ปี 2561 ปี 2562 ปี 2563 ปี 2654
X 2
4
Other measure of frequency 69
► Attack rate = incidence proportion

► Crude mortality rate
► Case-fatality rate
Attack rate (AR) (1) 70
►
Male AR =25.25%
Female AR = 27.32%
Overall AR=26.04%
Pr>0.05 ไม่มค
ี วามแตกต่างกัน AR ระหว่าง male และ fe
หา attack rate ของกลุม ่ ้และ

่ ทีได
ไม่ได ้วัคซีน
AR ในกลุม ่ ้ placebo = 30.58

่ ทีได
AR ในกลุม ่ ้ vaccine = 21.43
่ ทีได
Overall AR = 26.04
Pr < 0.05 มีความแตกต่างกันของ AR

ระหว่าง กลุม
่ placebo และกลุม ่ ้
่ ทีได
vaccine
73
Crude mortality rate
►
Case-fatality rate
74
Comparison testing
Comparison between group 75
► Categorical outcome
• Chi-square (χ2) test
❖ Normal Distribution
• t-test (2 groups) F-test/ANOVA (>= 2 groups)
❖ Not normal Distribution
• Wilcoxon test, Mann-Whitney U test
Chi-square (c2) test 76
t-test (2 groups) 77
F-test/ANOVA (>= 2 groups) 78
Wilcoxon test, 79
80
Measures of association
การวัดความสัมพันธ ์
Measures of association 81
► ั พันธ์ทางสถิตริ ะหว่างตัวแปรต ้นและตัวแปรตาม

การวัดความสม
► ่ าเหตุก็ได ้
อาจจะเป็ นสาเหตุหรือไม่ใชส
► ้ observation study
มักใชใน Case Non-case Total
► ้ น
การเลือกใชขึ ้ อยูก
่ บ
ั study design
► แบ่งเป็ น 2 ประเภทตามการเปรียบเทียบ Exposed A B A+B
► Ratio scale
▪ Risk ratio: ratio of incidence proportion (risk)
▪ Rate ratio: ratio of incidence rate Not C D C+D
▪ Odds ratio: ratio of odds exposed
▪ Prevalence ratio: ratio of prevalence
► Difference scale
▪ Risk difference
▪ Rate difference
▪ Prevalence difference
Prevalence ratio 82
►
Case Non-case Total
Exposed A B A+B
Not C D C+D
exposed
Prevalence difference (PD) 83
► A difference of two prevalence

► Cross-sectional study
► 𝑃𝐸 − 𝑃𝑢
► Prevalence of having disease in exposed group is PD higher than
that in unexposed group
► ความชุกของการป่ วยในกลุม
่ exposed สูงกว่า exposed PD%
84
Risk Ratio
Risk ratio / Relative risk (RR) (1) 85
► ี
ท้องเสย ไม่ม ี Total
ท้องเสยี
กินไข่ตม
้ 60 40 100
ไม่ได้กน
ิ 10 90 100
ไข่ตม
้
Total 70 130 200
่ อการเกิดอาการท ้องเสียในกลุม
ความเสียงต่ ่ นไข่ต ้มเป็ น 6 เท่าของ
่ ทีกิ
่ ได ้กิน
คนทีไม่
►
่ อการติดเชือในคนที
ความเสียงต่ ้ ่ วค
ได้ ั ซีนเป็ น 0.7 เท่าของคนที่
ไม่ได้วค ั ซีน
ความเสียงต่ ้ ไม่่ ได้วค
ั ซีนเป็ น 1.43 เท่าของค
ได้วค
ั ซีน
ไม่ได้วคั ซีน
ไม่ ได้ว ัคซีนEfficacy (VE) = 1- RR = 1-0.7 = 0.3 = 30%
Vaccine
้ ้ร ้อยละ 30
วัคซีนมีประสิทธิภาพป้ องกันการติดเชือได
Vaccine Efficacy(VE) 89
►
Risk difference (RD, attributable risk) 90
► difference of two incidence proportions (exposed vs unexposed group)
► Cohort study
► 𝐼𝐸 − 𝐼𝑢
► = (51/238) – (74/242)
► =21.43-30.58
► = -0.09 = 9%
► ี่ งต่อการติดเชอ
ความเสย ื้ ในคนทีไ่ ด ้วัคซน
ี คือ 9% น ้อยกว่าคนทีไ่ ม่ได ้วัคซน
ี
Rate ratio / Incidence rate ratio 91
(IRR) (1)
► A ratio of two incidence rates No. of sick Person-time Incidence
rate
► Used in RCT or Cohort study (with
person-time data) Exposed A TE A/TE
Unexposed B TU B/TU
► Rate ratio (IRR) = [A/TE] / [B/TU ]
Total A+B TE+TU (A+B)/(TE+TU)
92
Odds Ratio (OR)

Odds ratio 93
► Case Non-case Total
Exposed A B A+B
Not C D C+D
exposed
ไม่มี Odds difference เพราะตัวหารคนละ

Disease odds ratio 94
(cohort study)
►
Case Non-case Total
Exposed A B A+B
Not C D C+D
exposed
Exposure odds ratio 95
(case-control study)
►
Case Non-case Total
Exposed A B A+B
Not C D C+D
exposed
96
97
Odds ratio 98
► ใน Case-control study
► ถ ้า ึ ษาทีส
Case และ control ในการศก ่ ามารถเป็ นตัวแทนทีด
่ ใี น
ประชากร
► Exposure OR = Disease OR
► ถ ้าใน rare disease OR ≈ RR
► เราสามารถแปลผล OR แบบ RR ได ้
Odds ratio in Logistic regression 99
► Binary logistic regression model

► Binary outcome เช่น case non-case ป่ วยกับไม่ป่วย
► OR = exp β
100
Odds ratio in Logistic regression 101
Report odds ratios Report coefficients
OR = exp β = exp 0.479 = e^0.479

Coef. = ln (OR) = ln 0.619 = 0.479
102
Regression model
Regression model 103
❖ Prediction of outcome from exposures

❖ Hypothesis testing with adjusted (controlled) for confounding other variables
► Simple Y🡪 X ; Outcome 🡪 Factors
► BP 🡪 Drug (A/B) ; BP = a + b(Drug)
► BP 🡪 Age ; BP = a + b(Age)
► Multiple Y🡪 X1 X2 X3
► BP 🡪 Drug(A/B) Sex (M/F) Age
BP = a + b1(Drug) + b2 (Sex) + b3 (Age)
BP = 100 + 3 Drug + 5 Sex - 7 Age
A=0 / B=1 M=0 / F=1
Multi-variables analysis 104
► Linear Regression
❖ Y = continuous + normal distribution
❖ Injury severity score
► Logistic Regression OR = exp
Y = Categorical
❖
β
❖ Dead/ severity of injury/ bone fracture
► Poisson Regression
❖ Y = Incidence/count
❖ Dead IRR = exp
► Cox’s Proportional Hazard Regression β
❖ Y = Time to event
► Time from injury to dead HR = exp
β
X= continuous / categorical Note: If P0 (T) is small, when comparing two groups
; sex, age, alcohol drinking
Principles for studying association 105
⮚ Start with graphical display: scatterplots
■ Display the relationship between two quantitative variables.
■ The values of one variable appear on the horizontal axis (the x axis) and the values of
the other variable on the vertical axis (the y axis).
■ Each individual is the point in the plot fixed by the values of both variables for that
individual.
■ In regression, usually call the explanatory variable x and the response variable y.
⮚ Look for overall patterns and for striking deviations from the pattern :
interpreting scatterplots
■ Overall pattern: the relationship has ...
⬥ form (linear relationships, curved relationships, clusters)
⬥ direction (positive/negative association)
⬥ strength (how close the points follow a clear form?)
⬥ Outliers
■ For a categorical x and quantitative y, show the distributions of y for each
category of x.
⮚ When the overall pattern is quite regular, use a compact mathematical model
to describe it.
106
Linear Regression
Y = CONTINUOUS + NORMAL DISTRIBUTION
Normal distribution of Y 107
108
109
110
111
Normal distribution of Y 112
scatterplots 113
Linear regression 114
Y = a + bx
่ มขึ
ทุก 1 หน่ วย x ทีเพิ ่ น้ จะเพิม
่ y ขึน้ b หน่ วย
Bwt = 2657.33 + 12.36 (age)

ถ ้าแม่มอ ่ น้ 1 ปี ลูกจะมีน้าหนักเพิมขึ
ี ายุเพิมขึ ่ น้ 12.36
กร ัม
Linear regression 115
Y = a + bx
่ มขึ
ทุก 1 หน่ วย x ทีเพิ ่ น้ จะเพิม
่ y ขึน้ b หน่ วย
Bwt = 3054.957 – 281.7133 (age)

ู บุหรี่ ลูกจะมีน้าหนักลดลง 281.7133 กร ัม
ถ ้าแม่สบ
Univariate 116
Multivariate 117
Bwt = 3172.59 – 234.41(smoke) – 95.53(ptl) – 522.2454(ht) – 568.97(ui)
Baby birth weight will decrease 234.41g in smoking mother after

adjusted for history of premature labor, hypertension, and uterine
irritation with statistic significant.
118
Logistic Regression
Y = CATEGORICAL
Logistic Regression 119
► Binary outcome
► Outcome = yes/no
► Ordinal outcome
► Outcome = level
► Ex. Tumor grade
► Multinomial outcome
► Outcome = category
Binary Logistic regression 120
Report odds ratios Report coefficients
่ ดเชือ้ เป็ น 0.62 เท่า ของกลุ่มทีไม่

Interpretation: การได ้ร ับวัคซีนในกลุ่มทีติ ่ ตด ิ
เชือ้
แปลแบบ disease OR = การติดเชือในกลุ ้ ่มได ้รบั วัคซีนเป็ น 0.62 เท่าของกลุ่มที่
ไม่ได ้ร ับวัคซีน
แปลแบบ RR = คนทีได่ ้ร ับวัคซีนมีโอกาสติดเชือเป็
้ น 0.62 เท่าของคนทีไม่ ่ ได ้ร ับ
Ordinal Logistic regression 121
► Outcome = level
► Ex. 1 2 3 4 5
► เป็ นการเปรียบเทียบกับระดดับทีสู่ ง
่ า
กว่าและตากว่
► 1 เทียบกับ 2, 2 เทียบกับ 1 และ 3, 3
เทียบกับ 2 และ 4, 4 เทียบกับ 3 และ
5, 5 เทียบกับ 4
้ าให ้
การตกจากเก ้าอีจะท ่ การยุบตัวของตัวถังรถ
การนั่งบริเวณทีมี
การบาดเจ็บรุนแรงขึน้ 4.5 จะทาให ้การบาดเจ็บรุนแรงขึน้ 13.54
เท่า เท่า
้ าให ้การบาดเจ็บรุนแรงขึน้
การตกจากเก ้าอีจะท
4.69 เท่า
่ านึ งถึงตัวแปรรบกวนจากการนั่งบริเวณทีมี
เมือค ่ การ
Multinomial Logistic regression 125
คนทีสู่ บบุหรีหนั
่ กจะมีความอยากเลิกแต่ไม่
่ ยบคน
เคยพยายามเลิกเป็ น 0.75 เท่าเมือเที
่ อยากเลิก
ทีไม่
่
คนทีอยากเลิ กแต่ไม่เคยพยายามเลิกจะเป็ น
่ ก
คนสูบบุหรีหนั
่ ยบกับคนทีไม่
เป็ น 0.75 เท่าเมือเที ่ อยากเลิก
่ บบุหรีหนั
คนทีสู ่ กจะมีความอยากเลิกและ
่ น 0.65 เท่าของคนทีไม่
พยายามเลิกบุหรีเป็ ่
อยากเลิก
Confounding vs effect modification 126
Confounding 127
EXPOSURE DISEASE
(alcohol drinking) (heart disease)
CONFOUNDING
VARIABLE
(cigarette smoking)
1. เป็ น risk factor ของ outcome

2. มีความสม ั พันธ์กบ
ั exposure
3. ไม่เป็ น intermediate
รายงานผลเป็ น Adjusted measure
Control of confounding 128
Study design Analytic

► Restriction ► Stratification
► Matching ► Multivariate analysis
► Crude measure (OR/RR)
► Adjusted RR (OR/RR)
Degree of Confounding
⮚>1 over estimation
⮚<1 under estimation
Effect modification/ Interaction 129
รายงานผลเป็ น Crude measure จากการ stratified

130
Poisson Regression
Y = INCIDENCE / CONTINUOUS
131
Univariate analysis 132
• poisson ais fal4, irr • poisson ais fal4, irr
• poisson ais slept, irr • poisson ais hit, irr

Multivariate analysis; IRR 133
ี่
• ผูท้ ตกจากเก ้ ความเสียงที
้าอีจะมี ่ จะมี
่ คา่
รุนแรงการบาดเจ็บเพิมขึ่ นเป็
้ น 4.18 เท่า
่ ตก เมือตั
ของคนทีไม่ ่ ดตัวรบกวนจากการ
หลับขณะเกิดเหตุ การนั่งบริเวณทีมี ่ การ
ยุบตัวของตัวถังรถและการโดนชินส่ ้ วนรถ
กระแทกแล ้ว
• คนทีนั่ ่ งบริเวณทีมี
่ การยุบตัวขอรถมีคาม
่ จะมี
เสียงที ่ คะแนนความรุนแรงการ
บาดเจ็บเพิมขึ ่ น้ 3.86 เท่า เมือตั
่ ดตัว
รบกวนจากการหลับขณะเกิดเหตุ การตก
้
เก ้าอีและการโดนชิ ้ วนรถกระแทกแล ้ว
นส่
Multivariate analysis; Coef. 134
ี่
• ผูท้ ตกจากเก ้ ความเสียงที
้าอีจะมี ่ จะมี
่ คา่
รุนแรงการบาดเจ็บเพิมขึ่ นเป็
้ น 1.43
่ ดตัวรบกวนจากการหลับ
คะแนน เมือตั
ขณะเกิดเหตุ การนั่งบริเวณทีมี่ การยุบตัว
ของตัวถังรถและการโดนชินส่้ วนรถ
กระแทกแล ้ว
่ ่ งบริเวณทีมี
• คนทีนั ่ การยุบตัวขอรถมีคาม
่ จะมี
เสียงที ่ คะแนนความรุนแรงการ
บาดเจ็บเพิมขึ ่ น้ 1.35 เมือตั
่ ดตัวรบกวน
จากการหลับขณะเกิดเหตุ การตกเก ้าอี ้
และการโดนชินส่ ้ วนรถกระแทกแล ้ว
Data management Poisson Regression 135
Rate ratio / Incidence rate ratio 136
(IRR) (1)
No. of sick Person-time Incidence rate
Exposed A TE A/TE
Unexposed B TU B/TU
► A ratio of two incidence rates Total A+B TE+TU (A+B)/(TE+TU)
► Used in RCT or Cohort study (with

No. of better Person-time Incidence rate
person-time data)
estrogen 25 178 0.1404= 14.04%
► Rate ratio (IRR) = [A/TE] / [B/TU ] Placebo 9 117 0.0769= 76.69%
► IRR = (25/178)/(9/117) = 1.82 Total A+B TE+TU (A+B)/(TE+TU)

137
Conclusion
Type of study design
ดูวา่ งานเราเป็ นแบบไหน

► Qualitative / quantitative
► Descriptive
► Analytic ??
► Cohort/Case control/Cross-sectional
Type of data
1. ข ้อมูลเราเป็ นแบบไหน
Category / ordinal/ continue
Normal distribution ??
1. ตัวไหนเป็ น outcome / exposure

Descriptive study 141
► Categorical Variables
▪ Ratio, Proportion, Percent (%)
▪ Normal Distribution
⮚ Mean, SD
▪ Not normal Distribution
⮚ Median, Range/IQR
Analytic
► Comparison
► Categorical outcome
► Chi-square (χ2) test
❖ Normal Distribution
► t-test (2 groups) F-test/ANOVA (>= 2 groups)
❖ Not normal Distribution
► Wilcoxon test, Mann-Whitney U test
► Association
► Study design
Association by study design
Study design Measure of Measure of association

frequency
Cross-sectional Prevalence • Prevalence ratio
• Prevalence odds ratio
Case-control Prevalence • Odds ratio
Cohort Incidence • Risk ratio
• Odds ratio
Regression model (Predict/ Odds
ratio)
► Linear Regression Y = outcome, x= exposure
X= continuous / categorical
❖ Y = continuous + normal distribution
► Logistic Regression
❖ Y = Categorical
► Poisson Regression
❖ Y = Incidence/count
► Cox’s Proportional Hazard Regression
❖ Y = Time to event

Basic Stata For Biostatistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Stata For Biostatistics

Uploaded by

Copyright:

Available Formats

1

Basic Stata and

1. Import file, open file, save file, save command, ปร ับแต่งหน้าตา

นาเข ้าไฟล ์ xls/cvs

► Str2 = red / byte = black

► Relationship of Exposure <-> Outcome

Case Non-case Total

► Pro ► Bias ทีเ่ กิดขึน

► ไม่ม ี fellow up ► Information bias

► causality Retrospective time sequence

► Disease-free Population follow up over time ► Pro

► Exposure -> Follow -> Outcome ► บอก temporal sequenceได ้(Exposureเกิดก่อนOutcome)

Distribution Place Descriptive

Descriptive Case series

Describe sample statistics

If P-value > 0.05 แปลว่า เป็ น normal

ไม่ใช่โค ้งปกติ เบ ้ขวา

1. Describe → Measure of Frequency

Female : Male = 1.12 : 1

Indicator ต ัวตง้ั Numerator ตัวหาร Rate/Ratio/Proportio

= 4/8 = 0.5 = 50%

► Measure happening or occurrence of

=4/(8-1) = 4/7 = 57.14%

=3/(3+3+4+1+2+2+4) = 3/18 = 0.17

► Attack rate = incidence proportion

หา attack rate ของกลุม ่ ้และ

AR ในกลุม ่ ้ placebo = 30.58

Pr < 0.05 มีความแตกต่างกันของ AR

► ั พันธ์ทางสถิตริ ะหว่างตัวแปรต ้นและตัวแปรตาม

► A difference of two prevalence

Total 70 130 200

Odds Ratio (OR)

► Case Non-case Total

ไม่มี Odds difference เพราะตัวหารคนละ

► Binary logistic regression model

Report odds ratios Report coefficients

OR = exp β = exp 0.479 = e^0.479

❖ Prediction of outcome from exposures

Bwt = 2657.33 + 12.36 (age)

Bwt = 3054.957 – 281.7133 (age)

Bwt = 3172.59 – 234.41(smoke) – 95.53(ptl) – 522.2454(ht) – 568.97(ui)

Baby birth weight will decrease 234.41g in smoking mother after

Report odds ratios Report coefficients

่ ดเชือ้ เป็ น 0.62 เท่า ของกลุ่มทีไม่

1. เป็ น risk factor ของ outcome

Study design Analytic

รายงานผลเป็ น Crude measure จากการ stratified

• poisson ais fal4, irr • poisson ais fal4, irr

• poisson ais slept, irr • poisson ais hit, irr

► Used in RCT or Cohort study (with

► IRR = (25/178)/(9/117) = 1.82 Total A+B TE+TU (A+B)/(TE+TU)

ดูวา่ งานเราเป็ นแบบไหน

1. ตัวไหนเป็ น outcome / exposure

Study design Measure of Measure of association

You might also like