You are on page 1of 12

Name : M.

Agung Prasetya Adnyana Yoga


NIM : 21/475855/PKU/19330
Country : Indonesia
Assignment : Explory Data Analysis
Question
1. Repeate the class exercise on creating graphs but examine variables systolic blood
pressure and HDL cholesterol (mg/dL), exam 3 (hdlc3) or third examination.
2. Create a balance table as presented in the table 1 article number 2 by Pencina Michael, J.
et al. 2009. Compare between first and third examinations for the following variables:
death angina hospmi stroke sysbp1 diabp bmi totchol hdlc ldlc cursmoke.
3. Please interpret your results.

Answers
1. Code book
- Codebook provides diverse information, including the type of variable, range,
frequent values, amount of missing. When we use code book in stata, it can describes
very detail content of data sets. I will use Framingham data to make a code book
here.
- I will use command in stata chat to use “codebook sysbp or “ codebook hdlc3” and
etc” to looking for about the missing values and total sample size from each variables
that I want.
- For example the variable sex3 in a stata windows were told to us about the missing
values calculated 1.171 data and the total sample size were calculated 4.434. and also
the others variables for instance total cholesterol variable showed to us about the total
missing values that is 1.385 data were missed from 4.434 total data sample size .

Table 1. Missing Values and Sample Size in a data Framingham examination 3

No Variable Missing Values and Sample Size


1 Sex3 1,171/4,434
2 Total cholesterol (mg/dl) 1,385/4,434
3 Age (years) 1,171/4,434
4 Systolic Blood Pressure 1,171/4,434
(mmhg)
5 Diatolic blood pressure 1,171/4,434
(mmhg)
6 Current smoker 1,171/4,434
7 Number of cigarettes per day 1,185/4,434
8 Heart rate 1,175/4,434
9 Glucose level (mg/dl) 1,733/4,434
10 Prevalent CHD (Coronary 1,171/4,434
Heart Disease )
11 Body mass index 1,188/4,434
12 Diabetic 1,171/4,434
13 Use anti hypertension 1,617/4,434
medication
14 LDL3 1,408/4,434
15 HDL3 1,407/4,434
16 Prevalent angina pectoris 1,171/4,434
17 Prevalent stroke 1,171/4,434
18 Prevalent myocardial 1,171/4,434
infarction

2. Describe
- The describe command given to us information about how the variable is stored in
Stata and to describe a data set about the number observation that we have, size data,
total variables, value labels from each variables and give us information about the
whole type variables that we have.
- I will use command in stata command chat box with “describe”
- There is 74 Variables in a Framingham data and the whole number observation from
this data were calculated 4,434.

Table 2. Contain data information about observation, size and total variables in
a whole framingham data
No Describe Contain Data
1 Number 4,434
Observation
2 Size 900,102
3 Total Variables 74
4 Variables types Randid, death, angina, hospmi, stroke, cvd,
hyperten,timeap, timemi,age1,diabetes1, sex1,bmi1,
hearrate,timechd,
timehyp,cursmoke2,cigpday2,hdlc1,ldlc1, and so on
until 74 variables.

3. Summarize
- Summarize in a stata is the basic descriptive statistics command in Stata
is summarize, which calculates means, standard deviations, and ranges.
- I was write in a chat box stata command to show some values about mean and standar
deviation , minimum and maximum in each variables with “Summarize”

Table 3. Mean and Standard Deviation in a examination 3 framingham data

No Variables Obs Mean Standar Min Max


Deviation
1 Sex3 3,263 1.5 .4944 1 2
2 Totchol3 3,049 236.7 44,44 112 625
3 Age3 3,263 60.6 8,2 44 81
4 Sysbp3 3,263 140.2 22,9 86 267
5 Diabp3 3,263 81.7 11,2 30 130
6 Cursmoke3 3,263 .34 .474 0 1
7 Cigpday3 3,249 6.771 11.6 0 80
8 Bmi3 3,246 25.89 4.08 14,43 56,8
9 Diabetes3 3,263 .077 .267 0 1
10 Bpmeds3 2,817 .152 .359 0 1
11 Heartrate3 3,259 77.35 12.4 37 150
12 Glucose3 2,701 89.77 28.15 46 478
13 Prevchd3 3,263 .110 .313 0 1
14 Prevap3 3,263 .079 .270 0 1
15 Prevmi3 3,263 .048 .214 0 1
16 Prevstrk3 3,263 .021 .143 0 1
17 Prevhyp3 3,263 .599 .490 0 1
18 Hdlc3 3,027 49.3 15.6 10 189
19 Ldlc3 3,026 176.46 46.8 20 565

4. Make a balance table as presented in Elok journal article 2


- I will make some explanation with a few variables in exam 3 for instance : death
angina hospmi stroke sysbp1 diabp bmi totchol hdlc ldlc cursmoke.
- I was write in a chat box stata command to show some values about mean and standar
deviation in each variables with “ tabulate sex3, sum (sysbp1)” or tabulate sex3, sum
(bmi) “
- In the other hand, I was write in a chat box stata command to show some values about
incident in each variables that I want to analyses with “ tabulate sex3 stroke” or
“tabulate sex3 angina “
Table 4. Baseline Characteristics and Incident Events from each variables based on
sex
No Variable Men Women
(Clinical Features) n = 1,387 n = 1,876

Incident Event
1 Angina 264 264
2 Stroke 133 150
3 Incident Hospitalized 213 95
139,25 ± 21,1 140,92 ± 24,13
Systolic Blood
4 Pressure 1
5 Body Mass Index 26.22 ± 3.49 25.65 ± 4.45
6 Total cholesterol 26.22 ±3.49 25.65 ± 4.45
7 HDL Cholesterol 43.7 ± 13.2 53.6 ± 15.90
(mg/dl)
8 LDL Cholesterol 170.54 ± 44.65 180.94 ± 47.99
(mg/dl)
9 Current Smoker 539 582

Interpretation :
We know about our data that we had in a table 4 while in this Framingham data
examination 3 we have a total sample size around 3.263 people and we divided into two
group that are 1.387 men and 1.876 woman. And I want to analyses and describe a little
bit from each variable.
1. Angina  The incident from the population who suffered with angina disease were
included in a group man population that is 264 people but 1.123 people don’t have an
incident angina pectoris meanwhile in a woman group population was 264 people too
who suffered with angina and the other hand 1.612 people not have an incident
angina pectoris.

2. Stroke  The incident of stroke in this study in a man population was 133 people but
1.254 people don’t have an incident stroke and then the incident of stroke in a group
woman were calculated 150 people vice versa with 1.726 woman don’t have an
incident stroke disease.
3. Incidence hospitalized  Were identified the incidence hospitalized in this study
from the people who suffered with cardiovascular disease that is 213 people in a men
group were hospitalized due to cardiovascular disease but 1.174 people don’t need
hospitalized and then 95 people in a woman group, meanwhile 1.781 people don’t
need too hospitalized among female group.

4. Current smoker  The incident from the population who have a risk behaviour that
were doing current smoker in this study include in a group man population that is
539 people were smoker but 848 people non smoker. Meanwhile in a woman group
population was 582 people too who have a risk behaviour to get a cardiovascular
disease from were doing smoker habits but 1.294 non smoker .

5. Body Mass Index  the mean value and standard deviation from the variable body
mass index were calculated in a man population is 26.22 ± 3.49 and in a woman
population is 25.65 ± 4.45. the value of standard deviation in a 2 group population
was smaller rather than the mean value its signified that the variable of body mass
index tend to have a homogen data. The mean or average body mass index in a man
population is 26.2 and then the average body mass index in woman population is
25.65.

6. Systolic blood pressure  the value mean and standard deviation from the variable
systolic blood pressure were calculated in a man population that is 139.25 ± 21.1 and
in a woman population that is 140,92 ± 24.13. The value of standard deviation in a 2
group population tend to were smaller rather than the mean value its signified that the
variable of systolic blood pressure tend to have a homogen data. Other than that, the
mean or average systolic blood pressure in a man population around 139.25 and then
the average value systolic blood pressure in woman population is 140.92.

7. HDL Cholesterol  the value mean and standard deviation from the variable HDL
cholesterol were calculated in a man population that is 43.7 ± 13.2 and in a woman
population that is 53.6 ± 15.90. The value of standard deviation in a 2 group
population tend to were smaller rather than the mean value its signified that the
variable of HDL cholesterol tend to have a homogen data. Other than that, the mean
or average systolic blood pressure in a man population around 43.7 and then the
average value HDL cholesterol in woman population is 53.6.

8. LDL Cholesterol  the value mean and standard deviation from the variable LDL
cholesterol were calculated in a man population that is 170.54 ± 44.65 and in a
woman population that is 180.94 ± 47.99. The value of standard deviation in a 2
group population tend to were smaller rather than the mean value its signified that the
variable of LDL cholesterol tend to have a homogen data. Other than that, the mean
or average systolic blood pressure in a man population around 170.54 and then the
average value LDL cholesterol in woman population is 47.99.

5. Balance Table Histogram


a. Histogram normal distribution for variable sysbp3 (systolic blood pressure 3)

.02
.015
Density
.01.005
0

100 150 200 250 300


Systolic blood pressure (mmHg), exam 3

b. Histogram normal distribution for variable hdlc3 (HDL Cholesterol 3 )


.03
.02
Density
.01
0

0 50 100 150 200


HDL cholesterol (mg/dL), exam 3
c. Histogram for variable hdlc3 (HDL Cholesterol) for each sex category

Male Female

300
200
Frequency
100
0

0 50 100 150 200 0 50 100 150 200


HDL cholesterol (mg/dL), exam 3
Graphs by Sex, exam 3

d. Histogram for variable systolic blood pressure 3 (sysbp3) for each sex category

Male Female
200
150
Frequency
100
50
0

100 150 200 250 100 150 200 250


Systolic blood pressure (mmHg), exam 3
Graphs by Sex, exam 3
APPENDIX

1. The example “Codebook” command in stata.


. codebook sex3 totchol3 age3 sysbp3 diabp3

sex3 Sex, exam 3

type: numeric (byte)


label: mf

range: [1,2] units: 1


unique values: 2 missing .: 1,171/4,434

tabulation: Freq. Numeric Label


1,387 1 Male
1,876 2 Female
1,171 .

totchol3 Total cholesterol (mg/dL), exam 3

type: numeric (int)

range: [112,625] units: 1


unique values: 243 missing .: 1,385/4,434

mean: 236.713
std. dev: 44.4495

percentiles: 10% 25% 50% 75% 90%


184 206 234 264 293

age3 Age (years), exam 3

type: numeric (byte)

range: [44,81] units: 1


unique values: 38 missing .: 1,171/4,434

mean: 60.6482
std. dev: 8.29677

percentiles: 10% 25% 50% 75% 90%


50 54 60 67 73

2. Few example from describe command in stata.


. describe

Contains data from C:\Users\ASUS\Downloads\framingham03.dta


obs: 4,434
vars: 74 30 Aug 2012 14:39
size: 900,102

storage display value


variable name type format label variable label

randid long %12.0g Random ID


death byte %12.0g yesno Death indicator
angina byte %12.0g yesno Incident Angina Pectoris
hospmi byte %12.0g yesno Incident Hospitalized MI
mi_fchd byte %12.0g yesno Incident Hosp MI-Fatal CHD
anychd byte %12.0g yesno Incident Hosp MI, AP, CI, Fatal
CHD
stroke byte %12.0g yesno Incident Stroke Fatal/non-fatal
cvd byte %12.0g yesno Incident Hosp MI or Stroke,
Fatal or Non
hyperten byte %12.0g yesno Incident Hypertension
timeap double %12.0g Time (years) to Angina
timemi double %12.0g Time (years) to Hosp MI
timemifc double %12.0g Time (years) to MI-Fatal CHD
timechd double %12.0g Time (years) to CHD
timestrk double %12.0g Time (years) to Stroke
timecvd float %12.0g Time (years) to CVD
timedth double %12.0g Time (years) to Death
timehyp double %12.0g Time (years) to Hypertension
sex1 byte %12.0g mf Sex, exam 1
totchol1 int %12.0g Total cholesterol (mg/dL), exam
1
3. Summarize from Framingham data examination 3

. summarize sex3 totchol3 age3 sysbp3 diabp3 cursmoke3 cigpday3 bmi3 diabetes3

Variable Obs Mean Std. Dev. Min Max

sex3 3,263 1.574931 .4944292 1 2


totchol3 3,049 236.7133 44.44948 112 625
age3 3,263 60.64818 8.296766 44 81
sysbp3 3,263 140.2158 22.92764 86 267
diabp3 3,263 81.79298 11.27143 30 130

cursmoke3 3,263 .3435489 .4749655 0 1


cigpday3 3,249 6.771622 11.62963 0 80
bmi3 3,246 25.89478 4.080655 14.43 56.8
diabetes3 3,263 .0778425 .2679646 0 1

4. Tabulisation Sex3 Variable.


. tabulate sex3

Sex, exam 3 Freq. Percent Cum.

Male 1,387 42.51 42.51


Female 1,876 57.49 100.00

Total 3,263 100.00

5. Tabulate incident event (Angina, Stroke, incident hospitalized and current smoker) based
on Sex3.

. tabulate sex3 angina

Incident Angina
Sex, exam Pectoris
3 No Yes Total

Male 1,123 264 1,387


Female 1,612 264 1,876

Total 2,735 528 3,263

. tabulate sex3 stroke

Incident Stroke
Sex, exam Fatal/non-fatal
3 No Yes Total

Male 1,254 133 1,387


Female 1,726 150 1,876

Total 2,980 283 3,263

. tabulate sex3 hospmi

Incident Hospitalized
Sex, exam MI
3 No Yes Total

Male 1,174 213 1,387


Female 1,781 95 1,876

Total 2,955 308 3,263

. tabulate sex3 cursmoke3

Current smoker, exam


Sex, exam 3
3 No Yes Total

Male 848 539 1,387


Female 1,294 582 1,876

Total 2,142 1,121 3,263

6. Tabulate Body Mass Index, Systolic blood pressure, Total cholesterol, LDL Cholesterol,
HDL Cholesterol based on Sex3.
. tabulate sex3, sum( bmi3)

Summary of Body mass index, exam 3


Sex, exam 3 Mean Std. Dev. Freq.

Male 26.224246 3.493956 1,380


Female 25.651125 4.4504957 1,866

Total 25.894781 4.0806554 3,246

. tabulate sex3, sum ( sysbp3)

Summary of Systolic blood pressure


(mmHg), exam 3
Sex, exam 3 Mean Std. Dev. Freq.

Male 139.25775 21.151674 1,387


Female 140.92404 24.138008 1,876

Total 140.21575 22.927642 3,263

. tabulate sex3, sum ( totchol3)

Summary of Total cholesterol


(mg/dL), exam 3
Sex, exam 3 Mean Std. Dev. Freq.

Male 225.74238 41.127538 1,312


Female 245 45.076676 1,737

Total 236.71335 44.449481 3,049

You might also like