Professional Documents
Culture Documents
2
DATA ANALYSIS: SMOKING AND DEATH
Abstract
The purpose of this project is to analyze the Framingham Heart Data and to find a possible
correlation between the cause of death and smoking status as well as the number of cigarettes
smoked. This particular study is one of the best known for determining the risk factors of
cardiovascular disease such as heart failure, stroke, and peripheral artery disease.
3
DATA ANALYSIS: SMOKING AND DEATH
Data Analysis Project: Smoking and Cause of Death
The aim of my project is to analyze the Framingham Heart Data set in order to detect any
correlation between the cause of death, smoking status, and possibly the number of cigarettes
smoked. My null hypothesis is that smokers are more prone to specific causes of death than nonsmokers. If possible, I will also attempt to see if the amount of cigarettes plays a role as well. A ttest will be used in order to determine my null hypothesis of whether smoking causes or
contributes to the different causes of death.
Data Pull
To begin with, I did a data pull for all of the Heart Data set. It gives the different variables
and attributes. The variables that I am working with are: Smoking, Smoking Status, and Death
Cause. These were the results for my three variables:
Alphabetic List of Variables and Attributes
Variable
Type
Len
DeathCause
Char
26
11
Smoking
Num
17
Smoking_Status
Char
17
Label
Cause of Death
Smoking Status
Causes of Death
The causes of death in this study are: Cancer, Cerebral Vascular Disease, Coronary Heart
Disease, Other and Unknown. These are our discrete variables for this study.
4
DATA ANALYSIS: SMOKING AND DEATH
Cause of Death
DeathCause
Frequency
Percent
Cumulative
Cumulative
Frequency
Percent
539
27.07
539
27.07
Cerebral Vascular
Disease
378
18.99
917
46.06
Coronary Heart
Disease
605
30.39
1522
76.44
Other
357
17.93
1879
94.37
Unknown
112
5.63
1991
100.00
According to the above frequency distribution, Coronary Heart Disease is the most
common cause of death at 30.39%. It is followed by Cancer at 27.07% and Cerebral Vascular
Disease at 18.99%. Other as well as unknown death causes make up 17.93% and 5.63% of the
total respectively.
Smoking: Basic Statistical Measures
The Smokingcategory indicates the number of cigarettes smoked by the participants
in the data set. Not all of the sample patients smoke (about 48% in this study do not), and the
ones that do smoke anywhere from 1 to 60 per day. The mean is 9.37 with a standard deviation
of 12.03, a range of 60 and an interquartile range of 20.
Location
Variability
5
DATA ANALYSIS: SMOKING AND DEATH
Mean
9.366518
Std Deviation
12.03145
Median
1.000000
Variance
144.75582
Mode
0.000000
Range
60.00000
Interquartile Range
20.00000
Based upon the histogram, the smoking histogram is skewed non-normal right. This
indicates that the mode (0 in this study) may not necessarily be the best indicator and that the
mean would be a more reliable source of data. In this case, the mean in 9.37. This indicates that,
on average, people in this study smoke 9.37 cigarettes per day.
According to students t-test (or two sample t-test), the p value is less than .0001, which
indicates that there is not a significant difference in the groups and that we cannot reject the null
hypothesis.
Basic Statistical Measures
Location
Variability
Mean
9.366518
Std Deviation
12.03145
Median
1.000000
Variance
144.75582
6
DATA ANALYSIS: SMOKING AND DEATH
Mode
0.000000
Range
60.00000
Interquartile Range
20.00000
Test
Statistic
p Value
Student's t
55.9927
Pr > |t|
<.0001
Sign
1336
Pr >= |M|
<.0001
Signed Rank
1785564
Pr >= |S|
<.0001
Smoking Status
The smoking status category indicates the frequency of the smokers and this frequency
distribution is below. 20.22% are categorized as heavy smokers who smoke between 16 to 25
cigarettes per day. 11.19% are light smokers at 1-5 cigarettes per day and 11.12% are moderate
smokers at 6-15 cigarettes per day. 9.10% are very heavy smokers who smoke more than 25
cigarettes per day. Interestingly, 48.35% of the study participants are classified as nonsmokers.
7
DATA ANALYSIS: SMOKING AND DEATH
Smoking Status
Smoking_Stat
us
Frequency
Percent
Cumulative
Cumulative
Frequency
Percent
Frequency Missing = 36
Heavy (16-25)
1046
20.22
1046
20.22
Light (1-5)
579
11.19
1625
31.41
Moderate (615)
576
11.13
2201
42.55
Non-smoker
2501
48.35
4702
90.90
471
9.10
5173
100.00
Analysis
Unfortunately, I was unable to run any of the SAS codes that would show correlation of
the results. In regards to smoking, the t-test proved that this was a reliable sample and that we
cannot reject our null hypothesis.
In this particular study, most of the Smoking Status participants were Non-smokers,
48.35%; however this indicates that the other 51.65% ARE indeed cigarette smokers and fall
into the various ranges mentioned previously (i.e. heavy smokers, light smokers). On average,
people in the study smoke 9.36 cigarettes per day. Of the causes of death, coronary heart
disease is the most common cause of death at 30.39%. According to prominent studies, the
cause of coronary heart disease are attributed to multiple risk factors which include: emotional
stress and Type A personalities, heredity, high cholesterol, high blood pressure, obesity,
diabetes, and tobacco abuse. Tobacco abuse includes more than just cigarettes which are
looked at in our study, but also cigars, pipes, and chewing tobacco. The nicotine in tobacco
8
DATA ANALYSIS: SMOKING AND DEATH
products and smoke reduces the amount of oxygen which can reach your heart, raises your
blood pressure, increases our heart rates, impairs your blood vessels, and increases the
likelihood of blood clots which may lead to strokes or heart attacks.
Coronary heart disease remains the leading cause of death in the United States and
smoking-caused heart disease results in more deaths per year than smoking-caused lung
cancer. Approximately 30% of all heart disease deaths are caused by cigarette smoking.
Smoking is commonly referred to as the single largest preventable cause of heart disease as
well as the most preventable cause of premature death in the United States. Cigarette smoking
alone accounts for more than 440,000 of the more than 2.4 million annual deaths.
Conclusion
What is the correlation between smoking status and causes of death? Based upon the fact
that 30% of all heart disease deaths are caused by cigarette smoking, the 30.39% of deaths due to
coronary heart disease in this study seems like an accurate fit.
9
DATA ANALYSIS: SMOKING AND DEATH
References
Smoking & Cardiovascular Disease (Heart Disease). (n.d.). Retrieved August 29, 2015.
Sullivan, L. (2012). Essentials of Biostatistics in Public Health (2nd ed.). Burlington, MA: Jones
& Bartlett Learning.
Tobacco and Cardiovascular Disease. (n.d.). Retrieved August 29, 2015.