You are on page 1of 9

1

DATA ANALYSIS: SMOKING AND DEATH

Data Analysis Project: Smoking and Cause of Death


Ashley Winans
National University

COH 611 Biostatistics


Instructor: Professor Sara Tweeten
August 31, 2015

2
DATA ANALYSIS: SMOKING AND DEATH

Abstract
The purpose of this project is to analyze the Framingham Heart Data and to find a possible
correlation between the cause of death and smoking status as well as the number of cigarettes
smoked. This particular study is one of the best known for determining the risk factors of
cardiovascular disease such as heart failure, stroke, and peripheral artery disease.

3
DATA ANALYSIS: SMOKING AND DEATH
Data Analysis Project: Smoking and Cause of Death
The aim of my project is to analyze the Framingham Heart Data set in order to detect any
correlation between the cause of death, smoking status, and possibly the number of cigarettes
smoked. My null hypothesis is that smokers are more prone to specific causes of death than nonsmokers. If possible, I will also attempt to see if the amount of cigarettes plays a role as well. A ttest will be used in order to determine my null hypothesis of whether smoking causes or
contributes to the different causes of death.
Data Pull
To begin with, I did a data pull for all of the Heart Data set. It gives the different variables
and attributes. The variables that I am working with are: Smoking, Smoking Status, and Death
Cause. These were the results for my three variables:
Alphabetic List of Variables and Attributes

Variable

Type

Len

DeathCause

Char

26

11

Smoking

Num

17

Smoking_Status

Char

17

Label

Cause of Death

Smoking Status

Causes of Death
The causes of death in this study are: Cancer, Cerebral Vascular Disease, Coronary Heart
Disease, Other and Unknown. These are our discrete variables for this study.

The FREQ Procedure

4
DATA ANALYSIS: SMOKING AND DEATH
Cause of Death

DeathCause

Frequency

Percent

Cumulative

Cumulative

Frequency

Percent

Frequency Missing = 3218


Cancer

539

27.07

539

27.07

Cerebral Vascular
Disease

378

18.99

917

46.06

Coronary Heart
Disease

605

30.39

1522

76.44

Other

357

17.93

1879

94.37

Unknown

112

5.63

1991

100.00

According to the above frequency distribution, Coronary Heart Disease is the most
common cause of death at 30.39%. It is followed by Cancer at 27.07% and Cerebral Vascular
Disease at 18.99%. Other as well as unknown death causes make up 17.93% and 5.63% of the
total respectively.
Smoking: Basic Statistical Measures
The Smokingcategory indicates the number of cigarettes smoked by the participants
in the data set. Not all of the sample patients smoke (about 48% in this study do not), and the
ones that do smoke anywhere from 1 to 60 per day. The mean is 9.37 with a standard deviation
of 12.03, a range of 60 and an interquartile range of 20.

Basic Statistical Measures

Location

Variability

5
DATA ANALYSIS: SMOKING AND DEATH
Mean

9.366518

Std Deviation

12.03145

Median

1.000000

Variance

144.75582

Mode

0.000000

Range

60.00000

Interquartile Range

20.00000

Based upon the histogram, the smoking histogram is skewed non-normal right. This
indicates that the mode (0 in this study) may not necessarily be the best indicator and that the
mean would be a more reliable source of data. In this case, the mean in 9.37. This indicates that,
on average, people in this study smoke 9.37 cigarettes per day.
According to students t-test (or two sample t-test), the p value is less than .0001, which
indicates that there is not a significant difference in the groups and that we cannot reject the null
hypothesis.
Basic Statistical Measures

Location

Variability

Mean

9.366518

Std Deviation

12.03145

Median

1.000000

Variance

144.75582

6
DATA ANALYSIS: SMOKING AND DEATH
Mode

0.000000

Range

60.00000

Interquartile Range

20.00000

Tests for Location: Mu0=0

Test

Statistic

p Value

Student's t

55.9927

Pr > |t|

<.0001

Sign

1336

Pr >= |M|

<.0001

Signed Rank

1785564

Pr >= |S|

<.0001

Smoking Status
The smoking status category indicates the frequency of the smokers and this frequency
distribution is below. 20.22% are categorized as heavy smokers who smoke between 16 to 25
cigarettes per day. 11.19% are light smokers at 1-5 cigarettes per day and 11.12% are moderate
smokers at 6-15 cigarettes per day. 9.10% are very heavy smokers who smoke more than 25
cigarettes per day. Interestingly, 48.35% of the study participants are classified as nonsmokers.

7
DATA ANALYSIS: SMOKING AND DEATH
Smoking Status
Smoking_Stat
us

Frequency

Percent

Cumulative

Cumulative

Frequency

Percent

Frequency Missing = 36
Heavy (16-25)

1046

20.22

1046

20.22

Light (1-5)

579

11.19

1625

31.41

Moderate (615)

576

11.13

2201

42.55

Non-smoker

2501

48.35

4702

90.90

Very Heavy (>


25)

471

9.10

5173

100.00

Analysis
Unfortunately, I was unable to run any of the SAS codes that would show correlation of
the results. In regards to smoking, the t-test proved that this was a reliable sample and that we
cannot reject our null hypothesis.
In this particular study, most of the Smoking Status participants were Non-smokers,
48.35%; however this indicates that the other 51.65% ARE indeed cigarette smokers and fall
into the various ranges mentioned previously (i.e. heavy smokers, light smokers). On average,
people in the study smoke 9.36 cigarettes per day. Of the causes of death, coronary heart
disease is the most common cause of death at 30.39%. According to prominent studies, the
cause of coronary heart disease are attributed to multiple risk factors which include: emotional
stress and Type A personalities, heredity, high cholesterol, high blood pressure, obesity,
diabetes, and tobacco abuse. Tobacco abuse includes more than just cigarettes which are
looked at in our study, but also cigars, pipes, and chewing tobacco. The nicotine in tobacco

8
DATA ANALYSIS: SMOKING AND DEATH
products and smoke reduces the amount of oxygen which can reach your heart, raises your
blood pressure, increases our heart rates, impairs your blood vessels, and increases the
likelihood of blood clots which may lead to strokes or heart attacks.
Coronary heart disease remains the leading cause of death in the United States and
smoking-caused heart disease results in more deaths per year than smoking-caused lung
cancer. Approximately 30% of all heart disease deaths are caused by cigarette smoking.
Smoking is commonly referred to as the single largest preventable cause of heart disease as
well as the most preventable cause of premature death in the United States. Cigarette smoking
alone accounts for more than 440,000 of the more than 2.4 million annual deaths.
Conclusion
What is the correlation between smoking status and causes of death? Based upon the fact
that 30% of all heart disease deaths are caused by cigarette smoking, the 30.39% of deaths due to
coronary heart disease in this study seems like an accurate fit.

9
DATA ANALYSIS: SMOKING AND DEATH

References
Smoking & Cardiovascular Disease (Heart Disease). (n.d.). Retrieved August 29, 2015.
Sullivan, L. (2012). Essentials of Biostatistics in Public Health (2nd ed.). Burlington, MA: Jones
& Bartlett Learning.
Tobacco and Cardiovascular Disease. (n.d.). Retrieved August 29, 2015.

You might also like