Professional Documents
Culture Documents
Group 7
INTRODUCTION
Cardiovascular diseases, especially sudden cardiac arrest, have been notoriously labelled
as the world’s most leading cause of death. Considering such implications, it is crucial for such
according to the World Health Organization. Thus, a research study led by Alan Garfinkel was
conducted in order to test the effectiveness of the drug “dobutamine” in measuring a patient’s risk
of suffering a cardiac event. Dobutamine Stress Echocardiogram is a type of test used to assess
the heart’s function under stress wherein the drug is injected inside the veins to make the
heartbeat faster similar to the results of exercising (John Hopkins Medicine, 2019). However, its
ability to determine the effectiveness of predicting cardiac events on older patients when stress
is caused by such drug instead of exercise is still yet to be determined. Moreover, the researchers
events over a certain period of time. According to the U.S. Department of Health and Human
Services, older people tend to get more affected with cardiovascular diseases than most young
people; hence, it justifies the significance of early diagnosis before complications become too late
to be treated. Conclusively, this study intends to determine which measurements and variables
are most effective and helpful in predicting cardiac events, specifically myocardial infarction (MI),
bypass grafting surgery (CABG), and even cardiac death over the next year.
DESCRIPTION OF THE DATASET
There was a total of 558 participants included in the study, 338 of which were females
and 220 were males with their ages ranging from 26 to 93 years old. Prior to the actual process
of the Dobutamine Stress Echocardiogram test, all of the participants were first examined and
questioned to acquire relevant information that were used later on in predicting cardiac events.
These included their basic information such as gender and age. In addition to such, recent
experiences of myocardial infarction, angioplasty, and coronary artery bypass grafting surgery,
were also detailed. Moreover, measurements of specific vital signs such as blood pressure, heart
rate, and ejection fractions were incorporated as variables relating to a patient’s well-being. Lastly,
other necessary details including recent experiences of chest pains, signs of heart attacks, ECG
and echocardiogram readings, past surgeries, and one’s history of diseases and smoking were
also determined to further validate which set of data were the most effective in predicting
indicators of cardiac events over the next year. Among all of these variables, we chose to analyze
the total population’s basal heart rate, basal blood pressure, maximum heart rate, maximum blood
OBJECTIVES
The research study sought to determine which among the thirty-one (31) variables presented in
the dataset file were the most effective in predicting cardiac events a year after utilizing the
presented below are to be highlighted in this laboratory report in line with the purpose of the study:
patient’s risk of suffering from cardiac events over the next year:
2. To apply and utilize all applicable descriptive statistical methods on the quantitative
3. To analyze the set of empirical data and measurements on vital signs using various
STATISTICAL METHODS
Various descriptive statistical methods were used in this report in order to describe and
analyze the total population through a set of data points in line with the chosen variables. The
specified variables were all quantitative in nature wherein all of which can also be referred to as
continuous variables except for age which is a discrete variable. Specifically, the following
1. Measures of Central Tendency - describes the whole population by identifying the central
a. Arithmetic Mean - the sum of all data values divided by the number of total
observations.
b. Median - the value of the middle number when the number of observations is odd
or the arithmetic mean of the two middle values if the observations are even.
2. Measures of Variation - represents the degree of dispersion or spread of data points from
its mean.
observations.
c. Cumulative frequency - the difference between the total frequency of values and
the classes as the x-axis and the corresponding frequencies/counts per class as
4. Percentiles and Box Plots - used to compare the score of a variable in reference to the
a. Upper Quartile - the value dividing the third and fourth quartile which is also
b. Lower Quartile - the value dividing the first and second quartile which is also
In light of the computations made with the empirical data provided, the following results
and findings are interpreted below through the use of tables and figures. Given the nature of the
measurements and observations presented, various descriptive statistical methods were used in
order to yield meaningful information about the general population. Through which, such variables
were tested in the study to determine whether or not these measurements would be capable of
predicting a patient’s probability of suffering a cardiac event over the next few years.
Table 1 presents the measure of central tendency and variation for the specified variables
such as a patient’s basal heart rate (BHR), basal blood pressure (baseBP), maximum heart rate
(MaxHR), maximum blood pressure (MBP), and age. As for the variables’ arithmetic mean, the
total population had an average of 75.29 bpm for a patient’s basal heart rate as compared to their
maximum heart rate which showed an average of 119.369 bpm. On the other hand, a patient’s
basal blood pressure had an average of 135.324 mmHg while their maximum blood pressure
showed an average of 156 mmHg. In terms of demographic profile, the average age of
respondents was approximately 67 years old. As for the measure of dispersion among variables,
the population’s maximum blood pressure had the widest spread of data measurements (V =
1,003.448 and STD = 31.677 respectively) while age had the least spread of values among all
variables (V = 144.928 and STD = 12.039 respectively). When data measurements are ranked in
ascending order, the median for basal heart rate, basal blood pressure, maximum heart rate,
maximum blood pressure, and age are 74, 133, 120, 150, and 69 respectively. It could also be
inferred that 50% of observations are above such values while the other 50% are below it as well.
As for the mode, all of the aforementioned variables are unimodal except for basal heart rate
which was revealed to be bimodal (Mode = 67 and 72). Such values indicate that they are the
Table 2 presents the corresponding values of the aforementioned variables from the 10th
up to the 90th percentile as well as the lower quartile and the upper quartile (25th and 75th
percentile respectively). As for the interpretation of the quantities shown, it could be inferred that
for the nth percentile, n% of the specified value falls below it and (100 - n)% of such are classified
above it. For instance, 63 is the value for the 20th percentile of the basal heart rate which means
that 20% of the measurements taken within the total population are less than 63 in terms of a
patient’s basal heart rate. Moreover, the values for the 50th percentile are also equal to the
median of its corresponding variable since half of the quantities fall below it while the other half
Figure 1 presents the histogram of measurements taken within the total population for
basal heart rate (in bpm) in order to show the frequency distribution of data points. There were
ten (10) class intervals defined in the table each having a width of 16.8 given that the maximum
and minimum values are 210 and 42 respectively. The class interval with the most number of
counts ranges from 58.8 to 75.6 (f = 245) with a relative frequency of 0.439 or 43.9% while the
class interval with the least number of counts ranges from 142.8 to 193.2 (f = 0) having a relative
skewed since most of the values are clustered on the left side while the right tail of the distribution
is longer. This indicates the presence of outliers that have a greater value than the arithmetic
mean of the population found in the interval (109.2, 210]. Moreover, this can be supported by the
fact that the modes (Mo = 67 and 72) are less than both the median (Md = 74) and mean (μ =
Figure 2 presents the box plot of data values taken within the total population for basal
heart rate (in bpm) in order to determine the skewness of its distribution as well as outliers outside
the given range. The lower quartile, as indicated by the 25th percentile, has a value of 64 which
implies that 25% of all observations fall below it while the upper quartile, as indicated by the 75th
percentile, has a value of 84 meaning that 75% of measurements fall below it as well. On the
other hand, the median (also referred to as the 50th percentile) has a value of 74; thus, it is
considered to be the mid-point of all data sets. Additionally, there are three (3) data points that
are considered to be outliers beyond the upper whisker. Specifically, these are 115, 127, and 210;
hence, it leads to a more positively skewed distribution. This could also be further supported by
the fact that the mean is shown in the box plot to have a greater value as compared to the median.
Figure 3 presents the histogram of measurements taken within the total population for
basal blood pressure (in mmHg) in order to show the frequency distribution of data points. There
were ten (10) class intervals defined in the table each having a width of 11.8 given that the
maximum and minimum values are 203 and 85 respectively. The class interval with the most
number of counts ranges from 120.4 to 132.2 (f = 121) with a relative frequency of 0.217 or 21.7%
while the class interval with the least number of counts ranges from 191.2 to 203 (f = 6) having a
relative frequency of 0.011 or 1.10%. Moreover, it could be argued that the distribution of
measurements is positively skewed since most of the values are clustered on the left side while
the right tail of the distribution is longer. This also indicates the presence of an outlier that has a
greater value than the arithmetic mean of the population found in the interval (191.2, 203].
Moreover, this can be supported by the fact that the mode (Mo = 120) is less than both the median
(Md = 133) and mean (μ = 135.324) whereas the mean is greater than the median value.
Figure 4 presents the box plot of data values taken within the total population for basal
blood pressure (in mmHg) in order to determine the skewness of its distribution as well as outliers
outside the given range. The lower quartile, as indicated by the 25th percentile, has a value of
120 which implies that 25% of all observations fall below it while the upper quartile, as indicated
by the 75th percentile, has a value of 150 meaning that 75% of measurements fall below it as
well. On the other hand, the median (also referred to as the 50th percentile) has a value of 133;
thus, it is considered to be the mid-point of all data sets. Additionally, there is only one (1) data
point that is considered to be an outlier beyond the upper whisker. Specifically, this refers to the
data value of 201 which could also be associated with its positively skewed distribution. This is
further supported by the fact that the mean is shown in the box plot to have a greater value as
Figure 5 presents the histogram of measurements taken within the total population for
maximum heart rate (in bpm) in order to show the frequency distribution of data points. There
were ten (10) class intervals defined in the table each having a width of 14.2 given that the
maximum and minimum values are 200 and 58 respectively. The class interval with the most
number of counts ranges from 114.8 to 129 (f = 156) with a relative frequency of 0.28 or 28%
while the class interval with the least number of counts ranges from 185.5 to 200 (f = 1) having a
relative frequency of 0.002 or 0.2%. Moreover, it could be inferred that the distribution of
of the values are symmetrical on either side of the histogram. However, there are also indications
of outliers on both sides of the distribution. Furthermore, the near-symmetrical distribution of this
histogram could be associated with the fact that the mean is approximately equal to the median
Figure 6 presents the box plot of data values taken within the total population for maximum
heart rate (in bpm) in order to determine the skewness of its distribution as well as outliers outside
the given range. The lower quartile, as indicated by the 25th percentile, has a value of 104.25
which implies that 25% of all observations fall below it while the upper quartile, as indicated by
the 75th percentile, has a value of 133 meaning that 75% of measurements fall below it as well.
On the other hand, the median (also referred to as the 50th percentile) has a value of 120; thus,
it is considered to be the mid-point of all data sets. Additionally, there are two (2) data points that
are considered to be outliers beyond the upper whisker which are specifically, 182 and 200. On
the other hand, there is only one (1) outlier below the lower whisker which is 58. Furthermore, it
could also be inferred that the mean is approximately equal to the median value.
Figure 7. Histogram of Measurements Taken for Maximum Blood Pressure
Figure 7 presents the histogram of measurements taken within the total population for
maximum blood pressure (in mmHg) in order to show the frequency distribution of data points.
There were ten (10) class intervals defined in the table each having a width of 22.5 given that the
maximum and minimum values are 309 and 84 respectively. The class interval with the most
number of counts ranges from 129 to 151.5 (f = 192) with a relative frequency of 0.344 or 34.4%
while the class intervals with the least number of counts both ranges from 241.5 to 264 (f = 1) and
286.5 to 309 (f = 1) each having a relative frequency of 0.002 or 0.2%. Moreover, it could be
argued that the distribution of measurements is positively skewed since most of the values are
clustered on the left side while the right tail of the distribution is longer. This also indicates the
presence of outliers that have a greater value than the arithmetic mean of the population found in
the interval (241.5, 309]. Moreover, this can be supported by the fact that the mode (Mo = 140) is
less than both the median (Md = 150) and mean (μ = 156) whereas the mean is greater than the
median value.
Figure 8. Box Plot of Measurements Taken for Maximum Blood Pressure
Figure 8 presents the box plot of data values taken within the total population for maximum
blood pressure (in mmHg) in order to determine the skewness of its distribution as well as outliers
outside the given range. The lower quartile, as indicated by the 25th percentile, has a value of
133.25 which implies that 25% of all observations fall below it while the upper quartile, as indicated
by the 75th percentile, has a value of 175.75 meaning that 75% of measurements fall below it as
well. On the other hand, the median (also referred to as the 50th percentile) has a value of 150;
thus, it is considered to be the mid-point of all data sets. Additionally, there are five (5) data points
that are considered to be outliers beyond the upper whisker. Specifically, these are 240, 250, 274,
283, and 309; hence, it leads to a more positively skewed distribution. This could also be further
supported by the fact that the mean is shown in the box plot to have a greater value as compared
to the median.
Figure 9. Histogram of Measurements Taken for Age
Figure 9 presents the histogram of measurements taken within the total population for age
(in years) in order to show the frequency distribution of data points. There were ten (10) class
intervals defined in the table each having a width of 6.7 given that the maximum and minimum
values are 93 and 26 respectively. The class interval with the most number of counts ranges from
66.2 to 72.9 (f = 133) with a relative frequency of 0.238 or 23.8% while the class interval with the
least number of counts ranges from 26 to 32.7 (f = 5) having a relative frequency of 0.009 or 0.9%.
Moreover, it could be argued that the distribution of measurements is negatively skewed since
most of the values are clustered on the right side while the left tail of the distribution is longer.
This also indicates the presence of outliers that have a lesser value than the arithmetic mean of
the population found in the interval (26, 32.7]. Moreover, this can be supported by the fact that
the mean (μ = 67.344) is less than the median value (Md = 69).
Figure 10. Box Plot of Measurements Taken for Age
Figure 10 presents the box plot of data values taken within the total population for age (in
years) in order to determine the skewness of its distribution as well as outliers outside the given
range. The lower quartile, as indicated by the 25th percentile, has a value of 60 which implies that
25% of all observations fall below it while the upper quartile, as indicated by the 75th percentile,
has a value of 75 meaning that 75% of measurements fall below it as well. On the other hand, the
median (also referred to as the 50th percentile) has a value of 69; thus, it is considered to be the
mid-point of all data sets. Additionally, there are four (4) data points that are considered to be
outliers below the lower whisker. Specifically, these are 26, 28, 30, and 33; hence, it leads to a
more negatively skewed distribution. This could also be further supported by the fact that the
mean is shown in the box plot to have a lesser value as compared to the median.
REFERENCES
Cardiovascular diseases (cvds). World Health Organization. [accessed 2021 Sep 15].
https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Daniel WW, Cross CL. Biostatistics: A foundation for analysis in the health sciences. 8th ed.
Dobutamine stress echocardiogram. Johns Hopkins Medicine. [accessed 2021 Sep 15].
https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/dobutamine-
stress-echocardiogram
Garfinkel A, Krivokapich J, Child JS, Walter DO. Prognostic value of dobutamine stress
coronary artery disease. Journal of the American College of Cardiology. 1999 Mar
Samuels ML, Witmer JA, Schaffner AA. Statistics for the Life Sciences. 4th ed. Harlow: Pearson
U.S. Department of Health and Human Services. Heart health and aging. National Institute on
https://drive.google.com/drive/folders/1dUQWM1jbVj4sO1EV2IEWm4dZMIP8RjFR?usp=sharin
NOTE: MS Excel is the application used to compile all the computations and solutions provided