You are on page 1of 17

Module 1 Computation

Group 7

BIO 31.02 Biostatistics


Section D
1st Semester AY 2021-2022

INTRODUCTION

Cardiovascular diseases, especially sudden cardiac arrest, have been notoriously labelled

as the world’s most leading cause of death. Considering such implications, it is crucial for such

diseases to be detected beforehand to get early management and prevention treatments

according to the World Health Organization. Thus, a research study led by Alan Garfinkel was

conducted in order to test the effectiveness of the drug “dobutamine” in measuring a patient’s risk

of suffering a cardiac event. Dobutamine Stress Echocardiogram is a type of test used to assess

the heart’s function under stress wherein the drug is injected inside the veins to make the

heartbeat faster similar to the results of exercising (John Hopkins Medicine, 2019). However, its

ability to determine the effectiveness of predicting cardiac events on older patients when stress

is caused by such drug instead of exercise is still yet to be determined. Moreover, the researchers

desired to ascertain the efficacy of dobutamine stress echocardiography in predicting cardiac

events over a certain period of time. According to the U.S. Department of Health and Human

Services, older people tend to get more affected with cardiovascular diseases than most young

people; hence, it justifies the significance of early diagnosis before complications become too late

to be treated. Conclusively, this study intends to determine which measurements and variables

are most effective and helpful in predicting cardiac events, specifically myocardial infarction (MI),

revascularization by percutaneous transluminal coronary angioplasty (PTCA), coronary artery

bypass grafting surgery (CABG), and even cardiac death over the next year.
DESCRIPTION OF THE DATASET

There was a total of 558 participants included in the study, 338 of which were females

and 220 were males with their ages ranging from 26 to 93 years old. Prior to the actual process

of the Dobutamine Stress Echocardiogram test, all of the participants were first examined and

questioned to acquire relevant information that were used later on in predicting cardiac events.

These included their basic information such as gender and age. In addition to such, recent

experiences of myocardial infarction, angioplasty, and coronary artery bypass grafting surgery,

were also detailed. Moreover, measurements of specific vital signs such as blood pressure, heart

rate, and ejection fractions were incorporated as variables relating to a patient’s well-being. Lastly,

other necessary details including recent experiences of chest pains, signs of heart attacks, ECG

and echocardiogram readings, past surgeries, and one’s history of diseases and smoking were

also determined to further validate which set of data were the most effective in predicting

indicators of cardiac events over the next year. Among all of these variables, we chose to analyze

the total population’s basal heart rate, basal blood pressure, maximum heart rate, maximum blood

pressure, and age.

OBJECTIVES

The research study sought to determine which among the thirty-one (31) variables presented in

the dataset file were the most effective in predicting cardiac events a year after utilizing the

dobutamine stress echocardiography among patients. However, the following objectives

presented below are to be highlighted in this laboratory report in line with the purpose of the study:

1. To determine which relevant measurements could be further assessed in predicting a

patient’s risk of suffering from cardiac events over the next year:

1. Basal heart rate (in bpm),

2. Basal blood pressure (in mmHg),

3. Maximum blood rate (in bpm),


4. Maximum blood pressure (in mmHg),

5. Age (in years);

2. To apply and utilize all applicable descriptive statistical methods on the quantitative

variables provided so as to yield significant information about the total population;

3. To analyze the set of empirical data and measurements on vital signs using various

statistical approaches in order to provide context and background information on the

observations that were gathered.

STATISTICAL METHODS

Various descriptive statistical methods were used in this report in order to describe and

analyze the total population through a set of data points in line with the chosen variables. The

specified variables were all quantitative in nature wherein all of which can also be referred to as

continuous variables except for age which is a discrete variable. Specifically, the following

methods were utilized in the latter parts of the discussion:

1. Measures of Central Tendency - describes the whole population by identifying the central

position of any set of data when arranged in an ordered manner.

a. Arithmetic Mean - the sum of all data values divided by the number of total

observations.

b. Median - the value of the middle number when the number of observations is odd

or the arithmetic mean of the two middle values if the observations are even.

c. Mode - the value of the most frequently occurring value in an observation.

2. Measures of Variation - represents the degree of dispersion or spread of data points from

each other in a given set of values.

a. Range - difference between the minimum and maximum value in an dataset.

b. Variance - measurement of spread of values in a dataset.


c. Standard Deviation - measurement of dispersion of values in a dataset relative to

its mean.

3. Frequency Distribution and Histogram - visual representation of the distribution of

observations within each variable.

a. Frequency Distribution – a table representing the number of times a certain value

appears repeatedly or the frequency of occurrence.

b. Relative frequency - is the proportion of class frequency by the total number of

observations.

c. Cumulative frequency - the difference between the total frequency of values and

the upper-class boundary of a specific class interval.

d. Histogram – a visual representation of the graph showing the relationship between

the classes as the x-axis and the corresponding frequencies/counts per class as

the values in the y-axis.

4. Percentiles and Box Plots - used to compare the score of a variable in reference to the

remaining measurements and to determine to skewness of a given distribution.

a. Upper Quartile - the value dividing the third and fourth quartile which is also

referred to as the median of the upper half values.

b. Lower Quartile - the value dividing the first and second quartile which is also

referred to as the median of the lower half values.

c. Box Plots - visual representation of the distribution’s skewness of data values.

d. Outliers - a value of observation that lies within an abnormal distance as compared

to the other values of a given dataset.

RESULTS AND DISCUSSION

In light of the computations made with the empirical data provided, the following results

and findings are interpreted below through the use of tables and figures. Given the nature of the
measurements and observations presented, various descriptive statistical methods were used in

order to yield meaningful information about the general population. Through which, such variables

were tested in the study to determine whether or not these measurements would be capable of

predicting a patient’s probability of suffering a cardiac event over the next few years.

Table 1. The Measures of Central Tendency and Variation among Variables

Table 1 presents the measure of central tendency and variation for the specified variables

such as a patient’s basal heart rate (BHR), basal blood pressure (baseBP), maximum heart rate

(MaxHR), maximum blood pressure (MBP), and age. As for the variables’ arithmetic mean, the

total population had an average of 75.29 bpm for a patient’s basal heart rate as compared to their

maximum heart rate which showed an average of 119.369 bpm. On the other hand, a patient’s

basal blood pressure had an average of 135.324 mmHg while their maximum blood pressure

showed an average of 156 mmHg. In terms of demographic profile, the average age of

respondents was approximately 67 years old. As for the measure of dispersion among variables,

the population’s maximum blood pressure had the widest spread of data measurements (V =

1,003.448 and STD = 31.677 respectively) while age had the least spread of values among all

variables (V = 144.928 and STD = 12.039 respectively). When data measurements are ranked in

ascending order, the median for basal heart rate, basal blood pressure, maximum heart rate,

maximum blood pressure, and age are 74, 133, 120, 150, and 69 respectively. It could also be

inferred that 50% of observations are above such values while the other 50% are below it as well.
As for the mode, all of the aforementioned variables are unimodal except for basal heart rate

which was revealed to be bimodal (Mode = 67 and 72). Such values indicate that they are the

most recurring measurements in their specified category.

Table 2. The Percentile Values of Specified Variables

Table 2 presents the corresponding values of the aforementioned variables from the 10th

up to the 90th percentile as well as the lower quartile and the upper quartile (25th and 75th

percentile respectively). As for the interpretation of the quantities shown, it could be inferred that

for the nth percentile, n% of the specified value falls below it and (100 - n)% of such are classified

above it. For instance, 63 is the value for the 20th percentile of the basal heart rate which means

that 20% of the measurements taken within the total population are less than 63 in terms of a

patient’s basal heart rate. Moreover, the values for the 50th percentile are also equal to the

median of its corresponding variable since half of the quantities fall below it while the other half

are classified above it as well.


Figure 1. Histogram of Measurements Taken for Basal Heart Rate

Figure 1 presents the histogram of measurements taken within the total population for

basal heart rate (in bpm) in order to show the frequency distribution of data points. There were

ten (10) class intervals defined in the table each having a width of 16.8 given that the maximum

and minimum values are 210 and 42 respectively. The class interval with the most number of

counts ranges from 58.8 to 75.6 (f = 245) with a relative frequency of 0.439 or 43.9% while the

class interval with the least number of counts ranges from 142.8 to 193.2 (f = 0) having a relative

frequency of 0. Moreover, it could be argued that the distribution of measurements is positively

skewed since most of the values are clustered on the left side while the right tail of the distribution

is longer. This indicates the presence of outliers that have a greater value than the arithmetic

mean of the population found in the interval (109.2, 210]. Moreover, this can be supported by the

fact that the modes (Mo = 67 and 72) are less than both the median (Md = 74) and mean (μ =

75.29) whereas the mean is greater than the median value.

Figure 2. Box Plot of Measurements Taken for Basal Heart Rate

Figure 2 presents the box plot of data values taken within the total population for basal

heart rate (in bpm) in order to determine the skewness of its distribution as well as outliers outside
the given range. The lower quartile, as indicated by the 25th percentile, has a value of 64 which

implies that 25% of all observations fall below it while the upper quartile, as indicated by the 75th

percentile, has a value of 84 meaning that 75% of measurements fall below it as well. On the

other hand, the median (also referred to as the 50th percentile) has a value of 74; thus, it is

considered to be the mid-point of all data sets. Additionally, there are three (3) data points that

are considered to be outliers beyond the upper whisker. Specifically, these are 115, 127, and 210;

hence, it leads to a more positively skewed distribution. This could also be further supported by

the fact that the mean is shown in the box plot to have a greater value as compared to the median.

Figure 3. Histogram of Measurements Taken for Basal Blood Pressure

Figure 3 presents the histogram of measurements taken within the total population for

basal blood pressure (in mmHg) in order to show the frequency distribution of data points. There

were ten (10) class intervals defined in the table each having a width of 11.8 given that the

maximum and minimum values are 203 and 85 respectively. The class interval with the most

number of counts ranges from 120.4 to 132.2 (f = 121) with a relative frequency of 0.217 or 21.7%

while the class interval with the least number of counts ranges from 191.2 to 203 (f = 6) having a

relative frequency of 0.011 or 1.10%. Moreover, it could be argued that the distribution of

measurements is positively skewed since most of the values are clustered on the left side while
the right tail of the distribution is longer. This also indicates the presence of an outlier that has a

greater value than the arithmetic mean of the population found in the interval (191.2, 203].

Moreover, this can be supported by the fact that the mode (Mo = 120) is less than both the median

(Md = 133) and mean (μ = 135.324) whereas the mean is greater than the median value.

Figure 4. Box Plot of Measurements Taken for Basal Blood Pressure

Figure 4 presents the box plot of data values taken within the total population for basal

blood pressure (in mmHg) in order to determine the skewness of its distribution as well as outliers

outside the given range. The lower quartile, as indicated by the 25th percentile, has a value of

120 which implies that 25% of all observations fall below it while the upper quartile, as indicated

by the 75th percentile, has a value of 150 meaning that 75% of measurements fall below it as

well. On the other hand, the median (also referred to as the 50th percentile) has a value of 133;

thus, it is considered to be the mid-point of all data sets. Additionally, there is only one (1) data

point that is considered to be an outlier beyond the upper whisker. Specifically, this refers to the

data value of 201 which could also be associated with its positively skewed distribution. This is

further supported by the fact that the mean is shown in the box plot to have a greater value as

compared to the median.


Figure 5. Histogram of Measurements Taken for Maximum Heart Rate

Figure 5 presents the histogram of measurements taken within the total population for

maximum heart rate (in bpm) in order to show the frequency distribution of data points. There

were ten (10) class intervals defined in the table each having a width of 14.2 given that the

maximum and minimum values are 200 and 58 respectively. The class interval with the most

number of counts ranges from 114.8 to 129 (f = 156) with a relative frequency of 0.28 or 28%

while the class interval with the least number of counts ranges from 185.5 to 200 (f = 1) having a

relative frequency of 0.002 or 0.2%. Moreover, it could be inferred that the distribution of

measurements is approximately normal since it resembles a bell-shaped structure wherein most

of the values are symmetrical on either side of the histogram. However, there are also indications

of outliers on both sides of the distribution. Furthermore, the near-symmetrical distribution of this

histogram could be associated with the fact that the mean is approximately equal to the median

value (119.369 ≈ 120).


Figure 6. Box Plot of Measurements Taken for Maximum Heart Rate

Figure 6 presents the box plot of data values taken within the total population for maximum

heart rate (in bpm) in order to determine the skewness of its distribution as well as outliers outside

the given range. The lower quartile, as indicated by the 25th percentile, has a value of 104.25

which implies that 25% of all observations fall below it while the upper quartile, as indicated by

the 75th percentile, has a value of 133 meaning that 75% of measurements fall below it as well.

On the other hand, the median (also referred to as the 50th percentile) has a value of 120; thus,

it is considered to be the mid-point of all data sets. Additionally, there are two (2) data points that

are considered to be outliers beyond the upper whisker which are specifically, 182 and 200. On

the other hand, there is only one (1) outlier below the lower whisker which is 58. Furthermore, it

could also be inferred that the mean is approximately equal to the median value.
Figure 7. Histogram of Measurements Taken for Maximum Blood Pressure

Figure 7 presents the histogram of measurements taken within the total population for

maximum blood pressure (in mmHg) in order to show the frequency distribution of data points.

There were ten (10) class intervals defined in the table each having a width of 22.5 given that the

maximum and minimum values are 309 and 84 respectively. The class interval with the most

number of counts ranges from 129 to 151.5 (f = 192) with a relative frequency of 0.344 or 34.4%

while the class intervals with the least number of counts both ranges from 241.5 to 264 (f = 1) and

286.5 to 309 (f = 1) each having a relative frequency of 0.002 or 0.2%. Moreover, it could be

argued that the distribution of measurements is positively skewed since most of the values are

clustered on the left side while the right tail of the distribution is longer. This also indicates the

presence of outliers that have a greater value than the arithmetic mean of the population found in

the interval (241.5, 309]. Moreover, this can be supported by the fact that the mode (Mo = 140) is

less than both the median (Md = 150) and mean (μ = 156) whereas the mean is greater than the

median value.
Figure 8. Box Plot of Measurements Taken for Maximum Blood Pressure

Figure 8 presents the box plot of data values taken within the total population for maximum

blood pressure (in mmHg) in order to determine the skewness of its distribution as well as outliers

outside the given range. The lower quartile, as indicated by the 25th percentile, has a value of

133.25 which implies that 25% of all observations fall below it while the upper quartile, as indicated

by the 75th percentile, has a value of 175.75 meaning that 75% of measurements fall below it as

well. On the other hand, the median (also referred to as the 50th percentile) has a value of 150;

thus, it is considered to be the mid-point of all data sets. Additionally, there are five (5) data points

that are considered to be outliers beyond the upper whisker. Specifically, these are 240, 250, 274,

283, and 309; hence, it leads to a more positively skewed distribution. This could also be further

supported by the fact that the mean is shown in the box plot to have a greater value as compared

to the median.
Figure 9. Histogram of Measurements Taken for Age

Figure 9 presents the histogram of measurements taken within the total population for age

(in years) in order to show the frequency distribution of data points. There were ten (10) class

intervals defined in the table each having a width of 6.7 given that the maximum and minimum

values are 93 and 26 respectively. The class interval with the most number of counts ranges from

66.2 to 72.9 (f = 133) with a relative frequency of 0.238 or 23.8% while the class interval with the

least number of counts ranges from 26 to 32.7 (f = 5) having a relative frequency of 0.009 or 0.9%.

Moreover, it could be argued that the distribution of measurements is negatively skewed since

most of the values are clustered on the right side while the left tail of the distribution is longer.

This also indicates the presence of outliers that have a lesser value than the arithmetic mean of

the population found in the interval (26, 32.7]. Moreover, this can be supported by the fact that

the mean (μ = 67.344) is less than the median value (Md = 69).
Figure 10. Box Plot of Measurements Taken for Age

Figure 10 presents the box plot of data values taken within the total population for age (in

years) in order to determine the skewness of its distribution as well as outliers outside the given

range. The lower quartile, as indicated by the 25th percentile, has a value of 60 which implies that

25% of all observations fall below it while the upper quartile, as indicated by the 75th percentile,

has a value of 75 meaning that 75% of measurements fall below it as well. On the other hand, the

median (also referred to as the 50th percentile) has a value of 69; thus, it is considered to be the

mid-point of all data sets. Additionally, there are four (4) data points that are considered to be

outliers below the lower whisker. Specifically, these are 26, 28, 30, and 33; hence, it leads to a

more negatively skewed distribution. This could also be further supported by the fact that the

mean is shown in the box plot to have a lesser value as compared to the median.
REFERENCES

Cardiovascular diseases (cvds). World Health Organization. [accessed 2021 Sep 15].

https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

Daniel WW, Cross CL. Biostatistics: A foundation for analysis in the health sciences. 8th ed.

Hoboken, NJ: Wiley; 2019.

Dobutamine stress echocardiogram. Johns Hopkins Medicine. [accessed 2021 Sep 15].

https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/dobutamine-

stress-echocardiogram

Garfinkel A, Krivokapich J, Child JS, Walter DO. Prognostic value of dobutamine stress

echocardiography in predicting cardiac events in patients with known or suspected

coronary artery disease. Journal of the American College of Cardiology. 1999 Mar

[accessed 2021 Sep 15]. https://pubmed.ncbi.nlm.nih.gov/10080472/

Samuels ML, Witmer JA, Schaffner AA. Statistics for the Life Sciences. 4th ed. Harlow: Pearson

Education Limited; 2016.

U.S. Department of Health and Human Services. Heart health and aging. National Institute on

Aging. [accessed 2021 Sep 15]. https://www.nia.nih.gov/health/heart-health-and-aging

LINK TO THE GOOGLE DRIVE ANALYSIS FOLDER

https://drive.google.com/drive/folders/1dUQWM1jbVj4sO1EV2IEWm4dZMIP8RjFR?usp=sharin

NOTE: MS Excel is the application used to compile all the computations and solutions provided

on the worksheet. Kindly use it in accessing the file. Thank you!


GROUP CONTRIBUTION

Contributor Degree of Contribution

Lyandrei Duero 100%

Alfonso Duque 100%

Paul Paniza 100%

You might also like