You are on page 1of 8

Quantitative Analysis for Decision Making (MHM530): Assignment 3

Question 1

Define and compare the mean, median and mode as measures of central tendency.

Summary MEAN MEDIAN MODE


measure

Definition The average of all values Middle number of an The observation that
of the data set. (Sum of ordered data set; the 50th occurs most frequently
all values divided by the percentile of a set of
the total number of measurements
values

Type of data Both continuous and Ordinal data as well as All types of data
that can be discrete data continuous and discrete data
described

Advantage Includes every value in Robust i.e. less sensitive to Best used for
your data set as part of unusual data points categorical data where
the calculation we wish to know which
Only measure of central is the most common
tendency where the sum category
of the deviations of each
value from the mean is
always zero.

Drawback Method only considers Method only considers the Not unique, not useful
the magnitude of every ordering and relative when two or more
observation in a set of magnitude of the values share the
data; Extremely sensitive observations in data set highest frequency
to unusual (extremely Does not use all the (especially for
high or low) values which information in the data and continuous data)
means it is pulled in the so it can be shown to be less Does not provide a
direction of outlying data efficient than the mean or very good measure of
values average, which does use all central tendency when
Cannot be used for values of the data. the most common
nominal data unless data mark is far away from
is assigned dichotomous the rest of the data in
values the data set
Question 2

Under what conditions is use of the mean preferred? The median? The mode?

The use of summary measures depends on the type of data that is being dealt with.

If the data is symmetric and unimodal; the mean, median and mode are roughly the same. Mean is
preferred as a measure of central tendency when the distribution of data is continuous.

If distribution of data is symmetrical but bimodal i.e. the population from which values have been taken
consist of two distinct subgroups that differ in characteristics being measured, there could be two
modes which would be better to report than reporting mean or median.

When the data is not symmetric, it is better to report the median as measure of central tendency as the
outlying observations may result in the mean being either too high or too low.

TYPE OF VARIABLE BEST MEASURE OF CENTRAL TENDENCY

Nominal Mode

Ordinal Median

Interval/Ratio (Not skewed) Mean

Interval/Ratio (Skewed) Median


Question 3

Define and compare three common measures of dispersion-the range, interquartile range and the
standard deviation.

Measure of Range Interquartile Range Standard Deviation


Dispersion

Definition The difference between The difference of the A measure of the


the largest and the lowest 75th percentile from the differences of each
value of a group of 25th percentile observation from the
measurements Encompasses 50 percent mean (how dispersed the
Describes variability in a of observations data is in relation to the
set of data mean)

Drawback Considers only the Limited because they do Provides a useful basis for
extreme values of a data not take into account interpreting the data in
set rather than majority of every score in a group of terms of probability only
observations data when distribution of data
Does not give much is symmetric (continuous)
indication of the spread of Not for categorical data.
observations about the
mean

Advantage Useful when measuring a Quartiles are a useful A useful measure of the
variable that has either a measure of spread scatter of the
critical low or high because they are much observations is this: if the
threshold (or both) that less affected by outliers observations follow a
should not be crossed or a skewed data set Normal distribution, a
Detects errors when than the equivalent range covered by one
entering data measures of mean and standard deviation above
standard deviation the mean and one
standard deviation below
it
Question 6

A study was conducted investigating long-term prognosis of children who have suffered an acute
episode of bacterial meningitis, an inflammation of the membranes enclosing the brain and spinal
cord. Listed below are the times of onset of seizure for 13 children who took part in the study [10]. In
months, the measurements are:

0.10 0.25 0.50 4 12 12 24 24 31 36 42 55 96

a) Find the following numerical summary measures of data

Mean, Median, Mode, Range, Interquartile range, Standard Deviation

Descriptives
Std.
Statistic Error
Onset to Mean 25.9115 7.59133
seizure Median 24.0000
Std. Deviation 27.37094
Minimum .10
Maximum 96.00
Range 95.90
Interquartile Range 36.75

13
b)Show that ∑ ( xi− x̄ ) is equal to 0.
i=1

By definition, the sum of deviations from mean of all observations less than x ̅ is equal to sum of
deviations of all observations greater than x ;̅ consequently these two sums cancel each other out.

xi xi− x̄

0.10 -25.81

0.25 -25.66

0.50 -25.41

4 -21.91

12 -13.91

12 -13.91

24 -1.91

24 -1.91
31 5.09

36 10.09

42 16.09

55 29.09

96 70.09

Sum= 0.02

Question 7

In Massachussetts, eight individuals experienced an unexplained episode of vitamin D intoxication


that required hospitalization; it was thought that these unusual occurrences might be the result of
excessive supplementation of dairy milk [11]. Blood levels of calcium and albumin – a type of protein
– for each subject at the time of hospital admission are provided below.

a) Find the mean, median, mode, standard deviation and range of the recorded calcium levels.

b) Compute the mean, median, standard deviation and range of the given albumin levels.

Statistics
Blood level of calcium in
mmol/L
N Valid 8
Missin 0
g
Mean 3.1425
Median 3.0800
Mode 2.37a
Std. Deviation .51068
Range 1.47
Statistics
Blood level of albumin in g/l
N Valid 8
Missin 0
g
Mean 40.3750
Median 42.0000
Mode 42.00
Std. Deviation 3.02076
Range 9.00
c) For healthy individuals, the normal range of the calcium values is 2.12 to 2.74 mmol/L, while the
range of albumin levels is 32 to 55g/L. Do you believe that patients suffering from vitamin D
intoxication have normal blood levels of calcium and albumin level?

Patients suffering from vitamin D intoxication have normal albumin levels as mean and overall
observations suggest all values lie well within range of 32 to 55 g/L. However, the average calcium value
in this data set suggests a high blood calcium level i.e. 3.14 mmol/L which lies out of the normal range.
Both the mean and the median lie above the upper limit of the normal range; overall 6 of the 8 patients
have calcium levels that are above normal.

Question 8

A study was conducted comparing female adolescents who suffer from bulimia to healthy females
with similar body compositions and levels of physical activity. Listed below are measures of daily
caloric intake, recorded in kilocalories per kilogram, for samples of adolescents from each group [12].

a) Find the median daily caloric intake for both the bulimic adolescents and the healthy ones.

Statistics
Daily caloric Daily caloric
intake for intake for
bulemic females healthy females
(kcal/kg) (kcal/kg)
N Valid 23 15
Missing 0 8
Median 21.600 30.600

b) Compute the interquartile range for each group

Descriptives
Statistic
Daily caloric intake for Interquartile Range 5.1
bulemic females (kcal/kg)
Daily caloric intake for Interquartile Range 12.8
healthy females (kcal/kg)

c) Is a typical value of daily caloric intake larger for the individuals suffering from bulimia or for the
healthy adolescents? Which group has a greater amount of variability in the measurements?

The daily caloric intake seems to be larger for the healthy individuals than those suffering from bulimia.
The healthy female group also seems to have a greater amount of variability as the data is spread out.

Question 12

The percentages of low birth weight infants – defined as those weighing less than 2500 grams- for a
number of nations around the world are saved under the name lowbwt in the data set unicef [13]
(Appendix B, Table B.2).

a) Compute the mean and median of these observations

Statistics
Low Birth
Weight life92 life60
N Valid 108 141 127
Missing 33 0 14
Mean 12.09 63.64 51.47
Median 11.00 67.00 47.00

b) Compute the 5% trimmed mean.

Descriptives
Std.
Statistic Error
Low Birth Weight Mean 12.09 .632
5% Trimmed Mean 11.51
life60 Mean 51.42 1.189
5% Trimmed Mean 51.28
life92 Mean 63.57 1.035
5% Trimmed Mean 63.89
c) For this data set, which of these numbers would you prefer as a measure of central tendency?
Explain.

Mean would be a good measure of central tendency as data distribution is normal as seen by the
histogram obtained from the values (assignment 2).

You might also like