You are on page 1of 13

Chapter 1:

Describing Data
Lesson 8: More on Describing Data: Summary Measures and Graphs on
Heights and Weights
(MAY BE OMITTED OR USED AS INTEGRATION LESSON AT END OF THE
COURSE)
TIME FRAME: 1 hour session
OVERVIEW OF LESSON: In this activity, students work with the heights and weights data
they generated in lesson 3. They will construct box plots and calculate measures of center and
spread to determine if there is a difference between two groups, say boys and girls. They will
also compare the proportion of male and female heights (and weights) that are within one
standard deviation of their group mean.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

generate and interpret box plots that compare two groups.


interpret and compare measures of center and spread.

LESSON OUTLINE:
1. Introduction
2. Data Analysis
3. Interpretation
DEVELOPMENT OF THE LESSON
(A) Introduction
Begin the lesson by asking the students if they think that boys and girls have the same
heights, weights and BMI. Have them guess what the distribution of heights, weights and
BMI might look like for the whole class and whether the distribution of heights, weights and
BMI for boys and girls would be the same.

Possible questions to ask:

Are the heights, weights, and BMI of boys and girls the same or different?
What methods can we use to compare the heights, weights and BMI of boys and girls?
What are some other factors besides sex that might affect heights, weights and BMI?
(Possible factors that could be studied are age, location where person resides, and year
the data was collected.)

Chapter 1 Describing Data Lesson 8 Page 1


What are the proportion of male and female heights (and weights) that are within one
standard deviation of their group mean?

(B) Data Analysis


Divide the class into groups; some groups can analyze heights, other groups weights; other
groups BMI.
There are various ways to analyze the collected data. For example, the class can calculate
measures of center and spread for the boys and girls in their class. Box plots can be
constructed from the class data to compare boys and girls. Table 1 provides some sample
data.
Table 1. Example Class Data.

Sex Height (in Weight (in BMI


meters) kg)
F 1.64 40 14.8721
F 1.52 50 21.64127
F 1.52 49 21.20845
F 1.65 45 16.52893
F 1.02 60 57.67013
F 1.626 45 17.02046
F 1.5 38 16.88889
F 1.6 51 19.92188
F 1.42 42.2 20.92839
F 1.52 54 23.37258
F 1.48 46 21.00073
F 1.62 54 20.57613
F 1.5 36 16
F 1.54 50 21.08281
F 1.67 63 22.58955
M 1.72 55 18.59113
M 1.65 61 22.40588
M 1.56 60 24.65483
M 52
M 1.7 90 31.14187
M 1.53 50 21.35931
M 1.62 90 34.29355
M 1.79 80 24.96801
M 1.57 58 23.53037
M 1.7 68 23.52941
M 1.77 27 8.618213
M 1.478 50 22.8887
M 1.727 94 31.51688
M 1.56 66 27.12032

Chapter 1 Describing Data Lesson 8 Page 2


M 1.75 50 16.32653

Some selected descriptive statistics measuring the center and variability for this set of data
are provided in Table 2.
Table 2. Descriptive Statistics for Data from Table 1

Height Weight BMI


Mean Media Rang SD Mean Media Rang SD Mean Media Range SD
n e n e n
Male 1.652 1.675 0.312 0.095 63.4 60 67 17.824 23.639 23.530 25.675 23.530
Female 1.522 1.52 0.65 0.151 48.213 49 27 7.403 22.087 20.928 42.798 20.928

For this example data set, males have higher measures of center for data on heights, weights
and BMIs,. Measures of variation for the height are higher for females than for males. For
weights, measures of variation are higher for males, but for BMI, the range is higher for
females, but the standard deviation is lower females.
Box plots for the two groups are shown in Figure 1 for heights, .
Note to Teachers: A very simple but useful pictorial representation of the data distribution is a
box plot. We construct this plot by drawing a box extending from the lower quartile to the
upper quartile, with the median shown in the middle of the box. The standard box and
whiskers plot has its whiskers extending only as far as the smallest and largest data, which
are not outliers. That is, the whiskers are the smallest and largest data within 1.5 IQR from
the lower and upper quartiles, respectively. Outliers are shown in the standard box and
whiskers plot by means of circles. When the outliers are within 3 IQR from the lower or
upper quartile, they are said to be mild outliers; otherwise, they are said to be extreme
outliers. Box plots are easily and readily generated by statistical software.
For heights, the distribution for the girls has a larger range (and an outlier), and a smaller
median.
For weights, females have a lower median weight than males, as well as less variability. The
middle 50% of the female weight distribution is also observed to be contained within the
middle 50% of the male weight data.
As regards BMI, females also have a lower median weight and lower variability compared to
those of males. There is, at least extremely obese female, and one severely underweight
male.

Chapter 1 Describing Data Lesson 8 Page 3


Female

Male

1 1.2 1.4 1.6 1.8


Height

Figure 1. Comparative box plot for heights, by sex.

Female

Male

20 40 60 80 100
Weight

Figure 2. Comparative box plot for weights, by sex.

Chapter 1 Describing Data Lesson 8 Page 4


Female

Male

10 20 30 40 50 60
BMI

Figure 3. Comparative box plot for BMI, by sex.

(C) Interpretation of Results


In addition to interpretive points made in analyzing the data, the students can be prompted to
generate and answer further data analysis questions such as:

On average, how much do the heights, weights and BMIs of males and females and girls
differ? How much do they vary?
What percentage of the males is within one standard deviation of the mean for their data?
What percentage of the females is within one standard deviation of the mean for their data?
What generalizations, if any, can we make using the results from this data?

To determine the proportion of male and female heights (and weights and BMIs) that are
within one standard deviation of their group mean, we need to first obtain the intervals
within one standard deviation of the mean for each group are:

Heights Weights BMI


Males 1.556441384 to 45.57567206 to 17.32187356 to
1.747130616 81.22432794 29.95598644
Females 1.37025342 to 40.81081996 to 12.24735986 to
1.67321258 55.61584004 31.92628014

Chapter 1 Describing Data Lesson 8 Page 5


With regard to heights and BMIs, nine out of fourteen of the males are within one
standard deviation of their mean (one male did not have a recorded height and thus no
BMI also); while fourteen out of fifteen of the females are within one standard deviation
of their mean.

As regards weights, while males have a bigger median, and a greater amount of spread
than females, the proportion of males and females within one standard deviation of the
mean of their group is similar.

Possible Extensions

To investigate the distribution of heights, weights, and BMIs, larger samples could be taken
using classes at the same grade level, or data from several years.

Other graphs may also be helpful, such as stem-and-leaf displays (which were devised by the
lated John W. Tukey, one of the greatest statisticians of the 20th century). While histograms can
provide easy-to-grasp summaries of the distribution, but they do not show the data values
themselves. A stem-and-leaf display is like a histogram, but it shows the individual values.
Explain to students that a stem-and-leaf display is constructed by splitting each data point into
two parts, a stem (one or more of the leading digits) and a leaf (which consist of the remaining
digits). For instance, the stem-and-leaf display for male weights data in this lesson is given
below

Students might note that although the data in the display are grouped, the data may still
recovered unlike in the case of a histogram (or frequency distribution). This makes the stem-
and-leaf diagram a better picture of the data distribution, as information is not lost. Here, the
stems on the left side together with the leaves on the right side of the diagram represent the
actual data. For instance, the number 7 beside stem 2 pertains to a weight of 27 Furthermore, the
diagram is itself a histogram.

Chapter 1 Describing Data Lesson 8 Page 6


You may want to reinforce to students that when a distribution is described, it is important to
mention the shape, center and spread of the distribution. If there is a single peak in the
histogram, then we have a unimodal distribution. Histograms with two peaks are called bimodal,
while if the histogram does not appear to have any mode, i.e. when the bars appear to be the
same height (or nearly the same height), then the distribution is called uniform.

It is important to examine if the histogram can be folded along a vertical line through the middle,
and if the result more or less yields a mirror image, we have a symmetric distribution.
Otherwise, when the one tail of the distribution (as in income distributions, or the distribution of
life of equipment) stretches far more than the other tail, then we have a skewed distribution.

It will also be important to look for unusual observations that stand off away from the rest of the
distribution. These extremely small or extremely large data are called outliers.

In the next chapter, students will be shown why the mean and standard deviation are the most
important measures of the center and spread.

Finally, stress to students that if there are dos, there are also donts about describing data.

Do not make histograms of a categorical variable;

Do not keep using pie charts (even for categorical variables as they tend to convey less
information);

Do not use inconsistent scales in figures.

Chapter 1 Describing Data Lesson 8 Page 7


ASSESSMENT
Suppose that Jeremys class produced the following results when performing a measurement of
heights in his class:

Our Class Data: Heights (meters)


Imelda 1.54 F
Frederick 1.45 M
Gerald 1.42 M
Jose 1.52 M
Ana 1.56 F
Isidoro 1.34 M
Roberto 1.36 M
Katherine 1.43 F
Barbara 1.49 F
Jocie 1.58 F
Maria 1.64 F
Kenneth 1.56 M
Ofelia 1.56 F
Amparo 1.49 F
James 1.42 M

Use the approaches below to compare the heights of males and females in this class.

(a) Find the mean, median, range and standard deviations for the heights of the males in this
class. Find the mean, median, range and standard deviation for the heights of the females in this
class. Compare the two distributions using these measures of center and spread. Which group
has a larger height on average? Which group varies more?

(b) Sort the data within each group then determine what proportion in each group is within one
standard deviation of that group's mean. Are the proportions similar?

(c) Produce box plots of the data for the males and females. Compare the distributions of
heights.

Answers:

(a) The measures of center and spread are provided in the table below:

Chapter 1 Describing Data Lesson 8 Page 8


Mean Median Range Standard Deviation
Males 1.439 1.42 0.22 0.080
Females 1.536 1.55 0.21 0.065

The girls have a larger mean. On average the girls heights are about 0.10 m longer. However,
within each group the mean and medians are close in value. This suggests that the distributions
of heights are fairly symmetric. The two groups have similar amounts of variability; the ranges
differ by just 0.01 meters and the standard deviations differ by 0.15 meters.

(b) The sorted data is provided below:

Males: 1.34, 1.36, 1.42, 1.42, 1.45, 1.52, 1.56

Females: 1.43, 1.49, 1.49, 1.54, 1.56, 1.56, 1.58, 1.64



For males, the interval within one standard deviation of the mean is 1.439 0.08 to
1.439 + 0.08 or 1. 359 meters to 1.519 meters. Four of the seven males have a height that
is within one standard deviation of the mean.

Among the females, the interval within one standard deviation of the mean is 1.536
0.065 to 1.536 + 0.065 or 1.471 meters to 1.601 meters. Six of the eight females in this
class have a height that is within one standard deviation of the mean.

(c) The box plots are shown below:

Chapter 1 Describing Data Lesson 8 Page 9


Female

Male

1.3 1.4 1.5 1.6 1.7


height

The box plots have a similar shape. They are both symmetric with approximately the same range
and interquartile range. Medians and quartiles for the males is below the corresponding
summary values for the females. This suggests that, on average, males in this class have a
smaller heights (than females).

Explanatory Note:

Teachers have the option to just ask this assessment orally to the entire class, or to group
students and ask them to identify answers, or to give this as homework, or to use some
questions/items here for a chapter examination.

Chapter 1 Describing Data Lesson 8 Page 10


REFERENCES
Adapted from Armspans in STatistics Education Web (STEW)
http://www.amstat.org/education/stew/pdfs/Armspans.docx

Chapter 1 Describing Data Lesson 8 Page 11


GROUP ACTIVITY SHEET 1-8

1. Organize the data set that your group is asked to analyze (whether heights, weights or
BMIs), by sex.

Males:

Females:

2. Sort the data set for each group from smallest to largest and then find the mean, median, range
and standard deviation:

Males:

Females:

3. Using the sorted data, find the values needed to construct box plots of the data:

Males:

Females:

Chapter 1 Describing Data Lesson 8 Page 12


4. Construct the box plots:

5. Compare the distributions of heights, weights and BMI for males and females.

Chapter 1 Describing Data Lesson 8 Page 13