You are on page 1of 32

Statistical Analysis: S1 Data Analysis

Summary Statistics

Statistical Analysis: S1 Data Analysis – Analysing Data

New Century Textbook Chapter 10

Textbook Minimum Additional revision


Lesson Topic Exercise
Questions tasks / Excel Pages

Page 411:
1 Mean, Median & Mode Ex 10:01 Pages 145-147

Page 420:
Quartiles, deciles &
2 Ex 10:02 Page 148
percentiles
Page 427:
Range & interquartile
3 Ex 10:03 Pages 148 – 149
range
Page 431:
4 The effect of outliers Ex 10:04 Page 150

Page 437:
Cumulative frequency
5 Ex 10:05 Page 136
graphs (S1.1)
Page 442:
6 Box plots Ex 10:06 Pages 154 - 155

Page 448:
7 Standard deviation Ex 10:07 Page 149

Page 456:
8 Shape of a distribution Ex 10:08 Pages 151 – 152

S1.2 Syllabus
Lesson 1 Mean, Median & Mode Ex 10:01
The mean, median and mode are three summary statistics that represent the centre or average of a set of data. They
are called measures of location (measures of central tendency).

�, and is the sum of all the scores divided by the number of scores.
The mean (or average) has the symbol 𝒙𝒙

When the scores are ordered from lowest to highest, the median, is:

• The middle, for an odd number of scores


Example:

𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠


• The average of the two middle scores, for an even number of scores �𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = �
2
Example:

The mode is the most common score or category. A set of data can have more than one mode, or no mode at all.

The mode is the score with the highest frequency. Two modes = bimodal, three modes = trimodal.
Example 1:
Example 2:

Example 3:
Example 4:

Outliers are scores that are separated from the majority of the data. Outliers have the potential to significantly affect
the mean.

Examples:
1. Which measure of location is most appropriate for describing the following averages?
a. The average price of a new car
b. The most common number of bedrooms in a house
c. A cricket players batting average
d. Average weekly income
e.
2. Ten houses were sold this week at Westvale Lakes for the following prices.

a. Calculate the mean house price.

b. Calculate the median house price.

c. Which measure of location is higher the mean or the median?

d. Which measure of location is more appropriate for describing the average house price?
Lesson 2 Quartiles,deciles & percentiles Ex 10:02
Deciles, quartiles and percentiles are different ways of dividing data. In order! The data must be sorted in order
(ascending or descending) before it can be divided.

Quartiles
Deciles
Percentiles
Percentiles
Percentile: the value below which a percentage of data falls.

Example: You are the fourth tallest person in a group of 20

80% of people are shorter than you:

That means you are at the 80th percentile.

If your height is 1.85m then "1.85m" is the 80th percentile height in that group.

In Order
The data needs to be in order! So percentiles of height need to be in height order (sorted by height). If they
were percentiles of weight, they would need to be in weight order.

Deciles
A related idea is Deciles (sounds like decimal and percentile together), which splits the data into 10%
groups:

• The 1st decile is the 10th percentile (the value that divides the data so that 10% is below it)
• The 2nd decile is the 20th percentile (the value that divides the data so that 20% is below it)
• etc!

Example: (continued)

You are at the 8th decile (the 80th percentile).


Quartiles
Another related idea is Quartiles, which splits the data into quarters:

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are in order. Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

a) Quartile 1 (Q1) = 3
b) Quartile 2 (Q2) = 5.5
c) Quartile 3 (Q3) = 7

The Quartiles also divide the data into divisions of 25%, so:

• Quartile 1 (Q1) can be called the 25th percentile


• Quartile 2 (Q2) can be called the 50th percentile
• Quartile 3 (Q3) can be called the 75th percentile

Example: (continued)

For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:

• The 25th percentile = 3


• The 50th percentile = 5.5
• The 75th percentile = 7
Lesson 3 Range & interquartile range Ex 10:03
While the mean, median and mode describe the centre of a data set, the range, interquartile range and the
standard deviation are called measures of spread and describe the spread of the data set.

Examples:

1.

2.

3.
Lesson 4 The effect of outliers Ex 10:04

From the formula sheet:


Lesson 5 Cumulative frequency graphs (S1.1) Ex10:05
A cumulative frequency histogram is a column graph of cumulative frequency. A cumulative frequency polygon,
also called an ogive is drawn by joining to the top right-hand corner of each column in a cumulative frequency
histogram.

Examples:

1. The maximum daily temperatures (in °C) in Campbelltown in June were recorded and grouped into the
frequency table shown.
a) Draw a cumulative frequency histogram and polygon for the data.
b) Use the frequency polygon to find the median and calculate the interquartile range.

2. Use the cumulative frequency histogram above to find;

a) i) the 4th decile, D4

ii) the 7th decile, D7

b) What value cutes off the top 20% of temperatures?

c) Between which two deciles would you find a temperature of 14°C?


3. a) Construct a cumulative frequency histogram and ogive for the data below.

b) Estimate the median using the ogive.

c) Estimate QU and QL and the IQR.

Example:

1. How many students completed this university exam?


2. What was the frequency of 30?
3. What was the frequency of 50?
4. Estimate the median.
5. Estimate the 20th percentile.
Lesson 6 Box plots Ex 10:06

Box-and-whisker plots are always drawn to scale.

Examples:

1.
2.

3.

4. Using the stem and leaf plot, create a five-number summary.


Lesson 7 Standard deviation Ex 10:07
Standard deviation is a better measure of spread than the range and the interquartile range, because like the mean
it depends on every score in the data set.

The standard deviation indicates how different each score is from the mean.

The standard deviation is a measure of spread about the mean.

Examples:
1. Find the mean and the population standard deviation of the data sets below.

a.

b.

c.
Method
Cross out the class interval column and replace with
class centre column

i. Be careful to enter the whole number not just the


leaf
d. It may be easier to draw up a quick
frequency table before calculating

2.

Note: When comparing sets of data, the smaller standard deviation indicates
more consistent results or a smaller spread.
Lesson 8 Shape of a distribution Ex 10:08
S1.2 Summary
Answers

You might also like