You are on page 1of 41

MA150

STATIS
TICS
NOTES
CHAPT
ER 1-
CHAPT
ER 3

Chapter 1: The Nature of Statistics


CH1.1: Statistics Basics

1
Statistics: A branch of mathematics that deals with the
 Collection
 Organization
 Presentation
 Analysis
 Interpretation
of information in order to draw conclusions or answer questions.

Statistics and mathematics have similarities but are different


Mathematics
 Solves problems with 100% certainty
 Has only one correct answer

Statistics, because of variability


 Does not solve problems with 100% certainty (95% certainty is much more
common)
 Frequently has multiple reasonable answers

The Process of Statistics


Step 1: Identify a Research Objective
Researcher must determine a detailed question he/she wants to be answered,
Identify the group to be studied.

Step 2: Collect the information needed to answer the questions.


In conducting research, we typically look at a subset of the population, called a sample.

Step 3: Organize and summarize the information.


Step 4: Draw conclusions from the information.
The information collected from the sample is generalized to the population.

Population
 Is the group to be studied
 Includes all of the individuals in the group
Sample
 Is a subset of the population
 Is often used in analyses because getting access to the entire population is
impractical
Individual
 A person or object that is a member of the population being studied.

2
Descriptive statistics consists of organizing and summarizing the information
collected. It consists of charts, tables, and numerical summaries. (CH 2 – CH 3)

Inferential statistics uses methods that generalize results obtained from a sample to the
population and measure their reliability. (CH 8-CH 9)

CH1.2: Simple Random Sampling


Four Sources of Data
A census is a list of all individuals in a population along with certain characteristics of
each individual.

Existing Sources – use of already collected and documented data.

An observational study measures the characteristics of a population by studying


individuals in a sample, but does not attempt to manipulate or influence the individuals.
It is used to determine an understanding of an already established variable of interest.

A designed experiment applies a treatment to individuals (referred to as experimental


units or subjects). It is used to determine a response to a treatment.

A sample of size n from a population of size N is obtained through simple random


sampling if every possible sample of size n has an equally likely chance of occurring.
The sample is then called a simple random sample.

Example: How might you choose a random sample of 5 people from a group of 40
people?..............................................................................................................................

Steps for Obtaining a Simple Random Sample


1) Obtain a frame, which is a list of all the individuals in the population of interest.
2) Number the individuals in the frame 1 - N.

3
3) Use a random number table, graphing calculator, or statistical software to randomly
generate n numbers where n is the desired sample size.

Activity: Use the following table and simple random sampling to obtain a sample of
size 10 from the class.

Chapter 2: Organizing Data


CH 2.1: Variables and Data

4
Variables are the characteristics of the individuals within the population. The suggested
approaches to analyzing problems vary by the type of variable.
 qualitative or categorical : variables which have values that are attributes or
characteristics and cannot be ordered, added, subtracted, etc. For instance:
o Gender
o Zip code
o Blood type
o States in the United States
 quantitative: variables which have values that are numeric. And can be ordered,
added, and subtracted such as:
o Temperature
o Height and weight
o Sales of a product
o Number of children in a family

How to check if a variable is quantitative or qualitative: Consider the two subjects


in the population and their values of the variable. Can you compare their size? Can you
add them? Does it make sense? If yes, the variable is quantitative. If no, the variable is
qualitative.
Example: Is eye color qualitative or quantitative? ……………………………………

Quantitative variables can be either discrete or continuous


 Discrete variables have a finite or a countable number of possibilities and are
frequently counts. Examples of discrete variables
o The number of heads obtained in 5 coin flips
o The number of cars arriving at a McDonald’s between 12:00 and 1:00
o The number of students in class
 Continuous variables have an infinite but not countable number of possibilities
and are frequently measurements. Examples of continuous variables:

5
o The distance that a particular model car can drive on a full tank of gas
o Heights of college students
Sometimes the variable is discrete but has so many close values that it could be
considered continuous. For example:
 The number of DVDs rented per year at video stores
 The number of ants in an ant colony

Example: Researching a bag of M&M’s. What characteristics could we investigate?

Variable Qualitative or If Quantitative: Discrete


Quantitative or continuous
Color Qualitative
Count (# of whole M&Ms) Quantitative Discrete
Diameter Quantitative Continuous

Data: values of a variable

CH 2.2: Organizing Qualitative Data

6
Example: Introduction to organizing Data
Given the following random sample of patients’ blood types, answer the following
questions.
(A) According to the data which blood type is the most common? …………………

(B) What percent of the sample has blood type B?........................................................


O B AB O AB O O O O O O O B O O
A A B O A A B A A A A O A O O
A A B A B A AB A A A O O AB O A
A A O O O O B AB O AB O O O O O
O O B O O A A B A B O AB A A A
O O AB O A A A O O O O B AB O AB
O O O O O O O B O O A A B O A
A B A A A A O A O O A A B A B
O AB A A A O O AB O A A A O O O
O O B O O A A B O A A B A A A

Consider the ordered data. Are the questions easier to answer now?

(A) According to the data which blood type is the most common? ……………………

(B) What percent of the sample has blood type B?..........................................................

O
O O O O O O O O O O O O O O
O
O O O O O O O O O O O O O O
O
O O O O O O O O O O O O O O
O
O O O O O O O O O O O O O O
O
O O O O O A A A A A A A A A
A
A A A A A A A A A A A A A A
A
A A A A A A A A A A A A A A
A
A A A A A A A A A A A A A B 7
B B B B B B B B B B B B B B B
B B B AB AB AB AB AB AB AB AB AB AB AB AB
Data that is not organized is referred to as raw data.

Ways to Organize Data


 Tables
 Graphs
 Numerical Summaries (CH 3)

SPSS is a software which can be used to organize the data:

Frequency Table

Blood
Type Frequency Relative Frequency

A 53 35.3%
AB 12 8.0%
B 19 12.7%
O 66 44.0%

Total 150 100%

Bar Chart

8
Pie Chart

(A) According to the data which blood type is the most common?
“From the table we look for the _____________ count”
“From the bar graph we look for the _____________ bar”
“From the pie chart we look for the _____________ wedge”
(B) What percent of the sample has blood type B?

“From the table we calculate:____________”

Some formal definitions:


A frequency table is a table that lists the number of occurrences for each category of
data.
The relative frequency is the proportion or percent of observations within a category
and is found using the formula:
frequency
relative frequencies=
sum of all frequencies
A relative frequency distribution is a table that lists the relative frequencies for each
category of data.

A bar graph is constructed by labeling each category of data on a horizontal axis and
the frequency or relative frequency of the category on the vertical axis. A rectangle of
equal width is drawn for each category whose height is equal to the category's
frequency or relative frequency.

9
Note: In a bar graph, the rectangles do not touch, to reinforce the idea that the data are
qualitative and cannot be compared.
Activity: Divide the class into 4 groups. Each group will investigate one variable.

Group Variable

1 Gender

2 Eye color

3 Resident/ Non-resident

4 Major

1. Collect data into the frequency table:

Variable Name Tally Frequency Relative Frequency

2. Draw a bar graph:


When producing a bar graph make sure that you
 label the axes
 determine a vertical scale that accommodates the maximum frequency
 bars are of equal width
 bars are not touching

10
Section 2.3
Organizing Quantitative Data
A histogram is constructed by drawing rectangles for each class of data whose height is
the frequency, relative frequency, or percent of the class. The width of each rectangle
should be the same and they should touch each other.

Example: Histogram for Discrete Data


The following data represent the number of available cars in a household based on a
random sample of 20 households. Construct a frequency and relative frequency
distribution. Write each relative frequency with 2 decimal places.

30121112024222122024

Number of Cars Tally Frequency Relative Frequency

total:

Frequency Histogram Relative Frequency Histogram

11
To summarize continuous data in a table, categories are created using intervals of
numbers called classes.

Example: Histograms for Continuous Data


The highest temperature of each day in August in Chicago has been recorded since
1872. The following table shows a random sample of these temperatures. (Source:
NOAA)

There are many perspectives about how to handle the upper class limit for continuous
data. In this text, the author chooses to have the uppercut points have trailing “.9”s.

Degrees Days
50o –59.9o 1
60o –69.9o 31
70o –79.9o 152
80o –89.9o 163
90o –99.9o 50
100o –100.9o 2

The number of classes in this table? _________

The lower class limit of a class is the smallest (possible) value within the class.

The upper class limit of a class is the largest (possible) value within the class.

For the first class the lower class limit is _______ the upper class limit is _______

For the second class the lower class limit is ______the upper class limit is _______

The class width is the difference between consecutive lower class limits.

For this example the class width is ______

What is going on with the last class?

Example The use of classes with discrete data:

12
The following data represent the number of people admitted to local hospitals in the last
year. The number of its beds defines the size of the hospital.

Number of beds Admissions (in thousands)


100 – 199 6,826
200 – 299 6,800
300 – 399 5,607
400 – 499 3,593

What is the lower limit of the third class? _____


What is the upper limit of the third class? ______
What is the class width?_______

Construction of a Stem-and-Leaf Plot

When a stem-and-leaf plot is constructed, the quantitative data is organized into classes
in a unique table that has a bar graph appearance when completed.

Example: The following table displays the number of days to maturity for 24 short-
term investments.
Construct a stem-and-leaf diagram.

70 64 99 55 64 89 87 65
62 38 67 70 60 69 78 39
75 56 71 51 99 68 95 86

The leaf of a data item will be the rightmost digit.


The stem of a data set will consist of the remaining leading digits. (This is not data
from a single data set, but separate data sets going on separate stem and leaf plots.)
For example:
For the number 47 Leaf = _____ Stem = ____
For the number 148 Leaf = _____ Stem = ____
For the number 3 Leaf = _____ Stem = _____
For the number 2.9 Leaf = _____ Stem = _____

To construct a stem and leaf plot:

13
Step 1: Find the smallest item and largest item in the data. These will determine the
smallest and largest stems.
Step 2: Construct the column of stems. If the stems are split, write each stem 2 times.
Otherwise, write each stem 1 time. Draw a vertical line to the right of the stems.
Step 3: Write each leaf corresponding to the stems to the right of the vertical line. The
leaves are to line up vertically.
In an ordered Stem-and-Leaf Diagram the leaves are written in ascending order.
Make the two stem and leave plots ordered.
For split stems:
The stem’s 1st row are for the leaves 0, 1, 2, 3 and 4
The stem’s 2nd row are for the leaves 5, 6, 7, 8 and 9

Example: Construct a stem and leaf diagram for the following data: Cholesterol levels
for 20 high-level patients. Make one graph with and one without splitting the stems.
Which do you like better?
210 209 212 208

217 207 210 203

208 210 210 199

215 221 213 218

202 218 200 214

Advantage of Stem-and-Leaf Plots over Histograms

14
 The data is organized automatically into classes.
 You can see the shape of the distribution as you create the stem-and leaf diagram.
 The raw data can be retrieved from the stem-and-leaf plot. However, once a
frequency histogram of continuous data is created, the raw data is lost.

Disadvantage of Stem-and-Leaf Plots over Histograms


 You must count the leaves to find the frequency of a class.
 Stem-and-leaf plots are not well suited to very large data sets.
 You do not have flexibility in the choice of class widths.

Retrieving information from a stem and leaf plot:

Example: The following table is a stem-and-leaf plot. The stem represents the tens digit
and the leaf represents the ones digit.

8 147
9 2233
10 3458
11 019

(A) What is the smallest value in this plot? _______

(B) Are the stems split? ________


What are the possible leaves for the first row? ____________

(C) The lower class limit for the first row = _______
The lower class limit for the second row = ________
The class width = __________________________

Dot Plot: Back to the days to maturity data


Example: The following table displays the number of days to maturity for 40 short-
term investments.

15
Dotplot for DAYS

40 50 60 70 80 90 100

DAYS

To construct a dotplot
Step 1 – Draw a horizontal line that displays all possible values
Step 2 – Record each observation by placing a dot over the appropriate value on the
horizontal axis

Exercise: Construct a dotplot for the ages of the students in this class:

Construct a parallel dotplot of ages of the students in a statistics class for 2011. They
are:
22, 42, 28, 30, 20, 18, 26, 49, 39, 23, 21, 42
What time do you think this class was offered?

16
CH 2.4: Distribution Shapes

17
CH 2.5: Misleading Graphs
Some charts have a vertical scale that is unclear
 The scale is possibly not labeled
 The zero point of the scale is unclear

In these graphs, the order of the sizes is accurate, but the relative comparisons can be
misleading
In this graph, it is unclear
 Where the vertical scale begins (bottom of or top of the shirts)
 What the scale increments are

Some charts have a vertical scale that is truncated (the vertical scale does not start at 0)
When the vertical scale starts at a higher number, the differences between the bars is
exaggerated
 For some data, magnifying the differences is important
 For some data, magnifying the differences is misleading

18
The two graphs show the same data … the difference seems larger on the first graph
The vertical scale is truncated on the first graph.

One more example:

a. Construct a misleading time-series plot that indicates that the life expectancy has
risen sharply over time.

b. Construct a time-series plot that is not misleading.

Misleading Pie Charts

19
Or even more subtle:

● Some charts are made visually more attractive by using symbols and graphics
instead of plain bars and lines
● If one category has twice the frequency of another, that graphic is doubled in size
● If the graphic is a three dimensional graphic, then doubling each dimension
increases the volume by eight times which is misleading
● The gazebo on the right is twice as large in each dimension as the one on the left
● However, it is much more than twice as large as the one on the left

A couple more – Just for fun http://www.math6.org/graphing/6.8_quiz.htm

Chapter 3: Descriptive Measures


Section 1: Measures of Center

20
Measure of center gives a general idea of the “size” of the data.

Example: Consider data on weights of people in a certain population. The “center” of


the data is near 8.5 pounds. Can you describe the population?

Consider two basketball teams, Team 1 and Team 2. The player’s heights are listed
below:

Team 1: 76, 72, 78, 76, 73


Team 2: 84, 76, 72, 76, 67

Mean of a Data Set: The mean of a data set is the average, the sum of the observations
divided by the number of observations.

76+72+78+76+73 375
Team 1: 5
=
5
=75

Team 2: ________________

Median of a Data Set: The median of the data set is the middle value. To calculate
this:
1. Sort data in increasing order.
2. Determine n = number of observations
3. Determine the observation in the middle of the data set:
n+1
a. if n is odd, the median is in position 2
n+1
b. if n is even, then 2 is not an integer. The median is the average of the
n+1
two observations fall in positions on either side of 2 .
Height data: sorted
Team 1: 72, 73, 76, 76, 78
Median is the center value
n+1 5+1
Place of median: Since n = 5, 2 = 2 =3
Median = 76

Find the medians of team 2 and team 3.

21
Team 2: 67, 72, 76, 76, 84

Team 3: 70, 71, 72, 75, 76, 77

Mode of a Data Set: The value that occurs most often.


 If no value occurs more than once then the data set has no mode.
 If 2 values occur most often then there are 2 modes

Example:
Team 1: Mode = 76
Team 2: Mode =
Team 3: Mode =

We presented 3 measures of center: mean, median and mode. Which makes sense if the
data is qualitative?

Parameters and Statistics:


Numerical summaries of populations are called parameters. Numerical summaries of
samples are called statistics.

Population Mean and Sample Mean:


Mean of population data set: population mean = µ
Mean of sample data set: sample mean = X

Population Mean: (parameter)

For a variable x in a population of size N:


μ=
1∑
N
x
Sample Mean: (statistic)

For a variable x in a sample of size n:


X=
1∑
n
x
Identifying the Shape of a Distribution
● Symmetric – the mean will usually be close to the median

22
● Skewed left – the mean will usually be smaller than the median
● Skewed right – the mean will usually be larger than the median

The skew pulls the mean.

● Many variables, such as birth weights below, are approximately symmetric

Relating Mean, Median and Shape of Distribution

23
Asking price of homes for sale in Lincoln, NE.
79,995 128,950 149,900 189,900
99,899 130,950 151,350 203,950
105,200 131,800 154,900 217,500
111,000 132,300 159,900 260,000
120,000 134,950 163,300 284,900
121,700 135,500 165,000 299,900
125,950 138,500 174,850 309,900
126,900 147,500 180,000 349,900

The mean asking price is $168,320 and the median asking price is $148,700. Therefore,
the distribution is skewed right.

● What if one value is extremely different from the others?


● What if we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2

Data Mean Median


6, 1, 2
6000, 1, 2

The median is “resistant” to outliers, the mean is reactive.

24
CH 3.2: Measures of Variation

25
Range of a Data Set: difference between its maximum and minimum
Team 1: 72, 73,76,76,78
Team 2: 67, 72,76,76,84
Range = Max – Min
Team 1: Range = 78-72 = 6
Team 2: _______________

Sample Standard Deviation: Measures variation by indicating how far, on average,


the observations are from the mean.
Sample Variance: The square of the sample standard deviation.
A large amount of variation => observations far from mean

Small amount of variation => observations close to mean

Sample Variance:

s2=
∑ ( x−x )2
n−1

Sample Standard Deviation


2
s= √ s
Calculating sample standard deviation for Team 1: (recall mean = 75)

26
Height Deviation from Mean Squared Deviation from mean
x
x−x ( x−x )2
72 72 – 75 = -3 9
73 73 – 75 = -2 4
76 76 – 75 = 1 1
76 76 - 75 = 1 1
78 78 – 75 = 3 9
Sum of Squared deviations = 24

2
s =
∑ ( x− x )2
=
24 24
= =6
Sample Variance for team 1: n−1 5−1 4

Sample Standard Deviation for team 1: s= √6 = 2.4

Calculate s.d. for Team 2 (mean = )


Height Deviation from Squared Deviation from mean
Mean
( x−x )2
X x−x

Sum of Squared deviations =


Population Variance

σ 2=
∑ ( x−μ )2
N
Population Standard Deviation
σ =√ σ 2

27
Exercise: Susan scored 60, 75, 80, and 65 in four statistics tests. Find the population
variance and the standard deviation of her tests.

If she receives 5 points extra credit toward each test. Find the variance and the standard
deviation of her tests now.

Note: If every value in the dataset is increased or decreased by the same value, the
standard deviation remains unchanged. (Think about the definition of standard
deviation)

28
CH 3.3: Chebyshev’s Rule and Empirical Rule
The Empirical Rule

The standard deviation is very useful for estimating probabilities

If the distribution is roughly bell shaped, then


 Approximately 68% of the data will lie within 1 standard deviation of the
mean
 Approximately 95% of the data will lie within 2 standard deviations of the
mean
 Approximately 99.7% of the data (i.e. almost all) will lie within 3 standard
deviations of the mean.

Example: The number of cars that pass a given intersection in a day is known to be
roughly bell shaped with a mean = 375 cars and sd = 25 cars. Interpret the empirical
rule for the number of cars passing the intersection on a given day.

 1 sd interval: 350 – 400: 68% of data, 16% above 400, 16% below 350
 2 sd interval: 325 – 425: 95% of the data, 2.5% above 425, 2.5% below 325
 3 sd interval: 300 – 450: 99.7% of the data, 0.15% below 300, 0.15% above
450

What would you think it was reported that 550 cars passed the intersection?

29
Example:
If the average age of retirement for the entire population in a country is 64 years and the
distribution is bell shaped with a standard deviation of 3.5 years:
What is the approximate age range in which 95% of people retire?

What percentage of the population retire at 57 years or older?

30
CH 3.4: The Five Number Summary; Boxplots

Quartiles:
Quartiles divide data sets into fourths, or four equal parts.
 The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%.
Therefore, the 1st quartile is equivalent to the 25th percentile.

 The 2nd quartile divides the bottom 50% of the data from the top 50% of the data,
so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to
the median.

 The 3rd quartile divides the bottom 75% of the data from the top 25% of the data,
so that the 3rd quartile is equivalent to the 75th percentile.

How to calculate the Quartiles:


1. Calculate the median, Q2.
2. Consider the sub-data set to the left of Q2. Q1 is the median of the sub-data set.
3. Consider the sub-data set to the right of Q2. Q3 is the median of the sub-data set

Example: Calculating quartiles


A group of Brigham Young University—Idaho students (Matthew Herring, Nathan
Spencer, Mark Walker, and Mark Steiner) collected data on the speed of vehicles
traveling through a construction zone on a state highway, where the posted speed was
25 mph.

The recorded speed of 14 randomly selected vehicles is given below:


20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40
Find and interpret the quartiles for speed in the construction zone
1. Median: n = 14,
(n+1)/2 = (14+1)/2 = 7.5 which means Q2 is the average of the 7th and 8th
number in ordered data set. Q2= (32+33)/2 = 32.5
2. Q1: sub-data set to the left of 32.5:

31
20, 24, 27, 28, 29, 30, 32
n= 7,
(n+1)/2 = (7+1)/2 = 4
Q1=28

3. Q3: sub-data set to the right of 32.5:


33, 34, 36, 38, 39, 40, 40
n= 7,
(n+1)/2 = (7+1)/2 = 4
Q3=38
Interpretation:
• 25% of the speeds are less than or equal to the first quartile, 28 miles per hour,
and 75% of the speeds are greater than 28 miles per hour.
• 50% of the speeds are less than or equal to the second quartile, 32.5 miles per
hour, and 50% of the speeds are greater than 32.5 miles per hour.
• 75% of the speeds are less than or equal to the third quartile, 38 miles per hour,
and 25% of the speeds are greater than 38 miles per hour.

The interquartile range (IQR) is the difference between the third and first quartiles
IQR = Q3 – Q1

For the speed data: Q1=28, Q3=38


IQR = Q3 – Q1
= 38 – 28
= 10

The range of the middle 50% of the speed of cars traveling through the construction
zone is 10 miles per hour.

The Five Number Summary of data set consists of minimum, maximum, and quartiles
in increasing order:
Min, Q1, Q2, Q3, Max = 20, 28, 32.5, 38, 40

Draw a boxplot for the speed data:

To draw a box Plot:

32
1. Determine the 5 number summary

2. Draw a horizontal axis on which the numbers obtained in Step 1 can be located.
Above the axis mark the quartiles and the Min and Max with vertical lines.

3. Connect the quartile to make a box and then connect the box to the Min and Max
with a line.

Suppose a 15th car travels through the construction zone at 100 miles per hour. How
does this value impact the mean, median, standard deviation, range and interquartile
range?
Original Data: 20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40
Data with outlier: 20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40, 100

Histogram of speed - original data

3.0

2.5

2.0
Frequency

1.5

1.0

0.5

0.0
20 25 30 35 40
speed - original data

33
Histogram of speed - with 100 mph data point

Frequency
4

0
20 40 60 80 100
speed - with 100 mph data point

What can we do to make the graphs easier to compare?


To compare graphs always use the same size axis.
Histogram of speed - original data

5
Frequency

0
20 40 60 80 100
speed - original data

Histogram of speed - with 100 mph data point

5
Frequency

0
20 40 60 80 100
speed - with 100 mph data point

34
Without 15th With 15th car
car
Mean 32.1 mph 36.7 mph
Median 32.5 mph 33 mph
Standard 6.2 mph 18.5 mph
deviation
IQR 10 mph 11 mph
Range 20 mph 80 mph

The mean, range and


the standard deviation
Example:
TV watching data: in hours per week for 20 children
are reactive to
outliers. The median
5 15 16 20 21 and IQR are resistant
25 26 27 30 30
31 32 32 34 35 to outliers.
38 38 41 43 66

Q1:

Q2:

Q3:

IQR:

Five number summary:


Draw a Boxplot:

35
Use your eyes to identify outliers, not the upper and lower fence method.

9
8
7
6
Frequency

5
4
3
2
1
0

10 20 30 40 50 60 70
TV TIMES

Identifying Potential Outliers by Finding Lower Limit and Upper Limit.

Observations that are below the lower limit or above the upper limit are potential
outliers.

Use lower and upper limit to find potential outliers in above data (TV watching data).

36
Distribution shape and boxplot

Describing and Comparing Distributions

Average Average
Temperature Temperature
  Worcester San Francisco
Jan 24 48
Feb 25 52
Mar 34 54
Apr 44 56
May 56 58
Jun 64 62
Jul 70 64
Aug 68 64
Sep 60 65
Oct 50 61
Nov 38 55
Dec 27 48

37
Boxplot of Average Temperature Worcester, Average Temperature SFO

70

60

50
Data

40

30

20
Average Temperature Worcester Average Temperature SFO

To describe a distribution, comment on:


1. measure of center (mean, median or mode)

2. measure of spread (range, IQR, standard deviation)

3. shape (example: symmetric, right skewed, bell shaped)

Describe the average temperature in Worcester:


1. Median = 48 degrees
2. IQR about 62 – 28 = 34 degrees
3. Shape: Somewhat symmetric

Describe the average temperature in SFO:


1. Median:
2. IQR:
3. Shape:

Compare the distribution of average temp in Worcester to the distribution of the average
temperature in SFO
1.
2.
3.

38
CH 3.5 Descriptive Measures for Populations

Population Mean: (parameter)


For a variable x in a population of size N:
μ= ∑
1
N
x
Standard Deviation of Population: (Parameter)

σ=
√ ∑ ( x−μ
N
)2

Parameter: A descriptive measure for a population


Statistic: A descriptive measure for a sample

39
Z-scores and Percentiles
Measures of Position: Precise ways to describe the relative position of a data value
within the entire set of data:

The Z-score is the number of standard deviations a data value is away from the mean.

The population z-score is calculated using the population mean and population
standard deviation
x−μ
z=
σ
The sample z-score is calculated using the sample mean and sample standard deviation

x− x̄
z=
s
The z-score is unitless and has a mean of zero and a standard deviation of 1.

According to the Empirical rule, almost all Z scores are between -3 and 3.

Example:
If the mean of a population was 20, the standard deviation of the population was 6 and
the data value was 26. What is the z-score and what does it mean?

The value 26 would have a z-score of 1.0

x−μ 26−20
z=
σ
= 6 =1
It means the data point is 1 standard deviation higher than the mean.

 What if the date value was 14? Find the Z-score. How many standard deviation
above or below the mean is the data point.

 What if the date value was 20? Find the Z-score. How many standard deviation
above or below the mean is the data point.

40
Example: The mean height of males 20 years or older is 69.1 inches with a standard
deviation of 2.8 inches. The mean height of females 20 years or older is 63.7 inches
with a standard deviation of 2.7 inches.

Who is relatively taller? (Relative with respect to their reference distribution.)


 Kevin Garnett whose height is 83 inches
or
 Candace Parker whose height is 76 inches

Bringing z-scores and the empirical rule together:


The IQ distribution is bell shaped and has a population mean of 100 and a population
standard deviation of 16. What is the probability that a randomly selected person has an
IQ score greater than 132?

The z-score will tell us how many standard deviations 132 is from the population mean,
100.

x−μ 132−100
z=
σ
= 16
=2

From the Empirical Rule, we know that 95% of the observations are within 2 standard
deviations of the mean. Since the total percentage of observations is 100%,

if the center has 95%


the two tails have 100% - 95% = 5%
Each tail has an equal amount, 5% ÷ 2 = 2.5%

95%

(100% - 95%) / 2 = 2.5%

The probability that a randomly selected person has an IQ greater than 132 = 2.5%

41

You might also like