You are on page 1of 60

Chapter 3 Central Tendency

1
Learning Objectives
 Compute and interpret mean
 Compute and interpret median
 Compute and interpret mode

2
Measures of Central Tendency
Measures of Central Tendency 集中趨勢測量值
1. mean 平均數
2. median 中位數

3. mode 眾數

3
Descriptive statistics - review
統計測量值
Presenting data: 3. statistical measures
1. tables 統計表
measures of central
tendency: shows the
location of the center of
a distribution 集中量數
measures of 變異量數
2. graphs & charts 統計圖 variability: shows how
spread out a
distribution is
pie chart
clear and detailed
4 data characteristics
tables & graphs: data distribution
Descriptive statistics
 Descriptive statistics often involves using a few
numbers to summarize a distribution.
 Two important aspects of a distribution:
 where its center is located. Measures of central
tendency are used to do so.
 how spread out it is, i.e. how much the numbers in the
distribution vary from one another. Measures of
variability are used to do so.

5
分配

3-1 Distribution - discrete variable


 Distributions can differ in shape. Some distributions are symmetric
whereas others have long tails in just one direction.
 Example #1. The distribution of a bag of Plain M&M's. (The
M&M's were in six different colors.)

18
17

7 7

4
2

frequency table =
frequency distribution
6
Distribution
 The distribution of all M&M's of the six different colors (left).
 Since every M&M is one of the six familiar colors, the six proportions
shown in the figure add to one.

the proportions for all M&M's distribution in a sample of 55 M&M's


7 probability distribution
Distribution - continuous variables
 Example #2. An experiment to measure the time needed to move
the cursor over a small target in a series of 20 trials. The variable
‘time to respond’ is a continuous variable.
 Measuring time in milliseconds (thousandths of a second) is often
precise enough to approximate a continuous variable in psychology.

grouped frequency distribution

8 n = 20
Distribution - continuous variables
 Grouped frequency distributions can be portrayed graphically.
 Distributions for continuous variables are called continuous
distributions, also called probability density.

histogram

grouped frequency distribution

9
Shapes of distributions (1) -
Normal distribution
 Some probability densities have particular importance in statistics. A
very important one is shaped like a bell, and called the normal
distribution. 常態分配
 Many naturally-occurring phenomena can be approximated
surprisingly well by normal distribution. It will serve to
illustrate some features of all continuous distributions.

The normal probability


density is higher in the
middle compared to its two
tails.

10 Do you see the "bell"?


Shapes of distributions (2) - a
positive skew
 Distributions have different shapes.

A distribution with the longer tail


extending in the positive direction
is said to have a positive skew. It
is also described as "skewed to the
right."
a positive skew 正偏
11 = a right skew 右偏
Shapes of distributions (3) - a
negative skew

a negative skew 負偏 The tail of the distribution extends to


the left, this distribution is skewed to
= a left skew 左偏 the left, it has a negative skew

12
Shapes of distributions (4) -
bimodal distribution
The distributions shown so
far all have one distinct high
point or peak. The
distribution in Figure 10 has
two distinct peaks. A
distribution with two peaks is
called a bimodal distribution.
bimodal distribution雙峰分布

13
Shapes of
distributions (5)
– Kurtosis峰度
The top distribution has long tails.
It is called "leptokurtic."
高狹峰

The bottom distribution has short


tails. It is called "platykurtic."
低闊峰

14
3-1 Self review: Q1 out of 7.
1. A frequency distribution contains the
frequency of every value in the distribution.
 true
 false

15
3-1 Self review: Q2 out of 7.
2. A grouped frequency distribution should be
used instead of a frequency distribution
when the
 distribution is bimodal
 distribution is skewed.
 variable is continuous

16
3-1 Self review: Q3 out of 7.
3. A symmetric distribution
 has equal positive and negative skews.
 has no skew.
 can have either positive or negative skew,
but not both.

17
3-1 Self review: Q4 out of 7.
4. The following distribution has

 a positive skew.
 a negative skew.
 no skew.

18
3-1 Self review: Q5 out of 7.
5. The area under the curve of a
probability distribution is
1
0
 10

19
3-1 Self review: Q6 out of 7.
6. A normal or bell-shaped distribution has its
greatest probability density in its tails.
 true
 false

The distribution is higher and therefore denser in the


middle of the distribution.

20
3-1 Self review: Q7 out of 7.
7. Which of the following distributions is/are
symmetric?
A
B
C
D

21
3-2 Central tendency
 Central tendency has to do with the location of the center of
a distribution.
 In statistics, averages are often referred to as ‘measures
of central tendency’.
 This idea of comparing individual scores to a distribution of scores
is fundamental to statistics.

A: at the exact center of the


distribution
B: below the center of the
distribution
C: above the center of the
distribution

22
Central tendency
 Central tendency definitions:
1. One definition of central tendency is the point at which the
distribution is in balance. (2, 3, 4, 9,16) Balance: 6.8
2. The other definition of the center of a distribution is the number
for which the sum of the absolute deviations is smallest.
3. The third definition is the target that minimizes the sum of
squared deviations.

23
Average
 The Center of the Data
 An average is a measure of where most of the values in the data
are located.
 The center of the data is where most of the values in the data are
located. There are different types of averages. The most commonly
used are:
 Mean
 Median
 Mode
Exercise #1
Read the text to find the definitions for Mean,
24 Median, and Mode. (3.4)
3-2 Self review: Q1 out of 3.
1. A frequency distribution contains the
frequency of every value in the distribution.
 true
 false

The distribution of empirical data is called a frequency


distribution and consists of a count of the number of
occurrences of each value.

25
3-2 Self review: Q2 out of 3.
2. For the numbers 10, 12, 16, and 20, the sum of the
absolute deviations from 15 is:
 14
 15
 16

Subtract 15 from each number, take the absolute value of the


differences, and add them together. |10-15| + |12-15| + |16-15| +
|20-15| = 5 + 3 + 1 + 5 = 14
26
3-2 Self review: Q3 out of 3.
3. For the numbers 3, 6, 9, and 10, the sum of the squared
deviations from 8 is:
 35
 34
 33

Subtract 8 from each number, square the differences, and add


them together. (3-8)2 + (6-8)2 + (9-8)2 + (10-8)2 = 25 + 4 + 1 + 4
= 34
27
3-3 The Center of the Data
 The center of the data is where most of the values in the data
are located. Averages are measures of the location of the
center.
 There are different types of averages. The most commonly
used are:
1. Mean
2. Median
3. Mode

28
平均數

3-3-1 Mean (1) (μ; M /𝑋)
 The mean is usually referred to as ‘the average’.
 There are multiple types of mean values. The most common type
of mean is the arithmetic mean. 算術平均數
 The mean is the sum of all the values in the data divided by the
total number of values in the data (the sum of the numbers divided
by the number of numbers)
 The symbol "μ" ([mju]) is used for the mean of a
population. The symbol "M" is used for the mean of a
sample. (Σ=summation)
μ = ΣX/N M = ΣX/N
ΣX = the sum of all the numbers in ΣX = the sum of all the numbers in the
the population and sample and
N = the number of numbers in the N = the number of numbers in the
29 sample.
population.
Summation Notation
 Many statistical formulas involve summing numbers. Fortunately
there is a convenient notation for expressing summation. This
section covers the basics of this summation notation.

= X1 + X2 + X3 + X4 = 4.6 + 5.1 + 4.9 + 4.4 = 19.0.

• The Greek letter capital sigma (Σ)


indicates summation.
• The "i = 1" at the bottom indicates that
the summation is to start with X1 and
the 4 at the top indicates that the
summation will end with X4.
• The "Xi" indicates that X is the variable
to be summed as i goes from 1 to 4.
30
Summation Notation

= X1 + X2 + X3 = 4.6 + 5.1 + 4.9 = 14.6.

= to sum all the values of X


= X1 + X2 + X3 + X4 = 4.6 + 5.1 + 4.9 + 4.4 = 19.0.

= 4.62 + 5.12 + 4.92 + 4.42 = 21.16 + 26.01 + 24.01 + 19.36


= 90.54

= (X1 + X2 + X3 + X4 ) 2 = 19.0 2 = 361 .

= 3 + 4 + 21 = 28

31
3-3 Self review: Q1 out of 1.
1. For the data on the right, compute the following.
A. ΣX =
B. ΣY =
C. (ΣX)2=
D. (ΣXY)=
E. =

32
The Sample Mean 樣本平均數

 x is a variable. There are ‘n’ observations in x, x1, x2,…,xn


 The sample mean = 𝑋, ത (X bar ):

 Sample mean is sometimes written as M (Mean).


Example #3. 𝑋ത (X bar ) = sample mean
Sample data set: n = number of observations
40, 21, 55, 21, 48, 13, 72 Σ (summation) = total of observations

= (40 + 21 + 55 + 31 + 48 + 13 + 72) / 7 = 38.57


33
Symbols used for Sample Statistics
& Populations Parameters

Parameter: a characteristic of a population


Statistic: a characteristic of a sample
母體參數 vs. 樣本統計量

34
Exercise #2
Calculate the mean for each data set with the function.

Data set 2-1.


9, 3, 8, 5, 7, 6, 5, 8, 5, 4
Data set 2-2.
32, 35, 38, 40, 40, 42, 43, 44, 46, 49, 53, 56
Data set 2-3.
65, 70, 75, 78, 78, 83, 84, 86, 86, 94, 99
Round the results to the nearest tenth.
留到小數點後1位 Use Excel function.
35
已歸類資料平均數

3-3-1 Mean (2) - Grouped Data


 The continuous data is normally displayed with a grouped frequency
distribution.
 Grouped data is data that is grouped together in different categories.
Mean is considered as the average of the data. For the mean of grouped
data, it might be difficult to find the exact value however, we can always
estimate it.
 Steps:
1. Calculate midpoint for each group. (midpoint = )
2. Calculate the sum of frequency for each group. (fx = fmidpoint)
3. Calculate the total frequency. (Σfx)
3. Calculate the mean for the grouped data.
x̄ = the mean value of the set of given data.
36 f = frequency of the individual data
N = sum of frequencies
Exercise #3
Calculate the mean for each data set with the function.
 Example #4. Math scores for 50 students. Calculate the mean.
Scores Midpoint Frequency (f) Sum of frequency fX
95~99 97 2
90~94 92 6
85~89 87 8
80~84 82 12
75~79 77 7
70~74 72 7
65~69 67 5
60~64 62 3
total N=50

37
Exercise #4
 Calculate the cells needed for exercise - a frequency table for
the Physics scores of some students.
1. N
2. Σfx
3. mean
4. draw a histogram for relative
frequency.
5. draw a cumulative frequency polygon.

38
Exercise #5
作業1
Calculate the mean for each data set with the function.
 Following are the time spent on study per week for some students.
Please transfer the data into Excel and calculate:
17.5, 18.5, 17.0, 16.5, 18.5, 20.0, 19.0, 19.5, 15.0, 14.5,
16.0, 15.5, 16.5, 17.0, 16.5, 18.0, 17.5, 19.0, 17.0, 19.5,
15.5, 16.5, 19.0, 16.0, 17.0, 19.0, 19.5, 18.0, 18.0, 18.5,
17.5, 19.0, 21.0, 18.0, 20.5, 25.5, 24.5, 25.0, 18.5, 21.0

5-1 How many students are there?


5-2 What’s the longest study time?
5-3 What’s the shortest study time?
5-4 What’s the average study time?
5-5 What’s the median of study time?
5-6 What’s the mode of study time?
18.4
39
Properties of Arithmetic Mean 特質
1. It’s the point at which the distribution is in balance.
2. The arithmetic mean is simple to understand and easy to
calculate.
3. It is influenced by the value of every item in the series.
4. The sum of deviations of the items from their arithmetic mean is
always zero, i.e. ∑(x – X) = 0.
5. The sum of the squared deviations of the items from Arithmetic
Mean (A.M) is minimum, which is less than the sum of the
squared deviations of the items from any other values.

Demerits of Arithmetic Mean


1. It is changed by extreme items such as very small and very large items.
40
加權平均數

3-3-1 Mean (3) Weighted Mean


 Weighted Mean is an average computed by giving different weights
to some of the individual values.
 Data elements with a high weight contribute more to the weighted
mean than the elements with a low weight.
 The weights cannot be negative. Some may be zero, but not all of
them; since division by zero is not allowed. Weighted means play
an important role in the systems of data analysis, weighted
differential and integral calculus.
 If all the weights are equal, then the weighted mean is the same as
the arithmetic mean.

w = weight
41 x = value
Exercise #6
Calculate the weighted mean. Fill in all cells in gray.
 Example #5. Weighted mean for a college grade card for John.
Credits Weighted
Subject Scores GP GPA
(weight) scores
Literature 3 A- 3.7
English 1 3 A- 3.7
Health & Fineness 1 A+ 4.3
Calculus 3 C+ 2.3
Economics 4 B+ 3.3
Accounting 3 B 3
Speech 1 A 4
Total

GP: grade point Weighted score = credit  GP


42 GPA: grade point average GPA (1~5) =weighted total ÷ credit total
3-3-2 Median (me /md) 中位數
 The median is the ‘middle value’ of the data.
 The median is found by ordering all the values in the data and
picking the middle value:
position, not value
Example #6. 13, 21, 21, 40, 48, 55, 72
Computation of the Median
1. When there is an odd number (奇數) of numbers, the median is
simply the middle number. E.g. the median of 2, 4, 7 is 4.
2. When there is an even number (偶數) of numbers, the median
is the mean of the two middle numbers. Thus, the median of the
numbers 2, 4, 7, 12 is (4+7)/2 = 5.5.

position, not value


43
3-3-2 Median
 The median is less influenced by extreme values (極端
極) in the data than the mean.
 Changing the last value to 356 does not change the
median: The median is still 40.
Example #6. 13, 21, 21, 40, 48, 55, 356
 Changing the last value to 356, changes the mean a lot:
 (13 + 21 + 21 + 40 + 48 + 55 + 72)/7 = 38.57
 (13 + 21 + 21 + 40 + 48 + 55 + 356)/7 = 79.14

44
1-2#2中位數 (median)
Exercise
Calculate the median for each data set 2-1.

Steps:
1. Sort the data from the smallest to the largest.
2. n=10, an even number.
3, 4, 5, 5, 5, 6, 7, 8, 8, 9
3. Find the position of the median.
4. Me = [ n/2 + (n/2)+1] / 2 Me = ( 5 + 6 ) ÷ 2 = 5.5
= [(10/2) + (10/2)+1]=6 𝑋ത = 6
The median is the average of the 5th & 6th data.

45
Exercise #2 & #5
Calculate the median for each data set with the function.

Data set 2-1. 17.5, 18.5, 17.0, 16.5, 18.5,


20.0, 19.0, 19.5, 15.0, 14.5,
9, 3, 8, 5, 7, 6, 5, 8, 5, 4
16.0, 15.5, 16.5, 17.0, 16.5,
Data set 2-2. 18.0, 17.5, 19.0, 17.0, 19.5,
32, 35, 38, 40, 40, 42, 43, 44, 46, 49, 53, 56 (1)19.0, 16.0, 17.0,
15.5, 16.5,
Data set 2-3. Mean:
19.0, 19.5, 43.17
18.0, 18.0, 18.5,
65, 70, 75, 78, 78, 83, 84, 86, 86, 94, 99 Median:
17.5, 19.0, 42.5
21.0, 18.0, 20.5,
25.5, 24.5, 25.0, 18.5, 21.0
Exercise #2 (2)
Exercise #5
Mean: 81.64
Median: 83

46
Properties of Median 特質
1. Median is not dependent on all the data values in a dataset.
2. The median value is fixed by its position and is not reflected by the
individual value.
3. The distance between the median and the rest of the values is less than the
distance from any other point.
4. Every array has a single median.
5. Median cannot be manipulated algebraically. It cannot be weighed and
combined.
6. In a grouping procedure, the median is stable.
7. Median is not applicable to qualitative data.
8. The values must be grouped and ordered for computation.
9. Outliers and skewed data have less impact on the median.
10. If the distribution is skewed, the median is a better measure when
compared to mean.
47
Should we tell our
rival the mean or the
median of the
heights of our
players?

48
3-3-3 Mode (Mo)
 The mode is the value(s) that appears most often in the data:
 There can be more than one mode if multiple values appear the
same number of times in the data.
Example #7. 40, 21, 55, 21, 48, 13, 72
 Here, 21 appears two times, and the other values only once. The
mode of this data is 21.
 We can have more than one mode or no mode at all.

49
Mode
 The mode is also used for categorical data, unlike the median
and mean. Categorical data can’t be described directly with
numbers, like names:
 Alice, John, Bob, Maria, John, Julia, Carol
 Here, John appears two times, and the other values only once. The
mode of this data is John.

50
Mode
 The mode of continuous data is normally computed
from a grouped frequency distribution.
 Table 2 shows a grouped frequency distribution for the
target response time data. Since the interval with the
highest frequency is 600-700, the mode is the middle of
that interval (650).

51
1-2#2中位數 (median)
Exercise
Calculate the mode for each data set 2-1.

Steps:
1. Sort the data from the smallest to the largest.
2. Find the number that that appears most
often.
3, 4, 5, 5, 5, 6, 7, 8, 8, 9
3. The mode is 5.
Mo = 5
Me = ( 5 + 6 ) ÷ 2 = 5.5
𝑋ത = 6

52
Exercise #2 & #5
Calculate the mode for each data set with the function.

Data set 2-1. 17.5, 18.5, 17.0, 16.5, 18.5,


20.0, 19.0, 19.5, 15.0, 14.5,
9, 3, 8, 5, 7, 6, 5, 8, 5, 4
16.0, 15.5, 16.5, 17.0, 16.5,
Data set 2-2. 18.0, 17.5, 19.0, 17.0, 19.5,
32, 35, 38, 40, 40, 42, 43, 44, 46, 49, 53, 56 (1)19.0, 16.0, 17.0,
15.5, 16.5,
Data set 2-3. Mean:
19.0, 19.5, 43.17
18.0, 18.0, 18.5,
65, 70, 75, 78, 78, 83, 84, 86, 86, 94, 99 Median:
17.5, 19.0, 42.5
21.0, 18.0, 20.5,
25.5, 24.5, 25.0, 18.5, 21.0
Exercise #2 (2)
Exercise #5
Mean: 81.64
Median: 83

53
Exercise #7
heights of female college students
Female college students (cm)
7-1 how many students
151 154 154 164 158 146 162 151 7-2 mean height
154 162 152 158 151 166 167 156 7-3 median of height
7-4 mode of height
160 156 161 150 155 161 159 166 7-5 max
160 162 160 155 155 143 153 159 7-6 min
7-7 frequency table +
163 157 160 157 165 156 146 157 graphs (histogram +
156 162 153 161 165 156 156 156 cumulative polygon)
158 162 155 168 154 149 160 159
156 169 163 162 148 162 151 156
154 150 160 153 169 159 151 156
160 162 159 154 158 164 157 161

54
Exercise #7
frequency table
relative cumulative
cumulative
group midpoint frequency frequency frequency
frequency
% %
141~145 143 1 1% 1 1%
146~150 148 6 8% 7 8%
151~155 153 19 24% 26 24%
156~160 158 30 38% 56 38%
161~165 163 18 23% 74 23%
166~170 168 6 8% 80 8%
100
80
74
80
56
60
40 26
20 7
11 6 19 30 18 6
55 0
141~145 146~150 151~155 156~160 161~165 166~170

frequency cumulative frequency


English score of junior high school
students in Taiwan 2005-2006

56
S.No Mean Median Mode

The middle number in a given The most frequently occurred


The average taken of given
1 set of observations is called number in a given set of
observations is called Mean.
Median. observations is called mode.

Add up all the numbers and Place all the numbers in the mode is derived when a
2 divide by the total number of ascending or descending number has frequency
terms order occurred in a series

After arranging everything


Once the above step is The mode can be one or more
from smallest to biggest, take
3 finished, what we get is the than one. It is possible to have
out the middle number, which
mean. no mode at all, as well
is your median.

Mean is the arithmetic mean When series have even


or in a simple way can be a numbers, median is the simple If there is a unique data set,
4
simple average or weighted average of the middle pair of there is no mode at all.
average. numbers.

When data is normally When data distribution is When there is a nominal


5 distributed, the mean is widely skewed, median is the best distribution of data, the mode
preferred. representative. is preferred.
57
統計測量數 優點 缺點
平均數、中位數、眾數之比較
1.資料的重心。資料無極端值或偏 1.容易受到極端值
態時,具代表性。 ﹙extreme value﹚的影響,
2.易被人接受。 若有極端值時,則不具
平均數
3.每筆資料都被計算入,敏感度高。 代表性。
4.可用代數方法運算。 2.資料如為偏態,則代表性
5.觀察值與平均數差平方和最小。 較差。
1.簡單易了解。 1.只考慮居中的數值,忽
2.適用於有極端值的資料。 略了其他數值,敏感性
中位數
3.適用於偏態資料。 較低。
4.觀察值與中位數差絕對差和最小。 2.不適合代數運算。
1. 僅考慮幾個數值,不適
1.簡單易了解。 合代數運算,敏感性較
2.適用於有極端值的資料。 低。
眾數
3.適用於偏態資料。 2.可能不只一個或不存在。
4.適用於質性資料。 有兩個以上的眾數,則
較難取捨。
58
Relative position of mean, median,
and mode

59
Chapter Exercises
 This is the end of this week’s lesson.
 Please finish the exercises and turn
them in on time.
 Keep up the good work.

60

You might also like