ES Lecture Note

ES 2023-24
Elementary Statistics
This course focuses on the teaching of quantitative skills which can be applied to the problems commonly
encountered in daily life. It serves to cultivate students' interest towards quantitative techniques that are
necessary for further studies as well as to strengthen their ability to apply these techniques in different areas.
Upon completion of the course, students should have acquired a solid training in fundamental statistical skills.
They should be able to apply these skills in practice, and to analyse and present data using basic statistical
methods.
Topics
1. Introduction to Survey and Statistics
2. Probability Distributions
3. Sampling Distributions and Central Limit Theorem
4. Estimation
5. Time Series
6. Price Index
Assessment
Individual assignments: 60 marks
Final examination: 40 marks
References
1. Haeussler, Ernest F., Paul, Richard S. & Wood, R. J. Introductory Mathematical Analysis for Business,
Economics, and the Life and Social Sciences, 14th edition, Pearson Education Limited.
2. Berenson, Mark L., Levine, David M. & Szabet, Kathryn A. Basic Business Statistics: Concept and
Applications, 14th edition, Pearson Education Limited.
Intended Learning Outcomes and Grade Descriptors

A document with the set of intended learning outcomes and grade descriptors of this course has been
uploaded to SOUL course link for your reference.
Class Lecturer
Name:
Email contact:
SOUL account: Course link: CCMA4008 - ____________________
Class link: CCMA4008 - ____________________

General Reminders:
1. Remember to bring a HKEA approved calculator with SD (statistics) function to classes and
examination. Calculators with graphical display will not be allowed in the examination.
2. Check SOUL course link and class link frequently for updated information about the course and class
management.
3. Correct your answer to 4 decimal places when necessary.

(e.g. 23.56474024 = 23.5647 (correct to 4 d.p.))
2
Chapter 1 Introduction to Survey and Statistics

In this chapter, we would spend time to understand the basic work flow of conducting a simple
survey. The basic concept and some terminologies related to survey and statistics will be introduced.
Major learning objectives of this Chapter:

 Understand the basic work flow of conducting a survey
 Understand the difference between doing census and sample survey
 Be able to collect a good representative sample by using simple random sampling method,
systematic sampling method, or stratified sampling method
 Understand the difference between numerical variable and categorical variable
 Be able to present a numerical dataset by using: mean, mode, percentile, range, inter-quartile
range, variance, standard deviation, and skewness
 Be able to summarize the finding in a simple paragraph
 Be able to work out the linear function of a variable
We start this chapter with Example 1: suppose now the manager of a large company with 3000
employees wants to collect information about employees' satisfactory level towards the company.
How should the manager plan this survey?
There are many things the manager has to consider:

 why do we have to conduct this survey?
 who are eligible to participate in the study?
 how to measure the level of satisfactory?
 how many employees should be involved in the study?
 how to summarize the collected data to present the result?
Regarding to this survey:

 the objective: understand the employees' satisfactory level towards the company
 subject: every employee in the company
 variable: 10 point scale: 10: very satisfied ... 0: very unsatisfied
 most ideal: conducting census: all employees (3000 employees)
or more convenient: conducting survey: portion of the employees (e.g. 500 employees)
 summary statistics, e.g. mean, standard deviation, 25th percentile, 75th percentile, ...
3
Difference between Census and Survey

Let's visualize the situation with the following diagram:
 Johnny: 9
 May: 7  May: 7
 Peter: 8  Amy: 9
 David: 9  ...
 ...  ...
 ...  ...
 ...
 ...
 ...
 ...
This left oval represents the population of this study, it includes every employee in this company.
There should be 3000 employees in the company, each of them will give a score according to his /
her satisfactory level towards the company. Imagine all score must range from 0 to 10, 0 means
totally unsatisfied while 10 means totally satisfied.
When doing census, data would be collected from every employee (3000 data would be collected).
Once the data collection is completed, we try to understand the current situation by doing some data
analysis (for example, calculation of mean and standard deviation of the scores). You can imagine
that the mean satisfactory score equals to 9.5 or 2.4 represent very different situation.
However, sometimes it is not practical to conduct census due to some limitations, for example,
limited time and budget, survey will be conducted. A sample would be selected based on a fair and
random procedure (as far as possible) and then data will be collected and analysed, which is
represented by the right oval.
After doing survey, we would also analysis the data in order to draw conclusion about the current
situation. However, as the data collection is incomplete, we need to be very careful when we try to
make the conclusion. The reliability of the conclusion you made from a sample survey very much
depends on how good is your sample as a representative of the population.
4
Key Concepts in Statistics

A Population is the totality of elements (also called items, objects) under consideration.
Investigation based on the data of the whole population is called a census. Sometimes it is too
expensive or impossible to obtain data on every object of the population. In this case, we conduct
survey by selecting some objects of the population for analysis in order to derive the characteristics
of the whole population. A Sample is a portion of the population that is selected for analysis.
Types of Survey Sampling Methods

When a survey is to be conducted, we need to take a sample from the population for detailed study.
There are two main types of samples: probability samples (where selection is based on the chance
of occurrence); non-probability samples (where selection is made for convenience and time
saving). As far as possible, probability samples should be used in order to minimize the possible
chance of getting biased result.
Probability Sampling
When selecting probability sample, we need to ensure every element has a chance to be selected. In
order to do so, an updated sampling frame has to be prepared. Sampling Frame is a data file that
contains information of the population objects. An updated sampling frame is particularly important
for the selection of probability samples, even though sometimes it is quite a difficult task to prepare
the sampling frame. A sampling frame may be a telephone directory, student registration list,
employment record, etc.
Employee Name of employee Department Contact number

identity number
0001 Johnny Sales and marketing 12341234
0002 May Sales and marketing 23232323
0003 Peter Sales and marketing 25242524
0004 David Sales and marketing 12481248
...
...
3000 Alex Accounting 21252125
5
Method 1: Simple Random Sampling

Simple Random Sampling selects objects such that every object of the population has an equal
chance of being selected. Each element in the sampling frame is identified (usually by given a
unique identity number) and the sampling is conducted in a way that every element has the same
opportunity to be chosen. The simple random sampling can be done by using a Random Number
Table.
4928088924357790028381163072758986302348
6187041657074680861298083973492077545091
4389865923250788612978496976539155008078
6299393912304548459856095206641287264647
Example 1:
Select a sample of size 500 from 3000 employees by simple random sampling method.
Solution:
 Sampling Steps:
1. Assign unique identity number to each employee, 0001 – 3000.
2. Count every 4-digits (the digits are determined by the population size which is 3000 with 4
digits) as the employee number being chosen.
3. Numbers that are outside the range 0001 - 3000 have to be discarded.
4. If the same number is selected again, it also has to be discarded.
5. Suppose the first employee is selected from row 1 column 1 of the random number table, the
following numbers are selected.
4928 0889 2435 7790 0283 8116 3072 7589 8630 2348 ... ...
6. Continue the selection until 500 different employees are selected.
7. Employees with these employment numbers are chosen for the survey.
Using computer software to generate random numbers could be much faster, the underlying logic is
just the same as using the random number table.
Advantage: Easy to operate

Disadvantage: Time consuming for a large scale survey
6
Method 2: Systematic Sampling

Systematic Sampling selects the first object a randomly and the rest by a fixed interval k,
where k = .
Example 1:
Select a sample of size 500 from 3000 employees by systematic sampling method.
Solution:
 Sampling Steps:
1. Assign unique identity number to each employee, 0001 – 3000.
2. Compile the ratio k = = 6.
3. Select the first subject a randomly from the first k employees (may get help from the random
number table). Suppose the first employee is selected from row 1 column 1 of the random
number table, then a = 4.
4. Then we select a, a + k, a + 2k, …, and so on, until 500 employees are selected:
4, 10, 16, 22, 28, ... 2998
5. Employees with these employment numbers are chosen for the survey.
Advantage: Time saving

Disadvantage: The sample may be bias when studying periodic data
7
Method 3: Stratified Sampling

Stratified sampling divides the whole population into subgroups (called strata), which are mutually
exclusive and exhaustive, according to some common characteristic within the strata. The
classification is done in a way that between subgroups individuals hold very different information to
the objective of the survey. Ensuring the same proportion of representative in each of the strata is the
major idea of stratified sampling. Individual samples are then selected from each of the strata
randomly.
Example 1:
Select a stratified sample of size 500 from 3000 employees, for whom 600 are managers and the
other 2400 are junior staffs.
Solution:
 Sampling Steps:
1. Compile the sample size for each subgroup, which should be proportional to the population
600
size for each subgroup. Sample size for managers: 500 × = 100 and sample size for
3000
2400
juniors: 500 × = 400.
3000
2. Generate individual samples of 100 managers and 400 junior staffs randomly.
Advantage: Can avoid unbalance selection

Disadvantage: More complicated
8
Non-probability Samples
Select sample based on a convenience way (e.g. street interview). Practical when no sampling frame
is available.
Example 2:
How does a sample of 500 teenagers to be selected in order to review the satisfactory level towards a
brand of cola?
Solution:
As the population size is very large, all teenagers in Hong Kong, it is impossible to prepare a
sampling frame. A more practical way is to invite 500 teenagers to join the survey by convenience.
9
Types of Data
A single survey would deal with a variety of variables. The data, which are the observed outcomes
of these variables, will virtually always differ from person to person. There are two types of
variables: numerical variable and categorical variable.
Data
Numerical Categorical
Discrete Continuous Nominal Ordinal
Numerical variable: Data consists of numbers that represent counts or measurements

Discrete: Data only takes place at particular values (usually integers)
Continuous: Data covers a range of values
Categorical variable: Data consists of names that represent categories
Nominal: No natural order between categories
Ordinal: There exists a natural order between categories
Example 3:
This is the result of part of the survey. How many variables are there? What are the data types of the
variables?
Gender Number of previous Highest education Total working

full time level hours on 1/9/2019
employment
Female 2 HKCEE graduate 10.8
Male 4 Undergraduate 8.8
Female 0 Undergraduate 9.4
Female 4 HKALE graduate 10.6
Male 3 HKCEE graduate 10.5
10
Solution:
There are four variables.
Gender is a categorical variable which uses "M" and "F" as the names of the two categories. It is in
nominal scale as no natural order between "M" and "F".
Number of previous full time employment is a numerical variable, and it is discrete.
Highest education level is a categorical variable. It is in ordinal scale so that "Undergraduate",
"HKALE graduate", and then "HKCEE graduate" represents the decreasing education level of the
three groups of employees.
Total working hours on 1/9/2019 is a numerical value, and it is continuous.
We usually use capital letter, e.g. X to denote the variable and use small letter, x to denote the
collected data. Suppose let X represents the gender of an employee, x1 = "F", x2 = "M", x3 = "F", x4 =
"F", x5 = "M". Sample size is usually denoted by n (n = 5) and population size is denoted by N (N =
3000).
11
Summary Measures
In order to generate some helpful information from a messy numerical data set, a list of commonly
used summary measurement would be reviewed. There are three major types of descriptive
measures which help to describe a set of numerical data: central tendency, variation, and shape.
Measures of Central Tendency / Location

(a) Mean / Average
With μ as the notation of population mean and 𝑥̅ as the notation of sample mean, the formulae of this
most commonly used measurement are as follow:
Population Sample
𝑥 +𝑥 +⋯+𝑥 ∑𝑥 x1  x 2  ...  xn  x
Mean 𝜇= = x 
𝑁 𝑁 n n
Mean is a measurement that shares the total by the number of data equally.
Example 4:
The sales record (number of items sold in August, 2020) of a sample of 15 salespersons selected
from a company was as follow: (in ordered array)
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Solution:
⋯
Sample mean = 𝑥̅ = = 648.7333 items
 Some salespersons had better sales record and some performed not as good as the others. If
we put all records together and share equally among them, on the average, each person sold
648.73 items. Multiply 648.73 with 15 gives back the total of 9731 items sold in August,
2020.
Remark:
 Mean is the most common measure of central tendency.
 Affected by extreme values.
 Multiply the mean by the number of data equals to the total value.
12
What is the difference between population mean and sample mean?
Let’s take a look of this example. In a university, all year 1 students (e.g. N = 2000) have to take a
course “General Statistics”, the following are the examination result (marks) of all students:
25, 28, 29, …, 95, 95, 96, 97, 98, 99

⋯
The population mean result = = 78.65 marks
The population mean is a unique measurement. It totally reflects the characteristics of the population.
It can only be calculated when census is conducted.
When students are randomly assigned to different classes with each class size equals to 30, the
average result of each class can be calculated:
⋯
Class 1: 28, 32, …, 95, 97 mean result = = 75.3
⋯
⋯
...
The sample mean is not unique. Its value depends on which data are selected in the sample. The
idea of selecting a “good representative” sample is to avoid subjective selection of data so that the
sample mean is hopefully reasonably close to the unknown population mean.
There is an interesting relationship between random sample means and population mean. We will
discuss it in Chapter 3. In Chapter 4, we will try to make use of a randomly selected sample mean to
estimate the population mean with a reasonable high level of accuracy.
In this chapter, we highlight the difference between the population mean and sample mean and how
to calculate them correctly.
13
(b) Mode
The mode is the value occurs most frequently in the data set. Unlike mean, mode is not affected by
the occurrence of extreme values.
Mode = most frequently observed data
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
 we don’t have a mode as all data are having the same frequency.
Remark:
 Mode can also be used in categorical data set.
14
(c) pth percentile (1  p  99)

The pth percentile is the maximum value that (about) p% of the observations are smaller than that
value.
Procedure to find the pth percentile
Step1: Arrange data in ordered array *
Step2: Compile index i as the number of data in group 1, where i = n . Think about the two
cases for handling
median when the size of
So that number of data in group 2 should be n - n
data is odd number and
Step3: Adjust i as the position of the pointer even number!
(a) If i is not an integer, round up i. The pth percentile is the value

of the data in the ith position.
(b) If i is an integer, the pth percentile is the average of the values of
the data in the ith position and (i+1)th position.
* Data must be arranged in ordered array. Checking position in a raw data set does not give any
information related to percentile.
Example 4:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the 10th percentile, 40th percentile, and 90th percentile.
Solution:
10th percentile = 345 items (i = 15 = 1.5 ↑ 2)
40th percentile = =574.5 items (i = 15 = 6)
90th percentile = 904 items (i = 15 = 13.5 ↑ 14)
 The bottom 10% salesperson sold less than 345 items and the top 10% salesperson sold more
than 904 items. 40th of the salesperson sold less than 574.5 items.
Remark:
You may have some ideas about the following measurements:
25th percentile = first quartile (Q1)
50th percentile = second quartile (Q2) = median
75th percentile = third quartile (Q3)
15
Measures of Variation
Variation is the amount of dispersion, or spread, in the data. Two data sets with the same mean may
have completely different spreads (for example, both class A and class B have the mean test score of
82.5, however, class A students are very stable while class B students have large deviation). The
measure of central tendency and variation together give a good picture of a data set.
(a) Range
The range is the difference between the largest and smallest observations in a set of data.
Range = Maximum data – minimum data

Example 4:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the range of the data.
Solution:
Range = 990 – 300 = 690 items.
Remarks:
The range is affected by extreme values.
16
(b) Interquartile Range

The interquartile range is the difference between the third quartile and the first quartile in a set of
data.
IQR = Q3 – Q1
Example 4:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the interquartile range of the data.
Solution:
Q1 = 25th percentile = 420 items (i = 15 = 3.75 ↑ 4)
IQR = 890 – 420 = 470 items
Remarks:
1. The interquartile range measures the spread of the middle 50% of the data.
2. It is not affected by extreme values.
17
(c) Variance and Standard Deviation

Variance and standard deviation evaluate how the values fluctuate about the mean.
Population Sample
Variance 2 
 (x   ) 2
s2 
 (x  x) 2
N n 1
Standard deviation 
 (x  ) 2
s
 (x  x) 2
N n 1
Consider (𝑥 − 𝜇) as a new variable, which measures the square difference of the data point to the
mean. The variance is the average of the square differences. When the variance is small, that means
the difference of the data point to the mean is small, which also means the data points are located
closely together. A more commonly used measure of variation is the standard deviation, which is
simply the square root of the variance.
Besides using the formula to calculate the sample variance and then the sample standard deviation,
the dataset can be inputted into the calculator and the sample standard deviation can be generated
(See appendix in page 24).
18
Example 4:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the sample variance and standard deviation.
Solution:
⋯
𝑥̅ = = 648.7333 items
Sample variance:
( . ) ( . ) … ( . )
𝑠 = = 52690.50 items2
Sample standard deviation:

𝑠 = √𝑠 = √52690.50 = 229.54 items
Alternatively,
sample standard deviation s = 229.5441 items (from calculator)
sample variance s2 = (229.5441)2 = 52690.4952 items2.
 On the average, a salesperson sold 648.73 items. However, we know that no one sold exactly
648.73 items, some sold more and some sold less. If we use 648.73 items as a reference point,
then the average difference of the data to this mean is about 229.54 items.
Remarks:
1. Both variance and standard deviation are non-negative.
2. Variance is in the squared units of the original units of the data, e.g. squared dollars. Thus,
the standard deviation is more commonly used as it is in the original units of the data, e.g.
dollars.
3. When we are working on the sample survey data, we compile sample standard deviation as
the summary. We cannot find out the population standard deviation as there is a lot of
missing data out there.
4. The denominator in the sample variance is n - 1 instead of n, which makes the sample
variance the best estimator of the population variance. (We learn the concept of estimator in
more detail in Chapter 4.)
19
Skewness of data
When the relative frequency of a variable at different data value is plotted, the probability density
function of the variable is visualized. A distribution can have many different shapes. We can
classify distributions according to their skewness. A distribution is symmetric if the parts above and
below its centre are mirror images in the density function. A distribution is skewed to the right if the
right side is longer, while it is skewed to the left if the left side is longer.
Below are two simple ways to identify skewness.
By comparison of Q1, Q2, and Q3 By reviewing the density function
(a) Symmetric
Q2 – Q1 = Q3 – Q2
(example: height of a 10 years old boy)
(b) Left-skewed (negative-skewed)

Q2 – Q1 > Q3 – Q2
(example: examination result)
(c) Right-skewed (positive-skewed)

Q2 – Q1 < Q3 – Q2
(example: monthly income of fresh graduate)
In summary, when
Q2 – Q1 = Q3 – Q2 symmetric distribution
Q2 – Q1 > Q3 – Q2 left-skewed distribution
Q2 – Q1 < Q3 – Q2 right-skewed distribution
20
Example 4:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Comment on the skewness with reason.
Solution:
As Q2 – Q1 = 248 (668-420) > Q3 – Q2 = 222 (890-668), the distribution is left skewed.
Simple summary of a variable

The reason for calculating all these summary statistics is that we want to study the characteristics of a
numerical variable in different dimensions. Instead of just present a list of numbers, it would be
better that you can summarize the characteristics of the variable by a simple paragraph.
There is no standard rule about how to construct the summary. However, naming the variable,
presenting the central tendency with the support of the measure of variation, together with special
observation would give a simple picture of the basic characteristics of the variable.
Example 4:
The variable of this study was the sales (number of items sold in August, 2020) by a salesperson. 15
salespersons were selected from the company to form a sample. On the average, a salesperson sold
648.73 items with the standard deviation of 229.54 items. The range was 690 items. The bottom
10% salesperson sold less than 345 items, while the top 10% salesperson sold more than 904 items
with the median was 668 items.
21
Y as a linear function of X
Sometimes instead of just focusing on the analysis of the given variable, it is also the interest to
analyse a function of it. A simple linear function, which involves multiplication of a constant,
addition of a constant, or both applications, is often observed in daily application.
Y = a + bX
Think about the following applications, how can we express Y in terms of X?

X Y
Number of items sold in a Monthly salary, which is calculated with basic salary of $20000
month and an allowance of $30 for each item sold
Y = 20000 + 30X
Weight of a boy (in kg) Weight of a boy (in pound)
Original price Discounted price with 10% off
Original monthly salary Adjusted monthly salary with 4% increment
With the summary statistics for variable X has been calculated, the summary statistics of variable Y
can be calculated directly without regenerate the dataset with the following relationship
Summary statistics Y = a + bX
Mean Mean(Y) = a + b Mean(X)
Percentile pth(Y) = a + b pth(X)
Range Range(Y) = |b| Range(X)
IQR IQR(Y) = |b| IQR(X)
Standard deviation SD(Y) = |b| SD(X)
Variance Variance(Y) = b2 Variance(X)
22
Example 4:
After reviewing the summary of the number of items sold by a salesperson in August 2020, the
senior management also wants to have a summary about the monthly salary of a salesperson.
Without regenerate the dataset of the monthly salary, find the mean, 10 th percentile, median, 90th
percentile, range, standard deviation, and variance of the monthly salary, for which, monthly salary is
calculated with basic salary of $20000 and an allowance of $30 for each item sold.
Solution:
By using X to denote the number of items sold in a month and Y be the monthly salary of that
salesperson, Y = 20000 + 30X.
It would be convenience to generate the summary by applying the relationship directly:
Summary statistics X Y = 20000 + 30X

Mean 648.73 items 20000 + 30(648.73) = $39461.90
10th Percentile 345 items 20000 + 30(345) = $30350
Median 668 items 20000 + 30(668) = $40040
90th Percentile 904 items 20000 + 30(904) = $47120
Range 690 items 690(30) = $20700
Standard deviation 229.54 items 229.54(30) = $6886.20
Variance 52688.61 (items2) 52688.61 (302) = 47419749.00 ($2)
The variable is the monthly salary earned by a salesperson in August 2020. On the average, a
salesperson earned $39461.90 with the standard deviation of $6886.20. The range was $20700. The
bottom 10% salesperson earned less than $30350, while the top 10% salesperson earned more than
$47120 with the median was $40040.
23
Calculator Usage on Descriptive Statistics

(For Casio fx-50FH / fx-50FH II)
Data Set:
163.6 156.2 166.3 179.3 157.8 165.4 159.5 161.7 160.4
1. Change to “SD” mode

MODE MODE SD
2. Clear previous data

SHIFT CLR Stat EXE
3. Input data
163.6 DT 156.2 DT 166.3 DT 179.3 DT
157.8 DT 165.4 DT 159.5 DT 161.7 DT
160.4 DT
4. Calculate descriptive statistics

Mean (𝑥̅ =163.3555556): SHIFT 2 1 EXE
Population standard deviation ( 𝜎 = 6.459637417): SHIFT 2 2 EXE
Sample standard deviation ( 𝜎 = 6.851480132): SHIFT 2 3 EXE
No. of data input (n = 9): SHIFT 1 3 EXE
5. Change Data
Example : change the first data ‘163.6’ to ‘183.6’
▲/▼ (until you see x1=163.6) 183.6 EXE
6. Delete Data
Example : delete the second data ‘156.2’
▲/▼ (until you see x2=156.2) SHIFT DT
7. Frequency (more than 1 observation)

Example: 5, 5, 5, 5
5 SHIFT , 4 DT
8. Return to normal mode

MODE 1
24
Revision of Basic Probability before Chapter 2

Before we start Chapter 2, you need to do some revisions of the basic probability. Make sure you
can handle the following topics:
Probability
Probability is the likelihood or chance that a particular event will occur.
Sample Space and Event

Experiment: Situation with uncertain outcome
Sample Space (S): collection of all possible events
Event (A): each possible type of occurrence
Classical probability
When tossing a fair die, the probability of observing 1, P(1) = 1/6
When tossing a fair die, the probability of observing an odd number, P(odd) = 3/6 = 1/2
Combination
When 3 students are randomly selected from 10 students, there should be 10C3 = 120 possible
combinations.
Empirical probability
When tossing an unfair die, the probability of observing 1, according to the following frequency
table:
Observation 1 2 3 4 5 6
Frequency 12 20 18 19 22 29
P(1) = = 0.1
P(odd) = = 0.4333
If the die is tossed two times independently, based on the above frequency table, the probability that
the first tossing results as 1 and the second tossing results as 6:
P(first 1 and the second 6) = × = 0.0242
25
Chapter 2 Probability Distributions

In Chapter 1, we look at methods to describe the characteristics of a numerical dataset. In this
chapter, we will look at methods to describe characteristics of a numerical random variable.
By understanding the probability distribution of a variable, it gives us insight to predict the outcome
under an uncertain situation.

 Be able to summarize a numerical discrete variable by using: probability distribution function,
expected value, variance, and standard deviation
 Be able to calculate the expected value, variance, and standard deviation for a function of a
variable
 Be able to summarize a Binomial variable by using: probability distribution function, expected
value, variance, and standard deviation
 Be able to read the probability density function of a numerical continuous variable
 Be able to calculate the probability function of a normal variable
 Be able to locate the normal score of a normal variable
 Be able to analyse a function of a normal variable
26
Numerical Random Variable

A numerical random variable consists of the following characteristics:
1. its value is uncertain and unpredictable;
2. it is presented (can be converted) as a numerical value
In general, when the outcome of “something” is unpredictable and can be expressed numerically, it is
considered as a numerical random variable.
Example 1:
What side may face-up when you toss a die?
Example 2:
Imagine you are a librarian working in the borrowing counter in a public library. How many books
may the next reader borrow?
Example 3:
Imagine you are a tour guide. Every day you need to take care of a group of 10 tourists. Each tourist
would choose to visit one and only one of the two theme parks, Ocean Park or Disneyland. Today,
how many out of 10 tourists may go to Ocean Park?
Example 4:
Imagine you are a researcher doing analysis for a telecommunication company. You are required to
collect information about the duration of long-distance calls. How long may a long-distance call last
for?
Discrete vs Continuous
In Chapter 1, we have discussed the difference between discrete data and continuous data:
Discrete: Data only takes place at particular values
Continuous: Data covers a range of values
Let’s see if you can define for which variable(s), in Example 1 to Example 4, is (are) discrete
variable and which variable(s) is(are) continuous variable:
Discrete variable:
Continuous variable:
27
Probability Distribution Function for a Numerical Discrete Random Variable

A probability distribution function for a discrete random variable is a mutually exclusive (no
intersection) listing of all possible numerical outcomes for that random variable such that a
particular probability, p(xn), of occurrence is associated with each outcome, xn. Reviewing the
probability distribution of a random variable gives us important information of a discrete random
variable.
The probability distribution function, pdf, for a discrete variable X, can be represented by a formula,
a table, or a graph, which provides the probabilities P(X = x) = p(x) corresponding to each value of x
and it has the properties :
where capital X is random variable and small x is a specific number.
1. 0  p ( x )  1,
2.  p( x)  1 where the summation is over all possible values of x with non-zero probability.
x
Remarks:
1. Summarizing the discrete variable as a probability distribution function helps us to
a. understand the possible range of the outcome
b. evaluate which outcome has a relatively higher chance of happening than the others
2. There are many ways to prepare the probability distribution function, by theoretical approach,
experiment, observation, survey, …
28
Let's review the probability distribution functions for our discrete random variables.
Example 1:
Variable X: result of tossing a die
If the die is fair, it is expected that the chance of happening for each possible outcome is the same, so
the probability distribution function should be:
x 1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
However, if the die is unfair, we cannot simply assume each outcome has the same chance of
happening. An alternative way to find the probability distribution function is by experiment.
Suppose we do a series of experiment (for example, toss this single die 100 times) and find out the
empirical probability for each possible outcome by referring to the observed relative frequency. If
the following is the result of tossing this single die 100 times,
x 1 2 3 4 5 6
Frequency 18 12 32 16 12 10
Then the probability distribution function is:

x 1 2 3 4 5 6
p(x) 0.18 0.12 0.32 0.16 0.12 0.10
Example 2:
Variable X: number of book a reader borrows from the public library
We can generate the probability distribution function by referring to the book borrowing record in
the library system. The idea is to use the relative frequency to project the probability as in Example
1.
Here, we do not present you with the whole borrowing record, instead, the probability distribution
function is presented as follow:
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
29
Example 3:
Variable Y: how many tourists, out of a group of 10 tourists, will go to Ocean Park
There are two ways to prepare this probability distribution function.
Method 1: check the past record of the number of tourists go to Ocean Park every day for a long time
and use the relative frequency as a projection.
Method 2: Suppose based on the previous experience, 60% of the individual tourist visited Ocean
Park, while the other 40% individual tourist visited Disneyland.
In a later session, we would derive the probability distribution function of the number of tourists
(among 10) may visit Ocean Park theoretically by a Binomial distribution. The result is summarized
here:
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
Referring to the table, it is easy to aware that most likely there will be around 4 to 8 tourists (with a
probability of 0.899) visit Ocean Park in a group of 10 tourists.
30
Mathematical Expectation for a Discrete Random Variable

With the probability distribution function of a discrete random variable X is known, the expectation
(long term mean / weighted mean), variance and standard deviation (the measure of dispersion) of X
can be compiled as a summary of the variable.
Definitions
For a discrete random variable X with probability distribution p(x),
1. The expectation (or expected value or mean) of X, denoted by X or E(X), may be considered
as its weighted average over all possible outcomes – the “weights” being the probability
associated with each of the outcomes. i.e. p(x)
E ( X )  x1 p1  x 2 p 2  ...  xkpk   xp( x).

x
The mean of X can be interpreted as the average value of X in the long run.
Example 2:
In this table, X represents the number of books a reader borrows in one visit to the public library.
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
On the average, how many books a reader borrows in one visit to the public library?
Solution:
E(X) = 1(0.02) + 2(0.07) + 3(0.15) + 4(0.28) + 5(0.33) + 6(0.10) + 7(0.03) + 8(0.02) = 4.35 books
 On the average, a reader borrows 4.35 books from the public library.
Of course, not every reader borrows the same amount of books and actually none of them would
borrow 4.35 books. How large is the deviation between readers in terms of the number of books they
borrow?
31
2. The variance of X, denoted by  X2 or Var(X) may be defined as the weighted average of the
squared discrepancies (i.e. difference) between each possible outcome and its mean.
Var( X )  E( X 2 )  ( E( X )) 2
While the positive square root of the variance gives the standard deviation of X.
x  Var(X )
Example 2:
In this table, X represents the number of books a reader borrows in one visit to the public library.
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
Find the variance and standard deviation of the number of books a reader borrows from the public
library.
Solution:
Var(X) = (1- 4.35)2(0.02) + (2 - 4.35)2 (0.07) + (3 - 4.35)2 (0.15) + (4 - 4.35)2 (0.28)
+ (5 - 4.35)2 (0.33) + (6 - 4.35)2 (0.10) + (7 - 4.35)2 (0.03) + (8 - 4.35)2 (0.02)
= 1.8075 (formula similar to that in Chapter 1)
Expand the above formula, it can be regenerated as Var(X) = E(X2) – E(X)2
Var(X) = 12(0.02) + 22(0.07) + 32(0.15) + 42(0.28) + 52(0.33) + 62(0.10) + 72(0.03) + 82(0.02) – 4.352
= 1.8075
The standard deviation of X

σ(X) = √1.8075 = 1.3444 books
 On the average, a reader borrows 4.35 books from the public library with a standard deviation
of 1.34 books.
32
A Function of a Random Variable

Sometimes, the random variable is not studied directly, but a function Y = f(X) of the random variable
is studied. e.g. X is the number of photocopier to be sold in a week and Y = f(X) is the profit gained in
a week by selling photocopier. By understanding the mathematical relationship between y and x, it is
possible for us to perform the basic analysis for the variable Y.
Example 5:
We need to evaluate the profit made by selling a particular brand of photocopier “PhotoC”. We want
to check if it meets the target of having an average weekly profit of $5000.
Based on the limited information, suppose this is all we know about “PhotoC”:
1. Below is the probability distribution function of the number of “PhotoC” sold in a week, X, based
on past record.
x 0 1 2 3 4 5
p(x) 0.05 0.22 0.38 0.24 0.10 0.01
2. the weekly basic expense contributed to selling “PhotoC” is $3500,

3. the cost of each “PhotoC” is $1000,
4. the selling price of each “PhotoC” is $5340 each.
Based on the given information about the variable weekly sales, a direct analysis of the weekly sales,
in terms of the expected value and standard deviation are:
E(X) = (0)(0.05) + 1(0.22) + 2(0.38) + 3(0.24) + 4(0.1) + 5(0.01) = 2.15 (items)

Var(X) = (0)2(0.05) + 12(0.22) + 22(0.38) + 32(0.24) + 42(0.1) + 52(0.01) - 2.152 = 1.1275(item2)
σ(X) = 1.0618 (items)
 On the average, the weekly sales of PhotoC is 2.15 items with a standard deviation of 1.06
items.
33
As our target is to evaluate the weekly profit, we need to link up sales and profit by denoting Y as the
weekly profit and consider Y as a function of X:
Y = f(X) = (5340 – 1000)X – 3500 A helpful financial formula:

Profit = Revenue – Cost
=4340X – 3500
How can the expected weekly profit be calculated?
1st method for the calculation of the expected value of Y = f(X): by substitution
After we substitute the value of x into the formula Y = 4340X – 3500, we have the probability
distribution of Y as:
x 0 1 2 3 4 5
y=f(x) -3500 840 5180 9520 13860 18200
p(y) 0.05 0.22 0.38 0.24 0.10 0.01
E(Y) = f(x1)p1 + f(x2)p2 + f(x3)p3 + … + f(x6)p6

= (-3500)(0.05) + 840(0.22) + 5180(0.38) + 9520(0.24) + 13860(0.1) + 18200(0.01)
= 5831($)
Var(Y) = (-3500)2(0.05) + 8402(0.22) + 51802(0.38) + 95202(0.24) + 138602(0.1)

+ 182002(0.01) - 58312 = 21237139 ($2)
σ(Y) = √21237139 = 4608.38 ($)
 On the average, the weekly profit gained by selling PhotoC is $5831 with a standard deviation
of $4608.38.
34
Manipulation of Expected Value and Variance – Linear Function

When Y is a linear function of X, there is a faster method to generate the expected value and
standard deviation of Y based on the summary we have compiled on X.
When Y is a linear function of X, so that Y = a + bX, where a and b are constant, then
E(Y) = a + b E(X),
Var(Y) = b2Var(X),
σ(Y) = |b|σ(X)
2nd method for the calculation of the expected value of Y: (only apply for linear function)
Y = 4340X – 3500 = – 3500 + 4340X
With a = – 3500, b = 4340
With E(X) = 2.15, Var(X) = 1.1275, σ(X) = 1.0618
E(Y) = – 3500 + 4340E(X) = – 3500 + 4340(2.15) = $5831

Var(Y) = 43402Var(X) = 21237139 ($2)
σ(Y) = |4340|σ(X) = $4608.38
 On the average, the weekly profit gained by selling PhotoC is $5831 with a standard deviation
of $4608.38.
35
Special Numerical Discrete Probability Distribution - Binomial Distribution

Many types of statistical problems have only two outcomes, or can be converted as only two
outcomes. In example 1, tossing a die has six possible outcomes, which can be converted as either
odd number or even number, or the observed number greater than 4 or less than or equal to 4. In
example 2, the number of books a reader borrows can be classified as a lot (4 or more books) or a
few (3 or less books).
If we denote one of the two outcomes as success and the other outcome as failure, then after one
experiment you will have either one success or one failure. If you continuously conduct / observe a
series of identical experiments, you will have uncertain number of successes obtained. Sometimes,
we are interested to review the probability distribution function of the number of successes after a
series of identical experiments.
Just as in example 3 (page 30), when we just ask one tourist, the tourist may go either Ocean Park or
Disneyland. If we define visiting Ocean Park as success, then visiting Disneyland must be defined as
failure. By asking 10 tourists, you may result as 10 successes, 9 successes, 8 successes, …, 1 success
or 0 successes. Regarding to the number of tourists may go Ocean Park, there are 11 possibilities. In
this session, we try to derive the probability of each of these 11 possibilities theoretically.
Binomial distribution and its probability distribution function

When the discrete random variable of interest is the number of successes among n observations, x,
the binomial distribution is used to model the situation when:
(a) The experiment consists of n identical trials.
(b) Each trial results in one of the two possible outcomes. We shall call one outcome a “success”
and the other a “failure”. (defined by yourself)
(c) The probability of success in a single trial is p, which remains constant from trial to trial.
Thus, the probability of failure is 1 – p, which is constant over all trials.
(d) The trials are independent.
36
By denoting X as the number of successes among n identical trials with the probability of success in
each trial is p, the variable X is said to follow the binomial distribution and is commonly denoted as
X ~ Bin(n, p)
The probability distribution function of X is presented as
P( X  x)  p( x)  nCxp x (1  p) n x x  0, 1, ..., n
n!
where Cx 
n .
x!(n  x)!
Our interest is the total number of successes in n trials. The possible number of successes must be an
integer, which ranges from 0 to n. The product p x (1  p) n  x tells us the probability of obtaining
exactly x successes out of n observations in a particular sequence. (n-x) is the number of failures
while the term nCx tells us how many sequences or arrangements (combinations) of the x successes
out of n observations are possible.
37
Example 3:
Suppose the chance of one tourist visits Ocean Park is given as 0.6 (p = 0.6).
If you ask only 2 tourists about their preferences, you may get the following answer:
OO, OD, DO, DD
Converting the outcomes as the number of tourists visit Ocean Park, Y, a simple mapping will tell
you that
OO → y = 2 Do you remember how
to construct the two
OD → y = 1 levels tree diagram?
DO → y = 1
DD → y = 0
Y is a random variable and y = 0, 1, or 2.
p(0) = P(DD) = (0.4)(0.4) = 0.16
p(1) = P(OD) + P(DO) = 0.6(0.4) + (0.4)(0.6) = 2(0.6)(0.4) = 0.48
p(2) = P(OO) = 0.6(0.6) = 0.36
As a summary, let Y be the number of tourists, among 2 tourists, may go to Ocean Park,
Y ~ Bin(2, 0.6),
y 0 1 2
P(Y = y) 0.16 0.48 0.36
 Most likely, one out of 2 tourists would go to Ocean Park.
38
Well, as now we have a group of 10 tourists, we need to know how many of them prefer visiting
Ocean Park. We have already studied, the possible number of them will go Ocean Park can be 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 10. In order to project the probability, we need to do the mapping as before with
a larger scale.
p(0) = P(DDDDDDDDDD) = (0.4)(0.4) …(0.4)
= (0.4)10 = 0.0001
p(1) = P(ODDDDDDDDD) + P(DODDDDDDDD) + P(DDODDDDDDD) + …

+ P(DDDDDDDDDO)
= (0.6)(0.4)910C1 = 0.002
p(2) = P(OODDDDDDDD) + P(ODODDDDDDD) + P(ODDODDDDDD) + …

+ P(DDDDDDDDOO)
= (0.6)2(0.4)810C2 = 0.011
…
p(6) = P(OOOOOODDDD) + P(OOOOODODDD) + …
= (0.6)6(0.4)410C6 = 0.251
…
p(10) = P(OOOOOOOOOO)
= (0.6)10 = 0.006
If we summarize the above calculation and let Y be the number of tourists will go Ocean Park among
10 tourists and the chance of each tourist would go Ocean Park is known as 0.6,
then Y ~ Bin(10, 0.6).
Starting from y = 0 to y = 10, we can calculate the probability for each y by the formula:
P(Y = y) = 10Cy (0.6)y(1 - 0.6)10-y and summarize the probability distribution function of Y as:
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
 Most likely, 6 out of 10 tourists would go to Ocean Park. Relatively there is a high chance to
have around 4 to 8 out of 10 tourists go to Ocean Park.
39
Example 5:
In a casino, the probability to win a certain game is 0.3. Suppose you are going to play the game 4
times, what is the probability that
(a) you will win exactly two games?
(b) you will win at least two games?
Solution:
Let X be the number of games you will win, X ~ Bin(4, 0.3)
(a) P(X = 2) = 4C2(0.3)2(0.7)2 = 0.2646
(b) P(X ≥ 2) = P(X = 2) + P(X = 3) + P(X = 4)
= 4C2(0.3)2(0.7)2 + 4C3(0.3)3(0.7)1 + 4C4(0.3)4(0.7)0 = 0.2646 + 0.0756 + 0.0081
= 0.3483
40
Expectation and Variance of Binomial Distribution

What are the expectation and standard deviation of a binomial variable?
In our example 3, earlier we just focus on having one group of 10 tourists and try to project how
many of the tourists may go to Ocean Park, Y, and result with the following probability distribution
function:
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
How about we have many groups of tourists, each group with 10 tourists, then what is the average
number of tourists will go to Ocean Park in each group?
There are simple formulae for this question.
For X ~ Bin(n, p),

E(X) = np
Var(X) = np(1 – p)
𝜎(𝑋) = 𝑛𝑝(1 − 𝑝)
As in example 3, if for each group with 10 tourists, and the probability of each tourist go to Ocean
Park is 0.6, then
E(Y) = 10(0.6) = 6
Var(Y) = 10(0.6)(0.4) = 2.4
𝜎(𝑌) = 10(0.6)(0.4) = 1.549
Remark:
The calculation of the expectation of a Binomial variable gives us some insight about the most likely
number of happenings in a group. As in our example, when there are 10 tourists in a group, with the
expectation of the number of tourists will go to Ocean Park is calculated as 6 tourists, that means
most likely, there would be about 6 tourists go to Ocean Park. The standard deviation of 1.549 helps
us to widen a range including those neighbours with relatively high chance. (You may compare this
with the table in page 38)
41
Graphic Presentation of Binomial Distribution
Probability distribution B in(6,0.2)

p(x)
When the probability of success is less than
0.5
0.4
0.5, the distribution is right-skewed. It means
0.3 there is a higher chance to have just a few
0.2 successes.
0.1
0
0 1 2 3 4 5 6
x

p(x) When the probability of success is equals to
0.35
0.3 0.5, the distribution is symmetric.
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6
x

p(x)
When the probability of success is greater
0.5
0.4
than 0.5, the distribution is left-skewed. It
0.3 means there is a higher chance to have lots of
0.2 successes.
0.1
0
0 1 2 3 4 5 6
x
42
Probability Density Function for a Numerical Continuous Random Variable

For a discrete random variable, we can explicitly write down all possible outcomes and find out /
compile / estimate the probability for each outcome so the total probabilities of all possible outcomes
must be equal to 1. However, for a continuous random variable, there are infinitely many
possibilities that it is impossible to list them out. In this case, the probability density function is used
to present the relative likelihood for a random variable to take on a given value. Very often, the
probability density function is presented graphically. Below are some graphical examples of
probability density function:
(a) This graph indicates the time a baby needs to finish a simple task in a regular body check.
Regarding to the graph, you would see that a baby takes 1 to 5 minutes to finish the task. Unlike
discrete random variable, there are infinitely many possibilities between 1 and 5 minutes. A
horizontal probability density function means that it is equally likely for a baby to finish the task at
every possible time, between 1 and 5 minutes.
(b) This graph indicates the time a student spends on revision in a week. This random variable
takes any value greater than 0 and the curve shows a down going (exponential decay) pattern. It
shows that most students do not spend much time on doing revision.
43
Normal Distribution
The normal distribution is the most widely used continuous distribution in statistics. For simplicity,
if a random variable X is normally distributed with population mean  and population variance  2,
we say that
X ~ N( , 2).
μ − 3σ μ − 2σ μ−σ μ μ+σ μ + 2σ μ + 3σ X
A normal distribution consists of the following characteristics:

 It is bell-shaped symmetric about the mean, μ
 Because of its symmetric characteristics, P(X < μ) = 0.5 = P(X > μ) = 0.5
 About 34.1% data lies between μ and μ + σ; another 34.1% data lies between μ and μ – σ,
P(μ – σ < X < μ) = 0.341 = P(μ < X < μ + σ)
Example 6:
The waiting time for checking in a room in a hotel is normally distributed with mean 18 minutes and
standard deviation 4 minutes. By denoting X as the waiting time, X ~ N(18, 42). A simple review of
the waiting time gives us the following:
 Half of the customers have to wait for more than 18 minutes for checking in a room.
 There is about 34.1% chance that a customer has to wait 18 to 22 minutes. There is another
34.1% chance that a customer has to wait 14 to 18 minutes.
44
Example 7:
The spending on a cup of coffee in a café follows a normal distribution with mean $50 and standard
deviation $10. By denoting X as the spending on a cup of coffee by a customer, X ~ N(50, 102). A
simple review of the examination result tells:
 Half of the customers spend more than 50 dollars for a cup of coffee.
 About 68.2% customers spend between 40 to 60 dollars.
 About 2.3% customers spend less than 30 dollars.
Besides knowing the above basic information, can we do further analysis, such as
(a) What is the probability that a customer spends more than $53 for a cup of coffee?
(b) What is the minimum spending on a cup of coffee for the top 10% customers?
Finding probability for a normal variable
The probability that X lies between a and b is written as
P(a < X < b)
In this course, we will find the probability for a normal variable by

1. standardize the normal variable to a standard normal variable (standard score z = )
2. look up the probability from the standard normal table
45
Standard Normal Distribution and Standard Normal Table

Let Z denotes a random variable with standardized normal distribution, i.e. the mean of the variable
is 0 and the variance of the variable is 1, Z ~ N(0, 12). This variable Z actually is any normal variable
after transforming each data point to its standard score with the formula
Z=
where  is the mean  is the standard deviation of the original variable.
For X ~ N(µ, 2); with Z = ; then Z ~ N(0, 12)
𝜇 − 3𝜎 𝜇 − 2𝜎 𝜇 − 𝜎 𝜇 𝜇+𝜎 𝜇 + 2𝜎 𝜇 + 3𝜎 X~ N(, 2)

X
-3 −2 −1 0 1 2 3 Z ~ N(0, 12)
46
Below is first few rows of the standard normal table which keeps the probability function of
P(0 < Z < z), where z is any positive number correct to 2 decimal places. Let’s see how to use it to
read the probability function relate to a position z = 0.32.
0 0.32
The entries in Table I are the probabilities that a random variable having the standard
normal distribution will take on a value between 0 and z. They are given by the area of
the gray region under the curve in the figure.
TABLE I NORMAL-CURVE AREAS

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
According to the table, P(0 < Z < 0.32) = 0.1255.

And because of the symmetric characteristics,
 P(0 < Z < 0.32) = 0.1255
 P(–0.32 < Z < 0) = 0.1255 since the distribution is symmetric
 P(Z > 0.32) = 0.5 – 0.1255 = 0.3745
 P(Z < 0.32) = 0.5 + 0.1255 = 0.6255
47
The entries in Table I are the probabilities that a random variable having the standard
normal distribution will take on a value between 0 and z. They are given by the area of
the gray region under the curve in the figure.
TABLE I NORMAL-CURVE AREAS

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4648 0.4656 0.4664 0.4671 0.4678 0.4685 0.4692 0.4699 0.4706
1.9 0.4713 0.4719 0.4725 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999.
48
Standardization and finding probability for a normal distribution

If you want to calculate the probability for a normal variable, you may follow the following
procedure:
1. Write down the probability statement for variable X
P(a < X < b)
2. Write down the probability statement for variable Z:
P( <Z< )
3. Look up the probability function from the standard normal table

P(0 < Z < 𝑧)
4. Simple calculation with the symmetric characteristics
Example 7:
(a) What is the probability that a customer spends more than $53 for a cup of coffee when the
spending is normally distributed with mean of $50 and standard deviation of $10?
Solution:
For X ~ N(50, 102), standardize the variable X by using the function Z = .
Normal Distribution Standardized

Normal Distribution
  10 Z 1
.3821
53 X 0.30 Z
  50 Z  0
 53  50 
P ( X  53)  P Z    P ( Z  0.3)
 10 
= 0.5 – 0.1179 = 0.3821
 In conclusion, the chance that a customer spends more than $53 for a cup of coffee is 0.3821.
49
Finding normal score for a normal distribution

By reversing the previous procedure, we can locate the normal score in a normal distribution that
fulfills a specific probability requirement.
In order to find the location of k which fulfils the specify probability requirement, we can follow the
following procedure:
1. Locate the unknown normal score (k) reasonably in the normal curve. Make sure you
aware if the normal score should be smaller than the mean or bigger than the mean.
2. Rewrite the probability statement relate to the mean.

P(k < X < µ) for the normal score less than the mean
or P(µ < X < 𝑘) for the normal score bigger than the mean
3. Rewrite the probability statement for variable Z and find the value of a from the standard
normal table.
P(a < Z < 0) where a should be negative
or P(0 < Z < 𝑎) where a should be positive
4. Transform a back to k:
k = µ + (a)
50
Example 7:
(b) In the café, the spending on a cup of coffee, X, is known to follow normal distribution with
mean $50 and standard deviation $10. The manager of the café wants to know the minimum
spending for a cup of coffee for the top 10% customers. For P(X > k) = 0.1, what is the value of k?
Solution:
50%
40%
10%
50 k
0 a
P(X > k) = 0.1 (k should be bigger than the mean)

P(50 < X < k) = 0.4, (rewrite the above statement relate to the mean)
P(0 < Z < 1.28) = 0.4 (from the table)
k = 50 + 10(1.28) = 62.8
 The minimum spending for the top 10% customers would be $62.8.
51
Example 8:
In the cafe shop, besides selling coffee, it also sells sliced cakes. It is known that the spending on a
piece of cake follows a normal distribution with mean of $32 and standard deviation of $6.
(a) What is the probability that a customer spends less than $30 for a piece of cake?
(b) There are 60% customers would spend more than $k for a piece of cake, what is the value of k?
Solution:
Use Y to denote the spending on a piece of cake, Y ~ N(32, 62)
(a) A graph indicates Y < 30: (b) A graph indicates 60% of the spending is
more than $k:
(a) 𝑃(𝑌 < 30) = 𝑃 𝑍 < = 𝑃(𝑍 < −0.33) = 0.5 − 0.1293 = 0.3707
(b) 𝑃(𝑌 > k) = 0.6

𝑃(𝑘 < 𝑌 < 32) = 0.6 − 0.5 = 0.1
As P(-0.25 < Z < 0) = 0.1 from table
k = 32 + (-0.25)(6) = 30.5
52
Function of a normal variable – (I) Linear function

Suppose variable X follows a normal distribution with known µ and σ, X ~ N(µ , σ 2).
For a variable Y which is a linear function of X and be expressed as Y = a + bX, then Y also follows a
normal distribution such that
Y ~ N(a + b µ, (b σ)2)
Example 7:
In the café, the spending on a cup of coffee, X, is known to follow normal distribution with mean $50
and standard deviation $10. Suppose the owner of the café is considering adjustment of the selling
price of each cup of coffee by marking up the original price by 8% and then a discount of $2 will be
applied.
(a) What are the (i) mean and (ii) standard deviation of the selling price of a cup of coffee after
the adjustment?
(b) After the adjustment, what is the probability that someone buy a coffee which costs $54 or
more?
Solution:
(a) With X as the notation of the original price of a cup of coffee and use Y to denote the
adjusted price, Y = 1.08X – 2
(i) Mean of Y = 1.08E(X) – 2 = 1.08(50) – 2 = $52
(ii) Standard deviation of Y = 1.08 σ (X) = 1.08(10) = $10.8
(b) P(Y ≥ 54) = P 𝑍 ≥ = P(Z ≥ 0.19) = 0.5 – 0.0753 = 0.4247

.
53
Function of a normal variable – (II) Sum of Two Independent Normal Variables

The sum of two independent normal variables is also normally distributed. Let's take a look when
there are two independent variables, X1 and X2, so that
X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22), then
X1 + X2 ~ N(μ1 + μ2, σ12 + σ22)
Example 9:
In the café, the spending on a cup of coffee, X, is known to follow normal distribution with mean $50
and standard deviation $10. It is also known that, the spending on a piece of cake, Y, follows a
normal distribution with mean $32 and standard deviation $6,
X ~ N(50, 102);
Y ~ N(32, 62)
Every day, there are many customers buying one cup of coffee and one piece of cake.
Imagine you want to review the total spending of a customer on a cup of coffee and a piece of cake:
(a) What is the distribution of the total spending?

(b) What is the probability that a customer spends more than $80 when he orders a cup of coffee
and a piece of cake?
Solution:
(a) For T be the total spending, T = X + Y
T ~ N(50 + 32, 102 + 62);
T ~ N(82, 11.66192)
On the average, a customer will spend $82 when ordering a cup of coffee and a piece of cake.
The standard deviation of the total spending on the two items is $11.6619.
(b) P(T > 80) = P 𝑍 > = P(Z > -0.17) A graph indicates T > 80:
.
= 0.5 + 0.0675 = 0.5675

The chance that a customer spends more than $80 when
ordering a cup of coffee and a piece of cake is 0.5675.
54
Chapter 3
Sampling Distributions and Central Limit Theorem
In chapter 2, we look at the distribution function of a variable. In chapter 3, we consider sample
mean as a variable and try to understand the distribution function of the sample mean and we name it
as the sample mean distribution. We would also look at the distribution function of the sample
proportion.
By understanding the sampling distribution, we can prepare ourselves to understand the study of
inferential statistics in the next Chapter.

 Understand the normal distribution characteristic of sample mean (as a variable)
 Understand the normal distribution characteristic of sample proportion (as a variable)
55
Sample mean as a random variable
In Chapter 1, page 13, we have briefly reviewed the idea that sample mean is not unique, but a
variable which depends on the data in the sample.
In a university, all year 1 students have to take “General Statistics”. The population mean and
standard deviation of the result of all students are:
⋯
population mean score = = 68.65
( . ) ( . ) ⋯ ( . )
and population standard deviation = = 12.4
When students are randomly assigned to different classes with each class size equals to 30, the
average score of each class can be calculated:
⋯
Class 1: 28, 32, …, 95, 97 mean score = = 65.3
⋯
Class 2: 33, 35, …, 96, 98 mean score = = 61.4
⋯
Class 3: 30, 31, …, 88, 91 mean score = = 64.2
It is easy to understand from the above example that sample mean is not unique, but a variable.
From now on, we can consider sample mean as

(i) a data – if we just focus on one particular sample
(ii) a variable – if we record each sample mean from different samples repeatedly
If sample mean is a random variable, what are the characteristics of this random variable? Is it
discrete or continuous? What are the mean and standard deviation of this random variable?
56
Sampling distribution of Sample Mean
Firstly, make sure you understand that sample mean, 𝑋, is a random variable. Then, we can take a
look at the characteristics of this random variable.
Sample 1
x1
Variable X,
population mean μ
population variance 2
Sample 2
x2
Sample 3
x3
Imagine we are now selecting samples with sample size n repeatedly from the population with
population mean µ and population standard deviation  are given, the sample mean of each sample is
calculated and denoted as 𝑥̅ . The sample mean distribution characteristics can be summarized as
follows:
1. Mean of sample means is the same as the population mean.

𝐸(𝑋) = 𝜇
2. Variance of sample means equals population variance divided by the sample size.
𝜎
𝑉𝑎𝑟(𝑋) =
𝑛
3. Standard error, positive square root of the variance is a measurement used to represent
the average deviation from the individual sample mean to the population mean.
𝜎
𝑆𝐸(𝑋) =
√𝑛
57
Central Limit Theorem
If the sample size is reasonably large (n ≥ 30), the sample mean distribution is well approximated by
a normal distribution. (Central Limit Theorem)
 2
X ~ N   , 
 n 
Sample mean as a normal variable

Sample mean is considered as a normal variable either
(i) The original variable follows a normal distribution
(ii) Sample size is reasonably large enough (n ≥ 30).
With the requirement (i) or (ii) (or both) fulfilled, further analysis by using the normal variable
characteristics can be conducted.
58
Let’s look at some examples to have a better understanding of the sample mean distribution.
Example 1:
As mentioned earlier, suppose the examination result of General Statistics is as follow:
Population mean score: 68.65
Population standard deviation: 12.4
For every 30 students are randomly assigned to a class, the sample mean distribution of class average
is as follow:
Mean of class mean: 68.65
.
Variance of class mean: =5.1253
.
Standard error of class mean: = 2.2639
√
Because the sample size is reasonably large, 𝑋 ~ N(68.65, 2.26392)
 If you compare the performance between individual students, the mean score is 68.65 and the
standard deviation is 12.4. However, if you compare the performance between different classes
by using class mean to represent the performance of the class, the average of the mean score is
68.65 and the standard deviation is 2.2639. It is not a surprise that comparison between classes
should be more stable than comparison between students as in each class we have some well
performed and not so well performed students. The class mean takes balance between the high
marks and low marks.
59
Example 2:
A report indicates that on average a tourist spends $5000 in a 3-days trip to Taiwan. The standard
deviation of the spending is $600 so the variance is 360000($2). Imagine you are a tour guide and
you take care of a group (sample) of 40 tourists every day. If you make a long term record of the
mean spending of each group of 40 tourists per day, then you should aware that the mean spending in
each group is not constant, but a variable. Use X to denote the spending of an individual and X to
denote the mean spending of a sample of 40 tourists, then
Mean of sample mean spending: 𝐸(𝑋) = $5000

Variance of sample mean spending: 𝑉𝑎𝑟(𝑋) = =9000 ($2)
Standard error of sample mean spending: 𝑆𝐸(𝑋) = = $94.87

√
Because of the large sample size (n = 40 > 30), the sample mean spending follows a normal
distribution,
𝑋~𝑁(5000, 94.87 )
Example 3:
Samples of 25 light bulbs are randomly selected from the factory regularly for quality control
checking. The machine has been set up with the mean lifetime of the light bulbs at 1000 hours and
standard deviation at 80 hours. It is reasonable to assume the lifetime of the light bulbs follow a
normal distribution. Use X to denote the lifetime of a light bulb and 𝑋 to denote the mean lifetime
of a sample of 25 light bulbs, then
Mean of sample mean lifetime: 𝐸(𝑋) = 1000 hours
Variance of sample mean lifetime: 𝑉𝑎𝑟(𝑋) = = 256 (hours2)
Standard error of sample mean lifetime: 𝑆𝐸(𝑋) = = 16 hours

√
Because the lifetimes of light bulbs follow a normal distribution, the sample mean lifetimes also
follow a normal distribution,
𝑋~𝑁(1000, 16 )
60
Sampling Distribution of Sample Proportion
When the population variable is a numerical variable (e.g. examination result), the sample is usually
summarized by the calculation of the sample mean. When the population variable is a categorical
variable (e.g. gender of a student), the sample is then summarized by the calculation of the sample
proportion.
Example 1:
Imagine, for the same group of 2000 students taking the course “General Statistics”, there are 1500
male, 500 female. The variable gender is a categorical variable. Here, we use p to denote the
population proportion of male, for example, p = 0.75.
If a class of 30 students has 24 male and 6 female, we can use 𝑝̂ to denote the class proportion of
male (sample proportion) such that 𝑝̂ = 0.8.
Imagine now we select another class of 30 students, it is easy to realize the proportion of male in this
class may or may not be the same as the previous class. We have so many classes of students and
each class has its own class proportion of male. Again, we should consider sample proportion as a
random variable.
Now, try to include all possible sample proportions and review its probability density function.
61
When p is used to denote the given population proportion, the characteristics of the density function
of the sample proportion 𝑝̂ can be summarized as follows:
1. Mean of sample proportions is the same as the population proprotion.

𝐸(𝑝̂ ) = 𝑝
2. Variance of sample proportions:
𝑝(1 − 𝑝)
𝑉𝑎𝑟(𝑝̂ ) =
𝑛
3. Standard error, positive square root of the variance of the sample proportions:
𝑝(1 − 𝑝)
𝑆𝐸(𝑝̂ ) =
𝑛
Central Limit Theorem
When the sample size is reasonably large (n ≥ 30), the sample proportion distribution is well
approximated by a normal distribution. (Central Limit Theorem)
𝑝(1 − 𝑝)
𝑝̂ ~𝑁(𝑝, )
𝑛
for n > 30, np > 5, n(1-p) >5
62
Example 1:
For all year 1 students taking the course “General Statistics”, it is known that the population
proportion of male, p = 0.75.
For every 30 students are randomly assigned to a class, the distribution of proportion of male in a
class is as
Mean of class proportion of male = 0.75
. × .
Variance of class proportion of male = = 0.00625
. × .
Standard error of class proportion of male = = 0.0791
As the sample size is reasonable large, p~N(0.75, 0.0791 )
 Approximately there are about 75% male in each class, but it is not fixed. The proportion of
male in a class has a standard deviation of 7.91% around the true level of 75%.
Example 4:
Assume that among all customers of a jewelry shop, 40% customers are classified as “high
spending”. If random samples of size 70 are selected, and each time the sample proportion of
customers classified as “high spending” is calculated and denoted as 𝑝̂ , then
With p to denote the population proportion of “high spending” customers: p = 0.4,

𝑝̂ to denote the proportion of “high spending” customers in a sample of 70 customers
E(𝑝̂ ) = 0.4
. ( . )
Var(𝑝̂ ) = = 0.0034
. ( . )
SE(𝑝̂ ) = = 0.05855
As the sample size n = 70 is reasonably large, the sample proportion is normally distributed
𝑝̂ ~N(0.4, 0.05855 )
63
Chapter 4 Estimation
In the previous chapter, for a continuous random variable X with given population mean 𝜇 and
population standard deviation , the sample mean distribution for samples with sample size n
consists of the following characteristics:
(i) 𝐸(𝑋) = 𝜇
(ii) 𝑉𝑎𝑟(𝑋) =
(iii) SE(𝑋) =
√
(iv) 𝑋 is normally distributed either n ≥ 30 or X is originally normally distributed
Similarly, for any categorical random variable X with given population proportion in favour to one
particular option is denoted as p, the sample proportion distribution for samples with sample size n
consists of the following characteristics:
(i) 𝐸(𝑝̂ ) = 𝑝
( )
(ii) 𝑉𝑎𝑟(𝑝̂ ) =
( )
(iii) SE(𝑝̂ ) =
(iv) 𝑝̂ is normally distributed when n ≥ 30, np > 5, and n(1-p) >5
In this chapter, because of the above sampling distribution characteristics, we are going to study the
technique of estimating the unknown population mean (population proportion) by using the sample
mean (sample proportion) obtained from the survey.

 Be able to estimate the unknown population mean (with given population standard deviation) by
the calculation of point estimate, sampling error, and confidence interval estimate
 Be able to estimate the unknown population mean (with unknown population standard
deviation) by the calculation of point estimate, sampling error, and confidence interval estimate
 Be able to estimate the unknown population proportion by the calculation of point estimate,
sampling error, and confidence interval estimate
64
Example 1:
You are asked to review the lifetime of the light bulbs produced in a factory by reporting the
population mean lifetime. Lifetime is a continuous random variable. According to the information
provided by the factory, the population mean lifetime 𝜇 is unknown while the population standard
deviation is known to be 80 hours. How can we estimate the population mean lifetime by not doing
a census but only conducting a survey with sample size n = 50?
65
Estimation of the Population Mean (σ is known)

(i) Point Estimation
When we do estimation, the error (bias) in the estimation is calculated as the difference between the
estimate and the true population parameter. It is suggested to use the sample mean as the point
estimate of the population mean, as if we define error = 𝑥̅ − 𝜇, the average error would be equals to
0. The sample mean is said to be the unbiased point estimator of the population mean.
Example 1:
In order to estimate the population mean lifetime, a random sample of 50 light bulbs is selected. The
sample mean lifetime is calculated as 680 hours.
Solution:
The point estimate of the population mean lifetime is 680 hours.
The sample mean is a point estimate of the population mean. Definitely, a certain level of error in
the estimation is expected. The problem is: can the error in the estimation be calculated?
66
Estimation of the Population Mean (σ is known)

(ii) Sampling Error at 100(1 - α)% confidence
As the survey sample size is less than 100% of the population size, when we use the sample mean to
estimate the population mean, there must be a certain level of error. This error is named as sampling
error, which is defined as the difference between the estimate and the population parameter:
Error = 𝑥̅ − 𝜇
How large is this sampling error? We cannot derive the sampling error for a particular sample as the
population mean is unknown (you must remember this point). However, we can derive the sampling
error at a certain confidence level (some statisticians named this maximum sampling error as margin
of error), e.g. 95% confidence level. In order to derive the sampling error at a certain confidence
level, we must be familiar with the normal characteristics of the sampling distribution.
Example 1:
As you remember, we just mentioned the lifetime of the light bulb in a factory has the following
characteristics:
population mean 𝜇, which is unknown,
population standard deviation  = 80 hours
In order to estimate the population mean lifetime, a random sample of 50 light bulbs is selected. If
we don’t just focus on one particular sample, but consider we can repeatedly selecting many samples,
each with sample size n = 50, then the sample mean distribution is as:
80
𝑋~𝑁(, )
50
As 95% of z-scores lies between (-1.96, 1.96)
then, 95% of sample means lies between ( − 1.96 × ,  + 1.96 × )
√ √
Proof:
P(-1.96 < Z < 1.96) = P(-1.96 < < 1.96) = 𝑃(−1.96 × < 𝑋 − 𝜇 < 1.96 × )
√ √
√
= 𝑃(𝜇 − 1.96 × < 𝑋 < 𝜇 + 1.96 × )

√ √
67
0.025
0.025
-1.96 0 1.96 Z
𝜎 𝜎
𝜇 − 1.96 𝜇 𝜇 + 1.96 𝑋
√𝑛 √𝑛
As sampling error is defined as 𝑥̅ − 𝜇,

The sampling error at 95% confidence level is derived as 1.96 × , i. e. 22.175 hours.
√
That means there are 95% cases the error of the estimation is less than 22.175 hours.
Or in general, the sampling error at 100(1-α)% confidence level is

σ
z / ×
√n
We call 𝑧 / the critical value, while α 2 is the upper tail area in the normal curve.
Commonly used confidence level includes:

Confidence level Critical value z / 2
90%
95%
98%
99%
(The critical values can be easily found out from the standard normal table)
68
Referring to different confidence level, the sampling error would be:

Sampling error at 90% confidence = 1.645 × = 18.6111 hours
√

√

√

√
 There is a 95% chance that the difference between the calculated sample mean and the true
population mean is no more than 22.17 hours. Only 5% chance the difference is more than
22.17 hours.
69
Estimation of the Population Mean (σis known)

(iii) Confidence Interval Estimation
Combining the (i) point estimate and the (ii) sampling error, a confidence interval estimate can be
constructed as
(𝑥̅ − 𝑧 / × , 𝑥̅ + 𝑧 / × )
√ √
Let’s take a look of how to construct the 95% confidence interval estimate. As the confidence level
is set at 95%, the sampling error is calculated as 1.96 × . If repeated sampling is conducted and
√
each time an interval is calculated based on the formula

(𝑥̅ − 1.96 × , 𝑥̅ + 1.96 × )
√ √
0.025 𝜎 0.025
1.96
√𝑛
sample 1:
sample 2:
sample 3:
sample 4:
sample 5:
sample 6:
sample 7:
sample 8:
sample 9:
sample 10:
……
95% Confidence Intervals
We can see from the above diagram that most of the constructed intervals can cover the true
unknown population mean, only a few cannot. In fact, of all these constructed intervals, 95% can
cover the true unknown population mean.
Practically, if only one random sample is selected, there is 95% chance that the constructed
confidence interval can successfully include the unknown population mean.
70
Example 1:
As the sample mean lifetime of 50 light bulbs is 680 hours and the 95% sampling error is calculated
as 22.1749 hours, the 95% confidence interval estimate of the population mean lifetime is:
(680 − 1.96 × , 680 + 1.96 × ) = (657.8251, 702.1749) hours
√ √
As a summary,
The unbiased point estimate of the population mean is 𝑥̅
The sampling error with 100(1 - α)% confidence level 𝑧 / ×

√
The 100(1 - α)% confidence interval estimate of the population mean is

(𝑥̅ − 𝑧 / × , 𝑥̅ + 𝑧 / × )
√ √
Example 2:
The manager of a beauty counter wants to review the spending of the customers. The population
mean spending is unknown and the population standard deviation is $180. He estimates the
population mean spending by randomly select 60 customers. The sample mean spending of the
selected 60 customers is $880.
(a) What is the point estimate of the population mean?
(b) What is the sampling error at 90% confidence level?
(c) What is the 90% confidence interval estimate of the population mean?
Solution:
(a) The point estimate of the population mean spending is $880
(b) With σ = 180, n = 60,
the sampling error at 90% confidence level = 1.645 × = $38.2264
√
(c) The 90% confidence interval estimate of the population mean is:
(880 − 1.645 × , 880 + 1.645 × ) = $ (841.77, 918.23)
√ √
 The population mean spending is point estimated as $880 with a 90% sampling error of
$38.2264.
71
Estimation of the Population Mean (σis unknown)

In the previous session, we estimate the population mean by 3 steps:
(i) the point estimate: 𝑥̅
(ii) the sampling error with 100(1 - α)% confidence level: 𝑧 / ×
√
(iii) the 100(1 - α)% confidence interval estimate: (𝑥̅ − 𝑧 / × , 𝑥̅ + 𝑧 / × )

√ √
What if, practically the population standard deviation is unknown?
If the random variable X is normally distributed, the following statistics
𝑧= follows a standard normal distribution

/√
𝑡= follows a t-distribution with degrees of freedom n – 1

/√
for which  is the population standard deviation and s is the sample standard deviation, which is the
best estimator of the unknown population standard deviation.
What’s the difference between z and t transformation?

The calculation of t-value is almost the same as the standard score z-value, but the population
standard deviation is replaced by the sample standard deviation. The sample standard deviation is
reasonably close to the population standard deviation and is a variable, which is different from
sample to sample. As a result, the t-distribution looks similar to the standard normal distribution but
with “fatter” tail. The t-distributions with different sample size are different. In fact, the t-
distribution is getting closer to the standard normal distribution by increasing the sample size.
72
Comparison between standardized normal distribution and t-Distribution
Standardized
Normal
Bell-Shaped t (df = 13)

Symmetric
‘Fatter’ t (df = 5)
Tails
Z
t
0
T-distribution is very similar to the standard normal distribution, while t-distribution has relatively
fatter tails. When the degrees of freedom (degrees of freedom is defined as sample size minus 1, df =
n - 1) increases, the t-distribution is getting more similar to the standard normal distribution. The
reason behind it is a larger sample size makes the sample standard deviation a more accurate
estimator of the population standard deviation. It is well accepted that when the degrees of freedom
is greater than 29, the t-distribution is well approximated by the standard normal distribution.
Let’s use the standard normal table and t-table to look up the middle 95% data:
Standard normal distribution : -1.96 to 1.96
t-distribution with degrees of freedom 5 (sample size = 6): -2.571 to 2.571
t-distribution with degrees of freedom 13(sample size = 14): -2.160 to 2.160
t-distribution with degrees of freedom larger than 29: -1.96 to 1.96
73
The entries in Table II are values for which the area to their right under the t distribution with
given degrees of freedom (the gray area in the figure) is equal to  .
TABLE II VALUE OF t
d.f. t0.050 t0.025 t0.010 t0.005 d.f.
1 6.314 12.706 31.821 63.657 1

2 2.920 4.303 6.965 9.925 2
3 2.353 3.182 4.541 5.841 3
4 2.132 2.776 3.747 4.604 4
5 2.015 2.571 3.365 4.032 5
6 1.943 2.447 3.143 3.707 6

7 1.895 2.365 2.998 3.499 7
8 1.860 2.306 2.896 3.355 8
9 1.833 2.262 2.821 3.250 9
10 1.812 2.228 2.764 3.169 10
11 1.796 2.201 2.718 3.106 11

12 1.782 2.179 2.681 3.055 12
13 1.771 2.160 2.650 3.012 13
14 1.761 2.145 2.624 2.977 14
15 1.753 2.131 2.602 2.947 15
16 1.746 2.120 2.583 2.921 16

17 1.740 2.110 2.567 2.898 17
18 1.734 2.101 2.552 2.878 18
19 1.729 2.093 2.539 2.861 19
20 1.725 2.086 2.528 2.845 20
21 1.721 2.080 2.518 2.831 21

22 1.717 2.074 2.508 2.819 22
23 1.714 2.069 2.500 2.807 23
24 1.711 2.064 2.492 2.797 24
25 1.708 2.060 2.485 2.787 25
26 1.706 2.056 2.479 2.779 26

27 1.703 2.052 2.473 2.771 27
28 1.701 2.048 2.467 2.763 28
29 1.699 2.045 2.462 2.756 29
Inf. 1.645 1.960 2.326 2.576 Inf.
74
By using t-distribution as a replacement of the standard normal distribution, now we can estimate the
population mean with the 3 steps procedure:
The unbiased point estimate of the population mean is 𝑥̅
The sampling error with 100(1 - α)% confidence level is 𝑡 / ×

√
The 100(1 - α)% confidence interval estimate of the population mean is

(𝑥̅ − 𝑡 / × , 𝑥̅ + 𝑡 / × )
√ √
where 𝑡 / is the critical value with α/2 as upper tail area and n - 1 as the degrees of freedom.
Remark:
The t-distribution is developed with the assumption that the random variable X follows a normal
distribution. Practically, we can use the t-distribution to estimate the population mean when the
sample size is large enough (n > 30).
Example 3:
In order to estimate the population mean age of patients of a dentist, a random sample of 20 patients
is selected. The sample mean age is 37.4 and the sample standard deviation is 7.8. Assume that the
age of all patients follow a normal distribution.
(a) What is the point estimate of the population mean?
(c) What is the 90% confidence interval estimate of the population mean?
Solution:
With 𝑥̅ = 37.4, s = 1.8, n = 20, d.f. = 19, t19, 0.05 = 1.729
(a) point estimate of population mean age is 37.4
.
(b) 90% sampling error is 1.729 × = 3.0156
√
. .
(c) 90% C.I. of the population mean is (37.4 − 1.729 × , 37.4 + 1.729 × )
√ √
= (34.3844, 40.4156)
 The population mean age of all patients is point estimated as 37.4 with the 90% sampling
error of 3.0156.
75
Estimation of the Population Proportion

We are going to develop the 3 steps estimation of the population proportion by using the similar
approach as the estimation of the population mean. The first thing we need to do is to review the
sample proportion distribution as in Chapter 3:
𝑝(1 − 𝑝)
𝑝̂ ~𝑁(𝑝, )
𝑛
/2
/2
-z/2 0 z/2 Z
𝑝(1 − 𝑝) 𝑝(1 − 𝑝)
𝑝 − 𝑧/ 𝑝 𝑝 + 𝑧/ 𝑝̂
𝑛 𝑛
Review this example again:

For all year 1 students taking the course “General Statistics”, it is known that the population
proportion of male, p = 0.75. For every 30 students are randomly assigned to a class, the distribution
of proportion of male in a class is as
0.75(1 − 0.75)
𝑝̂ ~𝑁(0.75, )
30
When the population proportion of a population is unknown, we would use the sample proportion,
together with the normal distribution characteristics to do the estimation.
The unbiased point estimate of the population proportion is 𝑝̂
( )
The sampling error at 100(1 - α)% confidence level is 𝑧 /
The 100(1 - α)% confidence interval estimate of the population proportion is
𝑝̂ (1 − 𝑝̂ ) 𝑝̂ (1 − 𝑝̂ )
(𝑝̂ − 𝑧 / , 𝑝̂ + 𝑧 / )
𝑛 𝑛
76
Example 4:
A large insurance company is conducting a survey to reveal the proportion of employees have taken
the professional examination in the past two years. A survey involves 200 employees indicating 125
of them have taken the professional examination in the past two years.
(a) What is the point estimate of the population proportion of employees have taken the
examination?
(c) What is the 90% confidence interval estimate of the population proportion of employees have
taken the examination?
Solution:
With 𝑝̂ = = 0.625, 𝑛 = 200, 𝑧 . = 1.645
(a) The point estimate of the population proportion of employees have taken the examination =
0.625
. ( . )
(b) The sampling error at 90% confidence level = 1.645 = 0.0563
. ( . ) . ( . )
(c) The 90% C.I. of p = (0.625 − 1.645 , 0.625 + 1.645 )
= (0.5687, 0.6813)
 The population proportion of employees have taken the professional examination in the past two
years is point estimated as 62.5% with the 90% sampling error of 5.63%.
77
Chapter 5 Time Series

In this chapter, we will study time series data, which is a sequence of measurements of the same
variable collected over time. We would try to understand time series data by reading the time series
plot as well as calculating the moving average with seasonal variation to determine the direction of
the trend.

 Be able to describe pattern in time series plot
 Be able to calculate the trend by method of moving average
 Be able to calculate the seasonal variation
A time series is a sequence of measurements of the same variable collected over time. Most
commonly, the measurements are made at regular time interval, which can be daily, weekly,
monthly, quarterly, or yearly.
Component Factors of Time Series

A time series can be described by the following components:
 Trend
- A trend is an overall upward or downward movement that describes the series in a long-
term.
 Cyclical
- Cyclical effect describes the up-and-down swings or movements throughout the series.
 Seasonal
- The patterns of change in a time series within a year.
- These patterns tend to repeat themselves each year.
- For example, for quarterly data, this involves the effect of spring, summer, autumn and
winter on the time series.
 Random
- Some factors which affect the time series and occur at random.
- Unpredictable factors such as strikes, exceptionally bad weather, sudden shortage of
materials, etc.
78
Time Series Plot

The figure below shows a general graph of a quarterly data which covering several years.
Based on the graph, we can aware the trend, cyclical, seasonal changes of the variable across time.
1. The nearly straight line through the middle of the graph shows the overall direction of
movement of the variable with time. There is a general upward movement. This type of
variation is called the trend. The trend maintains the general direction for a long time.
2. The gently curving line which moves from side to side of the trend line represents cyclical
variation in the time series. This is an approximately periodic variation in the data values, and
the period, in as much as it can be discerned, will be of several years’ duration. The
amplitude of the cycles may also vary from one part of the time series to another.
3. The variation highlighted by the lines joining the plotted points together is the seasonal
variation. This is the variation from one part of the year to another. Seasonal variation is
usually regular in its period and its amplitude. In the example shown, we have larger values
in the second and third quarters of the year than in the first and fourth quarters.
4. A fourth kind of variation, which cannot be effectively represented on the graph, is called
random variation. Random variation is associated with random one-off occurrences such as
strikes or natural disasters, with sampling errors that occur in data collection, or with
rounding errors in processing and presentation of the data.
79
Our attention will be restricted to short time series, so the cyclical variation will not manifest itself
and is usually linked with the trend, so there we concentrate on three components — trend, seasonal
and random only.
Additive Model
With the availability of the observed past data, it is possible to build a time series model (e.g. moving
average model) to smooth out short-term frustrations and focuses on longer-term trends.
Let’s use t, s and r to denote the trend, seasonal variation and the random variation respectively.
Additive Model expresses the time series data as the sum of the three components with the
assumption that seasonal and random factors are independent of the trend.
y=t+s+r
y : time series data

t : trend
s : seasonal variation
r : random variation
Besides additive model, there are also multiplicative model and other models. In this course, we start
from the basic and assume the time series data following the additive model.
Analysis Approach
We will follow the procedures below to analyse the time series data:
1. Find the trend by moving average
2. Find the seasonal variation
80
1. Finding the trend by method of Moving Average

Model of Moving Average obtains a time series trend by calculating a set of averages, each one
corresponding to a trend t value for a time point of the series. For example, when calculating a set of
moving averages of period of p = 5,
The first of the five-term moving average = , with this moving average is centered on
the middle value, that is the 3rd position.
The second of the five-term moving average = , which is centered on the 4th position.
Remark:
Notice that the average is in line with the middle value of the set being worked on. Be aware that
there are no trend values corresponding to the first and last original values.
81
To demonstrate the technique, a set of moving averages of period of t = 5 has been calculated below
for a set of values.
Example 1:
Below is the number of applications received in the receptionist counter in a driving school during
the past two weeks (the counter opens on weekdays only).
Week 1 Week 2
Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri
Number of applications 12 10 11 11 9 11 10 10 11 10
Find the trend by calculating the 5-period moving average.
Solution:
The first average, located at the 3rd position is formed by .
The second average, located at the 4th position is formed by .
The third average, located at the 5th position is formed by .
The fourth average, located at the 6th position is formed by .
The fifth average, located at the 7th position is formed by .
The sixth average, located at the 8th position is formed by .
82
The Centered Moving Average

When calculating moving averages with an even period (i.e. 4, 6 or 8), the resulting moving average
would seem to have to be placed in between two corresponding time points, so centered moving
average are being calculated.
Example 2:
The following data shows the sales revenue ($100,000) of a company:
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4

1 79 48 68 107
2 97 66 85 134
3 113 91 100 148
4 136 105 125 174
(a) Describe the trend and seasonal variation by reading the time series plot.
(b) Find the trend by calculating the 4-period moving average.
83
Solution:
(a) The sales revenue has a general upward trend. Within a year, a better performance is
observed in Quarter 4, followed by Quarter 1. The revenue is relatively weaker in Quarter 3
and the worst is observed in Quarter 2.
(b) The first average, located at the 2.5th position is formed by = 75.50.
The second average, located at the 3.5th position is formed by = 80.00.
The third average, located at the 4.5th position is formed by = 84.50.
Similarly for the others.
Then adjust the position by taking average so that

.
trend of the 3rd position is = 77.75
.
trend of the 4th position is = 82.25
Similarly for the others.
Quarter Original 4-period Centered

data Moving Moving Average
y Average t
Year 1 1 79
2 48
75.50
3 68 77.75
80.00
4 107 82.25
84.50
Year 2 1 97 86.625
88.75
2 66 92.125
95.50
3 85 97.5
99.50
4 134 102.625
105.75
Year 3 1 113 107.625
109.50
2 91 111.25
113.00
3 100 115.875
118.75
4 148 120.5
122.25
Year 4 1 136 125.375
128.50
2 105 131.75
135.00
3 125
4 174
84
2. Finding the seasonal variation

Seasonal variation is present in many time series. Winter sports wear will to sold well in autumn and
winter and badly in spring and summer; supermarket sales are higher at the end of the week than at
the beginning; unemployment figures are always seasonally inflated in early summer owing to school
leavers.
When values are obtained to describe seasonal variation, they are sometimes known as seasonal
values of factors and are expressed as deviations (i.e. ‘+’ or ‘–’) from the underlying trend. They
show on average, by how a particular season will tend to increase or decrease the underlying trend.
Procedures for Finding the Seasonal Variation:

Step 1: Calculate, for each time point, the value of y – t.
(the difference between original value and the trend)
Step 2: For each season in turn, find the average of the y – t values.
Step 3: If the total of the average differs from 0, using the following formula to adjust it so that their
total is zero.
– Total of the average
Adjustment =
Number of seasons
For each season,
Seasonal Factor = Average of 𝑦 – 𝑡 + Adjustment
Example 2:
The following data shows the sales revenue of a small company:
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4

1 79 48 68 107
2 97 66 85 134
3 113 91 100 148
4 136 105 125 174
Continue with the previous calculation, find the seasonal variation for Quarter 1, 2, 3, 4 respectively.
85
Solution:
Quarter Original 4-period Centered y–t

data Moving Moving Average
y Average t
Year 1 1 79
2 48
75.50
3 68 77.75 -9.75
80.00
4 107 82.25 24.75
84.50
Year 2 1 97 86.625 10.375
88.75
2 66 92.125 -26.125
95.50
3 85 97.5 -12.5
99.50
4 134 102.625 31.375
105.75
Year 3 1 113 107.625 5.375
109.50
2 91 111.25 -20.25
113.00
3 100 115.875 -15.875
118.75
4 148 120.5 27.5
122.25
Year 4 1 136 125.375 10.625
128.50
2 105 131.75 -26.75
135.00
3 125
4 174
Quarter Average of y – t Adjustment Seasonal

Factor
1 10.375 + 5.375 + 10.625 0.1042 8.8959
= 8.7917
3
2 −26.125 − 20.25 − 26.75 0.1042 -24.2708
= −24.375
3
3 −9.75 − 12.5 − 15.875 0.1042 -12.6041
= −12.7083
3
4 24.75 + 31.375 + 27.5 0.1042 27.9792
= 27.875
3
Checking if the total of average y – t is zero:

8.7917 + (-24.375) + (-12.7083) + 27.875 = –0.4166 ≠ 0
( . )
Adjustment = = 0.1042
86
Remarks:
1. The seasonal factors parallel our previous observation that the revenue is relatively higher in
quarter 4, followed by quarter 1, then quarter 3, while the performance in quarter 2 is the
worst.
2. A wide range of time series analysis would be continued, which includes the forecasting of
the coming data, making decision in stock market, etc.
87
Chapter 6 Price Index

In this chapter, we will study the calculation and interpretation of price index. The price index is
used to measure changes in the prices of a basket of goods and services over time. There are many
types and methods to calculate price index. In this chapter, we would study the Laspeyres price
index, Paasche price index and Consumer Price Index.

 Be able to calculate Laspeyres price index
 Be able to calculate Paasche price index
 Be able to calculate Consumer Price Index
 Be able to interpret the finding
Laspeyres Price Index

Laspeyres Price Index is a price index used to reflect the change of prices of a basket of goods
relative to the selected base year. When calculating the index number for a year, the prices of goods
are updated while the quantities of goods remain the same over the years.
∑
PIL =∑ × 100
p0 : price in base year

pt : price in year t
q0 : quantity in base year
The index number in the base year is 100. A higher price level results in an index greater than 100
while a lower price level results in an index lower than 100.
Change in price level

The change in price level from the base year to current year can be easily compiled by the following
formula:
(PI- 100) %
88
Example 1
By using the information in the table below, calculate and interpret the Laspeyre price index for year
2012 and 2013 by using year 2011 as base year.
Goods 2011 2012 2013

Price Quantity Price Quantity Price Quantity
Coffee $20.5 13.1 $23.9 12.8 $25.6 11.2
Sugar $5.8 17.1 $6.2 17.7 $7.1 16.2
Milk $10.2 48.9 $12.1 45.4 $13.3 39.2
Solution
. ( . ) . ( . ) . ( . )
PIL (2011) = × 100 = 100
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIL (2012) = × 100 = 116.6519
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIL (2013) = × 100 = 127.7700
. ( . ) . ( . ) . ( . )
The price level increased by 16.65% from 2011 to 2012 and increased by 27.77% from 2011 to 2013.
Paasche Price Index

Paasche Price Index is a price index used to reflect the change of prices and quantities of a basket of
goods relative to the selected base year. When calculating the index number for a year, we need to
have updated information for both the prices and quantities of goods.
∑
PIP = ∑ × 100
p0 : price in base year

pt : price in year t
qt : quantity in year t
The index number in the base year is 100. A higher price level results in an index greater than 100
while a lower price level results in an index lower than 100.
89
Example 1
By using the information in the table below, calculate and interpret the Paasche price index for year
2012 and 2013 by using year 2011 as base year.
Goods 2011 2012 2013

Price Quantity Price Quantity Price Quantity
Coffee $20.5 13.1 $23.9 12.8 $25.6 11.2
Sugar $5.8 17.1 $6.2 17.7 $7.1 16.2
Milk $10.2 48.9 $12.1 45.4 $13.3 39.2
Solution
. ( . ) . ( . ) . ( . )
PIP (2011) = × 100 = 100
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIP (2012) = × 100 = 116.5262
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIP (2013) = × 100 = 127.6058
. ( . ) . ( . ) . ( . )
The price level increased by 16.53% from 2011 to 2012 and increased by 27.61% from 2011 to 2013.
Remark:
The Laspeyres Price Index and Paasche Price Index are very similar, with the main difference is the
quantity used. In Laspeyres Price Index, the base time quantities are used; in Paasche Price Index,
the current time quantities are used.
90
Consumer Price Index

Economists and Statisticians use different kinds of price indices to explain inflation / deflation and
/or cost of living. One commonly used Statistics is the Consumer Price Index (CPI), which is
commonly used as indicator of the inflation / deflation affecting consumers.
CPI is a measurement to reflect changes in the price levels of consumer goods and services generally
purchased by households. With the quantity and quality of the items in it are fixed, CPI provides
measures of the relative change over time in the total cost of a specified basket of consumer goods
and services.
When compiling the CPI, two types of data are required:

1. expenditure weights of consumer goods and services; and
2. price movements of consumer goods and services.
CPI = ∑ (𝑤 × 𝐼 )
where w is the expenditure weight of the item and
I is the item index, which is the × 100
91
In Hong Kong, the Census and Statistics Department compiles separate CPI series relating to
households in different expenditure ranges. CPI(A) relates to households which are in the relatively
low expenditure range; CPI(B) relates to households which are in the medium expenditure range;
CPI(C) relates to households which are in the relatively high expenditure range. Composite CPI
relates to all of the above households taken together. The reason of compiling different CPI series
because the expenditure patterns of households in different expenditure ranges vary as shown in the
graph.
For more information about Consumer Price Index in Hong Kong, you can refer to Introduction to
the Consumer Price Index: https://www.statistics.gov.hk/pub/B8XX0021.pdf
92
Example 2
Referring to the prices and expenditure weights indicated in the following table:
Item Expenditure weight Price in 2010 ($) Price in 2012 ($) Price in 2014 ($)
A 30% 50 55 58
B 60% 48 50.4 52.8
C 10% 30 39 42
(a) Compile the CPIs for year 2012 and 2014 by using 2010 as the base year
(b) Find the percentage change in prices from 2010 to 2012.
(c) Find the percentage change in prices from 2012 to 2014.
Solution
.
(a) CPI in 2012 = 0.3 × 100 + 0.6 × 100 + 0.1 × 100 = 109
.
CPI in 2014 = 0.3 × 100 + 0.6 × 100 + 0.1 × 100 = 114.8
(b) Percentage change in prices from 2010 to 2012 = × 100% = 9%
.
(c) Percentage change in prices from 2012 to 2014 = × 100% = 5.32%
93
Example 3
The table below shows the Consumer Price Indices for group A (relatively low expenditure range),
group B (medium expenditure range) and group C (relatively high expenditure range) in year 2017,
2018 and 2019.
CPI (A) CPI (B) CPI (C)

Year Index Index Index
2017 105.1 104.3 104.2
2018 107.9 106.7 106.5
2019 111.5 109.6 109.3
(a) Find the percentage change in prices from 2017 to 2018 for CPI(A), CPI(B), and CPI(C)
respectively.
(b) Find the percentage change in prices from 2018 to 2019 for CPI(A), CPI(B), and CPI(C)
respectively.
(c) Which household group experienced the highest percentage rise in prices from 2018 to 2019?
Solution
(a) Percentage change in prices from 2017 to 2018 for
. .
CPI(A) = × 100% = 2.66%
.
. .
CPI(B) = × 100% = 2.30%
.
. .
CPI(C) = × 100% = 2.21%
.
(b) Percentage change in prices from 2018 to 2019 for

. .
CPI(A) = × 100% = 3.34%
.
. .
CPI(B) = × 100% = 2.72%
.
. .
CPI(C) = × 100% = 2.63%
.
(c) Households in relatively low expenditure range experienced the highest percentage rise in
prices (3.34%) from 2018 to 2019.
94
95
96

ES Lecture Note

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ES Lecture Note

Uploaded by

Copyright:

Available Formats

ES 2023-24

Intended Learning Outcomes and Grade Descriptors

SOUL account: Course link: CCMA4008 - ____________________

Class link: CCMA4008 - ____________________

3. Correct your answer to 4 decimal places when necessary.

Chapter 1 Introduction to Survey and Statistics

Major learning objectives of this Chapter:

There are many things the manager has to consider:

Regarding to this survey:

Difference between Census and Survey

Key Concepts in Statistics

Types of Survey Sampling Methods

Employee Name of employee Department Contact number

Method 1: Simple Random Sampling

Advantage: Easy to operate

Method 2: Systematic Sampling

Advantage: Time saving

Method 3: Stratified Sampling

Advantage: Can avoid unbalance selection

Discrete Continuous Nominal Ordinal

Numerical variable: Data consists of numbers that represent counts or measurements

Gender Number of previous Highest education Total working

Measures of Central Tendency / Location

What is the difference between population mean and sample mean?

25, 28, 29, …, 95, 95, 96, 97, 98, 99

Mode = most frequently observed data

(c) pth percentile (1  p  99)

(a) If i is not an integer, round up i. The pth percentile is the value

10th percentile = 345 items (i = 15 = 1.5 ↑ 2)

40th percentile = =574.5 items (i = 15 = 6)

90th percentile = 904 items (i = 15 = 13.5 ↑ 14)

Range = Maximum data – minimum data

(b) Interquartile Range

Q1 = 25th percentile = 420 items (i = 15 = 3.75 ↑ 4)

Q3 = 75th percentile = 890 items (i = 15 = 11.25 ↑ 12)

IQR = 890 – 420 = 470 items

(c) Variance and Standard Deviation

Sample standard deviation:

Below are two simple ways to identify skewness.

By comparison of Q1, Q2, and Q3 By reviewing the density function

(b) Left-skewed (negative-skewed)

(c) Right-skewed (positive-skewed)

Q1 = 25th percentile = 420 items (i = 15 = 3.75 ↑ 4)

Q2 = 50th percentile = 668 items (i = 15 = 7.5 ↑ 8)

Q3 = 75th percentile = 890 items (i = 15 = 11.25 ↑ 12)

As Q2 – Q1 = 248 (668-420) > Q3 – Q2 = 222 (890-668), the distribution is left skewed.

Simple summary of a variable

Think about the following applications, how can we express Y in terms of X?

Weight of a boy (in kg) Weight of a boy (in pound)

Original price Discounted price with 10% off

Original monthly salary Adjusted monthly salary with 4% increment

It would be convenience to generate the summary by applying the relationship directly:

Summary statistics X Y = 20000 + 30X

Calculator Usage on Descriptive Statistics

1. Change to “SD” mode

2. Clear previous data

4. Calculate descriptive statistics

7. Frequency (more than 1 observation)

8. Return to normal mode

Revision of Basic Probability before Chapter 2

Sample Space and Event

Chapter 2 Probability Distributions

Major learning objectives of this Chapter: