Professional Documents
Culture Documents
Yabebal Ayalew
Statistics Department, Addis Ababa University
1 Introduction
— Definition and Classification of Statistics
— Stages in Statistical Investigation
— Definition of Some Basic Terms
— Application, uses and limitations of statistics
— Types of variables and measurement scales
2 Method of Data Collection and Presentation
— Method of Data Collection
— Source and Type of Data
— Methods of Data Presentation
• Frequency Distribution
• Diagrammatic and/or Graphical Presentation of Data
• In the modern world of computers and information technology, the importance of statistics is very
well recognized by all the disciplines
• Statistics has originated as a science of statehood and found applications slowly and steadily in
Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on
4 Statistics Department Probability and Statistics 22.2.2022
Introduction
Definition of Statistics
• In the meantime, there is no other human walk of life, where statistics cannot be applied. Hence, we
are constantly being bombarded with statistics and statistical information
• The word Statistics and Statistical are all derived from Latin word status which means a political
state∗
— In the olden days, the application of statistics was limited to state affairs
— In the 19th century, statistics as a field has included data analysis as its major component
• From time to time, the application of statistics has been grown and its definition has also been
changed
• The American Heritage Dictionary defines statistics as:
The mathematics of collection, organization and interpretation of numerical data, especially the
analyses of population characteristics by inference from sampling.
∗
A new Latin word statisticum collegium to mean council of state and Italian word statista to mean
statesman or politician.
5 Statistics Department Probability and Statistics 22.2.2022
Introduction
Definition of Statistics
A branch of mathematics dealing with the collection, analyses, interpretation, and presentation of
masses of numerical data.
• The former American Statistical Association president Jon Kettering define statistics as:
...the science of learning from data ... It presents exciting opportunities for those who work as
professional statisticians. Statistics is essential for the proper running of government, central to
decision making in industry and a core component of modern educational curricula at all level.
• Despite these, the word statistics can have two different senses while we use it as plural and singular
noun.
• Population is a collection of objects possessing the same characteristics that can be studied
— Population is defined with respect to time and space
— Example: Economics students of Addis Ababa University
A 2nd year Economics students of Addis Ababa University registered for 2016/17 AY
— By this definition, population is not directly referring human being. It can be chair, ocean,
bacteria etc
• Sample is a small portion of the population
— Small in terms of size
— Should be highly representative
— Saves time, money and have greater accuracy
• A parameter is a number that summarizes some aspect of the population as a whole. A statistic is
a number computed from the sample data.
• Statistics has application in every scientific fields. Some of the uses of Statistics are:
1 Statistics presents fact in the form of numerical data
2 It condenses and summarizes a mass of data into a few presentable and precise figures
3 It facilitates comparison of data
4 It helps to formulating and testing hypothesis
5 It helps to predicting future trend
6 It helps to formulate polices
Limitations of Statistics
• Statistics is not suitable to the study of qualitative phenomenon
— Unless we have indirect method to quantify those phenomenon, statistics is useless
• Statistics does not study individuals
— Statistics does not give any specific importance to the individual items; in fact it deals with an
aggregate of objects
†
An approximate answer to the right question is worth a great deal more than a precise answer to the wrong
question.—The first golden rule of applied mathematics.
13 Statistics Department Probability and Statistics 22.2.2022
Introduction
Stages in Statistical Investigation
4 Presentation of data: The purpose of putting the organized data in graphs, charts and tables is
two-fold
— First, it is a visual way to look at the data and see what happened and make interpretations
— Second, it is usually the best way to show the data to others
5 Analyses of data
— It is the process of looking at and summarizing data with the intent to extract useful
information and develop conclusions
— In this stage different types of inferential statistical methods will be applied
6 Interpretation of results‡
— Interpretation means drawing valid conclusions from data which form the basis of decision
making
— Correct interpretation requires a high degree of skill and experience
‡
Analyses and interpretation of data are the two sides of the same coin
14 Statistics Department Probability and Statistics 22.2.2022
Introduction
Scale of Measurement
• Variable is an attribute of physical and abstract system whose value varies while under consideration.
It can be classified as quantitative and qualitative
• Quantitative variables are those variables whose values are naturally expressed by numbers. e.g.
weight, salary etc
• Quantitative variable can be either discrete or continuous
— Discrete variable is a variable whose values have predefined gap. We don’t need to have
measuring device to know the next possible value. e.g. students number in class
— Continuous variable is a variable whose values don’t have predefined gap. We need to have
measuring device to know the next possible value of a variable
• Qualitative variable§ is a variable whose values are not naturally expressed by number. e.g. gender,
religion, political affiliation, military rank
— The four basic mathematical operation should not be applied
— They can be expressed by pseudo-numbers. e.g. Bus number
§
It can also called categorical variable
15 Statistics Department Probability and Statistics 22.2.2022
Introduction
Scale of Measurement
• According to Wikipedia¶ Level of measurement or scale of measure is a classification that describes
the nature of information within the values assigned to variables
1 Nominal
2 Ordinal
3 Interval
4 Ratio
• The first two are reserved for categorical variables and the last two reserved for quantitative variables
• Nominal Scale of Measurement
— The variable values are not ordered
— The permissible mathematical operation is count
— Example: Gender, religion affiliation, and race are typical examples
• Ordinal Scale of Measurement
— Rank order is possible
— Don’t know the real difference between categories
— Count, >, <, ≥, and ≤ are permissible operations
— Example: Thesis grade, military rank, academic rank etc
¶
https://en.wikipedia.org/wiki/Level_of_measurement
16 Statistics Department Probability and Statistics 22.2.2022
Introduction
Scale of Measurement
• The word census was derived from a Latin verb censere which means—contrary to what’s
expected—not to count but rather to assess, or in a term closer to the world of statistics, to estimate ‖
Census
It is a complete process of extracting information from each element of a population. Population
Census is a complete process of collection, receipt, assessment, analysis, publication and distribution of
demographic, economic and social data, which relate, at a given moment in time, to all the residents of
a country or of a well-defined partial geographic area
• One typical example of census in our country is population and housing census∗∗
— It is conducted every 10 years (Eth.Constitution A.103(4))
— Ethiopia is able to conduct three censuses (1984, 1994, 2007)
— The 4th census was scheduled for 2017 but not conducted yet
‖
Americana Corporation of Canada. 1951. The Encyclopedia Americana. Montreal: Americana Corp. of
Canada.
∗∗
Countries like Japan, Canada and Australia conduct population census every 5 years
19 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Collection—Census Vs Sample Survey
• Census activities can be divided into three main stages—planning, data collection and producing the
results††
• Planing—The end justifies the means
— The purpose and methodology of the census are determined
— Main strategical decisions are made
— Intermediate goals are defined
— Development of methods and means designed to achieved the goal of census
• Data Collection—The most intensive stage
— Collecting data by direct contact with residences
— Requires complex logistic preparation
— Public campaign to enlist the cooperation of the public and high level skill in the field operation
††
http://www.cbs.gov.il/census/census/pnimi_sub_page_e.html?id_topic=1&id_subtopic=1
22 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Collection—Census Vs Sample Survey
Advantages of Census
• Benchmark data may be obtained for Disadvantages of Census
future studies • It is costly in terms of money and time
• Detailed information about small • Not possible when the population in
sub-groups within the population is more infinite
likely to be available • Its reliability is compromised in areas
• Provides a true measure of the with low literacy
population (no sampling error)
• Reading Assignment: An important aspect of census enumerations is determining which individuals
can be counted. Broadly, three definitions can be used: de facto residence; de jure residence; and
permanent residence.
• Sample survey is a study that obtains data from a subset of a population, in order to estimate
population attributes.
• The definition has made it clear that the sample survey is a study of sample elements with the
intention of estimating population parameters
• Sample survey is imminent when
— The population is infinite
— The budget is small to consider all elements in the population
— You think of updating census results
— The population is homogeneous
• In order to conduct sample survey, we have to select elements from a population as sample element
• The process of selecting sample elements from a population is called sampling. There are two types
of sampling techniques—probability and non-probability sampling techniques
• Exercise: DKT Ethiopia has decided to conduct survey about attitude towards contraception. The
target population for this study is dwellers in Addis Ababa whose age has crossed 15. What sampling
technique is appropriate? Why?
• Systematic random sampling: It is a means of selecting every kth element in the population as
sample element
— First, you need to have a complete list of elements in the population
— Then you have to decide k such that k > 1
N
k=
n
round it to the nearest integer
• Bias in probability sampling: The sampling error mostly caused by sampling bias
1 Non-response bias: Occurs when the respondents fail to answer the questions in the survey
2 Response bias: Occurs when respondents have provided inaccurate answer
3 Selection bias: Occurs when some elements in the population has got higher chance of
selection
4 Self-Selection bias: A type of bias in which individuals voluntarily select themselves into a
group, thereby potentially biasing the response of that group
5 Coverage bias: Occurs when population elements don’t appear in the sampling frame
• Purposive Sampling: The elements are selected by the judgment of the researcher
— It is also know as judgment, selective or subjective sampling
• Quota Sampling: The population is segmented into groups just as in stratified sampling. Then
judgment is used to select elements in each segment
— Quota sampling is the non-probability version of stratified sampling
• The disadvantage of non-probability sampling is that since the sample is not representative, it is hard
to generalize the results
• Once we have decided to conduct either census or sample survey, we have to think about the possible
ways of collecting data
— Having good data is very essential for reaching at sound conclusion
• We have two methods of data collection
— Primary Data Collection Methods: questionnaire, interview, observation, focus group
discussion, etc
— Secondary Data Collection Methods
• Primary data is a data that has been collected for the first time by the researcher. i.e., First hand
information
• Secondary data is a data that has been collected and analyzed by somebody and is given for third
party for further analysis
• All methods of primary data collection are depending on a set of questions. i.e., need questions to be
formulated first
• Questionnaire is a set of printed or written questions with a choice of answers, devised for the
purposes of a survey or statistical study
• Questionnaire can contain both close-ended and open-ended questions
— An open-ended question is the one in which you do not provide any standard answers to
choose from
— A closed-ended question is the one in which you provide the response categories, and the
respondent just chooses one
• Having a good questionnaire has an impact on the quality of data collection
• A planned, thoughtful process based on systematic principles
Advantage
• Faster, less expensive, and less activities (i.e., field trip)
Disadvantage
• Not easily available, not adequate
• May not meet the needs of researcher
• Outdated information and inaccurate or bias
37 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Frequency Distribution
• Frequency distribution is the organization of raw data in table form, using classes and frequencies
• Objectives of Frequency Distribution
— To organize the data in a meaningful, intelligible way
— To enable the reader to determine the nature or shape of the distribution
— To facilitate computational procedures for measures of average and spread
— To enable the researcher to draw charts and graphs for the presentation of data
— To enable the reader to make comparisons between different data set
• Based on the type and nature of the variable, frequency distribution can be categorized as categorical
frequency distribution, ungroup and grouped frequency distribution
• Categorical Frequency Distribution is used to tabulate categorical variables
— The major components are class, tally and frequency and percentage
f
%= 100%
n
where f is frequency and n is total number of values
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
• Exercise: Exercise: A survey was taken on how much trust people place in the information they read
on the Internet. Construct a categorical frequency distribution for the data. A trust in everything
they read, M trust in most of what they read, H trust in about one-half of what they read, S trust in
a small portion of what they read
M M M A H M S M H M
S M M M M A M M A M
M M H M M M H M H M
A M M M H M M M M M
• Cumulative frequency tells us how many observation are accumulated up to and including a
particular distinct value
— Two varieties of cumulative frequencies—less than type and more than type
• Exercise: The following data represent the number of hours of TV viewing per week (X) for 75
people
0 1 5 4 2 3 1 0 1 5 2 3 1 4 5
2 1 2 2 3 4 6 5 7 1 2 0 1 0 1
5 4 2 1 4 5 6 3 2 1 5 8 6 3 1
6 3 0 1 0 2 5 4 3 4 1 4 0 2 5
0 2 4 8 3 5 4 7 5 0 1 2 3 1 4
57 61 57 57 58 57 61 54 68 56 61
51 49 64 50 48 65 52 56 46 52 69
54 49 51 47 55 55 54 42 51 64 46
56 55 51 54 51 60 62 43 55 54 47
Range = 69 − 42 = 27
LCLi = LCLi−1 + W, i = 1, 2, · · · , K
• Unit of measurement is the absolute difference between one observation in the data set and some
other value that is supposed to come next
— Suppose the data set is full of integer values, then the unit of measurement becomes one
— Suppose the data set is 14.2, 12, 26. Then the unit of measurement is 14.3 − 14.2 = 0.01
U CL1 U CL2 U CL3 U CL4 U CL5 U CL6 U CL7
45 49 53 57 61 65 69
U CLi+1 = U CLi + W, i = 1, 2, · · · , K − 1
• Note that
W = |LCLi − LCLi+1 |
= |U CLi − U CLi+1 |
= U CLi − LCLi + U
For instance,
U CB1 = U CL1 + U/2 = 45 + 0.5 = 45.5
49 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Frequency Distribution
Class boundaries
41.5 − 45.5
45.5 − 49.5
49.5 − 53.5
53.5 − 57.5
57.5 − 61.5
61.5 − 65.5
65.5 − 69.5
W = U CBi − LCBi , i = 1, 2, · · · , K
For instance,
42 + 45 41.5 + 45.5
CM1 = = = 43.5
2 2
CM2 = CM1 + W = 43.5 + 4 = 47.5
Note that class mark is the representative of each class
Class Limit Frequency Less than Cumulative Frequency More than Cumulative Frequency
42 − 45 2 2 44
46 − 49 7 9 42
50 − 53 8 17 35
54 − 57 16 33 27
58 − 61 5 38 11
62 − 65 4 42 6
66 − 69 2 44 2
112 100 127 120 134 118 105 110 109 112 110 118 117 116 118 122
107 112 114 115 118 117 118 122 106 110 114 114 105 109 116 108
110 121 113 120 119 111 104 111 120 113 120 117 105 110 118 112
114 114
54 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Diagrammatic Presentation
• Although presenting tables of numbers can be very informative, they can lack visual impact
— Delivers message instantly
— Summarizing the key features of the data, and representing it as a picture
• Pie chart is a type of graph in which a circle is divided into sectors that each represent a proportion
of the whole
— It is best used to present the proportions of a sample
— It is most useful where one or two results dominate the findings
— It can represent data summary as actual numbers or percentages
— Do not use when there are a large number of categories
• Example: Consider the following data of employment category of 474 employees of a company.
Manager
Category Frequency Percent
Custodial
Clerical 363 76.6
Custodial 27 5.7 Lorem ipsum
Manager 84 17.7 Clerical
Total 474 100
https://data.worldbank.org/indicator/ST.INT.ARVL?end=2019&locations=ET&start=2015
58 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Diagrammatic Presentation
• Keep in mind that one tourist icon is equal with 50,000 tourists
• Bar Graph: a graph with rectangular bars. Each bar’s length or height is proportional to the bars’
represented values
— Simple bar graph: This type of graph is appropriate to represent one variable
— Cluster bar graph: Used to present two categorical variables. The bars are placed adjacent to
each other in each category of the variable on X axis
— Stacked bar graph: It is used to present two categorical variables. The graph looks like simple
bar but partitioned into components of the second variable
• Example: Consider the following data presented in the table below. Draw simple, cluster, and
stacked bar graphs
Employment Category
Gender Clerical Custodial Manager Total
Male 157 27 74 258
Female 206 0 10 216
Total 363 27 84 474
400
300
350
250
300
200
250
200 150
150 100
100
50
50
0
0 Male Female
Clerical Custodial Manager
50 50
0 0
Clerical Custodial Manager Male Female
8 13064.15 35000
12 13241.87
Average salary
14 15625.00 30000
15 15610.60 25000
16 22338.47
20000
17 26904.55
18 32240.00 15000
19 34764.07
10000
20 36240.00 8 12 14 15 16 17 18 19 20 21
Education in Years
21 37500.00
64 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Diagrammatic Presentation
• Frequency Polygon: It is a line graph where the X axis is the class mark and the Y axis is either
frequency or relative frequency
— The graph touches the X axis at the beginning and at the end
• Example: Consider the following grouped frequency distribution and draw frequency polygon.
20 0.35
0.30
15
Relative Frequency
0.25
Frequency
0.20
10
0.15
0.10
5
0.05
0.00
0 39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5 71.5
39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5 71.5
Class Mark Class Mark
• Ogive (oh-jive), sometimes called a cumulative frequency polygon,is the line graph where the X axis
represents the class boundaries and the Y axis represents either less than type or more than type
cumulative frequency
• Example: Consider the previous grouped frequency distribution and draw Ogive
66 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Diagrammatic Presentation
Class Limit Class Boundary Class Mark Frequency Relative Frequency lcf mcf
42 − 45 41.5 − 45.5 43.5 2 0.045 2 44
46 − 49 45.5 − 49.5 47.5 7 0.159 9 42
50 − 53 49.5 − 53.5 51.5 8 0.182 17 35
54 − 57 53.5 − 57.5 55.5 16 0.364 33 27
58 − 61 57.5 − 61.5 59.5 5 0.114 38 11
62 − 65 61.5 − 65.5 63.5 4 0.091 42 6
66 − 69 65.5 − 69.5 67.5 2 0.045 44 2
Table: Grouped frequency distribution
40
Cumulative frequency
30
20
10
0
41.5 45.5 49.5 53.5 57.5 61.5 65.5 69.5
Class Boundary
68 Statistics Department Probability and Statistics 22.2.2022
Methods of Data Collection and Presentation
Methods of Data Presentation—Diagrammatic Presentation
• Histogram: It is a special type of bar graphs where there is no detachments between bars
— The X axis is class boundary and Y axis is either frequency or relative frequency
• Example: Consider the grouped frequency distribution that was used for ogive example and draw
histogram.
20
15
Frequency
10
0
41.5 45.5 49.5 53.5 57.5 61.5 65.5 69.5
Class Boundary
77 34 45 49 67 87 45 44 98 89
55 65 67 87 99 66 77 45 84 40
50 69 80 68 55 65 43 68 66 87
Stem Leaf
3 4
4 0345559
5 055
6 556677889
7 77
8 047779
9 89
• Variables are denoted by capital letters. Suppose X is a variable having n observations. i.e.,
x1 , x2 , · · · , xn
• Let xi represent the ith observation of variable X. i is called index or subscript. The sum of all
observations of variable X is denoted by Greek capital letter sigma (Σ)
n
X
x1 + x2 + · · · + xn = xi
i=1
• Note: When no confusion can result, we often denote the sum of all observations of X simply by
X X X
x= xi = xi
i
74 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measures of Central Tendency—Introduction
• ni=1 xi yi = x1 y1 + x2 y2 + · · · + xn yn
P
• Suppose X has n observations and let α be a non-zero arbitrary constant number, then
n
X
αxi = αx1 + αx2 + · · · + αxn
i=1
= α(x1 + x2 + · · · + xn )
n
X
= α xi
i=1
• Note: If a particular measure of central tendency or dispersion has failed to show some of these
characteristics, the failure will be considered as a disadvantage
• Simple arithmetic mean is defined as the sum of all observations divided by the total number of
observation.
• Simple arithmetic mean is the most familiar measure of central tendency. It is the first measure of
central tendency (MCT) that comes into our mind when we think of average
• The simple arithmetic mean which is computed from the sample is denoted by bar over the head of
the variable
— If the variable is denoted by X, then the mean will be denoted as X̄ (pronounced X-bar)
n
1X 1
X̄ = xi = (x1 + x2 + · · · + xn )
n i=1 n
where N is total number of observations in the population. Note that X̄ is sample statistic and µ is
population parameter
• Example: The number of tourists who have visited Ethiopia from 2009 to 2014 are 427000, 468000,
523000, 597000, 681000 and 770000, respectively. Compute the average number of tourists per year.
• Solution: Let X be number of tourists, then the average number of tourists is
n
1X 1
X̄ = Xi = (x1 + x2 + x3 + x4 + x5 )
n i=1 6
1
= (427000 + 468000 + 523000 + 597000 + 681000 + 770000)
6
1
= (3466000)
6
≈ 577667
80 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measures of Central Tendency—Types of MCT
• Interpretation: On average 577,667 tourists were visiting Ethiopia every year from 2009 to 2014.
• Example: The grades of a student on six examinations were 84, 91, 72, 68, 87 and 78. Find the
arithmetic mean of the grades.
• Solution: Suppose Y represents students’ grade
n
1X 1
Ȳ = yi = (y1 + y2 + · · · + y6 )
n i=1 6
1
= (84 + 91 + 75 + 68 + 87 + 78)
6
1
= (480)
6
= 80
• Exercise: The data represent the number of days o per year for a sample of individuals selected from
nine different countries. Find the mean.
• Suppose the data has k distinct observations. Let x1 occurs f1 times, x2 occurs f2 times,..., xk
occurs fk times as in the table below
1
X̄ = (f1 x1 + f2 x2 + · · · + fk xk )
f1 + f2 + · · · + fk
k
!
1 X
= Pk fi xi
i=1 fi i=1
• Example: The number of cylinder of 32 automobiles have been recorded and the following table has
been constructed
# of Cylinder # of automobiles
4 11
6 7
8 14
k
!
1 X
X̄ = Pk fi xi
i=1 fi i=1
f1 x1 + f2 x2 + f3 x3
=
f1 + f2 + f3
11(4) + 7(6) + 14(8)
=
11 + 7 + 14
= 6.2 ≈ 6
• Exercise: The hourly compensation costs (in U.S. dollars) for production workers in selected
countries are represented below. Compute the simple arithmetic mean
Class frequency
02.48 − 07.48 7
07.49 − 12.49 3
12.50 − 17.50 1
17.51 − 22.51 7
22.52 − 27.52 5
27.53 − 32.53 5
• The sum of the square of deviations of a set of numbers xi from the mean is always the least. i.e.,
n
X n
X
(xi − X̄)2 < (xi − A)2 , A 6= X̄
i=1 i=1
• Suppose the data is partitioned into k groups where the first group has n1 observation with the
corresponding mean of X̄1 , the second group has n2 observation with mean X̄2 ,..., the kth group has
nk observations with mean X̄k , then the combined (pooled) mean is computed as
k
!
n1 X̄1 + n2 X̄2 + · · · + nk X̄k 1 X
X̄c = = Pk ni X̄i
n1 + n2 + · · · + nk i=1 ni i=1
87 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measures of Central Tendency—Types of MC
• Example: The average age of 45 female students in a class is 25 years and the average age of 35
male students is 30. What is the average age of the class?
• Solution:
nf X̄f + nm X̄m 45(25) + 35(30)
X̄c = = = 27.2 Y rs
nf + nm 45 + 35
• Example: A company has different rate of wage per hour based on the experience of employees. If
60 employees have been paid 15.23 Birr per hour and 23 employees are paid 25 Birr per hour. What
is the average hourly payment in this company?
• Solution: Let wage be denoted by X, n1 = 60, X̄1 = 15.23, n2 = 23, X̄2 = 25
• Exercise: Four groups of students, consisting of 15, 20, 10, and 18 individuals, reported mean
weights of 162, 148, 153, and 140 pounds (lb), respectively. Find the mean weight of all the students.
• Example: Suppose the mean of variable X is 23. A new variable Y has been created as
yi = xi + 3, i = 1, 2, · · · , n. Show that the mean of Y is 26
• Solution: By definition, the mean of Y is
n n
1X 1X
Ȳ = yi = (xi + 3)
n i=1 n i=1
" n # n
1 X 1X 3n
= xi + 3n = xi +
n i=1 n i=1 n
= X̄ + 3 = 23 + 3
= 26
• Exercise: Suppose the mean of X is X̄ . Show that the mean of Z = aX + b is aX̄ + b where a and
b are non-zero constant numbers
• Suppose each observation of variable X may not have equal importance. We have to assign weights
for observations
• Suppose variable X has n observations where x1 has a weight of w1 , x2 has a weight of w2 , ..., xn
has a weight of wn , then the mean of X is
w1 x1 + w2 x2 + · · · + wn xn
X̄w =
w1 + w2 + · · · + wn
n
!
1 X
= Pn w i xi
i=1 wi i=1
• When all observations have equal weights, simple arithmetic and weighted means are the same. i.e.,
if w1 = w2 = · · · = wn , then X̄w = X̄
— Simple arithmetic mean is a special type of weighted arithmetic mean
n
! n1
√ Y
G= n
x1 x2 · · · xn = xi
i=1
• Geometric mean is often used in business and economics to find average rates of change, average
rates of growth, or average ratio
• Example: Suppose a variable X has the following three observations. Find the geometric mean
5, 10, 6
Solution:
√
G = 3
x1 · x2 · x3
1
= (5 × 10 × 6) 3
= 6.69
y1 = y0 + 0.05y0 = 1.05y0
y1
= 1.05
y0
y2 = y1 + 0.08y1 = 1.08y1
y2
= 1.08
y1
y3 = y2 + 0.77y2 = 1.77y2
y3
= 1.77
y2
The geometric mean for y1 , y2 and y3 is
√ √
G = 3 y1 × y2 × y3 = 3 1.05 × 1.08 × 1.77 = 1.26
• Exercise: Suppose you have an investment which earns 10% the first year, 50% the second year, and
30% the third year. What is its average rate of return?
• The harmonic mean of variable X with n observation is the reciprocal of the arithmetic mean of the
reciprocated observations
n
H=
Pn 1
i=1
xi
• Example: Find the simple arithmetic mean, geometric mean and harmonic mean for the following
data: 2, 4, 8, 6
n 4
H = 1 1 1 1 = 1 1 1 1
x1 + x2 + x3 + x4 2 + 4 + 8 + 6
= 4.17
X̄ = 5 and G = 4.43
• Exercise: Four students drive from Jimma to Addis Ababa at a speed of 40 km/hr. Since they need
to reach statistics class on time, they return at a speed of 60 km/hr. What is their average speed for
the round trip?
• The geometric mean of x1 , x2 , · · · , xn is less than or equal to their arithmetic mean but is greater
than or equal to their harmonic mean. In symbols,
H ≤ G ≤ X̄
The equality signs hold true when all the numbers x1 , x2 , · · · , xn are identical. i.e.,
x1 = x2 = · · · = xn
• Exercise: Show that H = G = X̄ if x1 = x2 = · · · = xn
• Mode is the most frequently observed value of a variable. i.e., mode is the value of X with the
highest frequency
— Mode may not exist; even if it does exist, it may not be unique
— Mode is denoted by hat over the head of a variable like X̂
∆1
X̂ = LCBm + W
∆1 + ∆2
5
= 17.5 + 6 = 21.25
5+3
97 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measures of Central Tendency—Types of MC
Median
The median of a data set is the measure of center that is the middle value when the original data
values are arranged in order of increasing (or decreasing) magnitude.
• The median is denoted by tilde over the head of the variable (like X̃ )
• Median for Grouped Data: The median can be obtained by using the following formula
n
−C
X̃ = LCBmed + 2 W
fmed
where LCBmed is the lower class boundary of the median class, C is the lcf which comes before the
median frequency (fmed ), W is the class width,
n
−C
2
X̃ = LCBmed + W
fmed
10 − 4
= 17.5 + 6
7
= 22.6
• For negatively skewed distribution, the median and the mode would be to the right of the mean. i.e.,
X̄ < X̃ < X̂
• In a positively skewed frequency distribution, the median and mode would be to the left of the mean.
i.e.,
X̄ > X̃ > X̂
• For symmetric distribution, mean, median and mode are equal
X̄ = X̃ = X̂
i(n + 1)
k=
100
50(10 + 1)
k= = 5.5
100
k = 5 + 0.5
Compute I + 1
4 Find the I th and (I + 1)th observation in the sorted data set
• Example: Find P25 , P30 , P46 and P85 for the following data
25(16 + 1)
k= = 4.25, I = 4, f = 0.25
100
The 4th and 5th observations are 5 and 6, respectively. Then
— For P30 , i = 30
30(16 + 1)
k= = 5.1, I = 5, f = 0.1
100
The 5th and 6th observations are 6 and 6, respectively. Then,
P30 = 6 + (6 − 6)0.1 = 6
• For P46 , i = 46
46(16 + 1)
k= = 7.82, I = 7, = 0.82
100
The 7th and 8th observations are 7 and 8, respectively. Then,
• For P85 , i = 85
85(16 + 1)
k= = 14.45, I = 14, f = 0.45
100
The 14th and 15th observations are 23 and 36, respectively. Then,
where li is the lower class boundary of the ith percentile class, fpi is the frequency of the ith
percentile class and C is the lcf that comes before the ith percentile class
• Example: Find the 20th, 75th and 60th percentile for the following data set
• Solution:
— The 20th percentile
n 20(45)
i = =9
100 100
The 9th observation will lie in 50 − 59 class. Thus, C = 6, W = 10, fp20 = 5 and l20 = 49.5
n
20 −C
P20 = l20 + 100 W = 49.5 + 9 − 6 10 = 55.5
fP20 5
n
75 100 − C
W = 69.5 + 33.75 − 20 10 = 75
P75 = l75 +
fP75 25
Note: The 50th percentile is equal with the median. i.e., P50 = X̃
Decile
Decile divides a data set into ten equal parts. It is denoted by Di , i = 1, 2, · · · , 9
Quartile
Quartile divides a data set into four equal parts. It is denoted by Qi , i = 1, 2, 3
Q2 = P50 , Q3 = P75
• Note: As you have noticed, decile and quartile are part of percentile.
Temperature Frequency
96.5 − 96.8 1
96.9 − 97.2 8
97.3 − 97.6 14
97.7 − 98.0 22
98.1 − 98.4 19
98.5 − 98.8 32
98.9 − 99.2 6
99.3 − 99.6 4
Compute
1 D1 and D3
2 35th percentile
3 Q1 and Q3
4 50th and 69th percentiles
113 Statistics Department Probability and Statistics 22.2.2022
Measure of Dispersion
Summarizing Data
Measure of Dispersion
• Dispersion is the scatteredness or spreadness of the individual items in a given series. The term
dispersion is generally used in two senses
— Dispersion refers to the variations of the items among themselves
— Dispersion refers to the variation of items around an average
• If the difference between the value of items and the average is large, the dispersion will be high and
on the other hand if the difference between the value of the items and averaging is small, the
dispersion will be low
• Objectives of Measures of Dispersion
1 To determine the reliability of an average: If the variation is small , the average will closely
represent the individual values and it is highly representative on the other hand, if the variation
is large, the average will be quite unreliable
2 To compare the variability of two or more data sets: A high degree of variation would
mean less consistency or less uniformity as compared to the data having less variation
Range
It is the simplest measures of dispersion. It is defined as the difference between the largest and smallest
value in the data set
R = M ax − M in
Relative Range
The relative measures of range, also called coefficient of range, is defined as
M ax − M in
RR =
M ax + M in
Note: For grouped frequency distribution M ax is the upper class limit of the last class and M in is the
lower class limit of the first class
• Example: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15. Find the range
and coefficient of range
• Solution: M ax = 35, M in = 15
M ax − M in R
RR = =
R = M ax − M in M ax + M in M ax + M in
= 35 − 15 20
=
= 20 35 + 15
= 0.4
• Example: Compute range for the following grouped data set
size 5 − 10 11 − 15 16 − 20 21 − 25 26 − 30
Frequency 4 9 15 30 40
M ax = 30, M in = 5, R = M ax − M in = 30 − 5 = 25
Inter-Quartile Range
IQR is the range of the middle 50% of the observations . i.e., the difference between the upper quartile
and lower quartile
IQR = Q3 − Q1
Quartile deviation
Quartile deviation, also called semi-inter-quartile range is half of the difference between the upper and
lower quartile
Q3 − Q1 IQR
QD = =
2 2
• Example: Find inter-quartile deviation, quartile deviation and coefficient of quartile deviation from
the following data.
15, 18, 20, 24, 27, 28, 30
• Solution: Q3 = 28, Q1 = 18
IQR = Q3 − Q2 = 28 − 18 = 10
IQR
QD = =5
2
Q3 − Q1 10
Coef f.QD = = = 0.22
Q3 + Q1 46
• Exercise: Compute quartile deviation, inter-quartile range and coefficient of quartile deviation for the
following grouped frequency distributions.
Mean Deviation
Consider a set of observations of variable X, x1 , x2 , · · · , xn . mean or average deviation (MD)a is
defined as
n
1X
MD = |xi − A|
n i=1
where A is a constant that represents mean, median and mode.
a
It is also called mean absolute deviation
122 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measure of Dispersion—Mean Deviation
k
!
1 X
M D = Pk fi |xi − A|
i=1 fi i=1
where fi and xi are the frequency and class mark of the ith class, respectively
28, 23, 56, 89, 55, 63, 47, 56, 41, 46, 22
• The variance which is computed from a population is denoted by Greek letter sigma (σ 2 )
N
1 X
σ2 = (xi − µ)2
N − 1 i=1
• Reading Assignment: In some statistics books, the sample variance is defined as the mean of the
square deviation of each observation from the center (mean). i.e.,
n
1X
S2 = (xi − X̄)2
n i=1
This formula is not advised as an estimator of the population variance. Why?
128 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measure of Dispersion—Variance and Standard Deviation
• The standard deviation of a statistical data is defined as the positive square root of the variance.
i.e., v
u n
u 1 X
S=t (xi − X̄)2
n − 1 i=1
2, 4, 6, 8
The variance is
n
2 1 X
S = (xi − X̄)2
n − 1 i=1
1
(2 − 5)2 + (4 − 5)2 + (6 − 5)2 + (8 − 5)2
=
4−1
= 6.67
The standard deviation is √
S= 6.67 = 2.58
130 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measure of Dispersion—Variance and Standard Deviation
• Example: Compute the variance and standard deviation of the following data
• Exercise: The number of highway miles per gallon of the 10 worst vehicles is shown.
k
1 X
S 2 = Pk fi (xi − X̄)2
i=1 fi − 1 i=1
where xi and fi are the class mark and frequency of of the ith class. k is number of classes
• Exercise: Compute the variance and standard deviation for the following grouped frequency
distributions and identify which frequency distribution is more dispersed.
Empirical Rule
For moderately symmetrical data
2 4
QD ≈ S, MD ≈ S
3 5
• Exercise: Suppose the variance of X is S 2 and the variance of Z = X + a, where a is constant, will
be S 2 . Show that this is true!
• In a symmetrical distribution
1 About 68.27% of the observations lie within one standard deviation from the mean
2 X̄ ∓ 2S includes about 95.45% of the observations
3 About 99.73% of the observations lie within three standard deviation from the mean
Chebyshev’s Theorem
The proportion of values from a data set that will fall within k standard deviations of the mean will be
at least 1 − (1/k2 ), where k is a number greater than 1 (k is not necessarily an integer).
• This theorem states that at least three-fourths, or 75%, of the data values will fall within 2 standard
deviations of the mean of the data set
• Example: The mean price of house rent in a certain neighborhood is 4,000 Br, and the standard
deviation is 200 Br. Find the price range for which at least 75% of the houses will sell.
• Solution: Chebyshev’s theorem states that three-fourths, or 75%, of the data values will fall within 2
standard deviations of the mean. Thus,
Hence, at least 75% of all homes rented in the area will have a price range from 3600Br to 4400Br.
• Exercise: A survey of local companies found that the mean amount of travel allowance for executives
was $0.25 permile. The standard deviation was $0.02. Using Chebyshev’s theorem, find the minimum
percentage of the data values that will fall between $0.20 and $0.30
Coefficient of Variations
The coefficient of variation, denoted by CV , is the standard deviation divided by the mean. The result
is expressed as a percentage
S
CV = × 100%
X̄
• Example: Compare the variation in heights of men to the variation in weights of men, using these
sample results obtained from a data set. For men, the heights yield X̄ = 68.34in. and Sx = 3.02in.
the weights yield Ȳ = 172.55lb and Sy = 26.33lb.
Mario F. Triola (2010). Elementary Statistics Using Excel. 4.ed. Pearson Education, Inc. USA
137 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measure of Dispersion—Pooled Variance
• Solution: Height
Sx 3.02in
CV = × 100% = × 100% = 4.42%
X̄ 68.31in
Wight
Sy 26.33lb
CV = × 100% = × 100% = 15.26%
Ȳ 172.55lb
We can see that heights have considerably less variation than weights
• Exercise: The weekly income (in Birr) of 10 men and 15 women workers are listed below. Whose
weekly income is more dispersed?
Women
Men 254 250 123 352 142 22
14 19 20 30 100
458 100 200 235 224 162
125 236 300 142 63
364 122 12 32
3, 8, 6, 14, 4, 12, 7, 10
Then
xi − X̄ 14 − 8
z= = = 1.57
S 3.82
Interpretation: The value 14 lies 1.57 standard deviation above the center
139 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Measure of Dispersion—Z-score (Standardized value)
• Example: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of
10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her
relative positions on the two tests.
• Solution:
Calculus result − Mean 65 − 50
Zcalculus = = = 1.5
S 10
History result − Mean 60 − 25
Zhistory = = = 1.0
S 5
Since the z score for calculus is larger, her relative position in the calculus class is higher than her
relative position in the history class
Outliers
An outlier is an extremely high or an extremely low data value when compared with the rest of the data
values.
• Note: An outlier can strongly affect the mean and standard deviation of a variable
• A number of rule of thumps are being proposed to identify the unusual observations in a data set
— If we know the standard deviation and mean of a collected data, we can roughly estimate the
minimum and maximum usual sample values as follows:
— Ordinary values: −2 ≤ z ≤ 2
will be a potential outlier. That is, any observation which is less than Q1 − 1.5IQR or greater than
Q3 + 1.5IQR will be a potential outlier
• Example: Check the following dataset for outliers
• Solution:
• The graphical technique that works with 1.5IQR to detect the presence of outliers is called Box and
Whisker plot or Box plot in short
• Box plot can provide 5-number summary. i.e., minimum, Q1, Q2 = median, Q3 and Maximum
• Procedure to Construct Box Plot:
1 Find the 5-number summary
2 Construct a scale with values that include the minimum and maximum
3 Constrict a box (rectangle) extending from Q1 to Q3 and draw a line in the box at the median
value
4 Draw lines extending outward from the box to the minimum and maximum data values
• In the modified box plot, the minimum and maximum values are replaced by Q1 − 1.5IQR and
Q3 + 1.5IQR, respectively
• Example: For the 24 amounts of nicotine (in mg per cigarette). Find the 5-number summary and
draw box plot
The two dots at the left side of the box plot are outliers
• Skewness is a measure of symmetry, or more precisely, the lack of symmetry or departure from
symmetry. i.e., it is a means of measuring the horizontal movement of the distribution
• Skewness can be measured in absolute terms by taking the difference between arithmetic mean and
mode
— The absolute measure of skewness is
Sk = Mean − Mode
• If the value of arithmetic mean is greater than mode, Skewness is positive and if the value of mode is
greater than mean, the skewness is negative
> 0 , The distribution is positively skewed
Sk = = 0 , The distribution is symmetric
< 0 , The distribution is negatively skewed
Symmetric
Positively Negatively
Skewed skwed
• Pearsonian Coefficients of Skewness: Karl Pearson’s formula for skewness indicates direction as
well as the extent of skewness.
Mean − Mode
Sk =
Standard Deviation
• Bowley’s Coefficients of Skewness: Bowley’s coefficient of skewness is based on quartiles. Thus a
measure of skewness based on the distance from the median is defined as follows:
(Q3 − Q2 ) − (Q2 − Q1 ) Q3 − 2Q2 + Q1
Sk = =
(Q3 − Q2 ) + (Q2 − Q1 ) Q3 − Q1
• Example: The sum of fifteen observations, whose mode is 8, was found to be 150 with coefficient of
variation of 20%
1 Calculate the Pearsonian coefficient of skewness and give appropriate conclusion
2 Are smaller values more or less frequent than bigger values for this distribution?
3 If a constant 4 was added on each observation, what will be the new Pearsonian coefficient of
skewness? Show your steps. What do you conclude from this?
• Solution: X̂,
P
xi = 150, X̄ = 10
S
CV = 20% = × 100%
X̄
X̄ − X̂ 10 − 8
Sk = = =1
S 2
The distribution is positively skewed.
2 According to the Sk value, smaller values are more frequently distributed
3 If a constant is added to the raw data set, then the mean and mode will shift by that constant
number. However, the standard deviation doesn’t change. Therefore, X̄ = 14, X̂ = 12 and
S=2
X̄ − X̂ 14 − 12
Sk = = =1
S 2
M30
Sk = 03/2
M2
where
n n
1X 1X
M30 = (xi − X̄)3 , M20 = (xi − X̄)2
n i=1 n i=1
• Exercise: The median and the mode of a mesokurtic distribution are 32 and 34, respectively. The
4th moment about the mean is 243. Compute the Peasonian coefficient of skewness and identify the
type of skewness. Assume n ≈ n − 1. The rth moment about the mean is
n
1X
Mr0 = (xi − X̄)r , r = 0, 1, 2, · · ·
n i=1
• Exercise: For a moderately skewed frequency distribution, the mean is 10 and the median 8.5. If the
coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the probable mode of
the distribution
151 Statistics Department Probability and Statistics 22.2.2022
Summarizing Data
Shape of Distribution—Kurtosis
• Kurtosis in Greek language mean bulginess, it measures the flatness of the curve. Three terms are
used for indicating flatness,
— Mesokurtic stands for a normal curve,
— Leptokurtic for a peaked curve and
— Platykurtic for a curve less peaked than normal
Leptokurtic
β>0
Mesokurtic
β=0
Platykurtic
β<0
M40
β= −3
(M20 )2
where
n n
1X 1X
M40 = (xi − X̄)4 , M20 = (xi − X̄)2
n i=1 n i=1
• Example: If the standard deviation of a symmetric distribution is 10, what should be the value of the
fourth moment in order the distribution to be
1 Leptokurtic
2 Platykutic
3 Mesokrtic
Ω = {1, 2, 3, 4, 5, 6}
Ω = {HH, HT, T H, T T }
1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
• Example: List all possible outcomes of tossing a fair coin three times
1st toss
2nd toss
3rd toss
• Suppose we have two sets A and B, A is subset of B is denoted as A ⊆ B where every element of A
is also an element of B. i.e.,
A ⊆ B = {∀x : if x ∈ A, then x ∈ B}
• Event: It is a subset of sample space (contains one or more outcomes which are in the sample space)
and is defined for a particular purpose
— Simple event is an event having only single outcome
— Compound event consisting of one or more outcomes or simple events
— Event is denoted by capital letters except S such as A, B, F etc.
• Example: Let event B is defined as at least one head in the experiment of tossing a coin two times.
B = {HH, HT, T H}
• The union of two sets A and B is the • The intersection of two sets A and B is
collection of all objects that are in either the collection of all objects that are in
set. It is written A ∪ B. Using curly both sets. It is written A ∩ B. Using
brace notion curly brace notion
Ω Ω
A B A A∩B B
• Mutually exclusive events are events which do not have the same element in common
A ∩ B = {}
• The compliment of a set A is the collection of objects in the universal set that are not in A. The
compliment is written as Ac . In curly brace notation
Ac = {x : (x ∈ Ω) and (x ∈
/ A)}
A B
Ω = {1, 2, 3, 4, 5, 6}
• Equally-Likely Events are those events which have equal chance of occurrence
• Null event is the event which doesn’t have element (outcome) in it. i.e., it is an empty set denoted
by either ∅ or {}
• Exercise: Write the shaded region in the following Venn diagrams using compliment, union and
intersection
Ω
Ω
A B
A B
C
— Ωc = ∅, ∅c = Ω — ∅∪A=A — ∅∩A=∅
— (Ac )c = A — A ∪ Ac = Ω — A ∩ Ac = ∅
— Ω∪A=Ω — A∪A=A — Ω∩A=A
• Let A, B and C be events (sets), then
— Associative law
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
— Commutative law
A∪B = B∪A
A∩B = B∩A
— Distributive law
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
166 Statistics Department Probability and Statistics 22.2.2022
Elementary Probability
Review of Set Theory
• Solution:
1 Ac = {s4 , s5 , s6 , s7 , s8 }, B c = {s1 , s6 , s7 , s8 } and C c = {s1 , s2 , s6 , s7 }
2 A ∪ B = {s1 , s2 , s3 , s4 , s5 }, A ∪ C = {s1 , s2 , s3 , s4 , s5 , s8 }, and A ∩ B ∪ C = {s2 , s3 , s4 , s5 , s8 }
3 A ∩ B = {s2 , s3 }, A ∩ C = {s3 } and A ∩ B ∩ C = {s3 }
4 Exercise
• De Morgan Law:
(∪i Aj )c = ∩j Acj , (∩j Aj )c = ∪j Acj
Suppose we have two sets, A and B. The De Morgan law says that
(A ∪ B)c = Ac ∩ B c
(A ∩ B)c = Ac ∪ B c
• Example: Consider the above example and show that (A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c
• The probability of an event A is denoted by P (A). i.e., the probability of any event · is generally
denoted by P (·)
• Computing the probability of the occurrence an event needs to know all outcomes in the event and
the sample space
#A
P (A) =
#Ω
• In probability, there are four basic principles of counting
— Addition
— Multiplication
— Permutation
— Combination
Addition
Suppose two experiments, say E1 and E2 , are performed independently. If E1 has n1 possible outcomes
and E2 has n2 possible outcomes, then we will have a total of n1 + n2 possible outcomes
• Example: Suppose you have 6 roads and 3 railways from Addis Ababa to Ambo. How many possible
ways do you have to go Ambo?
Multiplication
Suppose two experiments, say E1 and E2 , are performed simultaneously. The first experiment has n1
possible outcomes and the second experiment has n2 outcomes. For each possible outcome of the first
experiment, we will have n2 possible outcomes in the second experiment. We will have a total of
n1 × n2 possible outcomes
• Example: Say the only clean clothes you’ve got are 2 t-shirts and 4 pairs of jeans. How many
different combinations can you choose?
• Example: A small community consists of 10 women, each of whom has three children. If one women
and one of her children are to be chosen as mother and child of the year, how many different choices
are possible?
• Example: How many different 7-place license plates are possible if the first 3 places are to be
occupied by letters and the final 4 by numbers? Ans. 175,760,000
• Exercise: In the above example, how many license plates would be possible if repetition among
letters or numbers were prohibited? Ans. 78,624,000
• Exercise: You want to buy a car: you have two choices of body style, 5 color and 3 models (standard
model, sports model with bigger engine and luxury model with leather seats). How many possible
choices do you have to buy one car? Ans. 30
• If we have three objects, say A, B and C, we can arrange them in 6 different ways.
A B C
B C A C A B
C B C A B A
• In general, if we have n objects and want to arrange, then we will have a total of n!. If n is positive
integer, then
n! = n × (n − 1) × (n − 2) × · · · × (1)
• Note that
n! = n(n − 1)!
0! = 1
4! = 4(3)(2)(1) = 24 ways
• We shall now determine the number of permutations of a set of n objects when certain of the objects
are indistinguishable from each other. Then the formula is:
n!
n1 ! × n2 ! × · · · × nr !
Among n objects n1 are alike, n2 are alike, ..., nr are alike.
r
X
ni = n
i=1
6! 6(5)(4)3!
= = 60 ways
3!2!1! 3(2)(2)
5 5! 5(4)(3)2!
P3 = = = 60 ways
(5 − 3)! 2!
• Example: A license plate begins with three letters. If the possible letters are A, B, C, D and E,
how many different permutations of these letters can be made if no letter is used more than once?
5
P3 = 60 ways
Combination
Combination is a way of selecting items from a collection, such that (unlike permutations) the order of
selection does not matter
• Example: From a class of 20 students we need to select 3 for a committee. How many possibilities
do we have to form a committee?
20 20! 20(19)(18)17!
= = = 1140 possibilities
3 3!(20 − 3)! 3!17!
• Example: From a group of 5 women and 7 men, how many different committees consisting of 2
women and 3 men can be performed? What if 2 of the men are feuding and refuse to serve on the
committee together?
5 7
= 350
2 3
2 5 2 5 5
+ = 300
0 3 1 2 2
• There are three approaches to calculate a probability of an event. These are:
1 The classical approach
2 The frequentist approach
3 The subjective approach
• Classical Approach: If a procedure has n different simple events, each with an equal chance of
occurring, and event A can occur in s of these ways, then
s #(A)
P (A) = =
n #(Ω)
• Frequentist Approach: This approach is also called empirical approach. i.e., probability calculation
is depending on data
• If after n repetition of an experiment, where n is very large, an event is observed to occur in h of
these, then the probability of an event is
h
n
• Conduct an experiment a large number of times, and count the number of times event A actually
occurs, then an estimate of P (A) is
• Example: Suppose a coin was tossed 1000 times and the result was 587 tails. The relative frequency
of tails is 587/1000. Another 1000 tosses lead to 511 tails. Then the relative frequency of tails is
(587 + 511)/(1000 + 1000) = 1098/2000 . Proceeding, in this manner we obtain a sequence of numbers,
which gets closer and closer to the number defined as the probability of a trial in a single toss.
#A
P (A) = lim
n→∞ n
• Axioms of Probability: The probability of an event, say A must satisfy the following axioms
1 Axiom 1: The probability of any event A must be non-negative, that is, P (A) ≥ 0
2 Axiom 2: The probability of the sample space is 1, that is, P (Ω) = 1
3 Axiom 3: Given mutually exclusive events A1 , A2 , A3 , · · · that is, where Ai ∩ Aj = {}, i 6= j
— The probability of a finite union of the events is the sum of the probabilities of the individual
events, that is:
P (A1 ∪ A2 ∪ · · · ∪ Ak ) = P (A1 ) + P (A2 ) + · · · + P (Ak )
— The probability of a countably infinite union of the events is the sum of the probabilities of the
individual events, that is:
• Example: Suppose you throw two dice. We are interested in the sum of the upper face of the dice.
Let E be the event that the sum of the dice is odd. Find P (E)
• Example: In the experiment of tossing a coin three times, what is the probability of getting at most
one head?
• Example: If two dice are rolled, what is the probability that the sum of the upturned faces will equal
7?
• Example: If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black balls, what is
the probability that one of the balls is white and the other two black?
• Example: A committee of 5 is to be selected from a group of 6 men and 9 women. If the selection is
made randomly, what is the probability that the committee consists of 3 men and 2 women?
• Exercise: An urn contains n balls, one of which is special. If k of these balls are withdrawn one at a
time, with each selection being equally likely to be any of the balls that remain at the time, what is
the probability that the special ball is chosen?
• Exercise: Suppose that A and B are mutually exclusive events for which P (A) = 0.3 and
P (B) = 0.5. What is the probability that
1 either A or B occurs?
2 A occurs but B does not?
3 Both A and B occur?
• Exercise: Sixty percent of the students at a certain school wear neither a ring nor a necklace.
Twenty percent wear a ring and 30 percent wear a necklace. If one of the students is chosen
randomly, what is the probability that this student is wearing a ring or a necklace?
A1
• Rule 2: For every event A, 0 ≤ P (A) ≤ 1. i.e., the probability is a number between 0 and 1
• Rule 3: For ∅, the empty set, P (∅) = 0. i.e., impossible event has zero probability
• Rule 4: If Ac is the compliment of A, then
P (Ac ) = 1 − P (A)
• Rule 5: If A and B are two events, then
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
185 Statistics Department Probability and Statistics 22.2.2022
Elementary Probability
Derived Theorems of Probability
• More generally, if A , B and C are three events, then
• Example: From a group of 3 freshmen, 4 sophomores, 4 juniors, and 3 seniors a committee of size 4
is randomly selected. Find the probability that the committee will consist of
— 1 from each class;
— 2 sophomores and 2 juniors
— only sophomores or juniors
• Example: A customer visiting the suit department of a certain store will purchase a suit with
probability 0.22, a shirt with probability 0.30, and a tie with probability 0.28. The customer will
purchase both a suit and a shirt with probability 0.11, both a suit and a tie with probability .14, and
both a shirt and a tie with probability 0.10. A customer will purchase all 3 items with probability
0.06. What is the probability that a customer purchases
— none of these items?
— exactly 1 of these items?
1 Conditional probability
2 Multiplication theorem
3 Bayes’ theorem
4 Total probability theorem
5 Independent events
P (A ∩ B)
P (A|B) = , P (B) > 0
P (B)
P (A ∩ B) = P (A)P (B|A)
= P (B)P (A|B)
P (A ∩ B) = P (A)P (B)
P (Ω ∩ B) P (B)
P (Ω|B) = = =1
P (B) P (B)
P (A ∩ Ω) P (A)
P (AΩ) = = = P (A)
P (Ω) 1
190 Statistics Department Probability and Statistics 22.2.2022
Conditional Probability and Independence
Conditional Probability
• Example: Suppose that an office has 100 calculating machines. Some of these machines are electric
(E) while others are manual (M ). And some of the machines are new (N ) while others are used (U ).
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
A person enters the office and picks a machines randomly and discover that it is manual. What is the
probability that it is new?
• Example: If P (A) = 0.5, P (B) = 0.6, and P (A ∩ B c ) = 0.4, compute
1 P (A ∩ B) and P (A|B)
2 P (A ∪ B c )
3 P (B|A ∪ B c )
• Example: A jar contains black and white marbles. Two marbles are chosen without replacement.
The probability of selecting a black marble and then a white marble is 0.34, and the probability of
selecting a black marble on the first draw is 0.47. What is the probability of selecting white marble
on the second draw, given that the first marble drawn was black?
• Example: The probability that it is Friday and that a student is absent is 0.03. Since there are 5
schooldays in a week, the probability that it is Friday is 0.2. What is the probability that a student is
absent given that today is Friday?
• Exercise: Suppose that we roll a pair of fail dice, so each of the 36 possible outcome is equally likely.
Let A denotes the event that the first die lands on 3, let C be the event that the sum of the dice is 7.
Are A and C independent ?
P (∩ni=1 Ai ) = P (An |A1 ∩ A2 ∩ · · · ∩ An−1 )P (An−1 |A1 ∩ A2 ∩ · · · ∩ An−2 ) · · · P (A2 |A1 )P (A1 )
• Example: An urn contains 10 identical balls, of which 5 are black, 3 are red and 2 are white. Four
balls are drawn one at a time without replacement. Find the probability that the first ball is black,
the second is red, the third is white and the fourth is black
• Exercise: Consider a lot consisting of 20 defective and 80 non-defective items. If we choose two
items at random without replacement. What is the probability that both items are defective?
• The events B1 , B2 , · · · , Bk represent a partition of the sample space Ω if
1 Bi ∩ Bj = ∅, ∀i 6= j
2 ∪i Bi = Ω
3 P (Bi ) > 0, ∀i
• Let A be the event with respect to Ω and let B1 , B2 , · · · , Bk be partition of Ω. Then
A = (A ∩ B1 ) ∪ (A ∩ B2) ∪ · · · ∪ (A ∩ Bk )
B1 B8 B2
B7 B9
B5
B6 B3
B4
k
X
P (A) = P (A|Bi )P (Bi )
i=1
• Example: A certain item is manufactured by three factories, say 1, 2, and 3. It is known that 1 turns
out twice as many items as 2, and that 2 and 3 turns out the same number of items. It is also known
that 2% of the items produced by 1 and 2 are defective, while 4% of those manufactured by 3 are
defective. All the items produced are put into one stockpile and then one item is chosen at random.
What is the probability that this item is defective?
• Solution: Let us introduce the following events: A = {the item is defective},
B1 = {The item came from 1}, B2 = {The item came from 2}, and B3 = {The item came from 3}.
The required probability is
where P (B1 ) = 1/2, P (B2 ) = P (B3 ) = 1/4, P (A|B3 ) = 0.04 and P (A|B1 ) = P (A|B2 ) = 0.02.
P (A) = 0.025
• Example: Consider one of the examples above. Suppose that one item is chosen from the stockpile
and is found to be defective. What is the probability that it was produced in factory 1?
P (A|B1 )P (B1 )
P (B1 |A) =
P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + P (A|B3 )P (B3 )
(0.02)(0.5)
=
(0.05)(0.5) + (0.02)(0.25) + (0.04)(0.25)
= 0.40
197 Statistics Department Probability and Statistics 22.2.2022
Conditional Probability and Independence
Multiplicative, Baye’s and Total Probability Theorems
• Exercise: Suppose that the probability that both of a pair of twins are boys is 0.30 and that the
probability that they are both girls is 0.26. The probability of the first child being a boy is 0.52, what
is the probability that:
1 The second twin is a boy, given that the first is a boy?
2 The second twin is a girl, given that the first is a girl?
3 The second twin is a boy?
4 The first is a boy and the second is girl?
• Exercise: Let Bi , i = 1, · · · , 5 be partition of the sample space Ω and suppose that:
i 5−i
P (Bi ) = and P (A|Bi ) = , i = 1, 2, · · · , 5
15 15
Compute the probabilities P (Bi |A) =, i = 1, 2, · · · , 5
X
ω X(ω)
• The above definition has made it clear that the random variable X can be a function which maps
each outcome of a probability experiment to a real number system
— Random variable is a variable whose values are associated with chance
Ω = {HH, HT, T H, T T }
Define the random variable X as follows: X is the number of heads. Hence, X(HH) = 2,
X(HT ) = 1 = X(T H) and X(T T ) = 0
• Solution: Now the possible values of a random variable X are 0, 1, 2.
1 2 1
P (X = 0) = P ({T T }) = , P (X = 1) = P ({HT, T H}) = , P (X = 2) = P ({HH}) =
4 4 4
Based on the definition of X, P (X = 2) means the probability of getting two heads.
• The collection of pairs (xi ; P (xi )), i = 1, 2, · · · , is sometimes called probability distribution of X.
The probability distribution of the first example is
X = xi 0 1 2
P (X = xi ) 0.25 0.50 0.25
• Exercise: Roll a four-sided die twice, and let X equal the larger of the two outcomes if they are
different and the common value if they are the same. What are the possible values of X and the
corresponding probabilities?
• Random variable can be classified as discrete and continuous
Discrete Random Variable
Let X be a random variable. If the number of possible values of X is finite or countably infinite, we call
X a discrete random variable
• Let X be a discrete random variable. The probability mass function P (X = xi ) must satisfy the
following conditions
1 P = xi ) ≥ 0
P(X
∞
2 i=1 P (X = xi ) = 1
• Example: Consider the experiment of tossing a die two times. Let Y be the random variable which
denotes the absolute difference of the upturned faces. What is the probability mass function of Y ?
y -1 0 1 2
P (y) c 2c 0.5c 3c
(1 + |x − 3|)
P (X = x) = , x = 1, 2, 3, 4, 5
11
Find
1 P (X > 2)
2 P (X < 1)
3 P (2 ≤ X < 4)
4 P (X ≥ 4|X ≥ 2)
• Exercise: For each of the following, determine the constant c so that f (x) satisfies the conditions
being a pmf for random variable X
1 f (x) = xc , x = 1, 2, 3, 4
2 f (x) = cx, x = 1, 2, · · · , 10
3 f (x) = c(0.25)x , x = 1, 2, 3 · · ·
c
4 f (x) = (x+1)(x+2) , x = 0, 1, 2, 3, · · ·
• Exercise: Let X be the number of accidents per week in a factory. Let pmf of X be
c
f (x) = , 0, 1, 2, 3, · · ·
(x + 1)(x + 2)
The value of c is 1.
210 Statistics Department Probability and Statistics 22.2.2022
One-Dimensional Random Variables
Continuous Random Variable
For any value if x ∈ [0, ∞], f (x) ≥ 0. The two conditions are satisfied. Therefore, f (x) is pdf
• Exercise: The percentage of alcohol in a certain compound may be considered as a random variable,
where X, 0 < X < 1, has the following pdf :
1
The evaluation has reviled that a = 2
Z 1/2
1 1
P <x< = 6x(1 − x)dx
3 2 1/3
1/2
= x2 (3 − 2x) = 0.2407
1/3
Z 2/3
P (1/3 < x < 2/3) = 6x(1 − x)dx
1/3
2/3
= x2 (3 − 2x) = 0.4815
1/3
Therefore,
1 2 0.2407
P X ≤ 0.5 | < x < = = 0.4999
3 3 0.4815
• Exercise: Consider the pdf of the above example. Determine a number b such that
P (X < b) = 2P (X > b)
214 Statistics Department Probability and Statistics 22.2.2022
One-Dimensional Random Variables
Continuous Random Variable
1
f (x) = , −a ≤ x ≤ a
2a
where a > 0. Whenever possible, determine a so that the following are satisfied
1 P (X > 1) = 13
2 P (X < 12 ) = 0.3
3 P (X > 1) = 0.5
4 P (|X| < 1) = P (|X| > 1)
• Exercise: The continuous random variable X has pdf
f (x) = 3x2 , −1 ≤ x ≤ 0
b
If b is a number satisfying −1 < x < 0. Compute P X > b|X < 2
1 For n = 2, determine k
2 For n = 3, determine k
3 For general n, determine k
• Exercise: The claims submitted to an insurance company over a specified period of time t is a RV
with pdf
c
f (x) = , x > 0, c > 0
(1 + x)4
0.0 x
1.0 2.0 3.0 4.0
Note that
P (X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3)
The graph of cdf for discrete random variable is always a step graph
219 Statistics Department Probability and Statistics 22.2.2022
One-Dimensional Random Variables
Cumulative Distribution Function
• Exercise: Suppose that the random variable X assumes the three values 0; 1 and 2 with probabilities
1/3, 1/6 and 1/2, respectively. Derive the cdf
Construct F (x)
0.8
0.6
0.4
0.2
x
0.0
0.0 0.3 0.6 0.9 1.2 1.5
The graph of
0 if x/ ≤ 0
F (x) = x2 if 0 < x ≤ 1
1 if x > 1
• limx→−∞ F (x) = 0 and limx→∞ F (x) = 1. Usually it can be written as F (−∞) = 0 and F (∞) = 1
• For continuous case Z x
F (−∞) = lim f (s)ds = 0
x→−∞ −∞
Z x
F (∞) = lim f (s)ds = 1
x→∞ ∞
lim x2 = 0, lim x2 = 1
x→0 x→1
• Exercise: Let
F (x) = e3x , −∞ < x ≤ 0
show that limx→−∞ F (x) = 0 and limx→0 F (x) = 1
∂F (x)
f (x) =
∂x
∀x at which F is differentiable
• Example: Suppose that a continuous random variable has cdf F given by
0 if x ≤ 0
F (x) =
1 − e−x if x > 0
• Example: Determine the pdf f for the following cdf. Verify also that f is a pdf
1 F (x) = x5 , 0 ≤ x ≤ 5
2 F (x) = e3x , −∞ < x ≤ 0
224 Statistics Department Probability and Statistics 22.2.2022
One-Dimensional Random Variables
Cumulative Distribution Function—Properties
• Let X be a discrete random variable with possible values x1 , x2 , · · · and suppose that it is possible to
label these values so that x1 < x2 < · · · . Let F be the cdf of X. Then
P (X = xi ) = F (xi ) − F (xi − 1)
• Example: Let X be a discrete random variable with cdf
0 if x < 0
1
3 if 0 ≤ x < 1
F (x) = 1
if 1 ≤ x < 2
2
1 if x ≥ 2
We know that
1 1 1
P (X = 1) = F (1)−F (0) = − = The pdf is
2 3 6
x 0 1 2
1 1 1 1 1
P (X = 2) = F (2)−F (1) = 1− = P (x) 3 6 2
2 2
1 1
P (X = 0) = −0=
3 3
225 Statistics Department Probability and Statistics 22.2.2022
Chapter Six
Functions of Random Variables
Functions of Random Variables
Outline
1 Equivalent events
2 Functions of discrete random variables and their distributions
3 Functions of continuous random variables and their distributions
B = {x ∈ RX : H(x) ∈ A}
If A and B are related in this way, then they are equivalent events
• Two events A and B are equivalent if and only if they have equal chance of occurrence
√
• Example: Suppose H(x) = x2 and events B : {X > 2} and C : {Y > 2} are equivalent.
√
P (Y > 2) = P (X 2 > 2) = P (X > 2)
• If X is a discrete random variable and Y = H(X), then it follows immediately that Y is also a
discrete random variable
• Example: Consider the probability distribution of X (# of heads) defined based on tossing a fair coin
two times
X 0 1 2
P (X) 0.25 0.50 0.25
Y 1 3 5
P (Y ) 0.25 0.50 0.25
P (Y = 1) = P (2x + 1 = 1) = P (X = 0)
• Example: Suppose that the random variable X assumes the three values −1, 0 and 1 with
probabilities 13 , 12 and 16 , respectively. Let Y = X 2 . Construct the probability distribution of Y .
• The pdf of Y is
1 −(y − 1)/3
f (y) = e , y>1
3
• Example: Suppose that the pdf of a random variable X is
1
f (x) = , 1≤x≤3
2
Derive the pdf of Y = ex
• Example: Consider the pdf of a random variable X is
1
f (x) = , −1 ≤ x ≤ 1
2
. Derive the pdf of Y = X 2
f (x) = 1, x ∈ (0, 1)
• Example: Suppose that the random variable X has the following pdf
1
f (x) = , −1 < x < 1
2
. Find the pdf of the following random variables:
1 Y = sin (π/2)X
2 Y = cos (π/2)X