You are on page 1of 68

ecture notes on statistics

CHAPTER 1
1 CHAPTER ONE: INTRODUCTION
1.1 Definition and classification of statistics
1.1.1 Definition:
 Plural sense (lay man definition): Statistics is a collection of numerical facts and data.
 Singular sense (formal definition): Statistics is a mathematical science dealing with the methods
of collection, organizing the collected data, presentation, analysis and interpretation of the data.
 Statistics is a subject that deals with numbers and figures describing certain situations. It primarily
deals with numerical data taken by surveys and summarizes these data in such a way that this
summary gives a good indication about the nature of the data.
The word “statistics” is derived from the Latin for “state” indicating the historical importance of
governmental data gathering, which related to demographic information (military recruitment and tax
collecting). Thus, the scope of statistics in the ancient times was primarily limited to the collection of
demographic and property and wealth data of a country by governments for framing military and fiscal
policies.

1.1.2 Classification:
Statistics is broadly divided into two categories based on how the collected data are used.
1. Descriptive Statistics
 It deals with describing data without attempting to infer anything that goes beyond the given
set of data.
 It consists of collection, organization, summarization and presentation of data.
 It is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics
 It deals with making inferences and/or conclusions about a population based on data obtained
from a limited sample of observations,
 It consists of performing hypothesis testing, determining relationships among variables and
making predictions.
 It is important because statistical data usually arises from sample.
 Statistical techniques based on probability theory are required.
For example,
a) The average income of all families (the population) in Ethiopia can be estimated from figures obtained
from a few hundred (the sample) families.
b) The average age of a student in Dilla University is 20.1 years.
c) There is a relationship between smoking tobacco and an increased risk of developing cancer.

1.2 Stages in statistical investigation


There are five stages or steps in any statistical investigation.
1. Collection of data: the process of measuring, gathering, assembling the raw data up on which the
statistical investigation is to be based.
 Data can be collected in a variety of ways; one of the most common methods is
through the use of survey. Survey can also be done in different methods, three of the most common
methods are:
 Telephone survey
 Mailed questionnaire
 Personal interview.
Exercise: discuss the advantage and disadvantage of the above three methods with respect to each other.
2. Organization of data: Summarization of data in some meaningful way, e.g table form

Page 1 of 68
ecture notes on statistics

3. Presentation of the data: The process of re-organization, classification, compilation, and


summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data, mainly
through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical measures
through the analysis of the data by implementing those methods by which conclusions are formed and
inferences made.
 Statistical techniques based on probability theory are required.

1.3 Definition of some basic terms


a) Statistical Population: It is the collection of all possible observations of a
specified characteristic of interest (possessing certain common property) and
being under study. An example is all of the students in HU taken stat 1043
course in this term.
b) Sample: It is a subset of the population, selected using some sampling
technique in such a way that they represent the population.
c) Sampling: The process or method of sample selection from the population.
d) Sample size: The number of elements or observation to be included in the
sample.
e) Parameter: is a descriptive measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance,
f) Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.
g) Census: Complete enumeration or observation of the elements of the
population. Or it is the collection of data from every element in a population
h) Variable: It is an item of interest that can take on many different numerical
values.
1.4 Application and limitation of statistics

Statistics can be applied in any field of study which seeks quantitative evidence. For instance (in
engineering)
 To compare the breaking strength of two types of materials
 To determine the probability of reliability of a product.
 To control the quality of products in a given production process.
 To compare the improvement of yield due to certain additives (fertilizer, herbicides, (wee
decides), e t c
However, Statistics has the following limitations.
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty, and
standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The population
size of a country for some given year does not help us for comparative studies.
c) Statistical results are true only on the average.
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in a
particular year by women drivers is 10 while that committed by men drivers is 40. Hence women
drivers are safe drivers.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The following are
some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.

Page 2 of 68
ecture notes on statistics

4. Furnishes a technique of comparison


5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.

1.5 TYPES OF VARIABLES AND MEASUREMENT SCALES


1.5.1 TYPES OF VARIABLES
A variable is a characteristic of an object that can have different possible values.
There are two types of variables.
a) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that cannot be quantified directly. Examples: color , beauty,
sex, location qualitative variables are also called categorical variables. And hence we have two
types of data; quantitative & qualitative data.
Quantitative variables can be further classified as
 Discrete variables, and
 Continuous variables
I. Discrete variables are variables whose values are counts.
Examples: number of students, number of households (family size), Number of pages of a book.
II. Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.

1.5.2 MEASUREMENT SCALES


 Measurement scale refers to the property of value assigned to the data based on the properties of
order, distance and fixed zero.
 The goal of measurement systems is to structure the rule for assigning numbers to objects in such
a way that the relationship between the objects is preserved in the numbers assigned to the objects.
MEASUREMENT
Order
 The property of order exists when an object that has more of the attribute than another object, is
given a bigger number by the rule system. This relationship must hold for all objects in the "real
world".
Distance
 The property of distance is concerned with the relationship of differences between objects. The
unit of measurement means the same thing throughout the scale of numbers. That is, an inch is an
inch
Fixed Zero
 A measurement system possesses a rational zero (fixed zero) if an object that has none of
the attribute in question is assigned the number zero by the system of rules.
 The property of fixed zero is necessary for ratios between numbers to be meaningful.

There are four types of measurement scales for variables


1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping individuals into
different categories.
 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 In this scale, one is different from the other
 +, -, *, /, Impossible,( No arithmetic and relational operation can be applied)
 comparison is impossible
Examples:
o Blood Type

Page 3 of 68
ecture notes on statistics

o Political party preference (Republican, Democrat, or Other,)

o Sex (Male or Female.)

o Marital status(married, single, widow, divorce)

o Country code

o Regional differentiation of Ethiopia.

o red, brown, black, short, tall, pass, fail


2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”
 It is a scale for grouping and ordering of individuals in to different categories.
 Data consisting of an ordering of ranking of measurements are said to be on an ordinal
scale of measurements.
 Level of measurement which classifies data into categories that can be ranked. Differences
between the ranks do not exist.
 One is different from and grater /better/ less than the other.
 +, -, *, / Are impossible,( Arithmetic operations are not applicable)
 Comparison is possible.(relational operations are applicable)
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status.
 Man A weighs more than man B
 Faster, taller, shorter, military ranks, ranks in race, e t c
Ordinal scales data contain and convey more information than the nominal scale data, for relative
magnitudes are known, however, quantitative comparisons are impossible.
3. Interval scale: is a measurement scale in which:
 There is no physical significance to the zero point.
 There is a constant interval size between any adjacent units on the measurement scale.
 Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero.
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
 Interval scale data convey better information than nominal and ordinal scale data
 In this measurement when zero occurs it is an arbitrary measurement rather than actually
indicating “nothing”.
Examples:
 IQ
 Temperature in oF.
4. Ratio Scales:
 Ratio scales are measurement systems that possess all three properties: order,
distance, and fixed zero.
 The added power of a fixed zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas
this is not possible with interval scales.

Page 4 of 68
ecture notes on statistics

 Level of measurement which classifies data that can be ranked, differences


are meaningful, and there is a true zero. True ratios exist between the
different units of measure.
 +, -, *, / Are possible on this scale and relational operations are applicable.
 This measurement scale provides better information than interval scale of measurement
 Zero measurement indicates absence of the quantity being measured.
Examples:
Weight
Height
Number of students
Age Examples: m, cm, kg, km/hr, cm/sec, Year, hour, second, m3, e t c.

Page 5 of 68
ecture notes on statistics

CHAPTER 2
2 Methods of data collection and Presentation
2.1 Methods of data collection
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is statistical
data when they are
 Comparable
 Meaningful and
 Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of magnitude.
 It enables us to know the range of the data set easily and it also gives us some idea about the
general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained from
either a primary source or a secondary source.
Primary source: Is a source of data that supplies first-hand information for the use of the immediate
purpose.
1. Primary data: are data originally collected for the immediate purpose.
 Data measured or collect by the investigator or the user directly from the
source.
 Primary data are more expensive than secondary data.
 Two activities involved: planning and measuring.
 Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection method,…
et
 Decide measurement procedure.
 Set up the necessary organizational structure.
 Measuring: there are different options.
 Focus Group  Mall Intercept
 Telephone Interview  New Product Registration
 Mail Questionnaires  Personal Interview and
 Door-to-Door Survey
 Experiments are some of the sources for collecting the primary data.
 The process of data collection from a primary source may in value.
a) Field trials
b) Laboratory experiments
c) Surveys – census survey - Sample survey.
2. Secondary data: data collected from a secondary source.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
 Usually they are published or unpublished materials, records, reports, e t c.
 When our source is secondary data check that:
I. The type and objective of the situations.
II. The purpose for which the data are collected and compatible with the present
problem.

Page 6 of 68
ecture notes on statistics

III. The nature and classification of data is appropriate to our problem.


IV. There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 METHODS OF DATA PRESNTATION
Having collected and edited the data, the next important step is to organize it. That is to present it in a
readily comprehensible condensed form that aids in order to draw inferences from it. It is also necessary
that the like be separated from the unlike ones.

The presentation of data is broadly classified in to the following two categories:

 Tabular presentation

 Diagrammatic and Graphic presentation.

The process of arranging data in to classes or categories according to similarities technically is called
classification. Classification is a preliminary and it prepares the ground for proper presentation of data.
Classification eliminates inconsistency and also brings out the points of similarity and/or dissimilarity of
collected items/data. It is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.

2.2.1 Frequency Distributions


Definitions:
 Raw data: recorded information in its original collected form, whether it be counts or
measurements, is referred to as raw data.

 Frequency: - is the number of times a certain value or set of values occurs in a specific group.
 Frequency distribution: is the organization of raw data in table form using classes and frequencies.
Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39
There are three basic types of frequency distributions
 Categorical frequency distribution

 Ungrouped frequency distribution


 Grouped frequency distribution
There are specific procedures for constructing each type.
1) Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. E.g. Marital status.
Example: a social worker collected the following data on marital status for 25 persons.
(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Page 7 of 68
ecture notes on statistics

Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M, S, D, and
W. These types will be used as class for the distribution. We follow procedure to construct the frequency
distribution.
Step 1: Make a table as shown.
Class (1) Tally (2) Frequency (3) Percent (4)
M
S
D
W

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
%= ∗100
Step 4: Find the percentages of values in each class by using; n Where f= frequency of the
class, n=total number of value. Percentages are not normally a part of frequency distribution but they can be
added since they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.

Class(1) Tally (2) Frequency (3) Percent(4)


M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24

2) Ungrouped frequency distribution


Ungrouped frequency distribution is a table of all potential raw scored values that could possibly occur in the data

along with their corresponding frequencies. Ungrouped frequency distribution is often constructed for small set of

data or a discrete variable.

Constructing an ungrouped frequency distribution

To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in the collected

data. Then make a columnar table of all potential raw scored values arranged in order of magnitude with the

number of times a particular value is repeated, i.e., the frequency of that value. To facilitate counting method,

tallies can be used.

Example: The following data are the ages in years of 20 women who attend health education last year: 30, 41,
39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range=Maximum observation−Minimum observation

Page 8 of 68
ecture notes on statistics

STEP 2. Construct a table, tally the data and complete the frequency column. The frequency distribution becomes
as follows.

Age Tally Frequency


29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1
3) Grouped frequency distribution
When the range of the data is large, the data must be grouped into classes. Grouped frequency distribution is a
frequency distribution when several numbers of data are grouped into one class.
Some Important Definitions
 Class: the different, non-overlapping groups of data.
 Class limits: separate one class in a grouped frequency distribution from another. The limits could actually
appear in the collected data and have gaps between the limit of one class and the lower limit of the next
class.

 Units of measurement (U): the distance between two possible consecutive measures. It is usually
taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: separate one class in a grouped frequency distribution from another. The boundaries
have one more decimal place than the raw data and therefore do not appear in the collected data. There is
no gap between the upper boundary of one class and the lower boundary of the next class. The lower class
boundary (LCB) is found by subtracting 0.5 units of measurement from the lower class limit (LCL) and the
upper class boundary (UCB) is found by adding 0.5 units of measurement to the upper class limit (UCL).
1 1
That is, LCB=LCL+ 2 U and UCB =UCL + 2 U
 Class width (W): the difference between the upper and lower boundaries of any class or the lower limits of
two consecutive classes, or the upper limits of two consecutive classes.
o N.B. Class width is not equal to the difference between UCL and LCL of the same class.
 Class mark (M): the midpoint of a class interval.
UCBi + LCB i
M=
i.e. 2
 Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less than or
equal to the upper class boundary for the given class.
 Cumulative frequency (Cf) more than type: The total frequency of all values (observations) greater than or
equal to the lower class boundary for the given class.
 A tabular arrangement of class intervals together with their corresponding cumulative frequency (either
less than or more than type; as defined above) is called a cumulative frequency distribution.
 Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all frequencies) and, if
multiplied by 100, gives the percent of values falling in that class.
Frequency of that class
Re lative frequency of a class=
Total frequency
Note:
 The relative frequency shows what fractional part or proportion of the total frequency belongs to the
corresponding class.

Page 9 of 68
ecture notes on statistics

 The sum of all the relative frequencies in the frequency distribution is always 1.
 Relative cumulative frequency (less than type/ more than type): total of the relative frequencies above/
below a class inclusively. Or the cumulative frequency (less than type/more than type) divided by the total
frequency. This gives the percent of values which are less than/more than the upper/lower class boundary.
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into two different
classes
3. The classes must be all inclusive or exhaustive. This means that all data values must be included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is possible to
have an "below ..." or "... and above" class. This is often used with ages.
Guidelines to construct a grouped frequency distribution
STEP 1. Determine the unit of measurement, U
STEP 2. Find the maximum(Max) and the minimum(Min) observation, and then compute their range, R
Range=Max−Min
STEP 3. Fix the number of classes desired (k). there are two ways to fix k:
 Fix k arbitrarily between 5 and 20, or

 Use Sturge’s Formula:


k=1+3. 332 log10 N where N is the total frequency. And round
this value of k up to get an integer number.
STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the number up

R
W=
to get an integer value. K

STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest of the
lower limits.
STEP 6. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to this
upper limit so as to get the rest of the upper limits.
1 1
STEP 7. Compute the class boundaries as: LCB=LCL− 2 U and UCB=UCL+ 2 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB= upper
class boundary. The class boundaries are also half way between the upper limit of one class and the lower
limit of the next class.
STEP 8. Tally the data and Find the frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The number of hours 40 employees spends on their job for the last 7 working days is given below.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48

Page 10 of 68
ecture notes on statistics

27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
Construct a suitable frequency distribution for these data using 8 classes.
STEP 1. Unit of measurement; U= 1year
STEP 2. Max = 65, Min = 26 so that R = 65-26 = 39
STEP 3. It is already determined to construct a frequency distribution having 8 classes.
39
W= =4 . 875≈5
STEP 4. Class width 5
STEP 5. Starting point = 26 = lower limit of the first class. And hence the lower class limits become
26 31 36 41 46 51 56 61
STEP 6. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits Class limits
26 – 30 46 – 50
31 – 35 51 – 55
36 – 40 56 – 60
41 – 45 61 – 65
STEP 7. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units of
measurement to the upper class limits, we can get lower and upper class boundaries as follows.
Class Class
boundaries boundaries
25.5 – 30.5 45.5– 50.5
30.5 – 35.5 50.5– 55.5
35.5– 40.5 55.5– 60.5
40.5– 45.5 60.5– 65.5
STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally frequenc Cumulative Cumulative
boundaries y frequency (less frequency
than type) (more than type)
26 – 30 25.5 – 30.5 //// 5 5 40
31 – 35 30.5 – 35.5 //// 5 10 35
36 – 40 35.5– 40.5 //// 5 15 30
41 – 45 40.5– 45.5 //// //// 9 24 25
46 – 50 45.5– 50.5 //// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 //// / 6 40 6

2.2.2 Diagrammatic and Graphic Presentation of Data


The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically.
Diagrams and graphs:
 are techniques for presenting data in visual displays using geometric figures;
 are visual aids which give a bird’s eye view about a given set of numerical data;
 have greater attraction than mere figures (numbers);
 facilitate comparison of data;
 are easily understandable by anyone who does have no statistical background

Page 11 of 68
ecture notes on statistics

Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for presenting
continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and pictograms, as
well as three common graphic presentations of data: histogram, frequency polygon, and cumulative frequency
polygon (ogive).
I. Bar-diagrams/ Bar-charts
 Bar-diagram is a series of equally spaced bars having equal width and the height of each bar representing the
magnitude or frequency of observations in each group.
 Bar-diagrams are usually used to represent one way or simple frequency distribution.
 Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-diagrams are used for
qualitatively classified data whereas vertical bar-diagrams are used for quantitatively classified data.
Example: Horizontal bar-diagram.

AB
B lo o d T y p e

8 10 12 14 16 18

Frequency

Page 12 of 68
ecture notes on statistics

There are a number of bar-diagrams. The most common being:


 Simple bar-diagrams
 Deviation (two-way) bar-diagrams
 Broken bar-diagrams
 Component (subdivided) bar-diagrams
 Multiple bar-diagrams
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of three
products for 2004 production year.
Produc Sale (in million)
t
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.
S a le s ( in m illio n b ir r )

22

20

18

16

14

12

10

6
A B C D

Product

2. Deviation bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export, percent
change, etc) deviation bar-diagrams are appropriate.
Example: Present the following data using a suitable bar-diagram.
Data: Net profit (in thousands birr) in oil sales for five years
Profit Year (in
thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6
P r o f it ( in t h o u s a n d s )

The deviation bar-diagram for the data looks like the following.
20

10

-10
1997 1998 1999 2000 2001

Year

3. Broken bar-diagrams

Page 13 of 68
ecture notes on statistics

This kind of bar-diagram is used to present data involving a few extreme values where it will be difficult
to accommodate the magnitude of the bars corresponding to these values within the graph paper. In this
case we use pieces of bars with each piece starting with a jump on the numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity
produced (kg/day)
A 14
B 35
C 23
D 109

Page 14 of 68
ecture notes on statistics

When it is desired to show how a total (an aggregate) is divided into component parts, we use component
bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a variable with each
aggregate broken into its component parts and different colors or designs are used for identification.
Example: Represent the following data using bar-charts
Data: Yields of production of farmers in Southern Ethiopia.
Year  1990 EC 1991 EC 1992 EC 1993 EC
Crop
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47
P r o d u c t io n

The component bar-diagram for this table is as follows


60

50

40

30

20

MAIZE
10
WHA ET

0 BARLEY
1990 1991 1992 1993

YEAR

5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for comparing different
variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as below.
P r o d u c t io n

30

20

10

BARLEY

WHAET

0 MA IZE
1990 1991 1992 1993

YEAR

II. Pie-charts
A pie-chart is a circle that is divided into sections according to the percentages of frequencies in each
category of the distribution. The angle of the sector of a class is obtained by multiplying the ratio of the
frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sector angle of a class= ×3600
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses. Below is
the breakdown of the various expenditure items. Draw an appropriate chart to portray the data.

Page 15 of 68
ecture notes on statistics

Expenditure item Amount (in birr)


Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492
Total 2,950
How to draw a pie-chart
 First find the percentages of each class
 Next calculate the degree measures for each class
 Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for explanation.
Expenditure item Amount (in birr) Percentage (approx) Degree (approx)
Fuel 603 20 74
Interest on car loan 279 9 34
Repairs 930 32 113
Insurance and license 646 22 79
Depreciation 492 17 60
Total 2,950 100 360

Now we can draw the pie-chart for the data.

Key
Fuel 17% 20%

Insurance and license


Repairs
Interest on car loan 9% 22%

Depreciation
32%

III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a suitable picture
to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of students in a certain school
for four years.)
Year 1992 1993 1994 1995
No. of students 2000 3000 5000 7000
Let a single picture () represents one thousand students.
199 
5
199  Key: = 1000 students
4
199 
3
199 
2
IV. Histogram

Page 16 of 68
ecture notes on statistics

A histogram is another way of data presentation which is more suitable for frequency distributions with
continuous classes. In drawing a histogram, we put the class boundaries of each class on the horizontal
axis and its respective frequency on the vertical axis.

Example: Draw a histogram presenting the following data.


Class Class Cumulative Frequency Cumulative Frequency
Boundaries Mark Frequency (less than type) (more than type)
5.5 – 11.5 8.5 2 2 20
11.5 – 17.5 14.5 2 4 18
17.5 – 23.5 20.5 7 11 16
23.5 – 29.5 26.5 4 15 9
29.5 – 35.5 32.5 3 18 5
35.5 – 41.5 38.5 2 20 2
8

6
V a lu e F r e q u e n c y

0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5

Clas s Mid points

VI. Cumulative Frequency Polygon (Ogive)


Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis. Place
the class boundaries along the horizontal axis and the corresponding cumulative frequencies (either less
than or more than cumulative frequencies) along the vertical axis. Then join the cross points by a free
hand curve.
Example: the data in the previous example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.
(i) Less than type cumulative frequency polygon
30

30
Less than type cumulative frequencies

M o re th a n typ e cu m u la tive fre q u e n cie s

20
20

10
10

0 0
11.50 17.50 23.50 29.50 35.50 41.50 5.50 11.50 17.50 23.50 29.50 35.50

Upper class boundaries Lower class boundaries

Page 17 of 68
ecture notes on statistics

(ii) More than type cumulative frequency polygon

Page 18 of 68
ecture notes on statistics

Chapter three
3 Measuring Central Tendency:
3.1 Introduction
The most important aspect of studying the distribution of a sample measurement is the position of the central value,
that is, a representative value about which the measurements are distributed and when it is convenient to have one
figure that is representative of each group. This figure is known as the average of the group. If the numbers of the
group are arranged in order of magnitude, the averages tend to fall around the central position in the group, so
averages are called measures of central tendency. In short, any measure intended to represent the center of data set
is called a measure of location or central tendency.
Objectives
The most important objectives of measuring central tendency are:
 To determining a single value around which the other data will concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between groups of data
Desirable properties of good measure of central tendency
We say a measure of central tendency is best if it possess most of the following. It should:
 be simple to understand and easy to calculate/interpret,
 exist and be unique,
 be rigidly defined by mathematical formula,
 be based on all observations,
 Not be seriously affected by extreme observations,
 Have capable of further statistical analysis and/or algebraic manipulation.

3.2 The Summation Notation (∑)


x1 , x 2 , ..., x n where n (the last subscript) denotes
Let a data set consists of a number of observations, represents by
xi is the ith observation. Then the sum
the number of observations in the data and
n
x 1+ x2 +…+ x n =∑ x i
i=1
For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x 1 , x 2 , x 3 , x 4 , x5 and
x6 where
x 1 = 21, x 2 = 13, x 3 = 54, x 4 = 46, x 5 = 32 and x 6 = 37.
6
∑ x i=
Their sum becomes i=1 21+13+59+46+32+37=208.
n

x 2 +x 2 + . . . + x 2
∑ xi 2
Similarly 1 2 n = i=1
Some Properties of the Summation Notation
n
∑c
1. i=1 = n.c where c is a constant number.
n n
∑ b . x i=b ∑ x i
2. i=1 i=1 where b is a constant number
n n
∑ (a+bx i )=n . a+b ∑ x i
3. i =1 i =1 where a and b are constant numbers
n n n
∑ (x i ± y i )= ∑ x i ±∑ y i
4. i=1 i=1 i=1

Page 19 of 68
ecture notes on statistics

3.3 Types of Measures of Central Tendency


Several types of averages or measures of central tendency can be defined, the most commons are
- the arithmetic mean or the mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the property under
discussion.

3.3.1 The Arithmetic Mean (The Mean)


The arithmetic mean is defined as the sum of the measurements of the items divided by the total number of items.
Arithmetic Mean for Ungrouped Frequency Distribution
When the data are arranged or given on the form of ungrouped frequency distribution, then the formula for the
mean is
k
f i xi
f 1 x 1 + f 2 x 2 +…+ f k x k ∑
X́ = = i=1k Note that
f 1 +f 2 + …+ f k
∑ fi
i=1
Example: Obtain the mean of the following number
2, 7, 8, 2, 7, 3, 7
Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
4
∑ f i Xi
36
X̄ = i =14 = =5 . 15
7
∑fi
i=1

Page 20 of 68
ecture notes on statistics

Exercise 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record the following:
17.5 19.5 17.5 19 20
21 18 19.5 18 10.75
Compute the sample mean length of the infants for these data.
Exercise 2: Monthly incomes of fourth year regular students are given in the following frequency distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.
Arithmetic Mean for Grouped Frequency Distribution
If data are given in the form of continuous frequency distribution, the sample mean can be computed as
k
f i xi
f 1 x 1 + f 2 x 2 +…+ f k x k ∑
X́ = = i=1k
f 1 +f 2 + …+ f k
∑ fi
i=1

Where
xi = the class mark of the i
th
class; i = 1, 2, …, k
fi i
th
= the frequency of the class and k = the number of classes
k
∑ f i=n
Note that i=1 = the total number of observations.
Example: Calculate the mean for the following age distribution.
Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
 First find the class marks
 Find the product of frequency and class marks
 Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280 6

11- 15 23 13 299 ∑ f i Xi
i =1 1575
16- 20 15 18 270 X̄ = 6
= =15 .75
21- 25 12 23 276 100
26- 30 9 28 252
∑fi
i=1
31- 35 6 33 198 Exercises:
Total 100 1575 1. Marks of 75 students are summarized in the following frequency
distribution:

Marks No. of students


40-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If 20% of the students have marks between 55 and 59
i. Find the missing frequencies f4 and f5.

Page 21 of 68
ecture notes on statistics

ii. Find the mean.


2. The following table gives the daily wages of laborers. Calculate the average daily wages paid to a laborer.
Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25
Number of laborers 3 4 5 6 6 4 3
Properties of the Arithmetic Mean
 The sum of the deviations of the items from their arithmetic mean is zero. This means, the algebraic sum of the
deviations of a set of numbers
x 1 , x 2 , . . ., x n from their mean x̄ is zero.
n

 (x i  x)  0
That is i 1
 The sum of the squares of the deviations of a set of observations from any number, say A, is the least only
2
when A= X́ . That is, ∑ (x i−x́)2 ≤ ∑ ( x i− A)

 When a set of observations is divided into k groups and


x̄ 1 is the mean of n1 observations of group 1,
x̄ 2 is the mean of n2 observations of group2, …, x̄ k is the mean of
nk observations of group k ,
then the combined mean ,denoted by
x̄ c , of all observations taken together is given by
k
ni x́i
n x́ + n x́ +…+ nk x́ k ∑
X́ c = 1 1 2 2 = i=1k
n1 +n 2+ …+nk
∑ ni
i =1
 If a wrong figure has been used in calculating the mean, we can correct if we know the correct figure that
should have been used. Let
 X wr denote the wrong figure used in calculating the mean
 X c be the correct figure that should have been used
 X́ wr be the wrong mean calculated using X wr , then the correct mean, X́ correct , is given by
n X́ wr + X c − X wr
X́ correct
n
 If the mean of
x 1 , x 2 , . . ., x n is x̄ , then
a) the mean of
x 1±k , x 2 ±k , . . ., x n±k will be x̄±k
b) The mean of 1 2 kx , kx , .. .,kx
n will be k x̄ .
Example 1: Last year there were three sections taking Stat 273 course in Alemaya University. At the end of the
semester, the three sections got average marks of 80, 83 and 76. There were 28, 32 and 35 students in each section
respectively. Find the mean mark for the entire students.
Solution:
n1 x̄ 1 +n 2 x̄2 +n3 x̄ 3 28(80)+32(83 )+35( 76 ) 7556
x̄ c= = = =
n1 +n 2 +n3 28+ 32+ 35 95 79.54
Example 2: An average weight of 10 students was calculated to be 65 kg, but latter, it was discovered that one
measurement was misread as 40 kg instead of 80 kg. Calculate the corrected average weight.
n X́ wr + X c − X wr 10 ( 65 ) +80−40
Solution: X́ correct = =69
n 10
Example 3: The mean of n Tetracycline Capsules X 1, X2, …,Xn are known to be 12 gm. New set of
capsules of another drug are obtained by the linear transformation Y i = 2Xi – 0.5 ( i = 1, 2, …, n ) then
what will be the mean of the new set of capsules
Solutions:
NewMean=2∗OldMean−0 . 5=2∗12−0 . 5=23 . 5
Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100. After the mid-term
exam, however, a student whose score was 41 out of 100 dropped the course. What is the average/mean score
among the 24 students?
NOTE

Page 22 of 68
ecture notes on statistics

If the values in a series or mid values of a class are large enough, coding of values is a good device
to simplify the calculations.
 For raw data suppose we have used the following coding system.
d i= X i− A ⇒ X i =d i + A
n n n
∑ X i ∑ ( di + A ) ∑ di
X̄ =i=1 =i =1 ⇒ X̄= A+ i=1 ⇒ X̄ =A + d̄
n n n
Where A is an assumed mean and d̄ is the mean of the coded data.
 If the data are expressed in terms of ungrouped frequency distribution
d i= X i− A ⇒ X i =d i + A
k k k
∑ f i X i ∑ f i ( di + A ) ∑ f i di
i=1 i=1 i=1
X̄ = = ⇒ X̄= A+ ⇒ X̄= A+ d̄
n n n
 In both cases the true mean is the assumed mean plus the average of the deviations from the assumed
mean.

Page 23 of 68
ecture notes on statistics

 Suppose the data is given in the shape of continuous frequency distribution with a constant class size
of w then the following coding is appropriate.
X −A
d =
i ⇒ X = wd + A
i w i i
k k k
∑ f i X i ∑ f i ( wd i + A ) ∑ f i wd i
i=1 i=1 i=1
X̄ = = ⇒ X̄ = A + ⇒ X̄ = A + w d̄
n n n
Where: Xi is the original class mark for the ith class.
di is the transformed class mark for the ith class.
A is an assumed mean usually the mean of the class marks. (i =1, 2… k)
Example:
1. Suppose the deviations of the observations from an assumed mean of 7 are: 1, -1, -2, -2, 0, -3,
-2, 2, 0, -3.
a) Find the true mean
b) Find the original observation.
Solutions:
10
A =7 , ∑ d i=−10
i=1
−10
⇒ d̄= =−1
10
a) ⇒ X̄ = A+ d̄=7−1=6
The true mean is 6.
b) Using Xi=A+di we obtain the following original observations:
8, 6, 5, 5, 7, 4, 5, 9, 7, 4.
Weighted Arithmetic Mean
In finding arithmetic mean, all items were assumed to be of equal importance. When due importance is to be given
to each item, that is, when proper importance is required to be given to different data, then we find weighted
average. Weights are assigned to each item in proportion to its relative importance.
If x 1 , x 2 , … , x k represent values of the items and w 1 , w 2 , … , wk are the corresponding weights, then the weighted

mean, ( X́ W ) is given by
k

w1 x1 + w2 x 2 +…+ wk x k i=1 ∑ wi x i
X́ w = = k
w1 +w 2+ …+w k
∑ wi
i=1
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are respectively 82, 80, 90 and
70.If the respective credits received for these courses are 3, 5, 3 and 1, determine the approximate average mark the
student has got for one course.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as the number of
credits received for the corresponding course.
x 82 80 90 70
i
wi 3 5 3 1

x̄ w =
∑ wi x i = (3×82)+(5×80 )+(3×90)+(1×70) =82. 17
∑ wi 3+5+3+1
Therefore
Average mark of the student for one course is approximately 82.
 Example: Suppose that a student obtained the following grades in the first semester of freshman
program.

Course Mats Bio Chem Phy Flen

Page 24 of 68
ecture notes on statistics

Credit hours 4 3 3 4 3
Grade A C B B C

w x i i
xw  i 1
n

w i
i 1 = 49/17 = 2.88
Merits of Arithmetic Mean
 Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
 It is calculated based on all observations.
 Arithmetic mean is simple to calculate and easy to understand. It doesn’t need arraying (arranging in
increasing or decreasing order) of the data.
 Arithmetic mean is also capable of further algebraic treatment.
 It affords a good standard of comparison.
Demerits of Arithmetic Mean
 It is highly affected by extreme (abnormal) observations in the series. For instance, the monthly incomes
of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026 birr. The average income
become for one of these four people becomes 219 birr which is not at all a representative figure.
 It can be a number which does not exist in the series.
 It sometime gives such results which appear almost meaningless. For example it is likely that we can get
an average of ‘3.6 children’ per family.
 It gives greater importance to bigger items of a series and lesser importance to smaller items. That means
it is an upward bias measure.
 It can’t be calculated for open-ended classes.

THE GEOMETRIC MEAN


 The geometric mean of a set of n observation is the nth root of their product.
 The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
n
G. M =√ X 1∗X 2∗. . .∗X n
 Taking the logarithms of both sides
1
n
log (G . M )=log ( √ X 1∗X 2∗.. .∗X n )=log ( X 1∗ X 2∗.. .∗ X n )n
1 1
⇒ log ( G. M )= log ( X 1∗X 2∗. .. .∗X n )= (log X 1 + log X 2 +. ..+ log X n )
n n
n
1
⇒ log ( G. M )= ∑ log X i
n i=1
⇒ The logarithm of the G.M of a set of observation is the arithmetic mean of their logarithm.
n
1
⇒ G . M = Anti log ( ∑ log X i )
n i=1

Example: Find the G.M of the numbers 2, 4, 8.


Solutions:
3 3
G. M =√n X 1∗X 2∗. . .∗X n =√ 2∗4∗8=√ 64=4
Remark: The Geometric Mean is useful and appropriate for finding averages of ratios.

THE HARMONIC MEAN


The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
n
H . M= n
∑ X1
i=1 i , This is called simple harmonic mean.

Page 25 of 68
ecture notes on statistics

n
H . M= k k
fi
∑X n=∑ f i
In a case of frequency distribution: i=1 i , i=1
If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their harmonic mean is given
by
n
∑ Wi
i =1
H . M= n
∑ W i / Xi
i=1 , This is called Weighted Harmonic Mean.
Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the college
to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
2
H . M= =12 km / hr
1 1
+
X1= 10km/hr X2=15km/hr 10 15

3.3.2 The Median


The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the middle
value or the arithmetic mean of the two middle values. We shall denote the median of x 1 , x 2 , … , xn by ~
X.
For ungrouped data the median is obtained by

~x=¿ x if the number of items, n, is odd ¿¿¿


{
n+1
2
¿
For grouped data the median, obtained by interpolation method, is given by
~ w n
X =Lmed +
f med 2
−C ( )
Where medL = lower class boundary of the median class
C = Sum of frequencies of all class lower than the median class (in other words it is the cumulative frequency
preceding the median class)
f med= Frequency of the median class and W= is class width
n
The median class is the class with the smallest cumulative frequency greater than or equal to 2 . It can be located
n
by counting 2 of the frequencies beginning from the lowest class.
Example: Find the median of the following numbers.
a) 6, 5, 2, 8, 9, 4. b) 2, 1, 8, 3, 5, 8.
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9
~ 1 1 1
X = ( X n + X n ) = ( X [3 ]+ X [4 ] ) = (5+ 6)=5. 5
2 [ 2 ] [ 2 +1 ] 2 2
Here n=6
b) Order the data :1, 2, 3, 5, 8
~
X= X n+1 = X [3 ] =3
[ ]
Here n=5 2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2, 6.4, 10.5,
8.1 and 7.8. Find the median weight of these five babies.

Page 26 of 68
ecture notes on statistics

Solution: the median is 8.1.


Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
 Find median using formula.

Class Frequency Cumu.Freq(less


than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75
n 75
= =37. 5. 39 is the first cumulative frequency to be greater than or equal to 37 .5
2 2
⇒ 50−54 is the median class .
Lmed =49. 5, w = 5, n = 75 , c = 17 , f med= 22
~ w n 5
⇒ X =Lmed + ( −c ) =49 . 5+ (37 .5−17 ) =54 .16
f med 2 22
Exersise1: The following table gives the distribution of the weekly wages of employees of a small firm.
Wages in birr No. of employees
126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
172 and above 2
a) Find the median weekly wage.
b) Why is the median a more suitable measure of central tendency than the mean in this case?
Merits of median
 Median is a positional average and hence it is not influenced by extreme values.
 Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
 Median can be calculated even in case of open-ended intervals.
 It gives best result in a study of those phenomena’s which are incapable of direct quantitative measurement.
Example: intelligence
Demerits of median
 It is not capable of further algebraic treatment.
 It is not a good representative of the data if the number of items (data) is small.
 The arrangement of items in order of magnitude is sometimes very tedious process if the number of items is very
large.

Page 27 of 68
ecture notes on statistics

3.3.3 The Mode


The mode or the modal value is the most frequently occurring score/observation in a series and denoted by x^ . Note
that the mode may not exist in the series or, even if it does exist, it may not be unique.
Examples:
1. Find the mode of 5, 3, 5, 8, 9 Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
For grouped data, the mode is found by the following formula:
Δ1
^x =Lmod +
( )
Δ1 + Δ 2
W

Where
Lmod = lower class boundary of the modal class
Δ 1= The difference between the frequency of the modal class and the next lower class
Δ 2= The difference between the frequency of the modal class and the next higher class
W= is the class width
The modal class is the class with the highest frequency in the distribution.
Example: Following is the distribution of the size of certain farms selected at random from a district.
Calculate the mode of the distribution.

Size of farms No. of farms


5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
Solutions:
45−55 is the mod al class, sin ce it is a class with the highest frequency .
Lmo=45 , w=10, Δ1 =f mo−f 1=2, Δ 2=f mo −f 2 =26
f mo=31 , f 1 =29 , f 2 =5
2
⇒ X=45+10
^
( )
2+26
= 45 . 71
Exercise 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70, 75, 73, 80, 70, 83
and 86. Find the mode of the students’ marks.
Exercise 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30 children given below.
Weight 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3
No. of children 5 5 9 4 4 3
Merits of mode
 Mode is not affected by extreme values.
 Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all observations.
Demerits of mode
 Mode may not exist in the series and if it exists it may not be a unique value.
 It does not fulfill most of the requirements of a good measure of central tendency
 It may be unrepresentative in many cases.

Page 28 of 68
ecture notes on statistics

3.3.4 Quantiles
Quantiles are values which divides the data set arranged in order of magnitude in to certain equal parts. They are
averages of position (non-central tendency). Their measures that depend up on their positions in distribution
quartiles, deciles, and percentiles are collectively called quantiles.
I.
Q ,Q Q
Quartiles: are values which divide the data set in to four equal parts, denoted by 1 2 and 3 . The first
quartile is also called the lower quartile and the third quartile is the upper quartile. The second quartile is the
median.
Q1 is a value which has 25% items which are less than or equal to it. Similarly Q 2 has 50%items
with value less than or equal to it and Q3 has 75% items whose values are less than or equal to it.
iN
To find Qi (i=1, 2, 3) we count 4 of the classes beginning from the lowest class.

For Ungrouped data:


Qj th
Let be the j quartile value for j  1, 2, 3 . Then
th
j
(
Q j = ( n+1 )
4 ) item ; j=1 , 2 , 3 .
For grouped data: We can apply the following formula:
w iN
Q =LQ + f ( 4 −c) ,i=1,2,3
i i Q i

Where: LQ =lower class boundary of the quartile class. w = the size of the quartile class
i

N = total number of observations. f Q = thefrequency of the quartile class. i

c = the cumulative frequency (less than type) preceeding the quartile class.
Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative frequency (less than type)
iN
greater than or equal to 4 .
II. Deciles are measures that divide the frequency distribution in to ten equal parts.
The values of the variables corresponding to these divisions are denoted D 1, D2,.. D9 often called the
first, the second,…, the ninth decile respectively. The fifth decile is the median.
iN
To find Di (i=1, 2,..9) we count 10 of the classes beginning from the lowest class.
Dj th
For Ungrouped data: Let be the j percentile value for j  1, 2, ... , 9 . Then
th
j
D j= (
10
( n+1 ) ) item; j=1 , 2 , . . . , 9
For grouped data: We can apply the following formula:
w iN
D i=L D + ( −c ) ,i=1,2 , .. . , 9
i f D 10
i

Where :
LD =lower class boundary of the decile class , w = the size of the decileclass
i

N = total number of observations . f D = thefrequency of the decile class .


i

c = the cumulative frequency (less than type) preceeding the decile class .
Remark:

Page 29 of 68
ecture notes on statistics

The decile class (class containing Di )is the class with the smallest cumulative frequency (less than type)
iN
greater than or equal to 10 .
III. Percentiles:
Percentiles are measures that divide the frequency distribution in to hundred equal parts.
The values of the variables corresponding to these divisions are denoted P 1, P2,.. P99 often called the
first, the second,…, the ninety-ninth percentile respectively. The fiftieth percentile is the median.
iN
To find Pi (i=1, 2,..99) we count 100 of the classes beginning from the lowest class.
P
For ungrouped data: Let j be the percentile value for j=1, 2, 3, . . . , 99 . Then
th
j
P j= (
100
( n+1 ) ) item; j=1 , 2 , 3 , . . . , 99
For grouped data : We can use the following formula:
w iN
Pi=L P + ( −c ) ,i=1,2, . .. , 99
i f P 100
i

Where :
LPi =lower class boundary of the percentile class . , w = the size of the percentile class
N = total number of observations . f P = thefrequency of the percentile class.
i

c = the cumulative frequency (less than type) preceeding the percentile class .
Remark:
The percentile class (class containing Pi )is the class with the smallest cumulative frequency (less than type)
iN
greater than or equal to 100 .
Interpretations
1.
Qj is the value below which ( j×25 ) percent of the observations in the series are found (where j  1, 2, 3 ).
For instance
Q3 means the value below which 75 percent of observations in the given series are found.
Dj
2. Is the value below which ( j×10 ) percent of the observations in the series are found (where j  1, 2, ... , 9 ).
For instance
D 4 is the value below which 40 percent of the values are found in the series.

3.
Pj is the value below which j percent of the total observations are found (where j=1, 2, 3, . . . , 99 ). For
P
example 73 percent of the observations in a given series are below 73 .
Example: Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
Values Frequency CF(less than type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493

Page 30 of 68
ecture notes on statistics

Solutions:
 First find the less than cumulative frequency.
 Use the formula to calculate the required quantile.
A) Quartiles:
I. Q1
 Determine the class containing the first quartile.
N
=123 . 25
4
⇒ 170−180 is the class containing the first quartile.
LQ =170 , w = 10 , N = 493 , c = 88 , f Q = 72
1 1

w N 10
⇒Q 1 =LQ + ( −c ) =170+ (123. 25−88)=174 . 90
1 fQ 4 72
1

II. Q2
- determine the class containing the second quartile.
2∗N
=246 . 5 ⇒ 190−200 is the class containing the sec ond quartile .
4
LQ =190 , w = 10 , N = 493 , c = 244 , f Q 2 = 107
2
w 2∗N 10
⇒Q2 =LQ + ( −c ) =170+ (246 .5−244 ) =190 . 23
2 fQ 4 72
2

III. Q3
- determine the class containing the third quartile.
3∗N
=369 .75 ⇒ 200−210 is the class containing the third quartile .
4
LQ =200 , w = 10 , N = 493 , c = 351 , f Q = 49
3 3

w 3∗N 10
⇒Q3 =LQ + ( −c ) =200+ (369 .75−351) =203. 83
3 fQ 4 49
3

B) D7
- determine the class containing the 7th decile.
7∗N
=345 . 1 ⇒ 190−200 is the class containing the seventh decile .
10
L D =190 , w = 10 , N = 493 , c = 244 , f D = 107
7 7

w 7∗N 10
⇒ D7 =L D + ( −c ) =190+ (345 .1−244 ) =199 . 45
7 f D 10 107
7

C) P90
- determine the class containing the 90th percentile.
90∗N
=443 .7 ⇒ 220−230 is the class containing the 90 th percentile .
100
L P =220 , w = 10 , N = 493 , c = 434 , f P = 31
90 90

w 90∗N 10
⇒ P90 =LP + ( −c ) =220+ (443 .7−434 ) =223 .13
90 f 100 31
P 90

Page 31 of 68
ecture notes on statistics

CHAPTER 4
4 Measures of Dispersion (Variation)
4.1 Introduction
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution. The average or central value
is of little use unless the degree of variation, which occurs about it, is given. If the scatter about the measure of
central tendency is very large, the average is not a typical value. Therefore it is necessary to develop a quantitative
measure of the dispersion (or variation) of the values about the average.
Measures of variation are statistical measures, which provide ways of measuring the extent to which the data are
dispersed or spread out.
Objectives: Measures of variation are needed for the following basic objectives.
 To judge the reliability of a measure of central tendency
 To compare two or more sets of data with regard to their variability
 To control variability itself like in quality control, body temperature, etc
 To make further statistical analysis or to facilitate the use of other statistical measures
Properties of a good measure of dispersion
A good measure of dispersion should:
 be rigidly defined by a mathematical formula,
 be simple to understand and easy to calculate,
 be unique,
 be fundamental of all observations in the series,
 not be affected by some extreme values existing in the series,
 have sampling stability property, and
 Be capable of further algebraic treatment as well as further statistical analysis.

4.2 Absolute and Relative Measures of Dispersion


Measures of dispersion /variation may be either absolute or relative. Absolute measures of dispersion are expressed
in the same unit of measurement in which the original data are given. These values may be used to compare the
variation in two distributions provided that the variables are in the same units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar versus tones of
sugarcane or if the average sizes are very different such as manager’s salary versus worker’s salary, the absolute
measures of dispersion are not comparable. In such cases measures of relative dispersion should be used.
A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate measure of
central tendency. It is sometimes called coefficient of dispersion because the word “coefficient” represents a pure
number (that is independent of any unit of measurement). It should be noted that while computing the relative
dispersion, the average (the measure of central tendency) used as a base should be the same one from which the
absolute deviations were measured. Note also that the value of a relative dispersion is unit less quantity.

4.3 Types of Measures of Dispersion

4.3.1 The Range and Relative Range


Range (R) is defined as the difference between the largest and the smallest observation in a given set of data. That

is,
R=x max −xmin where xmax and xmin are the largest and the smallest observations in the series respectively.
In case grouped data, range is found by taking the difference between the class mark of the last class and that of the

first class. That is,


R=M −M last first where M
last and M
first are the class marks of the last class and that
of the first class respectively.
A relative range (RR), also known as coefficient of range, is given by
x max −x min R
RR= = . . . . .. . . for ungrouped data
x max +x min x max + x min
M −M first R
RR= last = . . . . . . . . . for grouped data
M last +M first M last +M first

Page 32 of 68
ecture notes on statistics

Properties of Range and Relative Range


 Range and relative range are easy to calculate and simple to understand.
 Both cannot be computed for grouped data with open ended classes.
 They do not tell us anything about the distribution of values in the series.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain paint factory given
below.
462 480 534 624 498 552 606 588 516 570
Solution:
x max =624 birr x min=462 birr
R=x max −x min =624 birr−462 birr=162 birr
x −x 624 birr−462 birr 162 birr
RR= max min = = =0 .149
x max +x min 624 birr−462 birr 1086 birr
Example 2: Find the values of the range and relative range for the following frequency distribution: which shows
the distribution of the maximum loads supported by a certain number of cables.
Maximum load Number
(in kilo-Newton) of cables
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
M first =95 kN M last =130 kN
R=M last −M first =130 kN−95 kN =35 kN
M last −M first 130 kN −95 kN 35 kN
RR= = = =0 .156
M last +M first 130 kN +95 kN 225 kN
4.3.2 The Quartile Deviation (Semi-inter quartile range), Q.D
The inter quartile range is the difference between the third and the first quartiles of a set of items and semi-
inter quartile range is half of the inter quartile range.
Q 3 −Q 1
Q . D=
2
Coefficient of Quartile Deviation (C.Q.D)
(Q 3 −Q 1 / 2 2∗Q . D Q 3−Q 1
C . Q. D= = =
(Q 3 +Q 1 )/ 2 Q 3 +Q 1 Q 3 +Q 1
 It gives the average amount by which the two quartiles differ from the median.

Example: Compute Q.D and its coefficient for the following distribution.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49

Page 33 of 68
ecture notes on statistics

210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
In the previous chapter we have obtained the values of all quartiles as:
Q1= 174.90, Q2= 190.23, Q3=203.83
Q3 −Q1 203 . 83−174 . 90
⇒ Q . D= = =14 . 47
2 2
2∗Q . D 2∗14 . 47
C . Q . D= = =0 .076
Q3 +Q1 203 . 83+174 . 90
Remark: Q.D or C.Q.D includes only the middle 50% of the observation.

4.3.3 The Mean Deviation and Coefficient of Mean Deviation


The mean deviation (MD) measures the average deviation of a set of observations about their central
value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
The mean deviation of a sample of n observations
x 1 , x 2 , . . . ,x n is given as

MD=
∑|x i −A|
n
Where A is a central measure (the mean or the median)
In case of grouped data, the formula for MD becomes

MD=
∑ f i|x i− A|
n Where
xi is the class mark of the i
th
class,
fi is the frequency of the i
th

class and
n=∑ f
i .
 The mean deviation about the arithmetic mean is, therefore, given by

MD=
∑|x i − x̄| . . . .
n for ungrouped data

MD=
∑ f i|x i− x̄| . . . .
n for grouped frequency distribution; where
xi is the class mark of the i
th

f
class, i is the frequency of the i class and
th n=∑ f i
 The mean deviation about the median is also given by

MD=
∑|x i −~x| . . . .
n for ungrouped data

MD=
∑ f i|x i−~x| ....
n for grouped frequency distribution; where
xi is the class mark of the i
th

class,
fi is the frequency of the i
th
class and
n=∑ f i .
 Mean Deviation about the mode.
n
∑ |X i − X|
^
^ )= i=1
 Denoted by M.D( X^ ) and given by
M . D( X
n for ungrouped data
k
∑ f i|X i− X|
^
^ )= i=1
M . D( X
 For the case of frequency distribution it is given as: n
Coefficient of mean deviation (CMD)

Page 34 of 68
ecture notes on statistics

The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to their appropriate
measure of central tendency: the arithmetic mean or the median.
MD
CMD=
In general, A where A is a measure of central tendency: the arithmetic mean or the median.
MD
CMD=
That is, CMD about the arithmetic mean is given by x̄ where MD is the mean deviation calculated
MD
CMD= ~
about the arithmetic mean. On the other hand CMD about the median is given by x in which case
MD is calculated about the median of the observations. And also CMD about the mode is given by
MD
CMD=
^x in which case MD is calculated about the mode of the observations.
Properties of Mean Deviation and coefficient of mean deviation
- It is easy to understand and compute
- It is based on all observations
- It is not affected very much by the values of extreme value(s).
- It is not capable of further mathematical treatments and it is not a very accurate measure of dispersion.
Examples:
1. The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6, 5, 5, 7, 4, 5,
9, 7, 4 Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages
~ ^
X̄=6 , X=5. 5 , X=5
Then take the deviations of each observation from these averages.
Xi 4 4 5 5 5 6 7 7 8 9 total
|X i −6| 2 2 1 1 1 0 1 1 2 3 14
|X i −5 .5| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
|X i −5| 1 1 0 0 0 1 2 2 3 4 14
10 10
∑ |X i−6 )| ∑ |X i −5 .5|
14 14
⇒ M . D ( X̄ )= i =1 = =1 . 4 M . D(~
X )= i=1 = =1. 4
10 10 10 10
10
∑ |X i −5)|
^ )= i=1 14
M . D( X = =1. 4
10 10

4.3.4 The Variance, the Standard Deviation and Coefficient of Variation


The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their arithmetic mean.
2
 Population Variance ( σ )
2
∑ ( x i−μ )2
For ungrouped data
σ= 2
N
=. . .=
1
N (∑ x 2−
i
( ∑ xi)
N )
Where μ is the population arithmetic mean and N is the total number of observations in the population.
For grouped data

Page 35 of 68
ecture notes on statistics

2
∑ f i ( xi −μ )2
σ=
2
N
=. ..=
1
N (∑ f i x 2−
i
( ∑ f i xi)
N ) Where μ is the population arithmetic mean,
xi is the class mark of the i
th
class,
fi is the frequency of the i
th
class and
N=∑ f i .
2
 Sample Variance ( S )
For ungrouped data
2
∑ ( x i− x̄ )2
S=
2
n−1
=...=
1
n−1 (∑ x 2−
i
(∑ x i )
n ) Where x̄ is the sample arithmetic mean and n is
the total number of observations in the sample.

Page 36 of 68
ecture notes on statistics

For grouped data


2
∑ f i ( x i− x̄ )2
S=2
n−1
=. . .=
1
n−1 (∑ f i x 2−
i
(∑ f i x i )
n ) Where x̄ is the sample arithmetic mean,
xi f
th
is the class mark of the i class, i is the frequency of the i
th
class and
n=∑ f i .
The Standard Deviation
Standard deviation is the positive square root of the variance.
 Population Standard Deviation ( σ )

σ =√ σ 2 Where σ
2
is the population variance

 Sample Standard Deviation ( S )

S= √ S2 Where S
2
is the sample standard variance.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is known as the
coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two or more than two
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean, usually
expressed in percent.
S
CV = ×100
x̄ . Where S is the standard deviation of the observations.
A distribution having less coefficient of variation is said to be less variable or more consistent or more
uniform or more homogeneous.
Example: Last semester, the students of Hydraulics and Civil Departments took Stat 273 course. At the
end of the semester, the following information was recorded.
Department Hydraulics Civil
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Hydraulics Department Civil Department
S S
CV = ×100 CV = ×100
x̄ x̄
23 11
= ×100=29 .11 % = ×100=17 . 19 %
79 64
Interpretation: Since the CV of Hydraulics Department students is greater than that of Civil Department students,
we can say that there is more dispersion relative to the mean in the distribution of Hydraulics students’ scores
compared with that of Civil students.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to the same
industry gives the following results
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions:
Calculate coefficient of variation for both firms.

Page 37 of 68
ecture notes on statistics

SA 10 SB 11
C . V A= ∗100= ∗100=19. 05 % C . V B= ∗100= ∗100=23 .16 %
X̄ A 52 .5 X̄ B 47 . 5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2. A meteorologist interested in the consistency of temperatures in three cities during a given week collected the
following data. The temperatures for the five days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20 Which city have the most consistent temperature, based on these data?
City3 32 27 35 24 28 (Exercise)
Properties of the Variance and the Standard Deviation
Variance
 It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
 Its unit is the square of the unit of measurement of values. For example, if the variable is measured in kg, the
unit of variance is kg2.
 It is calculated based on all the observations/data in the series.
 It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation
 It is considered to be the best measure of dispersion.
 [Demerits] If the values of two series have different unit of measurement, then we cannot compare their
variability just by comparing the values of their respective standard deviations.
 It is calculated based on all the observations/data in the series. Standard deviation is capable of further
algebraic treatment.
 Standard deviation is as such neither easy to calculate nor to understand.
 Similar to the variance, standard deviation gives more weight to extreme values and less to those which are
near to the mean.
The Standard Scores (Z-Scores)
A standard score is a measure that describes the relative position of a single score in the entire distribution of scores
in terms of the mean and standard deviation. It also gives us the number of standard deviations a particular
observation lie above or below the mean.
x−μ
Z=
Population standard score: σ where x is the value of the observation, μ and σ are the mean
and standard deviation of the population respectively.
x− x̄
Z=
Sample standard score: S where x is the value of the observation, x̄ and S are the mean and
standard deviation of the sample respectively.
Interpretation:
positive , the observation lies above the mean
If Z is
{
negative ,the observation lies below the mean the mean ¿
zero , theobservation equals
¿
Example: Two sections were given an exam in a course. The average score was 72 with standard deviation of 6 for
section 1 and 85 with standard deviation of 5 for section 2. Student A from section 1 scored 84 and student B from
section 2 scored 90. Who performed better relative to his/her group?
Solution
Section 1: x̄ = 72, S = 6 and score of student A from Section 1; x A = 84
:
Section 2: x̄ = 85, S = 5 and score of student B from Section 2; x B = 90
x A− x̄1 84−72
Z= = =2 . 00
Z-score of student A: S1 6

Page 38 of 68
ecture notes on statistics

x B − x̄2 90−85
Z= = =1 . 00
Z-score of student B: S2 5
From these two standard scores, we can conclude that student A has performed better relative to his/her section
students because his/her score is two standard deviations above the mean score of selection 1 while the score of
student B is only one standard deviation above the mean score of section 2 students.
Examples 1: Two sections were given introduction to statistics examinations. The following information
was given.
Value Section 1 Section 2
Mean 78 90
Stan.deviation 6 5
Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking who
performed better?
Solutions: Calculate the standard score of both students.
X A − X̄ 1 90−78 X B− X̄ 2 95−90
Z A= = =2 , Z B= = =1
S1 6 S2 5
 Student A performed better relative to his section because the score of student A is two standard
deviation above the mean score of his section while, the score of student B is only one standard deviation
above the mean score of his section.
Examples 2: Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:
Value Group one Group two
Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two take 9.3
minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
S 1. 2 S 1. 3
C . V 1= 1 ∗100= ∗100=11. 54 % C . V 2= 2 ∗100= ∗100=10. 92 %
X̄ 1 10 . 4 X̄ 2 11. 9
Since C.V2 < C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B

X A− X̄ 1 9 .2−10 . 4 X B − X̄ 2 9 . 3−11. 9
Z A= = =−1 , Z B= = =−2
S1 1. 2 S2 1.3
Child B is faster because the time taken by child B is two standard deviations shorter than the average
time taken by group 2 while, the time taken by child A is only one standard deviation shorter than the
average time taken by group 1.

Page 39 of 68
ecture notes on statistics

4.4 Measures of Skewness and Kurtosis


After going through this topic, you will be able to:
 distinguish between a symmetrical and a skewed distribution;
 compute various coefficients to measure the extent of skewness in a distribution;
 distinguish between platykurtic, mesokurtic and leptokurtic distributions; and
 Compute the coefficient of kurtosis.

4.4.1 Moments:
The Kth row moment about the origin for a given n observation x , x , .. .. . , x
1 2 N with the corresponding
frequencies f ,f1 2 ,. .. . , f N is defined as
k N

M 1 N
k= N
∑i=1 i f x , where N=∑ f , k=1 , 2, ..
i
i=1
i

 For k=1, we have


M = N1 ∑ f i x
1 i=1
N
i
Thus the 1st raw moment about the origin is arithmetic
mean.
2

 For k=2, we have


M 1 N
2= N
∑i=1 i fx i

 The Kth central moment about the arithmetic mean for a given n observation is denoted by M K and
defined as
k N

M =
k N
1 N
∑i=1 f i ( x i−μ) , where N =∑ f , k=1 , 2 , .. i=1
i
and  is arithmetic mean
 For k=1 => Mk=0
2 2

 For k=2 =>population variance i.e.


M 1 N
2 N ∑i=1 i
= f ( xi −μ) =σ
Example: Find the first three moments a about the mean from the following data:
value 5 15 25 35 Total
frequency 1 3 4 2 10
Solution
value 5 15 25 35 Total
frequency 1 3 4 2 10
f ix i
5 45 100 70 220
x i
 -17 -7 3 13
f i( x i
 ) -17 -21 12 26 0
f ( xi   ) 289 147 36 338 810
2

f ( xi   ) -4913 -1029 108 4394 -1440


3

i
4

∑ xi
N
 f i( x i   )
i=1 200 M1  i 1
4
μ=
f
= =22
∑i=1 f i
N 10 i
, i 1 =0/10=0

Page 40 of 68
ecture notes on statistics

4 4

 f i( x i   )  f i( x i   )
2 3

M2  i 1
4
M3  i 1
4

 fi f i
i 1 =810/10=81 and i 1 = -1440/10 = 144

When deviations are raised to an odd power (i.e. k=1, 2, 3, …) and sum of the negative deviation equal to
sum of positive deviations, then the distribution is symmetrical otherwise it is skewed. i.e. the distribution
is symmetrical if M3=0, M5=0, M7=0, etc but for example if M3≠0 then the distribution is skewed.

4.4.2 Skewness

The skewness of a distribution is defined as the lack of symmetry. In a symmetrical distribution, the
Mean, Median and Mode are equal to each other and the ordinate at mean divides the distribution
into two equal parts such that one part is mirror image of the other. If some observations, of very high
(low) magnitude, are added to such a distribution, its right (left) tail gets elongated.

The presence of extreme observations on the right hand side of a distribution makes it positively skewed
and the three averages, mean, median and mode, will no longer be equal. That is, Mean > Median >
Mode. On the other hand, the presence of extreme observations to the left hand side of a distribution
make it negatively skewed and the relationship between mean, median and mode is: Mean < Median <
Mode.
Measures of Skewness:

Karl Pearson's Measure of Skewness


Karl Pearson’s Coefficient of Skewness Sk, given by

Mean− Mode 3( Mean−Median)


S K= =
S. D S. D

Page 41 of 68
ecture notes on statistics

S K
lies b / n  3 and 3 i.e.  3  S K
 3

 If S K = 0, then the distribution is symmetrical since


~
X̄ = X

 If S K > 0, then the distribution is positively skewed, since


~
X̄ > X

 If S K
~
< 0, then the distribution is negatively skewed, since X̄ < X
Bowley's Measure of Skewness
 it says in a symmetrical distribution first and third quartile has equidistance from the
median(Q2)
~ Q 1 +Q 3
X=
i.e. Q2 – Q1= Q3 – Q2 in other word median, 2
 If Q2 – Q1 ≠ Q3 – Q2 the data is skewed

Bowley’s Quartile coefficient of skewness: is denoted by SB


Q  Q3  2Q2
SB  1
Q3  Q1 Since Q = median we can rewrite as
2
(median  Q1 )  (Q3  median)
SB 
(median  Q1 )  (Q3  median)
S B Lies b/n -1 and 1
S
If B =0, then the distribution is symmetrical
If
S B >0, then the distribution is positively skewed

If
S B <0, then the distribution is negatively skewed

Kelly's Measure of Skewness


Bowley’s measure of skewness is based on the middle 50% of the observations because it leaves 25% of
the observations on each extreme of the distribution. As an improvement over Bowley's measure, Kelly
has suggested a measure based on P 10 and, P90 so that only 10% of the observations on each extreme are
ignored.

Kelly's coefficient of skewness, denoted by SP, is given by

( P90−P50 )−( P 50−P10 ) ( P 90−2 P50 ) + P10


S P= =
( P90−P50 ) + ( P50−P 10) ( P90−P10 )

Note that P50 = Md, (median).

It may be noted here that although the coefficient SK, SQ and Sp, are not comparable, however, in the
absence of skewness, each of them will be equal to zero.

The coefficient of skewness in terms of moment denoted by


3 :

Page 42 of 68
ecture notes on statistics

M3 M3
3  3 3 
(M 2)
3

( )
2
2

, where
M2   2
=>
if  3  0 the distribution is symmetrical ,
if  3  0 the distribution is negatively skewed
if  3  0 the distribution is positively skewed
4.4.3 KURTOSIS

Kurtosis is another measure of the shape of a distribution. Whereas skewness measures the lack of
symmetry of the frequency curve of a distribution, kurtosis is a measure of the relative peakedness of its
frequency curve. Various frequency curves can be divided into three categories depending upon the shape
of their peak.

The three shapes are termed as Leptokurtic, Mesokurtic and Platykurhc as shown in Figure below.

 Mesokurtic (normal curve): If the frequency distribution is unimodal and if the curve is bell
shaped and symmetrical.

 Leptokurtic: If the frequency distribution is more peaked than normal i.e. large numbers of
observations have high frequency.

 Platykurtic: If the frequency distribution is less peaked than normal i.e. large numbers of
observations have low frequency.

Measures of Kurtosis:
The moment coefficient of kurtosis:
Use the following Formula for calculating the measure of kurtosis:
M4 M4
4  
(M 2)
2
4

 If
 4 > 3 the curve is leptokurtic (more peaked)

 If
 4 < 3 the curve is platykurtic (less peaked)

 If
 4 = 3the curve is mesokurtic (normal curve )

Page 43 of 68
ecture notes on statistics

Example: The standard deviation of a symmetrical distribution is 3.What must be the value of the fourth
moment about the mean in order that the distribution be mesokurtic?
Solution:
M4 M4
4  3 4   3 
M4
 M 4  3(81)  243
 4
81 81
So the 4th moment about the mean should be equal to 243

EXERCISE
i) Some characteristics of annually family income distribution (in Birr) in two regions is as follows:

Regio Mean Median Standard Deviation


n
A 6250 5100 960
B 6980 5500 940
a) Calculate coefficient of skewness for each region
b) For which region is, the income distribution more skewed. Give your interpretation for this
Region
ii) For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If the
coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the probable mode of
the distribution.
iii) The sum of fifteen observations, whose mode is 8, was found to be 150 with coefficient of variation
of 20%
(a) Calculate the Pearsonian coefficient of skewness and give appropriate conclusion.
(b) Are smaller values more or less frequent than bigger values for this distribution?
(c) If a constant k was added on each observation, what will be the new Pearsonian coefficient of
skewness? Show your steps. What do you conclude from this?

Page 44 of 68
ecture notes on statistics

CHAPTER 5

5 ELEMENTARY PROBABILITY
5.1 Introduction
 Probability theory is the foundation upon which the logic of inference is built.
 It helps us to cope up with uncertainty.
 In general, probability is the chance of an outcome of an experiment. It is the measure of how
likely an outcome is to occur.

5.2 Definitions of some probability terms


1. Experiment: Any process of observation or measurement or any process which generates well
defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of times under similar
conditions and it is possible to enumerate the total number of outcomes without predicting an individual
outcome. It is also called random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes
i.e.1, 2, 3, 4, 5, 6 but it is not possible to predict which outcome will occur.
3. Outcome :The result of a single trial of a random experiment
4. Sample Space: Set of all possible outcomes of a probability experiment
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of a
random experiment .They are denoted by capital letters.
Example: Considering the above experiment let A be the event of odd numbers, B be the event
of even numbers, and C be the event of number 8.
⇒ A= {1,3,5 } B= {2,4,6 } C={} or empty space or impossible event
Remark:
If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non- occurrence of A and is
' c
denoted by A , or A ,or Ā contains those points of the sample space which don’t belong to A.
8. Elementary Event: an event having only a single element or sample point.
9. Mutually Exclusive Events: Two events which cannot happen at the same time.
10. Independent Events: Two events are independent if the occurrence of one does not affect the
probability of the other occurring.
11. Dependent Events: Two events are dependent if the first event affects the outcome or occurrence of the
second event in a way the probability is changed.
Example: .What is the sample space for the following experiment
a)
b) Toss a die one time. c) Toss a coin two times.
d) A light bulb is manufactured. It is tested for its life length by time.
Solution
a) S={1,2,3,4,5,6}
b) S={(HH),(HT),(TH),(TT)}
c) S={t /t≥0}
 Sample space can be
 Countable ( finite or infinite)
 Uncountable.

5.3 Counting Rules


In order to calculate probabilities, we have to know

Page 45 of 68
ecture notes on statistics

 The number of elements of an event


 The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
 In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule - Permutation rule
- The multiplication rule - Combination rule
 To list the outcomes of the sequence of events, a useful device called tree diagram is used.
Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk with bread,
cake and sandwich. How many possibilities does he have?
Solutions:

Tea
Bread Coeffee Bread Milk Bread
Cake Cake Cake
Sandwich Sandwich Sandwich

 There are nine possibilities.


The Multiplication Rule:
If a choice consists of k steps of which the first can be made in n1 ways, the second can be made in n2 ways…,

the kth can be made in nk ways, then the whole choice can be made in
(n ∗n ∗. .. .. . ..∗n ) ways .
1 2 k
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many different cards are
possible if
a) Repetitions are permitted.
b) Repetitions are not permitted.
Solutions
a)
1st digit 2nd digit 3rd digit 4th digit
5 5 5 5
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.
⇒5∗5∗5∗5=625 different cards are possible .
b)
1st digit 2nd digit 3rd digit 4th digit
5 4 3 2
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.
⇒5∗4∗3∗2=120 different cards are possible.
Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n !=n∗(n−1)∗(n−2 )∗. .. . .∗3∗2∗1

Page 46 of 68
ecture notes on statistics

2. The arrangement of n objects in a specified order using r objects at a time is called the
P
permutation of n objects taken r objects at a time. It is written as n r and the formula is
n!
=
n Pr (n−r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike ---- etc is
n!
n Pr =
k 1 !* k 2∗.. .∗k n
Example:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word “CORRECTION”?
Solutions: 1)
Here n=4 , there are four disnict object
a) ⇒ There are 4!=24 permutations.
Here n=4 , r =2
4! 24
⇒ There are 4 P 2= = =12 permutations .
b) ( 4−2 )! 2
2. Here n=10
Of which 2 are C , 2 are O , 2 are R , 1 E , 1T , 1 I , 1 N
⇒ K 1 =2, k 2 =2 , k 3 =2 , k 4 =k 5 =k 6=k 7 =1
rd
U sin g the 3 rule of permutation , there are
10 !
=453600 permutations .
2!*2!*2!*1!*1!*1!*1!
Exercises:
1. Six different statistics books, seven different physics books, and 3 different Economics books are
arranged on a shelf. How many different arrangements are possible if;
i. The books in each particular subject must all stand together
ii. Only the statistics books must stand together
2. If the permutation of the word WHITE is selected at random, how many of the permutations
i. Begins with a consonant?
ii. Ends with a vowel?
iii. Has a consonant and vowels alternating?
Combination
A selection of objects without regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two letters.
Solutions:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule

Page 47 of 68
ecture notes on statistics

n
The number of combinations of r objects selected from n objects is denoted by
n Cr or ()
r and is

given by the formula:


( nr )= ( n−rn!)!* r !
Examples 1: In how many ways a committee of 5 people be chosen out of 9 people?
Solutions:
n=9 , r=5
n =n ! 9!
() =
r (n−r )!*r ! 4 !*5 !
=126 ways
Examples 2: Among 15 clocks there are two defectives .In how many ways can an inspector chose three
of the clocks for inspection so that:
a) There is no restriction. c) Only one of the defective clocks is included.
b) None of the defective clock is included. d) Two of the defective clock is included.
Solutions:
n=15 of which 2 are defective and 13 are non −defective. r=3
a) If there is no restriction select three clocks from 15 clocks and this can be done in :
n=15 , r=3
n n! 15 !
()= =
r (n−r )!*r ! 12 !* 3!
=455 ways
b) None of the defective clocks is included.
This is equivalent to zero defective and three are non-defective, which can be done in:
2 ∗ 13 =286 ways .
()( )
0 3
c) Only one of the defective clocks is included.
This is equivalent to one defective and two are non-defective, which can be done in:

(21)∗(132 )=156 ways .


d) Two of the defective clock is included.
This is equivalent to two defective and one non-defective, which can be done in:
2 13
()( )
∗ =13 ways .
2 1
Exercises:
1. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 Mathematician and 3
Statistician is to be formed. In how many ways this can be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians cannot be included on the committee.
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of poems, and a
dictionary, in how many ways this can be done if
a) There is no restriction.
b) The dictionary is selected?
c) 2 novels and 1 book of poems are selected?
5.4 Approaches to measuring Probability
There are four different conceptual approaches to the study of probability theory. These are:
 The classical approach.  The frequentist approach.

Page 48 of 68
ecture notes on statistics

 The axiomatic approach.  The subjective approach.


5.4.1 The classical approach
This approach is used when:
- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted and out of these NA
outcomes are favorable to the event A, then the probability that event A occur denoted P( A ) is
defined as:
N A No . of outcomes favourable to A n( A )
P( A )= = =
N Total number of outcomes n (S )
Examples:
1. A fair die is tossed once. What is the probability of getting
a) Number 4? c) An even number?
b) An odd number? d) Number 8?
Solutions:
First identify the sample space, say S c) Let A be the event of even
S= {1, 2, 3 , 4, 5 , 6 } ⇒ N =n( S )=6 numbers
a) Let A be the event of number 4 A= { 2,4,6 } ⇒ N A =n( A )=3
n(A)
A = { 4 } ⇒ N A =n ( A )=1 P( A )= =3/6=0 .5
n (S )
n(A) 1 d) Let A be the event of number 8
P ( A )= =
n (S ) 6 A= Ø
b) Let A be the event of odd numbers ⇒ N A =n( A )=0
A= {1,3,5 } ⇒ N A =n( A )=3 n( A )
n(A) P( A )= =0/6=0
P( A )= =3/ 6=0 .5 n(S )
n (S )
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of this candles are
selected at random, what is the probability
a) All will be defective.
b) 6 will be non-defective
c) All will be non-defective
Solutions:
80
Total selection= ( )
10
=N=n ( S )
a) Let A be the event that all will be defective.

30 50
Total way in which A occur = ( )( )
10

0
=N A =n( A )

30 50
⇒ P( A )= =
(
n ( A ) 10 )∗( )
0
=0 . 00001825
n(S) 80
(10 )
b) Let A be the event that 6 will be non-defective.

Page 49 of 68
ecture notes on statistics

30 50
Total way in which A occur = ( )( )
4

6
=N A =n( A )

30 ∗ 50
⇒ P( A )=
n( A) 4
=
( )( )
6
=0. 265
n (S ) 80
( )
10
c) Let A be the event that all will be non-defective.

Total way in which A occur = 30 ∗ 50 =N A =n( A )


( )( )
0 10
30 ∗ 50
⇒ P( A )=
n( A) 0
=
( )( )
10
=0 . 00624
n (S ) 80
10 ( )
Exercises:
1. What is the probability that a waitress will refuse to serve alcoholic beverages to only three
minors if she randomly checks the I.D’s of five students from among ten students of which four
are not of legal age?
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of poems, and a
dictionary, what is the probability that
A. The dictionary is selected? B) 2 novels and 1 book of poems are selected?
Short coming of the classical approach:
This approach is not applicable when:
- The total number of outcomes is infinite.
- Outcomes are not equally likely.
5.4.2 The Frequentist Approach
This is based on the relative frequencies of outcomes belonging to an event.
Definition: The probability of an event A is the proportion of outcomes favorable to A in the long run when
the experiment is repeated under same condition.
NA
P( A )= lim
N →∞ N
Example: If records show that 60 out of 100,000 bulbs produced are defective. What is the probability of a
newly produced bulb to be defective?
Solution:
Let A be the event that the newly produced bulb is defective.
N A 60
P( A )= lim = =0. 0006
N →∞ N 100 , 000
5.4.3 Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a real number
called the probability of A satisfies the following properties called axioms of probability or postulates of
probability.
1. P( A )≥0

Page 50 of 68
ecture notes on statistics

2. P(S )=1, S is the sure event .


3. If A and B are mutually exclusive events, the probability that one or the other occur equals the sum of
the two probabilities. i. e.
P( A∪ B )=P( A )+ P(B )
4. P( A ' )=1−P( A )
5. 0≤P ( A )≤1
6. P(ø) =0, ø is the impossible event.
Remark: Venn-diagrams can be used to solve probability problems.

AUB AnB A
In general p( A∪B )= p( A )+ p (B )− p( A∩B )
5.4.4 Subjective Probability
 based on personal beliefs, experiences, prejudices, intuition, judgment
 different for all observers (subjective)
 examples: elections, new product introduction, snowfall
 Example: From a group of 5 men and 7 women, it is required to form a committee of 5 persons. If
the selection is made randomly,
 What is the probability that 2 men and 3 women will be in the committee? 350/792
 What is the probability that all members of the committee will be men? 1/792
 What is the probability that at least three members will be women? 546/792
Example: Suppose that an office has 100 calculating machines. Some of them use electric power (E)
while others are manual (M); and some machines are old brand (O) while others are new brands (N). The
table below gives numbers of machines in each category.

Power
Bran E M Total
d
O 40 30 70
N 20 10 30
Total 60 40 100

 A person pick one of the machine randomly, calculate the following probabilities:
a) The selected machine is new brand?
b) The selected brand is manual?
c) The selected brand is old and uses electric power?
d) The selected brand is old and uses electric power?
e) The selected brand operates manually and is new brand?
f) The selected brand is old or uses electric power?

Page 51 of 68
ecture notes on statistics

g) The selected brand is old and uses electric power?


h) The selected brand uses electric power or is new brand?

5.5 Conditional probability and Independency


5.5.1 Conditional probability of an event
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the other event
then the two events are conditional or dependent events.
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
2
p( A )=
Let A= the event that the first draw is red 5
2
p( B )=
B= the event that the second draw is red  5 . Thus A and B are independent.
2
p( A )=
2. Draw a ball without replacement, Let A= the event that the first draw is red 5 B=
the event that the second draw is red  p(B )=? , This is conditional.
 p(B )=1 /4
Let B= the event that the second draw is red given that the first draw is red
The conditional probability of an event A given that B has already occurred, denoted p( A / B) is:
p( A∩B)
, p( B)≠0
p( A / B) = p( B)
' '
Remark: (1) p( A / B)=1− p( A / B) (2) p(B / A )=1− p(B / A )
Examples 1: For a student enrolling at freshman at certain university the probability is 0.25 that he/she will
get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that he/she will get scholarship
and will also graduate. What is the probability that a student who get a scholarship graduate?
Solution: Let A= the event that a student will get a scholarship
B= the event that a student will graduate
given p( A )=0 . 25 , p(B )=0. 75 , p ( A∩B )=0. 20
Re quired p ( B / A )
p ( A∩B ) 0. 20
p ( B/ A )= = =0 . 80
p(A) 0. 25
Examples 2: If the probability that a research project will be well planned is 0.60 and the probability that it
will be well planned and well executed is 0.54, what is the probability that it will be well executed given
that it is well planned?
Solution; Let A= the event that a research project will be well Planned
B= the event that a research project will be well Executed
given p( A )=0 . 60 , p ( A∩B )=0 . 54
p ( A∩B ) 0 .54
Re quired p ( B / A ) , then p ( B / A )= = =0. 90
p(A) 0 .60
Examples 3: Suppose that an office has 100 calculating machines. Some of them use electric power (E)
while others are manual (M) and some machines are well known (N) while others are used (U). The table
below gives numbers of machines in each category. A person enter the office picks a machine at random
and discovers that it is new. What is the probability that it is used with electric power?
E M Total
N 40 30 70

Page 52 of 68
ecture notes on statistics

U 20 10 30

Total 60 40 100
Solution: P (E/N) =P (E ¿ N) /P (N) = 40/100÷70/100 =4/7
Examples 4: A lot consists of 20 defective and 80 non-defective items from which two items are chosen
without replacement. Events A & B are defined as A = the first item chosen is defective, B = the second
item chosen is defective
a. What is the probability that both items are defective?
b. What is the probability that the second item is defective?
Solution; Exercise
Note; for any two events A and B the following relation holds.
p ( B ) =p ( B/ A ) . p ( A ) + p ( B/ A ' ) . p ( A ' )
5.5.2 Theorem on probability
Theorem 1.1: Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En has non-
zero probability that is P(Ei) ≠ 0 for i = 1,2, … ,n and let E be any event, then

∑ P ( E ) P( E
n
)
P(E) =P(E1)* P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En) =
i =1
E i

Theorem 1.2: (Baye’s theorem)


Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En has non-zero probability
that is P(Ei) ≠ 0 for i = 1,2, … ,n and let E be any event for P(E) > 0, then for each integer k, 1 ≤ K ≤ n,
we have

E ) P( E/ E )
P(
P ( E E )=
K k k

∑ P( E ) P ( E/ E )
n
i i
i =1

Example: suppose that three machines are A 1,A2 and A3 produce 60%, 30%, and 20% respectively of the
total production of machines are 2%, 4%, and 6% respectively. If an item is selected at random, then find
the probability that the item is defective. Assuming that an item selected at random is found to be
defective. Find the probability the item was produced on machine A1.
Solution :Let B be an event of selecting a defective item at random and let E1,E2 and E3 be an items
produced on machines A1,A2 and A3 respectively then
 P (B/E1) = 2%=0.02, P(B/E2) = 4%=0.04 and P(B/E3)=6%=0.06

Thus P(B) = P(B ¿ [E1 ¿ E2 ¿ E3])

=> P ([B ¿ E1] ¿ [B ¿ E2] ¿ [B ¿ E3]) = P (B ¿ E1) + P (B ¿ E2) +P (B ¿ E3)


=> P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
=> 0.6*0.02 + 0.3*0.04 + 0.1*.006
=> 0.03

Page 53 of 68
ecture notes on statistics

P(E )P (B / E )
1 1

P( E ∩B )
1
∑ P ( E ) P( B/ E )
n
i i
0. 6∗0 . 02
We use Baye’s formula P (E1/B) = P( B) = i=1 = 0. 03 =0.4

5.5.3 Probability of Independent Events

Two events E1 and E2 are said to be independent if the occurrence of E1 has no bearing on occurrence of
E2. That means knowledge of E1 has occurred given no information about the occurrence of E 2. Two

events A and B are independent if and only if p ( A∩B )= p ( A ) . p ( B )

Here p ( A /B )= p ( A ) , P ( B/ A )= p ( B )
Example; A box contains four black and six white balls. What is the probability of getting two black
balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution; Let A= first drawn ball is black B= second drawn is black
Required p ( A∩B )
a. p ( A∩B )= p ( B / A ) . p ( A )=( 4 /10 ) ( 3 /9 )=2/ 15
p ( A∩B ) = p ( A ) . p ( B ) =( 4/ 10 ) ( 4 / 10 )=4 /25
Example: Consider the experiment of drawing a card from a well shuffled deck of cards
Let A: a spade is drawn
B: an honor (10, J, Q, K, A) is drawn
Are the two events are independent?
13 1 20 5
P ( A)   P( B)   5
Solution: 52 4 , 52 13 and P ( A  B)  52

Using independence theorem, if two events are independent it satisfies the following condition

13 20 5
P ( A  B )  P( A) P ( B )  * 
52 52 52 . Thus A and B are independent

Two events are not independent unless all these statements are true. It is important to be aware that the
terms independent and mutually exclusive do not mean the same thing.

Let us illustrate the concept of independence by means of the following additional example.

Example: In a certain high school class, consisting of 60 girls and 40 boys, it is observed that 24 girls and
16 boys wear eyeglasses. If a student is picked at random from this class, the probability that the student
wear eyeglasses, P (E), is 40/100.

a) What is the probability that the student picked at random wears eyeglasses, given that the student
is a boy?

Page 54 of 68
ecture notes on statistics

b) What is the probability of the joint occurrence of the events of wearing eyeglasses and being a
boy?

Solution:

a) By using the formula for computing conditional probability, we find this to be:
16
E P( E∩B) 100
P ( B)=
P (B )
=
40
=0 . 4
100
Thus the additional information that a student is a boy does not alter is the probability that the student
wear eyeglasses, and P(E)=P(E/B). We say that the events being a boy and wearing eyeglasses for this
group are independent. We may also show that the event of wearing eyeglasses, E and not being a boy BꞋ,
are also independent as follows:
24
E P ( E∩ B ) 100
P ( ) B
=
P (B )
=
60
=0 . 4
100
b) Using the rule of multiplication, we have

P(E ∩B) = P(B)P(E/B), but we have shown that events E and B are independent we may replace P(E/B)
by P(E) to obtain the equation
P(E ∩B) = P(B)P(E)=40/100*40/100=0.16

Page 55 of 68
ecture notes on statistics

CHAPTER 6
6 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
6.1 RANDOM VARIABLES
Definition: A random variable is a numerical description of the outcomes of the experiment or a numerical
valued function defined on sample space, usually denoted by capital letters.
Example: If X is a random variable, then it is a function from the elements of the sample space to the set of real
numbers. i.e.

X is a function X: S  R

A random variable takes a possible outcome and assigns a number to it.

Example: Flip a coin three times, let X be the number of heads in three tosses.

⇒ S={ ( HHH ) , ( HHT ) , ( HTH ) , ( HTT ) , ( THH ) , ( THT ) , ( TTH ) , ( TTT ) }


⇒ X ( HHH )=3 ,
X ( HHT )=X ( HTH )= X ( THH )=2 ,
X ( HTT )=X ( THT )= X (TTH ) =1
X (TTT )=0
X={0, 1, 2, 3 }

X assumes a specific number of values with some probabilities.

Random variables are of two types:

1. Discrete random variable: are variables which can assume only a specific number of values. They
have values that can be counted

Examples:

 Toss coin n times and count the number of heads.

 Number of children in a family.

 Number of car accidents per week.

 Number of defective items in a given company.

 Number of bacteria per two cubic centimeter of water.

2. Continuous random variable: are variables that can assume all values between any two give values.

Examples:

 Height of students at certain college.

 Mark of a student.

Page 56 of 68
ecture notes on statistics

 Life time of light bulbs.

 Length of time required to complete a given training.

6.2 Probability Distribution

Definition: a probability distribution consists of a value a random variable can assume and the corresponding
probabilities of the values.

Example: Consider the experiment of tossing a coin three times. Let X be the number of heads. Construct the
probability distribution of X.

Solution:

 First identify the possible value that X can assume.

 Calculate the probability of each possible distinct value of X and express X in the form of frequency
distribution.

X =x 0 1 2 3
P ( X=x ) 1/8 3/8 3/8 1/8
Probability distribution is denoted by P for discrete and by f for continuous random variable.

6.3 Properties of Probability Distribution:


Let p(x) is a discrete probability distribution and f(x) is continuous probability distribution
2. ∑ P ( X =x ) =1 , if X is discrete .
x
1. P( x )≥0 , if X is discrete .
f (x )≥0 , if X is continuous .
∫ f ( x )dx =1 , if is continuous .
x

Note:
b
P(a< X< b )=∫ f (x )dx
1. If X is a continuous random variable then a

2. Probability of a fixed value of a continuous random variable is zero.

⇒ P(a< X <b )=P( a≤X <b )=P(a< X≤b )=P( a≤X ≤b )

3. If X is discrete random variable the


b−1 b
1. P( a< X <b )= ∑ P (x ) 3 . P( a< X≤b )= ∑ P( x )
x=a+1 x =a+1
b−1 b
2. P( a≤X < b)= ∑ p( x ) 4 . P( a≤X ≤b )= ∑ P( x )
x =a x=a

4. Probability means area for continuous random variable.

Page 57 of 68
ecture notes on statistics

6.4 Introduction to Mean and Variance of a random variable (expectation)

Definition:

1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities P(X1), P(X2),
….,P(Xn) respectively. Then the expected value of X ,denoted as E(X) is defined as:

E( X )=X 1 P ( X 1 )+ X 2 P( X 2 )+. .. .+ X n P( X n )
n
=∑ X i P( X i )
i=1

2. Let X be a continuous random variable assuming the values in the interval (a, b) such that
b b
∫ f ( x )dx=1 E( X )=∫ x f ( x )dx
a ,then a

Example1: What is the expected value of a random variable X obtained by tossing a coin three times
where is the number of heads?

Solution: First construct the probability distribution of X

0 1 2 3
X =x
P ( X=x ) 1/8 3/8 3/8 1/8
⇒ E ( X )=X 1 P( X 1 )+ X 2 P( X 2 )+. . ..+ X n P( X n ) Example 2: Suppose a charity organization is
= 0∗1 /8+ 1∗3 /8+ .. .. .+2∗1 /8=1 .5 mailing printed return-address stickers to over one
million homes in the Ethiopia. Each recipient is
asked to donate either $1, $2, $5, $10, $15, or $20. Based on past experience, the amount a person
donates is believed to follow the following probability distribution; what is expected that an average
donor to contribute?

X =x $1 $2 $5 $10 $15 $20


P ( X=x ) 0.1 0.2 0.3 0.2 0.15 0.05

Solution:

X =x $1 $2 $5 $10 $15 $20 Total


P ( X=x ) 0.1 0.2 0.3 0.2 0.15 0.05 1

xP( X =x ) 0.1 0.4 1.5 2 2.25 1 7.25


6
⇒ E ( X ) =∑ x i P ( X = x i ) =$ 7 . 25
i=1

NOTE: Let X be given random variable.

1. The expected value of X is its mean ⇒ Mean of X=E ( X )

Page 58 of 68
ecture notes on statistics

2. The variance of X is given by: Variance of X =var ( X )=E( X 2 )−[ E ( X )]2


n
2
E( X )=∑ x 2 P ( X=x i ) , if X is discrete
i=1 i
2
=∫ x f ( x ) dx , if X is continuous .
Where: x

Examples 1: Find the mean and the variance of a random variable X in example 2 above.

Solutions:

X =x $1 $2 $5 $10 $15 $20 Total


P ( X=x ) 0.1 0.2 0.3 0.2 0.15 0.05 1
xP( X =x ) 0.1 0.4 1.5 2 2.25 1 7.25
x 2 P ( X=x ) 0.1 0.8 7.5 20 33.75 20 82.15
⇒ E( X )=7 . 25
Var ( X )=E( X 2 )−[ E( X )] 2=82. 15−7 . 252 =29 .59
Examples 2: Two dice are rolled. Let X be a random variable denoting the sum of the numbers on the
two dice.
i) Give the probability distribution of X
ii) Compute the expected value of X and its variance. Solution (exercise)
6.4.1 Rules of expectation and variance

There are some general rules for mathematical expectation.

Let X and Y are random variables and k be a constant.

RULE 1 : E(k )=k RULE 2 : E(kX )=kE( X )

RULE 3 : E( X +Y )=E ( X )+E(Y ) RULE 4: Var(k )=0

Page 59 of 68
ecture notes on statistics

Page 60 of 68
ecture notes on statistics

Page 61 of 68
ecture notes on statistics

CHAPTER 7
7 Common Discrete and Continuous Probability Distributions
7.1 Common Discrete Probability Distributions
7.1.1 Binomial Distribution

A binomial experiment is a probability experiment that satisfies the following four requirements called
assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
Examples of binomial experiments
 Tossing a coin 20 times to see how many tails occur.
 Asking 200 people if they watch BBC news.
 Registering a newly produced product as defective or non-defective.
 Asking 100 peoples if they favor the ruling party.
 Rolling a die to see if a 5 appears.

Definition: The outcomes of the binomial experiment and the corresponding


probabilities of these outcomes are called Binomial Distribution.
Let P=the probability of success
q=1−p=the probability of failure on any given trial
Then the probability of getting x successes in n trials becomes:

P( X= x )= n p x qn−x , x=0,1,2 , .. . ., n
()
x
And this is some times written as:
X ~ Bin( n , p )
When using the binomial formula to solve problems, we have to identify three things:
 The number of trials ( n )
 The probability of a success on any one trial ( p ) and
 The number of successes desired ( X ).
Examples 1: What is the probability of getting three heads by tossing a fair con four times?
Solution: Let X be the number of heads in tossing a fair coin four time
X ~ Bin( n=4 , p=0 . 50)
⇒ P( X =x)= ( nx ) p q
x n−x
, x=0,1,2,3,4

4 4
x ()
=
x ()
0 .5 x 0 . 54−x = 0 . 54

4
()
⇒ P( X =3 )= 0. 5 4 =0 .25
3
Examples 2: Suppose that an examination consists of six true and false questions, and assume that a
student has no knowledge of the subject matter. The probability that the student will guess the correct
answer to the first question is 30%. Likewise, the probability of guessing each of the remaining questions
correctly is also 30%.
a) What is the probability of getting more than three correct answers?

Page 62 of 68
ecture notes on statistics

b) What is the probability of getting at least two correct answers?


c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution
Let X = the number of correct answers that the student gets.
X ~ Bin( n=6 , p=0 .30 )
a) P( X >3 )=?
⇒ P( X =x )= n p x qn−x , x=0,1,2, . .6
()
x
= 6 0. 3 x 0 .7 6−x
()
x
⇒ P( X >3 )=P( X=4 )+P ( X=5 )+P( X=6)
=0 .060+0 .010+0. 001
=0 .071
Thus, we may conclude that if 30% of the exam questions are answered by guessing, the probability is
0.071 (or 7.1%) that more than four of the questions are answered correctly by the student.
b) P( X≥2)=?
P( X≥2)=P( X=2 )+P( X=3 )+P( X=4 )+P( X =5)+P( X =6 )
=0 .324 +0 .185+0. 060+0 . 010+0 . 001
=0 .58
c) P( X≤3)=?
P( X≤3)=P ( X=0 )+P( X =1)+P( X =2)+P( X =3 )
=0 .118+0 .303+0. 324+0. 185
=0 .93
d) P( X <5 )=?
P( X <5 )=1−P( X≥5 )=1−{P( X =5)+P( X=6 )}
=1−(0 . 010+0 . 001) =0 .989
Exercises:
1. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If eight of these TVs are
randomly selected from across the country and tested, what is the probability that exactly three of
them are defective? Assume that each TV is made independently of the others.
2. An allergist claims that 45% of the patients she tests are allergic to some type of weed. What is the
probability that
a) Exactly 3 of her next 4 patients are allergic to weeds?
b) None of her next 4 patients are allergic to weeds?
3. Explain why the following experiments are not Binomial

a) Rolling a die until a 6 appears.


b) Asking 20 people how old they are.
c) Drawing 5 cards from a deck for a poker hand.

Remark: If X is a binomial random variable with parameters n and p then

Page 63 of 68
ecture notes on statistics

E( X )=np , Var ( X )=npq


7.1.2 Poisson Probability Distribution

A random variable X is said to have a Poisson distribution if its probability distribution is given by:

λ x e− λ
P ( X = x )= , x=0,1,2 ,. . .. ..
x!
Where λ=the average number .

- The Poisson distribution depends only on the average number of occurrences per unit time of space.
- The Poisson distribution is used as a distribution of rare events, such as:

 Number of misprints.  Hereditary.


 Natural disasters like earth quake.  Arrivals
 Accidents.

- The process that gives rise to such events are called Poisson process.

Examples: If 1.6 accidents can be expected an intersection on any given day, what is the probability that
there will be 3 accidents on any given day?

Solution; Let X =the number of accidents, λ=1. 6

1. 6 x e−1. 6
X =poisson ( 1. 6 ) ⇒ p ( X =x )=
x!
3 −1 . 6
1. 6 e
p ( X=3 )= =0 .1380
3!
Examples: On the average, five smokers pass a certain street corners every ten minutes, what is the
probability that during a given 10minutes the number of smokers passing will be

a. 6 or fewer
b. 7 or more
c. Exactly 8……. (Exercise)

If X is a Poisson random variable with parameters λ then

E( X )=λ , Var ( X )=λ


Note: The Poisson probability distribution provides a close approximation to the binomial probability
distribution when n is large and p is quite small or quite large with λ=np .

( np) x e−( np )
P( X= x )= , x=0,1,2, . .. .. .
x!
Where λ=np=the average number .

Page 64 of 68
ecture notes on statistics

Usually we use this approximation if np≤5 . In other words, if n>20 and np≤5 [or
n(1− p )≤5 ], then we may use Poisson distribution as an approximation to binomial distribution.
Example: Find the binomial probability P(X=3) by using the Poisson distribution, if p=0. 01 and n=200
Solution:U sin g Poisson , λ=np=0 . 01∗200=2
23 e−2
⇒ P( X =3 )= =0 . 1804
3!
U sin g Binomial , n=200 , p=0 . 01
200
⇒ P( X =3 )= ( )
3
(0 . 01)3 ( 0. 99 )99=0 . 1814

7.2 Common Continuous Probability Distributions


7.2.1 Normal Distribution
A random variable X is said to have a normal distribution if its probability density function is given by
1 x− μ 2
1
f (x )=

2 σ
e
( ), −∞< x<∞ , −∞<μ<∞ , σ >0
σ √2π
Where μ=E( X ), σ 2 =Variance( X )
μ and σ 2 are the Parameters of the Normal Distribution .
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum ordinate is at
x=μ and is given by
1
f (x )=
σ √2 π
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different
normal distribution. Thus, the normal distribution is completely described by two parameters: mean
and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5.


∫−∞ f (x )dx=1
6. It is unimodal, i.e., values mound up only in the center of the curve.
7. Mean=Median=mod e=μ
8. The probability that a random variable will have a value between any two points is equal to the area
under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the standard normal
distribution was derived by using the transformation
1 2
X−μ 1 −2 z
Z= ⇒ f (z )= e
σ √2 π
Properties of the Standard Normal Distribution:

Same as a normal distribution, but also...

Page 65 of 68
ecture notes on statistics

 Mean is zero
 Variance is one
 Standard Deviation is one

- Areas under the standard normal distribution curve have been tabulated in various ways. The most
common ones are the areas between
Z =0 and a positive value of Z .
- Given a normal distributed random variable X with

Mean μ and s tan dard deviation σ


a−μ X−μ b−μ a−μ b−μ
P( a< X< b )=P( < < ) P( a< X< b )=P( < Z< )
σ σ σ ⇒ σ σ

Note: P( a< X< b )=P(a≤X <b )=P( a< X ≤b )=P(a≤X≤b)

Examples: Find the area under the standard normal distribution which lies

a) Between Z =0 and Z=0 .96

Solution: Area=P (0<Z <0 . 96 )=0 . 3315

b) Between Z =−1. 45 and Z=0

Area=P (−1 . 45< Z< 0)


=P(0< Z <1 . 45)
=0 . 4265

c) To the right of Z =−0 .35

Solution : Area=P( Z>−0 .35 )


=P(−0 . 35<Z <0 )+ P( Z >0 )
=P( 0< Z <0 . 35)+ P( Z> 0)
=0 . 1368+0 .50=0 . 6368

d) To the left of Z =−0 .35

Solution: Area=P( Z<−0. 35 )


=1− P( Z >−0 .35 )
=1−0 .6368=0 .3632

e) Between Z =−0 .67 and Z=0. 75

Page 66 of 68
ecture notes on statistics

solution : Area=P(−0 . 67< Z<0. 75 )


=P(−0 . 67<Z<0)+P(0<Z <0 .75 )
=P(0<Z <0 . 67)+P(0< Z <0 . 75)
=0 . 2486+0 . 2734=0 .5220

f) Between Z =0 .25 and Z=1 .25


Solution : Area=P(0 . 25<Z <1 .25 )
=P(0<Z <1 .25 )−P(0<Z <0 . 25)
=0 .3934−0 . 0987=0 .2957
2. Find the value of Z if
a) The normal curve area between 0 and z(positive) is
0.4726

P(0<Z <z)=0 . 4726 and from table


P(0<Z <1. 92 )=0 . 4726
⇔ z=1 . 92. .. . .uniqueness of Areea.
b) The area to the left of z is 0.9868

Solution B. P( Z<z )=0 . 9868


=P(Z <0 )+ P(0<Z< z )=0 . 50+ P(0< Z< z )
⇒ P (0< Z< z )=0 . 9868−0 . 50=0. 4868
and from table P( 0< Z <2. 2)=0 . 4868
⇔ z=2. 2
3. A random variable X has a normal distribution with mean 80 and standard deviation 4.8. What is the
probability that it will take a value
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
Solution
X is normal with mean , μ=80 , s tandard deviation, σ=4 .8
X−μ 87. 2−μ
A . P( X <87 . 2)=P( < )
σ σ
87 . 2−80
=P (Z < )
4 .8
=P( Z<1 . 5)
=P( Z<0)+P(0<Z <1. 5 )
=0 . 50+0 . 4332=0 . 9332

Page 67 of 68
ecture notes on statistics

X−μ 76 . 4−μ
B . P( X >76 . 4 )=P( > )
σ σ
76 . 4−80
=P ( Z > )
4 .8
=P( Z>−0. 75 )
=P( Z> 0)+ P( 0< Z <0 .75 )
=0 . 50+0 .2734=0 . 7734

81 .2−μ X −μ 86 . 0−μ
C . P(81. 2<X <86 .0 ) =P( < < )
σ σ σ
81 . 2−80 86 . 0−80
=P( <Z < )
4.8 4 .8
=P(0 . 25< Z<1 . 25)
= P(0<Z<1. 25)−P(0< Z<1 . 25)
=0 . 3934−0 .0987=0 . 2957

4. A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area under the normal
curve lies to the right of 72.9

X −μ 72. 9−μ
solution P ( X >72. 9 )=0 . 2005⇒ P( > )=0 . 2005
σ σ
72. 9−62. 4
⇒ P( Z> )=0 .2005
σ
10 .5
⇒ P( Z > )=0 .2005
σ
10. 5
⇒ P( 0< Z < )=0 . 50−0 . 2005=0. 2995
σ
And from table P( 0< Z< 0. 84 )=0. 2995
10 .5
⇔ =0 . 84
σ
⇒ σ=12 .5
5. A random variable has a normal distribution with σ =5 .Find its mean if the probability that the
random variable will assume a value less than 52.5 is 0.6915.
Solution
52. 5−μ
P( Z < z )=P( Z< )=0. 6915
5
⇒ P(0< Z < z )=0. 6915−0 .50=0 .1915 .
But from the table
⇒ P(0< Z <0 .5 )=0 . 1915
52. 5−μ
⇔z= =0 . 5
5
⇒ μ=50
6. Of a large group of men, 5% are less than 60 inches in height and 40% are between 60 & 65 inches.
Assuming a normal distribution, find the mean and standard deviation of heights.
Solution (Exercise)

Page 68 of 68

You might also like