0% found this document useful (0 votes)
19 views92 pages

Introduction to Statistical Concepts

Chapter one introduces the fundamentals of statistics, defining it as the science of collecting, organizing, and analyzing data to aid decision-making. It covers the classification of statistics into descriptive and inferential branches, stages of statistical investigation, and basic terminologies such as population, sample, and variable. Additionally, it discusses methods of data collection, presentation techniques, and the limitations of statistics.

Uploaded by

Mohammed Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views92 pages

Introduction to Statistical Concepts

Chapter one introduces the fundamentals of statistics, defining it as the science of collecting, organizing, and analyzing data to aid decision-making. It covers the classification of statistics into descriptive and inferential branches, stages of statistical investigation, and basic terminologies such as population, sample, and variable. Additionally, it discusses methods of data collection, presentation techniques, and the limitations of statistics.

Uploaded by

Mohammed Sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter one

1 Basic Concepts, Methods of data collection and presentation


1.1 Introduction
1.1.1 Definition and classification of Statistics
Definition of Statistics

Statistics is the science of decision making in a world full of uncertainties.


Statistics is defined as the science of collecting, organizing, presenting, analyzing and
interpreting numerical data for the purpose of assisting in making a more effective decision. Or
from a scientific point of view, it is the science, technology and art of extracting information
from observational data, with an emphasis on solving real world problems.
The field of statistics deals with the collection, presentation, analysis, and use of data to make
decisions, solve problems, and design products and processes. Because many aspects of
engineering practice involve working with data, obviously some knowledge of statistics is
important to any engineer. Specifically, statistical techniques can be a powerful aid in designing
new products and systems, improving existing designs, and designing, developing, and
improving production processes.
Classification of Statistics
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics: consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions.
For example, the average car battery life can be estimated from a few hundred (the sample) car
batteries.
 It is important because statistical data usually arises from sample.
 Statistical techniques based on probability theory are required.

1.1.2 Stages in statistical investigation


There are five stages or steps in any statistical investigation.

Page 1 of 92
1. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.
2. Organization of data: Summarization of data in some meaningful way, e.g table form.
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made.

1.1.3 Definition of Some Basic terms


a. Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study.
b. Sample: It is a subset of the population, selected using some sampling technique in such a
way that they represent the population.
c. Sampling: The process or method of sample selection from the population.
d. Sample size: The number of elements or observation to be included in the sample.
e. Census: Complete enumeration or observation of the elements of the population. Or it is the
collection of data from every element in a population
f. Parameter: Characteristic or measure obtained from a population.
g. Statistic: Characteristic or measure obtained from a sample.
h. Variable: It is an item of interest that can take on many different numerical values.

1.1.4 Applications, uses and limitations of Statistics


Applications of statistics:
 In almost all fields of human endeavor.
 Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
about price.
 Applicable in some process e.g. in designing new products, improving existing designs,
developing, and improving production processes, extent of environmental pollution.
 In industries especially in quality control area.

Page 2 of 92
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena.
The following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics
6. Testing and formulating of hypothesis
7. Studying the relationship between two or more variable
8. Forecasting future events.
Limitations of statistics
As a science statistics has its own limitations.
The following are some of the limitations:
 Deals with only quantitative information.
 Deals with only aggregate of facts and not with individual data items.
 Statistical data are only approximately and not mathematical correct.
 Statistics can be easily misused and therefore should be used by experts.

1.1.5 Types of variables and measurement scales


Types of variables
1. Qualitative Variables are nonnumeric variables and can't be measured. Examples
categorizing products as defective or non-defective, light or heavy, gender, religious affiliation,
and state of birth, bearing material (aluminum or copper/lead) or type of wood (pine, oak,
or walnut).
2. Quantitative Variables are numerical variables and can be measured. For example,
temperature, pressure, distance, speed, and voltage are quantitative variables.
Note that quantitative variables are either discrete (which can assume only certain values,
and there are usually "gaps" between the values, such as the number of bedrooms in your
house, number of cars in Wachemo University) or continuous (which can assume any
value within a specific range, such as the air pressure in a tire.)

Page 3 of 92
Measurement scales
Measurement is the assignment of numbers to objects or events in a systematic fashion.
Measurement scale refers to the property of value assigned to the data based on the properties of
order, distance and fixed zero. Four levels of measurement scales are commonly distinguished:
nominal, ordinal, interval, and ratio and each possessed different properties of measurement
systems.
1. Nominal Scales
Nominal scales are measurement systems that possess none of the three properties stated above.
 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.
Examples:
o Sex (Male or Female)
o Marital status(married, single, widow, divorce)
o Country code
o Regional differentiation of Ethiopia
o Motor type
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order, but not the property
of distance. The property of fixed zero is not important if the property of distance is not satisfied.
 Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Examples:
o Letter grades (A, B, C, D, F).
o Rating scales (Excellent, Very good, Good, Fair, poor).
o Military status.
3. Interval Scales
Interval scales are measurement systems that possess the properties of Order and distance, but
not the property of fixed zero.

Page 4 of 92
 Level of measurement which classifies data that can be ranked and differences are meaningful.
However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
 Examples: IQ, Temperature in oF
4. Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted;
i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas this is not possible with
interval scales.
 Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.
 All arithmetic and relational operations are applicable.
 Examples: Weight, Height, distance, Age

1.2 Methods of data collection and presentation


1.2.1 Methods of data collection
There are three major methods of data collection
i. Observation or measurement
ii. Interviews and questionnaires
iii. The use of documentary sources
I. Observation or measurement:-In this method, data can be obtained through direct
observation or measurement.
 It requires training of persons who measure in order to insure the use of standard
procedure
 Provides accurate information but it is expensive and inconvenient
II. Interviews and Questionnaires
Questionnaire: - are written documents which instruct the readers or listeners to answer the
questions written on it.
There are three ways of collecting information under this method
a) Face –to-face interviews ( Questionnaires in charge of interviewers )

Page 5 of 92
b) Telephone interviews
c) Mailed questionnaires ( Self-administered questionnaires returned by mail )
III. The use of documentary sources
It is extracting of information from existing sources (e.g. Hospital, municipality, …)
1.2.2 Source and types of data
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
Primary data: are data originally collected for the immediate purpose. Primary data are more
expensive than secondary data.

Data measured or collect by the investigator or the user directly from the source.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others. Usually they are published or unpublished materials, records,
reports, documents and etc.

Secondary data: data collected from a secondary source.


- Data gathered or compiled from published and unpublished sources or files.
Note: Data which are primary for one may be secondary for the other.
1.2.3 Methods of data presentation
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones. The process of arranging data in to
classes or categories according to similarities technically is called classification.
Classification is a preliminary and it prepares the ground for proper presentation of data.
The presentation of data is broadly classified in to the following two categories:
 Tabular presentation
 Diagrammatic and Graphic presentation.

Page 6 of 92
1.2.4 Frequency distributions
Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
Frequency: is the number of values in a specific class of the distribution.
There are three basic types of frequency distributions
 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
1. Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. e.g. marital status.
Example: a social worker collected the following data on marital status for 25 persons.(M=married,
S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M,
S, D, and W. These types will be used as class for the distribution. We follow procedure to construct
the frequency distribution.
Step 1: Make a table as shown.
Class (1) Tally (2) Frequency (3) Percent (4)
M
S
D
W

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
Step 4: Find the percentages of values in each class by using; %  * 100 Where f= frequency of
n
the class, n=total number of value.

Page 7 of 92
Step 5: Find the total for column (3) and (4).
Combining the entire steps one can construct the following frequency distribution.
Class (1) Tally (2) Frequency (3) Percent (4)
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// 6 24

2. Ungrouped frequency Distribution:


-Is a table of all the potential raw score values that could possible occur in the data along with the
number of times each actually occurred.
-Is often constructed for small set or data on discrete variable.
Constructing ungrouped frequency distribution:
 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.
Example:-
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:-Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1

Page 8 of 92
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3. Grouped frequency Distribution:
-When the range of the data is large, the data must be grouped in to classes that are more than one
unit in width.
Definitions:
 Grouped Frequency Distribution: a frequency distribution when several numbers are
grouped in one class.
 Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one class
and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: Separates one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the row data and therefore do not appear in
the data. There is no gap between the upper boundary of one class and lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the corresponding
upper class limit.
 Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.

Page 9 of 92
 Cumulative frequency above: it is the total frequency of all values greater than or equal to
the lower class boundary of a given class.
 Cumulative frequency blow: it is the total frequency of all values less than or equal to the
upper class boundary of a given class.
 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Steps for constructing Grouped frequency Distribution

1. Find the largest and smallest values


2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k  1  3.32 log n where k is number of classes desired and n is total number of
observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not
R
off. w  .
k
5. Pick a suitable starting point less than or equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper
limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units
from the upper limits. The boundaries are also half-way between the upper limit of one
class and the lower limit of the next class. !may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.

Page 10 of 92
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example:- Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes’ desired using Sturges formula;
k  1  3.32 log n =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
 6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
 11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries; E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
 Then continue adding w on both boundaries to obtain the rest boundaries. By doing so
one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5

Page 11 of 92
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class boundary Class Mark Tally Freq. Cf (less than type) Cf (more than rf. rcf (less than
limit type) type
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

1.2.5 Diagrammatic and graphical presentation of data


These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
 They have greater attraction.
 They facilitate comparison.
 They are easily understandable.
1.2.5.1 Diagrammatic presentation of data: Bar charts, pie-chart and pictogram
-Diagrams are appropriate for presenting discrete data.
-The three most commonly used diagrammatic presentation for discrete as well as qualitative data are:
 Pie charts, Pictogram, Bar-graphs
Pie chart
- A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
Valueofthepart
Angleof sec tor  * 100
thewholequantity
Example: Draw a suitable diagram to represent the following population in a town.

Page 12 of 92
Men Women Girls Boys
2500 2000 4000 1500
Solutions: Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name corresponding percentage.
Class Frequency Percent Degree
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54

CLASS

Boy s Men

Girls Women

Pictogram (Reading assignment, read from books and plot it)


In diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.
Example: draw a pictogram to represent the following population of a town.
Year 1989 1990 1991 1992
Population 2000 3000 5000 7000
Bar-graphs:
- A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar graphs. The most common are:
 Simple bar-graph
 Component or sub divided bar-graph.
 Multiple bar-graphs.
Simple Bar-graphs

Page 13 of 92
-Are used to display data on one variable.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity is
represented by the height /length of the bars.
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales($) Sales($) Sales($)
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Solutions:
Sales by product in 1957

30
25
Sales in $

20
15
10
5
0
A B C
product

Component Bar-graphs
-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar-graph.
-The bars represent total value of a variable with each total broken in to its component parts and different
colours or designs are used for identifications
Example: Draw a component bar graph to represent the sales by product from 1957 to 1959.
Solutions:

SALES BY PRODUCT 1957-1959

100

80
Sales in $

Product C
60
Product B
40
Product A
20

0
1957 1958 1959
Year of production

Page 14 of 92
Multiple Bar-graphs
- It is used to display data on more than one variable for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solutions:

Sales by product 1957-1959

60
50
Sales in $

40 Product A
30 Product B
20 Product C

10
0
1957 1958 1959
Year of production

1.2.6 Graphical presentation of data: Histogram, Frequency polygon, and Ogive curve
- The histogram, frequency polygon and cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
 Draw and label the X and Y axes.
 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes.
 Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axes.
 Plot the points.
 Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical bars of various height to represent frequencies. Class
boundaries are placed along the horizontal axes. Class marks and class limits are sometimes used as
quantity on the X axes.
Example: Construct a histogram to represent the previous data (example *).
Frequency Polygon: It is a line graph. The frequency of the data is placed along the vertical axis and
classes mid points are placed along the horizontal axis. It is customer to the next higher and lower class
interval with corresponding frequency of zero; this makes it a complete polygon.

Page 15 of 92
Example: Draw a frequency polygon for the above data (example *).
Solutions:

4
Value Frequency

0
2. 5 8. 5 14.5 20.5 26.5 32.5 38.5 44.5

Class Mid points

Ogive (cumulative frequency polygon)


- A graph showing the cumulative frequency (less than or more than type) plotted against upper or
lower class boundaries respectively. That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the vertical axis. The points are joined by a
free hand curve.
Example: Draw an ogive curve(less than type) for the above data. (Example *)

Page 16 of 92
Chapter two
2 Summarizing of Data
2.1 Measures of central Tendency: objectives of measuring central tendency
Introduction: When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group. Averages are also called measures of central tendency. An average
which is representative is called typical average and an average which is not representative and
has only a theoretical value is called a descriptive average. A typical average should possess
the following:
 It should be rigidly defined.
 It should be based on all observation under investigation.
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling.
 It should be ease to calculate and simple to understand.

Objectives:
 To comprehend (understand) the data easily.
 To facilitate comparison.
 To make further statistical analysis.
The Summation Notation:
 Let X1, X2 ,X3 …XN be a number of measurements where N is the total number of
th
observation and Xi is i observation.
 Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called
the summation notation.
N
The symbol X
i 1
i is a mathematical shorthand for X1+X2+X3+...+XN

Page 17 of 92
Properties of Summation
n
1.  k  nk
i 1
where k is any constant

n n
2.  kX i  k  X i where k is any constant
i 1 i 1

n n
3.  (a  bX
i 1
i )  na  b X i
i 1
where a and b are any constant

n n n
4. (X
i 1
i  Yi )   X i   Yi
i 1 i 1

2.2 Types of measures of central tendency


There are several different measures of central tendency; each has its advantage and
disadvantage.
2.2.1 The mean (Arithmetic, weighted, Geometric and Harmonic)
Arithmetic Mean: Is defined as the sum of the magnitude of the items divided by the number of
items.
Arithmetic Mean for ungrouped data: The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or
X 1  X 2  ...  X n
X 
n
X and is given by: n
if each item occurs only once
 Xi
X  i 1

 If X1 occurs f1 times, If X2occurs f2 times, ……, If Xn occurs fn times


k

fX k

f n
i i
Then the mean will be X  i 1 , where k is the number of classes and i
k

f
i 1
i
i 1

Example: Obtain the mean of the following number 2, 7, 8, 2, 7, 3, 7


Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36

Page 18 of 92
4

f i Xi
36
X  i 1
4
  5.15
f
7
i
i 1

Arithmetic Mean for Grouped Data: If data are given in the shape of a continuous frequency
k

f i Xi th
distribution, then the mean is obtained as: X  i 1
, Where Xi =the class mark of the i
k

f i 1
i

class and fi = the frequency of the ith class.


Example: calculate the mean for the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
 First find the class marks
 Find the product of frequency and class marks
 Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
Total 100 1575

f X i i
1575
X  i 1
6
  15.75
f
100
i
i 1

Page 19 of 92
Exercises: Life times (in months) of 75 bulbs are summarized in the following frequency
distribution:
Marks No. of students
40-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If the 20% of the bulbs have life times between 55 and 59 months,
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n

 ( X  X )  0.
i 1
i

2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n

 ( Xi  X )   ( X  A) , A  X
i 1
2

i 1
i
2

3. If X 1 is the mean of n1 observations

If X 2 is the mean of n 2 observations


.
.
If X k is the mean of n k observations
Then the mean of all the observation in all groups often called the combined mean is given
k

X n  X 2 n 2  ....  X k n k X n i i
by: Xc  1 1  i 1
n1  n 2  ...n k
k

n
i 1
i

Example: In a class there are 30 females and 70 males. If females averaged 60 in an


examination and boys averaged 72, find the mean for the entire class.

Page 20 of 92
Solutions:
Males
Females
X 2  72
X 1  60
n2  70
n1  30

X 1 n1  X 2 n 2 X n i i
Xc   i 1

n1  n 2
2

n
i 1
i

30(60)  70(72) 6840


 Xc    68.40
30  70 100

4. The effect of transforming original series on the mean.


a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be k*old

mean

Weighted Mean

 When a proper importance is desired to be given to different data a weighted mean is


appropriate. Weights are assigned to each item in proportion to its relative importance.
 Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn their corresponding
weights , then the weighted mean denoted X w is defined as:
n

X W i i
Xw  i 1
n

W
i 1
i

Example: A student obtained the following percentage in an examination: English 60, Biology
75, Mathematics 63, Physics 59, and chemistry 55.Find the students weighted arithmetic mean if
weights 1, 2, 1, 3, 3 respectively are allotted to the subjects.
Solutions:

Page 21 of 92
5

X W i i
60 * 1  75 * 2  63 * 1  59 * 3  55 * 3 615
Xw  i 1
   61.5
1 2  1 3  3
5
10
W
i 1
i

Merits and Demerits of Arithmetic Mean


Merits:
 It is rigidly defined.
 It is based on all observation.
 It is suitable for further mathematical treatment.
 It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.
 It is easy to calculate and simple to understand.
Demerits:
 It is affected by extreme observations.
 It cannot be used in the case of open end classes.
 It cannot be determined by the method of inspection.
 It cannot be used when dealing with qualitative characteristics, such as intelligence, honesty,
beauty.
 It can be a number which does not exist in a serious.
 Sometimes it leads to wrong conclusion if the details of the data from which it is obtained are
not available.
 It gives high weight to high extreme values and less weight to low extreme values.
The Geometric Mean

 The geometric mean of a set of n observation is the nth root of their product.
 The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
G.M  n X1 * X2 * ... * Xn
 Taking the logarithms of both sides
1
log(G.M)  log(n X 1 * X 2 * ... * X n )  log(X 1 * X 2 * ... * X n ) n
1 1
 log(G.M)  log(X 1 * X 2 * .... * X n )  (log X 1  log X 2  ...  log X n )
n n
n
1
 log(G.M)   log X i
n i1

Page 22 of 92
 The logarithm of the G.M of a set of observation is the arithmetic mean of their logarithm.
1 n
 G.M  Anti log(  log X i )
n i1

Example: Find the G.M of the numbers 2, 4, 8.


Solutions: G.M  n X1 * X2 * ... * Xn  3 2 * 4 * 8  3 64  4
Remark: The Geometric Mean is useful and appropriate for finding averages of ratios.
The Harmonic Mean
The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
n
H.M  n , This is called simple harmonic mean.
1

i 1 X i

In a case of frequency distribution:


k
n
H.M  k
fi
, n   fi

i 1 X i
i 1

If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their harmonic
mean is given by
n

W i
H.M  n
i 1
, This is called Weighted Harmonic Mean.
W
i 1
i Xi

Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average
rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the
college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M   12km/hr
1 1

10 15

Page 23 of 92
2.2.2 The mode
Mode is a value which occurs most frequently in a set of values
The mode may not exist and even if it does exist, it may not be unique.
In case of discrete distribution the value having the maximum frequency is the model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9. Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
- The mode of a set of numbers X1, X2, … Xn is usually denoted by X̂ .

Mode for Grouped data


If data are given in the shape of continuous frequency distribution, the mode is defined as:

 1  Where:
X̂  L mo  w 
 1   2  Xˆ  the mod e of the distribution
w  the size of the mod al class
 1  f mo  f 1
 2  f mo  f 2
f mo  frequencyof the mod al class
f 1  frequencyof the class preceedingthe mod al class
f 2  frequencyof the class following the mod al class

Note: The modal class is a class with the highest frequency.


Example: Following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
Size of farms No. of farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3

Page 24 of 92
Solutions:
45  55 is the mod al class,sin ce it is a class with thehighestfrequency.
L mo  45
w  10
 1  f mo  f 1  2
 2  f mo  f 2  26
f mo  31
f 1  29
f2  5

ˆ  45  10 2 
X
 2  26 
 45.71

Merits and Demerits of Mode


Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class
Demerits:
 It is not rigidly defined.

 It is not based on all observations

 It is not suitable for further mathematical treatment.


 It is not stable average, i.e. it is affected by fluctuations of sampling to some
extent.
 Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most popular
size in studies relating to marketing, trade, business, and industry. It is the appropriate average to
be used to find the ideal size.
2.2.3 The Median
In a distribution, median is the value of the variable which divides it in to two equal halves.
In an ordered series of data median is an observation lying exactly in the middle of the series. It is the
middle most value in the sense that the number of values less than the median is equal to the number
of values greater than it.
Page 25 of 92
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will be X[1], X[2],
…X[n], where X[i] is ith smallest value.
 X[1]< X[2]< …<X[n]
-Median is denoted by X̂ .
Median for ungrouped data
 X ( n1) 2  ,If n is odd.
~ 1
X   (X  X ), If n is even
 2 n 2  ( n 2 )  1


Example: Find the median of the following numbers.
a) 6, 5, 2, 8, 9, 4.
b) 2, 1, 8, 3, 5, 8.
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9. Here n=6
~ 1
X  (X n  X n )
2 [2] [  1]
2

1
 ( X [3]  X [ 4 ] )
2
1
 ( 5  6)  5.5
2
b) Order the data:1, 2, 3, 5, 8. Here n=5
~ X
X n 1
[ ]
2
 X [3]
3

Median for grouped data


If data are given in the shape of continuous frequency distribution, the median is defined as:
~ w n
X  L med  (  c)
f med 2
Where :
L med  lower class boundary of the median class.
w  the size of the median class
n  total number of observations.
c  the cumulativefrequency( less than type) preceeding the median class.
f med  thefrequency of the median class.

Page 26 of 92
Remark: The median class is the class with the smallest cumulative frequency (less than type) greater

n
than or equal to .
2
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
 Find median using formula.
Class Frequency Cumu.Freq(less
than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75

n 75
  37.5
2 2
39 is the first cumulative frequencyto be greater thanor equalto 37.5
 50  54 is the median class.

L  49.5, w  5
med
n  75, c  17, f  22
med

Page 27 of 92

~
X L w ( n  c)
med f 2
med
 49.5  5 (37.5  17)
22
 54.16

Merits and Demerits of Median


Merits:
 Median is a positional average and hence not influenced by extreme observations.
 Can be calculated in the case of open end intervals.
 Median can be located even if the data are incomplete.
Demerits:
 It is not a good representative of data if the number of items is small.
 It is not amenable to further algebraic treatment.
 It is susceptible to sampling fluctuations.
2.3 Measures of location: Quantiles (Quartiles, Deciles and Percentiles)
When a distribution is arranged in order of magnitude of items, the median is the value of the middle
term. Their measures that depend up on their positions in distribution quartiles, deciles, and percentiles
are collectively called quantiles.
Quartiles: are measures that divide the frequency distribution in to four equal parts.
The value of the variables corresponding to these divisions are denoted Q1, Q2, and Q3 often called the
first, the second and the third quartile respectively.
Q1 is a value which has 25% items which are less than or equal to it. Similarly Q2 has 50%items with
value less than or equal to it and Q3 has 75% items whose values are less than or equal to it.
- To find Qi (i=1, 2, 3) we count iN of the classes beginning from the lowest class.
4
- For grouped data: we have the following formula
Q  L Q  w ( iN  c) ,i  1,2,3
i i fQ 4
i
Where :
L Q  lower class boundary of thequartile class.
i
w  thesize of thequartile class
N  total number of observations.
c  thecumulativefrequency(lessthan type) preceedingthequartile class.
f Q  thefrequency of thequartile class.
i

Page 28 of 92
Remark: The quartile class (class containing Qi) is the class with the smallest cumulative frequency

(less than type) greater than or equal to iN .


4
Deciles: are measures that divide the frequency distribution in to ten equal parts.
The values of the variables corresponding to these divisions are denoted D1, D2,.. D9 often called the
first, the second,…, the ninth decile respectively.
iN
To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.
10
- For grouped data: we have the following formula
w iN
Di  L Di  (  c ) , i  1,2,...,9
f Di 10
Where :
L Di  lower class boundaryof the decile class.
w  the size of the decileclass
N  total number of observations.
c  the cumulativefrequency(less than type) preceedingthe decile class.
f Di  thefrequency of the decile class.

Remark: The decile class (class containing Di )is the class with the smallest cumulative frequency (less

than type) greater than or equal to iN .


10
Percentiles: are measures that divide the frequency distribution in to hundred equal parts.
The values of the variables corresponding to these divisions are denoted P 1, P2,.. P99 often called the
first, the second,…, the ninety-ninth percentile respectively.
To find Pi (i=1, 2,..99) we count iN of the classes beginning from the lowest class.
100

- For grouped data: we have the following formula


w iN
Pi  L Pi  (  c) , i  1,2,...,99
f Pi 100
Where :
L Pi  lower class boundary of the percentile class.
w  the size of the percentile class
N  total number of observations.
c  the cumulativefrequency( less than type) preceedingthe percentile class.
f Pi  thefrequency of the percentileclass.

Page 29 of 92
Remark: The percentile class (class containing Pi )is the class with the smallest cumulative
frequency (less than type) greater than or equal to iN .
100

Example: Considering the following distribution and Calculate:


a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
 First find the less than cumulative frequency.
 Use the formula to calculate the required quantile.
Values Frequency Cum.Freq(less than type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493

Page 30 of 92
a) Quartiles:
i. Q1
- determine the class containing the first quartile.
N
 123.25
4
 170  180 is the class containingthe first quartile.

w N
 Q1  LQ1  (  c)
LQ  170 ,
1
w 10 fQ 4
1

N  493 , c  88 , f Q  72
1  170 
10
(123.25  88)
72
 174.90

ii. Q2
- determine the class containing the second quartile.
2* N
 246.5
4
 190  200 is the class containingthe sec ond quartile.
LQ  190 ,
2
w 10
N  493 , c  244 , f Q 107
2

w 2* N
 Q2  LQ  (  c)
2
fQ
2
4
10
 170  ( 246.5  244)
72
 190.23

iii. Q3
- determine the class containing the third quartile.
3* N
 369.75
4
 200  210 is the class containingthe third quartile.

Page 31 of 92
LQ  200 ,
3
w 10
N  493 , c  351 , f Q  49
3

w 3* N
 Q3  LQ 3  (  c)
fQ 4
3

10
 200  (369.75  351)
49
 203.83

b) D7
- determine the class containing the 7th decile.
7* N
 345.1
10
190  200 is the class containingthe seventh decile.
LD  190 ,
7
w 10
N  493 , c  244 , f D 107
7

w 7* N
 D7  LD  (  c)
7
f D 10
7

10
 190  (345.1  244)
107
 199.45

c) P90
- determine the class containing the 90th percentile.
90 * N
 443.7
100
 220  230 is the class containingthe 90th percentile.

LP  220 ,
90
w 10
N  493 , c  434 , f P  3107
90

Page 32 of 92
w 90 * N
 P90  LP  (  c)
90
f P 100
90

10
 220  (443.7  434)
31
 223.13

2.4 Measures of Dispersion/Variation


Introduction and objectives of measuring Variation
-The scatter or spread of items of a distribution is known as dispersion or variation. In other
words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.
-Measures of dispersions are statistical measures which provide ways of measuring the extent in
which data are dispersed or spread out.
Objectives of measuring Variation:
 To judge the reliability of measures of central tendency
 To control variability it-self.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.
Absolute and Relative Measures of Dispersion
The measures of dispersion which are expressed in terms of the original unit of a series are
termed as absolute measures. Such measures are not suitable for comparing the variability of two
distributions which are expressed in different units of measurement and different average size.
Relative measures of dispersions are a ratio or percentage of a measure of absolute dispersion to
an appropriate measure of central tendency and are thus pure numbers independent of the units
of measurement. For comparing the variability of two distributions (even if they are measured in
the same unit), we compute the relative measure of dispersion instead of absolute measures of
dispersion.
Types of Measures of Dispersion
Various measures of dispersions are in use. The most commonly used measures of dispersions
are:
1) Range and relative range

Page 33 of 92
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.
2.4.1 Range, Variance, Standard deviation and coefficient of variation
The Range (R): is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture
of the scores. The following two distributions have the same range, 13, yet appear to differ
greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.

R  LS , L  l arg est observation


S  smallestobservation

Range for grouped data:


If data are given in the shape of continuous frequency distribution, the range is computed as:
R  UCLk  LCL1 , UCLk is upperclasslim it of the last class.
UCL1 is lower class lim it of the first class.
This is sometimes expressed as:

R  X k  X1 , X k is class mark of thelast class.


X 1 is classmark of the first class.
Merits and Demerits of range
Merits:
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Demerits:
 It is not based on all observation.
 It is highly affected by extreme observations.
 It is affected by fluctuation in sampling.

Page 34 of 92
 It is not liable to further algebraic treatment.
 It cannot be computed in the case of open end distribution.
 It is very sensitive to the size of the sample.
The Variance
Population Variance: If we divide the variation by the number of values in the population, we
get something called the population variance. This variance is the "average squared deviation
from the mean".

1
Population Varince   2   ( X i   ) 2 , i  1,2,.....N
N

For the case of frequency distribution it is expressed as:

1
Population Varince   2   f i ( X i   ) 2 , i  1,2,.....k
N

Sample Variance: One would expect the sample variance to simply be the population variance
with the population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem that the
estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.

1
Sample Varince  S 2   ( X i  X ) 2 , i  1,2,....., n
n 1

For the case of frequency distribution it is expressed as:

1
Sample Varince  S 2   f i ( X i  X ) 2 , i  1,2,.....k
n 1

We usually use the following short cut formula.

n
 X i  nX 2
2

S2  i 1
, for raw data.
n 1

Page 35 of 92
k
 f i X i  nX 2
2

S2  i 1
, for frequency distribution.
n 1

Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.

Populations tan dard deviation    2


Samples tan dard deviation  s  S2

The following steps are used to calculate the sample variance:


1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number of
observations minus one, i.e., n-1 (where n is equal to the number of observations in the
data set).
Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10.
2. The data is given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3

Page 36 of 92
Solutions:
1. X  11
Xi 5 10 12 17 Total

(Xi- X)2 36 1 1 36 74

n
 ( X i  X )2 74
 S 2  i 1   24.67.
n 1 3
 S  S 2  24.67  4.97.
2. X  55
Xi(C.M) 42 47 52 57 62 67 72 Total
1183 640 198 60 588 864 867 4400
fi(Xi- X)2
n
 fi ( X i  X )2 4400
 S2  i 1
  59.46.
n 1 74
S  S2  59.46  7.71.
Special properties of Standard deviations

1.  ( X i  X )2   ( X i  A) 2 ,A X
n 1 n 1
2. For normal (symmetric distribution the following holds.
 Approximately 68.27% of the data values fall within one standard deviation of the mean. i.e.
with in ( X  S , X  S )

 Approximately 95.45% of the data values fall within two standard deviations of the mean. i.e.
with in ( X  2 S , X  2 S )

 Approximately 99.73% of the data values fall within three standard deviations of the mean.
i.e. with in ( X  3S , X  3S )

3. Chebyshev's Theorem

Page 37 of 92
For any data set ,no matter what the pattern of variation, the proportion of the values that fall

with in k standard deviations of the mean or ( X  kS , X  kS ) will be at least 1  1 ,


2
k
where k is an number greater than 1. i.e. the proportion of items falling beyond k standard

deviations of the mean is at most 1


k2
Example: Suppose a distribution has mean 50 and standard deviation 6.What percent of the
numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Solutions:
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
 ks  12
12 12
 k    2
S 6

 Applying the above theorem at least (1  1 ) *100%  75% of the numbers lie between 38 and 62.
k2
b) Similarly done.
c) It is just the complement of a) i.e. at most 1 *100%  25% of the numbers lie less
2
k
than 32 or more than 62.
d) Similarly done.
Example 2: The average score of a special test of knowledge of wood refinishing has a mean of
53 and standard deviation of 6. Find the range of values in which at least 75% the scores will lie.
(Exercise)
4. If the standard deviation of X 1 , X 2 , .....X n is S , then the standard deviation of

a) X 1  k , X 2  k , .....X n  k will alsobe S

b) kX1 , kX 2 , .....kX n would be k S

c) a  kX1 , a  kX 2 , .....a  kX n would be k S

Page 38 of 92
Exercise: Verify each of the above relationship, considering k and a as constants.
Examples:
1. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the
variance and standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be
the variance and standard deviation of the new set?
Solutions:
1. a. They will remain the same.

b. New standard deviation=  k S  5 *10  50


Coefficient of Variation (C.V): Is defined as the ratio of standard deviation to the mean usually
expressed as percents.
S
C.V  *100
X
 The distribution having less C.V is said to be less variable or more consistent.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to the
same industry gives the following results.
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions: Calculate coefficient of variation for both firms.
SA 10
C.VA  *100  *100  19.05%
XA 52.5

SB 11
C.VB  *100  *100  23.16%
XB 47.5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2. A meteorologist interested in the consistency of temperatures in three cities during a given week
collected the following data. The temperatures for the five days of the week in the three cities were

Page 39 of 92
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Exercise: Which city have the most consistent temperature, based on these data?
2.4.2 Standard Scores (Z-scores)

 If X is a measurement from a distribution with mean X and standard deviation S, then


its value in standard units is
X 
Z  , for population .

X X
Z  , for sample
S
 Z gives the deviations from the mean in units of standard deviation
 Z gives the number of standard deviation a particular observation lie above or below the
mean.
 It is used to compare two observations coming from different groups.
Examples:
1. Two sections were given introduction to statistics examinations. The following information
was given.
Value Section 1 Section 2
Mean 78 90
Stan.deviation 6 5
Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking
who performed better?
Solutions: Calculate the standard score of both students.
X A  X1 90  78
ZA   2
S1 6
XB  X2 95  90
ZB   1
S2 5
 Student A performed better relative to his section because the score of student A is two
standard deviation above the mean score of his section while, the score of student B is only one
standard deviation above the mean score of his section.
2. Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:

Page 40 of 92
Value Group one Group two

Mean 10.4 min 11.9 min

Stan.dev. 1.2 min 1.3 min

Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two
take 9.3 minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
S1 1.2
C.V1  *100  *100  11.54%
X1 10.4

S2 1.3
C.V2  *100  *100  10.92%
X2 11.9
Since C.V2 < C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B
X A  X 1 9.2  10.4
ZA    1
S1 1.2
XB  X2 9.3  11.9
ZB    2
S2 1.3
Child B is faster because the time taken by child B is two standard deviation shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.

Page 41 of 92
Chapter Three
3 Elementary Probability
3.1 Deterministic and non-deterministic models
A model is an approximate representation of a physical situation. It attempts to explain observed
behavior using a set of simple and understandable rules. These rules can be used to predict the
outcome of experiments involving the given physical situation. A useful model explains all
relevant aspects of a given situation. Such models can be used instead of experiments to answer
questions regarding the given situation. Models therefore allow the engineer to avoid the costs of
experimentation, namely, labor, equipment, and time.
The two basic types of models are deterministic models and non-deterministic models
(probability models)
Deterministic models used when the observational phenomenon has measurable properties. In
this type of model conditions under which an experiment is carried out determine the exact
outcome of the experiment. The solution of a set of mathematical equations specifies the exact
outcome of the experiment. As a simple example, suppose we are measuring the flow of current
in a thin copper wire, our model for this phenomena might be Ohm’s low that:
Current =Voltage/Resistance
Or

We call this deterministic model because it assumes that the interaction between these idealized
components is completely described by this formula or laws.
If an experiment involving the measurement of a set of voltages is repeated a number of times
under the same conditions, this theory predicts that the observations will always be exactly the
same.
However, if we performed this measurement process more than once, perhaps at different times,
or even on different days, the observed current could differ slightly because of small changes or
variations in factors that are not completely controlled, such as changes in ambient temperature,

Page 42 of 92
fluctuations in performance of the gauge, small impurities present at different locations in the
wire and drifts in the voltage source. Consequently, a more realistic model of the observed
current might be

Where is a term added to the model to account for the fact that the observed values of current
flow do not perfectly conform to the deterministic model .We can think of as a term that
includes the effects of all of the unmodeled sources of variability that affect this system.

3.2 Review of set theory: sets, union, intersection, complementation, De Morgan’s rules

The term set refers to a well-defined collection of objects that share a certain property or certain
properties. The term “well-defined” here means that the set is described in such a way that one
can decide whether or not a given object belongs in the set. If is a set, then the objects of the
collection are called the elements or members of the set . If is an element of the set , we
write . If is not an element of the set , we write .
As a convention, we use capital letters to denote the names of sets and lowercase letters for
elements of a set.
Note that for each objects and each set , exactly one of or but not both must be
true.
Example a.The set of counting numbers less than ten.
b.The set of letters in the word “Addis Ababa.”
c. The set of all countries in Africa.
Union
The union of two events is the event that consists of all outcomes that are contained in either of
the two events. We denote the union as:

Example: Consider an experiment in which you select a molded plastic part, such as a
connector, and measure its thickness. The possible values for thickness depend on the resolution
of the measuring instrument, and they also depend on upper and lower bounds for thickness. If
the objective of the analysis is to consider only whether or not the parts conform to the
manufacturing specifications, either part may or may not conform. We abbreviate yes and no as y
and n. If the ordered pair yn indicates that the first connector conforms and the second does not,
the set of all outcomes can be represented by the four outcomes:

Page 43 of 92
Suppose that the set of all outcomes for which at least one part conforms is denoted as E1. Then,

. The event, in which both parts do not conform, denoted as , contains only the
single outcome, * + Other examples of events are * + , the null set, and . If
* +

Intersection
The intersection of two events is the event that consists of all outcomes that are contained in both
of the two events. We denote the intersection as: .
* +.

Complementation
The complement of an event in a sample space is the set of outcomes in the sample space that are
not in the event. We denote the complementation of the event .
* +

De Morgan’s rules
De Morgan’s laws imply that

Example: ( ) * + * +

3.3 Random experiments, sample space and events


3.3.1 Random Experiments
An experiment that can result in different outcomes, even though it is repeated in the same
manner every time, is called a random experiment.
Example: If we measure the current in a thin copper wire, we are conducting an experiment.
However, in day-to-day repetitions of the measurement the results can differ slightly because of
small variations in variables that are not controlled in our experiment, including changes in
ambient temperatures, slight variations in gauge and small impurities in the chemical
composition of the wire if different locations are selected, and current source drifts.

Page 44 of 92
However, no matter how carefully our experiment is designed and conducted, the variation is
almost always present, and its magnitude can be large enough that the important conclusions
from our experiment are not obvious.

3.3.2 Sample space and events


To model and analyze a random experiment, we must understand the set of possible outcomes
from the experiment.

The set of all possible outcomes of a random experiment is called the sample space of the
experiment. The sample space is denoted as S.

Example: Consider an experiment in which you select a molded plastic part, such as a
connector, and measure its thickness. The possible values for thickness depend on the resolution
of the measuring instrument, and they also depend on upper and lower bounds for thickness.
However, it might be convenient to define the sample space as simply the positive real line

because a negative value for thickness cannot occur. If it is known that all connectors will be
between 10 and 11 millimeters thick, the sample space could be

If the objective of the analysis is to consider only whether a particular part is low, medium, or
high for thickness, the sample space might be taken to be the set of three outcomes:

If the objective of the analysis is to consider only whether or not a particular part conforms to the
manufacturing specifications, the sample space might be simplified to the set of two outcomes

If the objective of the analysis is to consider only whether or not the parts conform to the
manufacturing specifications, either part may or may not conform. We abbreviate yes and no as y
and n. If the ordered pair yn indicates that the first connector conforms and the second does not,
the sample space can be represented by the four outcomes:

In random experiments in which items are selected from a batch, we will indicate whether or not
a selected item is replaced before the next one is selected. For example, if the batch consists of

Page 45 of 92
three items {a, b, c} and our experiment is to select two items without replacement, the sample

space can be represented as

If items are replaced before the next one is selected, the sampling is referred to as with

replacement. Then the possible ordered outcomes are

An event is a subset of the sample space of a random experiment.


Consider the sample space S= {yy, yn, ny,nn} in the above Example. Suppose that the set of all
outcomes for which at least one part conforms is denoted as E1. Then,

3.4 Finite sample spaces and equally likely outcomes


Finite sample spaces
Sample space can be
 Countable ( finite or infinite)
 Uncountable.
Example: .What is the sample space for the following experiment
a) Toss a die one time.
b) Toss a coin two times.
c) A light bulb is manufactured. It is tested for its life length by time.
Solution
a) S={1,2,3,4,5,6} (finite)
b) S={(HH),(HT),(TH),(TT)} (finite)
c) S={t /t≥0} (Uncountable)
Equally Likely Events: Events which have the same chance of occurring.

Example: In the experiment of tossing a die one time the events 1, 2, 3, 4, 5, 6 have the
same chance of occurring, so they are equally likely events.
3.5 Counting techniques
As sample spaces become larger, complete enumeration is difficult. Instead, counts of the
number outcomes in the sample space and in various events are often used to analyze the random
experiment. These methods are referred to as counting techniques.
In order to determine the number of outcomes, one can use several rules of counting.
The multiplication rule, Permutation rule, Combination rule

Page 46 of 92
The Multiplication Rule: If a choice consists of k steps of which the first can be made in n1
ways, the second can be made in n2 ways…, the kth can be made in nk ways, then the whole

choice can be made in (n1 * n2 * ........* nk ) ways.


Example: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many
different cards are possible if
A).Repetitions are permitted. B).Repetitions are not permitted.
Solutions
a)
1st digit 2nd digit 3rd digit 4th digit
5 5 5 5
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.

 5 * 5 * 5 * 5  625 different cards are possible.

b)
1st digit 2nd digit 3rd digit 4th digit
5 4 3 2
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.
 5*4*3*2=120 different cards are possible
Permutation: An arrangement of n objects in a specified order is called permutation of the
objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! n * (n  1) * (n  2) * .....* 3 * 2 *1

Page 47 of 92
2. The arrangement of n objects in a specified order using r objects at a time is called the

permutation of n objects taken r objects at a time. It is written as n Pr and the formula

n!
is n Pr 
( n  r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike ---- etc is
n!
n Pr 
k1!*k 2 * ...* k n
Example:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?
Solutions:
1.
a)
Here n  4, there are four disnict object
 There are 4! 24 permutations.
Here n  4, r2
b) 4! 24
 There are 4 P2    12 permutations.
(4  2)! 2
2.
Here n  10
Of which 2 are C , 2 are O, 2 are R ,1E ,1T ,1I ,1N
 K1  2, k 2  2, k3  2, k 4  k5  k 6  k 7  1
U sin g the 3rd rule of permutatio n , there are
10!
 453600 permutatio ns.
2!*2!*2!*1!*1!*1!*1!
Combination
A selection of objects with-out regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two
letters.

Page 48 of 92
Solutions:
Permutation Combination
AB BC
AB BA CA DA
AC BD
AC BC CB DB
AD DC
AD BD CD DC

Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule

The number of combinations of r objects selected from n objects is denoted by C or  n  and


n r 
r 
 
is given by the formula:

 n n!

r  (n  r )!*r!
 
Examples:
1. In how many ways a committee of 5 people be chosen out of 9 people?
Solutions:
n9 , r 5
n n! 9!

r
  ( n  r )!*r!  4!*5!  126 ways
 
2. Among 15 clocks there are two defectives .In how many ways can an inspector chose three
of the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:
n  15 of which 2 are defective and 13 are non  defective.
r 3

a) If there is no restriction select three clocks from 15 clocks and this can be done in :

Page 49 of 92
n  15 , r 3
n n! 15!

r
  ( n  r )!*r!  12!*3!  455 ways
 
b) None of the defective clocks is included.
This is equivalent to zero defective and three non-defectives, which can be done in:
 2  13 

 0 *
   286 ways.
  3
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non-defectives, which can be done in:
 2  13 

1  *
   156 ways.
   2
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
 2  13 

 2 *
   13 ways.
   3
3.6 Definitions of probability
Probability is used to quantify the likelihood, or chance, that an outcome of a random experiment
will occur. A probability 0 indicates an outcome will not occur. A probability of 1 indicates an
outcome will occur with certainty.

Subjective probability definition is defining the probability based on degree of belief that the
outcome will occur. Different individuals will no doubt assign different probabilities to the same
outcomes.

The classical approach


This approach is used when:
- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted and out of these
NA outcomes are favourable to the event A, then the probability that event A occur denoted
P(A) is defined as:

Page 50 of 92
N A No. of outcomes favourableto A n( A)
P( A)   
N Total numberof outcomes n( S )
Examples:
1. A fair die is tossed once. What is the probability of getting
A).Number 4? B) An odd number? C).An even number? D).Number 8?
Solutions: First identify the sample space, say S
S  1, 2, 3, 4, 5, 6
 N  n( S )  6
a) Let A be the event of number 4
A  4
 N A  n( A)  1
n( A)
P ( A)  1 6
n( S )
b) Let A be the event of odd numbers
A  1,3,5
 N A  n( A)  3
n( A)
P ( A)   3 6  0.5
n( S )
c) Let A be the event of even numbers
A  2,4,6
 N A  n( A)  3
n( A)
P( A)   3 6  0.5
n( S )
d) Let A be the event of number 8
A Ø

 N A  n( A)  0
n( A)
P ( A)  0 60
n( S )

Page 51 of 92
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of this
candles are selected at random, what is the probability
A). All will be defective. B). 6 will be non-defective C). All will be non-defective
Solutions:
 80 
Total selection  
 10 
  N  n( S )
 
a) Let A be the event that all will be defective.
 30   50 
Total way in which A occur    10 
*
 
  N A  n( A)
   0 
 30   50 

 10 
* 0 

 P ( A) 
n( A)
      0.00001825
n( S )  80 

 10 

 
b) Let A be the event that 6 will be non-defective.
 30   50 
Total way in which A occur    4 *
 
  N A  n( A)
   6 
 30   50 

 4 * 

    6   0.265
n( A)
 P ( A) 
n( S )  80 

 10 

 
c) Let A be the event that all will be non-defective.
 30   50 
Total way in which A occur    0 *
 
  N A  n( A)
   10 
 30   50 

 0 * 10 

 P ( A) 
n( A)
      0.00624
n( S )  80 

 10 

 
Exercises:
1. What is the probability that a waitress will refuse to serve alcoholic beverages to only
three minors if she randomly checks the I.D’s of five students from among ten students of
which four are not of legal age?
Short coming of the classical approach:
This approach is not applicable when:
- The total number of outcomes is infinite.
- Outcomes are not equally likely.
Page 52 of 92
The Frequentist Approach
This is based on the relative frequencies of outcomes belonging to an event.
Definition: The probability of an event A is the proportion of outcomes favourable to A in the
long run when the experiment is repeated under same condition.
NA
P ( A)  lim
N  N
Example: If records show that 60 out of 100,000 bulbs produced are defective. What is
the probability of a newly produced bulb to be defective?
Solution: Let A be the event that the newly produced bulb is defective.
NA 60
P ( A)  lim   0.0006
N  N 100,000

Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.

1. P( A)  0
2. P( S )  1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e.
P( A  B)  P( A)  P( B)
4. P( A' )  1  P( A)
5. 0  P( A)  1
6. P(ø) =0, ø is the impossible event.

In general p( A  B)  p( A)  p( B)  p( A  B)

Page 53 of 92
Chapter Four
4 Conditional Probability and Independence
4.1 Conditional Probability
The conditional probability of an event B given an event A, denoted as ( ⁄ ), is

( )
( ⁄ ) ( ) .
( )

This definition can be understood in a special case in which all outcomes of a random experiment
are equally likely. If there are n total outcomes,

Example1: A day’s production of 850 manufactured parts contains 50 parts that do not meet
customer requirements. Two parts are selected randomly without replacement from the batch.
What is the probability that the second part is defective given that the first part is defective?
Solution: Let A denote the event that the first part selected is defective, and let B denote the
event that the second part selected is defective. The probability needed can be expressed
as ( ⁄ ). If the first part is defective, prior to selecting the second part, the batch contains 849
parts, of which 49 are defective, therefore
( ⁄ ) ⁄
Example2: Continuing the previous example, if three parts are selected at random, what is the
probability that the first two are defective and the third is not defective? This event can be
described in shorthand notation as simply P(ddn). We have

( )

The third term is obtained as follows. After the first two parts are selected, there are 848
remaining. Of the remaining parts, 800 are not defective. In this example, it is easy to obtain the
solution with a conditional probability for each selection.

Page 54 of 92
4.2 Multiplication theorem, Total probability theorem and Bayes’ Theorem
Multiplication Rule
( ) ( ⁄ ) ( ) ( ⁄ ) ( )
The multiplication rule is useful for determining the probability of an event that depends on other
events.
Example: The probability that an automobile battery subject to high engine compartment
temperature suffers low charging current is 0.7. The probability that a battery is subject to high
engine compartment temperature is 0.05. Let C denote the event that a battery suffers low
charging current, and let T denote the event that a battery is subject to high engine compartment
temperature. The probability that a battery is subject to low charging current and high engine
compartment temperature is

( ) ( ⁄ ) ( )

Total probability theorem


For any events A and B,

( ) ( ) ( ) ( ⁄ ) ( ) ( ⁄ ) ( )

Example: suppose that in semiconductor manufacturing the probability is 0.10 that a chip that is
subjected to high levels of contamination during manufacturing causes a product failure. The
probability is 0.005 that a chip that is not subjected to high contamination levels during
manufacturing causes a product failure. In a particular production run, 20% of the chips are
subject to high levels of contamination. What is the probability that a product using one of these
chips fails?
Let F denote the event that the product fails, and let H denote the event that the chip is exposed
to high levels of contamination. The requested probability is P(F), and the information provided
can be represented as

which can be interpreted as just the weighted average of the two probabilities of failure.

Page 55 of 92
The reasoning used to develop the above equation can be applied more generally. In the
development of the above equation we only used the two mutually exclusive A and . However
the fact that =S, the entire sample space, was important.

In general a collection of sets such that said to be


exhaustive.
Assume that are k mutually exclusive and exhaustive sets. Then
( ) ( ) ( ) ( )
( ⁄ ) ( ) ( ⁄ ) ( ) ( ⁄ ) ( )
Example: In a particular production run, 20% of the chips are subjected to high levels of
contamination, 30% to medium levels of contamination, and 50% to low levels of contamination.
What is the probability that a product using one of these chips fails?

Let H denote the event that a chip is exposed to high levels of contamination
M denote the event that a chip is exposed to medium levels of contamination
L denote the event that a chip is exposed to low levels of contamination Then,
( ) ( ⁄ ) ( ) ( ⁄ ) ( ) ( ⁄ ) ( )
=0.10(0.20)+0.01(0.30)+0.001(0.50)=0.0235
Bayes’ Theorem
In some examples, we do not have a complete table of information. We might know one
conditional probability but would like to calculate a different one. Consider the following that
Continuing with the semiconductor manufacturing example, assume the following probabilities
for product failure subject to levels of contamination in manufacturing:

In this problem, we might ask the following: If the semiconductor chip in the product fails, what
is the probability that the chip was exposed to high levels of contamination? From the definition
of conditional probability,
( ) ( ⁄ ) ( ) ( ) ( ⁄ ) ( )
Now considering the second and last terms in the expression above, we can write

Page 56 of 92
( ⁄ ) ( )
( ⁄ ) ( )
( )
This is a useful result that enables us to solve for in terms of ( ⁄ ) in terms of ( ⁄ )
In general, if ( ) in the denominator of Equation 2-11 is written using the Total Probability
Rule in Equation 2-8, we obtain the following general result, which is known as Bayes’Theorem.
If are k mutually exclusive and exhaustive events and B is any event,

( ⁄ ) ( )
( ⁄ ) ( )
( ) ( ) ( ) ( ) ( ) ( )

Example: We can answer the question posed at the start of this section as follows: The
probability requested can be expressed as ( ⁄ ). Then,

( ⁄ ) ( ) ( )
( ⁄ )
( )

4.3 Independent Events


In some cases, the conditional probability of might equal P(B). In this special case, knowledge
that the outcome of the experiment is in event A does not affect the probability that the outcome
is in event B.

Two events are independent if any one of the following equivalent statements is true:

1. ( ⁄ ) ( )
2. ( ⁄ ) ( )
3. ( ) ( ) ( )

Example: A day’s production of 850 manufactured parts contains 50 parts that do not meet
customer requirements. Two parts are selected at random, without replacement, from the batch.
a) Find the probability that the second part selected is defective
b) Find the probability that the second part selected is defective given that the first part is
defective.
Let A denote the event that the first part is defective, and let B denote the event that the second
part is defective. Indeed ( ⁄ ) and the ( )

When considering three or more events, we can extend the definition of independence with the
following general result.

Page 57 of 92
The events are independent if and only if for any subset of these events
,
( ) ( ) ( ) ( )
Example: The following circuit operates only if there is a path of functional devices from left to
right. The probability that each device functions is shown on the graph. Assume that devices fail
independently. What is the probability that the circuit operates?

Let T and B denote the events that the top and bottom devices operate, respectively. There is a
path if at least one device operates. The probability that the circuit operates is
( ) ,( )- ( )

a simple formula for the solution can be derived from the complements and From the
independence assumption,
( ) ( ) ( ) ( )

( )

Chapter Five
5 One-dimensional Random Variables
5.1 Random variable: definition and distribution function
Definition
A random variable is a function that assigns a real number to each outcome in the sample space
of a random experiment. A random variable is denoted by an uppercase letter such as X. After an
experiment is conducted, the measured value of the random variable is denoted by a lowercase
letter such as x=70 mill amperes.

5.2 Discrete random variables


A discrete random variable is a random variable with a finite (or countably infinite) range.
In other experiments, we might record a count such as the number of transmitted bits that are
received in error. Then the measurement is limited to integers. Or we might record that a
proportion such as 0.0042 of the 10,000 transmitted bits were received in error. Then the

Page 58 of 92
measurement is fractional, but it is still limited to discrete points on the real line. Whenever the
measurement is limited to discrete points on the real line, the random variable is said to be a
discrete random variable.
Examples of discrete random variables: number of scratches on a surface, proportion of defective
parts among 1000 tested, number of transmitted bits received in error.

5.3 Continuous random variables


A continuous random variable is a random variable with an interval (either finite or infinite) of
real numbers for its range. The range of the random variable includes all values in an interval of
real numbers; that is, the range can be thought of as a continuum.

Sometimes a measurement (such as current in a copper wire or length of a machined part) can
assume any value in an interval of real numbers (at least theoretically). Then arbitrary precision
in the measurement is possible. Of course, in practice, we might round off to the nearest tenth or
hundredth of a unit. The random variable that represents this measurement is said to be a
continuous random variable.

Examples of continuous random variables: electrical current, length, pressure, temperature, time,
voltage, weight

5.4 Cumulative distribution function and its properties


The cumulative distribution function of X is denoted as F(x) and given by:

( ) ( ) ∑ ( ), for discrete random variable X

( ) ( ) ∫ ( ), for continuous random variable X

For a random variable X, ( ) satisfies the following properties.

1. ( ) ( ) ∑ ( ), for discrete random variable X

2. ( ) ( ) ∫ ( ), for continuous random variable X


3. ( )
4. If , then ( ) ( )

Page 59 of 92
Chapter Six
6 Functions of Random Variables
6.1 Equivalent events
Let be the set of all values of x such that if x is in , then ( ) is in A, and let be the set
of all values of y such that if y is in , then ( ) is in B we called and the equivalent
events of A and B.

If x is a random variable, then ( ) is also a random variable. The notion of equivalent


event allows us to derive expressions for the CDF and pdf of y in-terms of the CDF and pdf of x.

6.2 Functions of discrete random variables and their distributions


For a discrete random variable X with possible values a probability mass
function ( ) ( ) is a function such that
1. ( )
2. ∑ ( )
Example1: The following are number of messages sent per hour over a computer network on 28
hours.
10, 14, 10, 11, 13, 11, 14, 13, 12, 14, 12, 15, 13, 12,
12, 11, 11, 12, 12, 13, 13, 13, 14, 14, 12, 14, 12, 15
Let X is a random variable that the number of messages sent per hour over a computer network
on 28 hours. Find probability mass function and the distribution function of a random variable X.
Solution: possible values of random variable X= {10, 11, 12, 13, 14, 15}
X=x 10 11 12 13 14 15
f 2 4 8 6 6 2
( ) ( ) 0.0714 ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

X=x 10 11 12 13 14 15
Pmf= ( ) 0.0714 0.1429 0.2857 0.2143 0.2143 0.0714
CDF= ( ) 0.07143 0.14286 0.28572 0.21428 0.21428 0.07143

Page 60 of 92
Theorem1: Let X be a discrete random variable whose probability function is p(x). Suppose that
a discrete random variable U is defined in terms of X by ( ), where to each value of X
there corresponds one and only one value of U and conversely, so that ( ). Then the
probability function for U is given by
( ) , ( )-
Example2: let ( ) ; then find probability mass function and

distribution function of y, for .


Solution: let probability mass function of y is g(y);
( ) , ( )-; Where ( )
( ⁄ )
( ) ; for y=0,3,6
( ⁄ )
The distribution function of y is: ( ) ∑ ( ) ( ) ∑

6.3 Functions of continuous random variables and their distributions


For a continuous random variable X with the interval (c, d)for any c ≤ d we have a probability
density function ( ) such that
1. ( ) (f is nonnegative)
2. ∫ ( )

3. ( ) ∫ ( )
Example3: Suppose X has pdf f(x) = 3 on [0, 1/3] (this means f(x) = 0 outside of [0, 1/3]).
Compute a) P (.1 ≤ X ≤ .2), b) P (.1 ≤ X ≤ 1) and c) the cumulative distribution function of x.

a) ( ) ∫ ∫ ( ) ( )

b) ( ) ( ) ∫ ∫ ( )

( )
c) For x in [0, 1/3] we have ( ) ∫ ( ) ∫
Since f(x) is 0 outside of [0,1/3] we know F (x) = P (X ≤ x) = 0 for x < 0 and F (x)
= 1 for x > 1/3. Putting this all together we have

Page 61 of 92
( ) {

Example4: Let X be a random variable with range [0, 1] and pdf ( ) . Find
a). ( ) ) ( ) ) ( ).
Solution:
a) To find the probability of ( ); we have to find the value of C?
If f(x) is pdf the probability at the given interval should be equal to one;

( ) ∫ ∫ ( ) , ( )

( ) ∫ , ∫ ( ⁄ )

b) ( ) ∫ ( ) ∫ ( ) ∫

c) . / ∫ ( ) ∫ ( ⁄ ) ( ⁄ )

Example5. Let X be the duration of a telephone call in minutes and suppose X has pdf :

( ) {

a) Which value(s) of c make(s) f(x) a valid pdf? Answer. c =1/10.

b) Find the probability that the call lasts less than 5 minutes. Answer. P(X < 5)=1−e−1/2 ≈0.393.

a) ( ) ∫ ∫ ( )

( ) ;
( )

b) ( ) ∫ ∫ ( )

Page 62 of 92
Exercise1: The resistance of an electrical component follows a p. d. f. given by

( ) {

What is the probability that the resistance is less than 2?

Exercise2: The weekly demand for petrol at a local garage (in thousands of liters) is given by the
p. d. f.

( )( )
( ) {

The petrol tanks are filled to capacity of 940 liters every Monday. What is the probability the
garage runs out of petrol in a particular week?

Theorem 2: Let X be a continuous random variable with probability density f(x). Let us
define ( ), where ( ) as in Theorem 1. Then the probability density of U is given
by g(u) where
( )| | ( )| |

Or ( ) ( )| | , ( )-| ( )|

Example 6: The probability function of a random variable X is given by


( ) {

Find the probability density function and distribution function for the random variable

( ).

Solution: ( ) ( ) ( ) , ( )-| ( )| ( )

( ) ( )
The pdf of u is; {

The distribution function of u is:

( ) ∫ ( ) ∫ ( ) , -∫

Page 63 of 92
= [, - , ( ) ( ) -]∫ ( ) , -

Exercise3. Let X be a random variable with pdf:

( ) {

What is the pdf and cdf of ?

Exercise4. Let X be a random variable with pdf:

( ) { ( )

What is the pdf and cdf of ( ) ?

Page 64 of 92
Chapter Seven
7 Two dimensional Random Variables
7.1 Two dimensional random variables
In science and real life, we are often interested in two or more random variables at the same time.
For example height and weight of building materials, type of bulb and its life, the amount of
water and the amount used for power generation. Such types of variables are known as two
dimensional random variables and there may have also common distribution function.

7.2 Joint distributions for discrete and continuous random variables


The above chapter ideas are easily generalized to two or more random variables. We consider the
typical case of two random variables that are either both discrete or both continuous. In case
where one variable is discrete and the other continuous, appropriate modifications are easily
made. Generalizations to more than two variables can also be made.
Discrete case: If X and Y are two discrete random variables, we define the joint probability
function of X and Y by ( ) ( ).
Where, 1. ( )
2.∑ ∑ ( ) i.e., the sum over all values of x and y is 1.

Suppose that X can assume any one of m values and Y can assume any one of
n values . Then the probability of the event that and is given by
( ) ( )
A joint probability function for X and Y can be represented by a joint probability table as in
below Table.
X Y Totals
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )

( ) ( ) ( ) ( )
Totals ( ) ( ) ( ) 1 Grand total
This can be written:

∑∑ ( )

Page 65 of 92
This is simply the statement that the total probability of all entries is 1. The grand total of 1 is
indicated in the lower right-hand corner of the table.
The joint distribution function or cdf of X and Y is defined by

( ) ( ) ∑∑ ( )

In above Table, P(x, y) is the sum of all entries for which and .
Example1: The joint probability function of two discrete random variables X and Y is given by
( ) ( ), where x and y can assume all integers such that ,
and ( ) otherwise.
(a) Find the value of the constant (b) Find ( ) (c) Find ( ).

Solution: a). The grand total must be equal to 1,

b). ( ) . /

c). ( ) ( ) ( ) ( ) (
) ( ) ( )

( )

Continuous case
The case where both variables are continuous is obtained easily by analogy with the discrete case
on replacing sums by integrals. Thus the joint probability function for the random variables X
and Y (or, as it is more commonly called, the joint density function of X and Y) is defined by
1. ( )
2. ∫ ∫ ( )

Page 66 of 92
The probability that X lies between a and b while Y lies between c and d is given mathematically
by

( ) ∫ ∫ ( )

The joint distribution function of X and Y in this case is defined by

( ) ( ) ∫ ∫ ( )

( )
( )

i.e., the density function is obtained by differentiating the distribution function with respect to x
and y.
Example2: The joint density function of two continuous random variables X and Y is

Page 67 of 92
d. Find the joint distribution function of x and y.(exercise)

7.3 Marginal and conditional distributions


 For discrete random variable x and y:
The probability that is obtained by adding all entries in the row corresponding to and is
given by

( ) ( ) ∑ ( )

For j = 1, 2, . . . , n, these are indicated by the entry totals in the extreme right-hand column or
margin of above Table. Similarly the probability that is obtained by adding all entries in
the column corresponding to and is given by

( ) ( ) ∑ ( )

For k = 1, 2, . . . , m, these are indicated by the entry totals in the bottom row or margin of
above Table. Because the probabilities ( ) and ( ) are obtained from the margins
of the table, we often refer to ( ) and ( ) as the marginal probability functions of X and Y,
respectively. It should also be noted that:
∑ ( ) , ∑ ( )

Page 68 of 92
Example3: ( ) ( ), where x and y can assume all integers such that

, and ( ) otherwise. Find the marginal distribution functions (a) of X and


(b) of Y.

( ) ∑ ( ) ∑ ( )

( ) ∑ ( ) ∑ ( )

( ) ∑ ( ) ∑

( ) ∑ ( ) ∑

 For continuous random variables x and y with the following joint distribution function,

( ) ( ) ∫ ∫ ( )

The marginal distribution is given as:

( ) ( ) ∫ ∫ ( )

( ) ( ) ∫ ∫ ( )

We call ( ) and ( ) the marginal distribution functions, or simply the distribution functions,
of X and Y, respectively. The derivatives of ( ) and ( ) with respect to x and y are then
called the marginal density functions, or simply the density functions, of X and Y and are given
by

( ) ∫ ( ) ( ) ∫ ( )

Example4: Find the marginal distribution functions (a) of X and (b) of Y for the following p.d.f.

( ) {

Page 69 of 92
(a) The marginal distribution function for X if is

Conditional distribution
Let X and Y be discrete random variables on the same probability space
The conditional pmf of X given Y = y is defined as:
( )
⁄ ( ⁄ ) ( )
( )
The conditional pmf of Y given X = x is defined as:
( )
⁄ ( ⁄ ) ( )
( )

Page 70 of 92
Example5: Let ( ) ( ), where x and y can assume all integers such that

, and ( ) otherwise. Find the conditional distribution functions (a) of X


given that Y and (b) of Y given that X.

( ) ∑ ( ) ∑ ( )

( ) ∑ ( ) ∑ ( )

( ) ( ) ( )
⁄ ( ⁄ )
( )

( ) ∑ ∑ ( )
⁄ ( ⁄ )
( ) ∑

( ) ( ) ( )
⁄ ( ⁄ )
( )

( ) ∑ ∑ ( )
⁄ ( ⁄ )
( ) ∑

X and Y are jointly continuous random variables if their joint cdf is continuous in both x and y
We then define the conditional pdf in the usual way as
( )
| ( | ) ( )
( )
The conditional pdf of Y given X = x is defined as:
( )
| ( | ) ( )
( )
Example6: Find the conditional distribution functions (a) of X and (b) of Y for the above
example. i.e | ( | ) | ( | )

( ) ( )
| ( | ) | ( | )
( ) ( )

Page 71 of 92
( )
( ) ∫∫ ( ) | ( | )
( )

( )
| ( | ) ( )

Example7: If X and Y have the joint density function

( ) {

Find (a) | ( | )

For , ( ) ∫ ( )

( )
And | ( | ) {
( )

For other values of | ( | ) is not defined.

7.4 Independent random variables


Suppose that X and Y are discrete random variables. If the events X = x and Y = y are
independent events for all x and y, then we say that X and Y are independent random variables.
In such case,
( ) ( ) ( )
Or equivalently ( ) ( ) ( )
Conversely, if for all x and y the joint probability function f(x, y) can be expressed as the product
of a function of x alone and a function of y alone (which are then the marginal probability
functions of X and Y), X and Y are independent. If, however, f(x, y) cannot be so expressed,
( ) ( ) ( )
Or equivalently ( ) ( ) ( )
, then X and Y are dependent.

Page 72 of 92
Example8: Let ( ) ( ), where x and y can assume all integers such that

, and ( ) otherwise. Determine whether the random variables x and y are


independent or not.If X and Y are independent, the joint distribution function of X and Y is equal
to the product of the distribution function of X and Y
( ) ( ) ( )
Let’s see the example

( ) ∑∑ ( ) ( ) ( ) ∑ ∑ ∑∑ ( ) ( )

∑∑ ( ) ∑∑ ( ) ( )

The random variables X and Y are not independent.


 If X and Y are continuous random variables, we say that they are independent random
variables if the events and are independent events for all x and y. In such case
we can write ( ) ( ) ( ) where ( ) and ( ) are the (marginal) distribution
functions of X and Y, respectively. Conversely, X and Y are independent random variables if
for all x and y, their joint distribution function F(x,y) can be expressed as a product of a
function of x alone and a function of y alone (which are the marginal distributions of X and
Y, respectively). If, however, F(x, y) cannot be so expressed, then X and Y are dependent.
 For continuous independent random variables, it is also true that the joint density function
f(x, y) is the product of a function of x alone, ( ), and a function of y alone, ( ), and
these are the (marginal) density functions of X and Y, respectively.
( ) ( ) ( )
Example9: Show that x and y in the above example are independent.

( ) {

( ) ∫ ( ) ∫ ( ) ( ) ( )

Exercise1: Determine whether the random variables x and y are independent or not in the
following given.
( ) ( ) ( ) ( )

Page 73 of 92
7.5 Distributions of functions of two dimensional random variables
Theorem 3 Let X and Y be discrete random variables having joint probability function f(x, y).
Suppose that two discrete random variables U and V are defined in terms of X and Y by
( ), ( ), where to each pair of values of XandYthere corresponds one and
only one pair of values of U and V and conversely, so that ( ), ( ). Then
the joint probability function of U and V is given by
( ) , ( ) ( )-
Example11: Let ( ) ( ), where x and y can assume all integers such that

, and ( ) otherwise. , find the joint density function of

, .

Solution: ( ) ( ), ( ) ( )

( ) [ ( ) ( )] ( )

where u and v can assume all integers such that * +,


* +.
Theorem4 Let X and Y be continuous random variables having joint density function f(x, y). Let
us define ( ), ( ) where ( ), ( ) as in Theorem3. Then
the joint density function of U and V is given by g(u, v) where
( )| | ( )| |
( )
Or ( )| | ( )| | , ( ) ( )-| |
( )

In the above equation the Jacobian determinant, or briefly Jacobian, is given by

( )
| |
( )

Example12: If the random variables X and


Y have joint density function

, find the joint density function of , .Consider , . Dividing


these equations, we obtain ⁄ ⁄ so that ⁄ . This leads to the simultaneous solution
, . The image of , in the uv-plane is given by

Page 74 of 92
The Jacobian is given by

Thus the joint density function of U and V is, by Theorem 4

Page 75 of 92
Chapter Eight
8 Expectation
8.1 Expectation of a random variable
 For a discrete random variable X with p.m.f. ( )

( ) ∑ ( )

Example1: The following are number of messages sent per hour over a computer network on 28
hours.
10, 14, 10, 11, 13, 11, 14, 13, 12, 14, 12, 15, 13, 12,

12, 11, 11, 12, 12, 13, 13, 13, 14, 14, 12, 14, 12, 15
Let X is a random variable that the number of messages sent per hour over a computer network
on 28 hours. Find expected value of a random variable X.
Solution: possible values of random variable X= {10, 11, 12, 13, 14, 15}
X=x 10 11 12 13 14 15
f 2 4 8 6 6 2

X=x 10 11 12 13 14 15
( ) 0.0714 0.1429 0.2857 0.2143 0.2143 0.0714
( ) 0.07143 0.14286 0.28572 0.21428 0.21428 0.07143

( ) ∑ ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( )
Example2: A lot containing 7 components is sampled by a quality inspector; the lot contains 4
good components and 3 defective components. A sample of 3 is taken by the inspector. Find the
expected value of the number of good components in this sample. Solution: Let X represent the
number of good components in the sample. The probability distribution of X is
( )( )
( )
( )

Page 76 of 92
Simple calculation yield ( ) , ( ) , ( ) , and
( ) . Therefore,

( ) ( )( ) ( )( ) ( )( ) ( )( )

Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good
components and 3 defective components, it will contain, on average, 1.7 good components.
 For X a continuous random variable X with p.d.f. f(x)

Example3: Suppose X has p.d.f.

Find the expected value of X.


Solution: ( ) ∫ ( ) , ( ) ∫ ( )( )

( ) ∫ ( ) ∫ ( )( ) , ( ) ( )( )∫ ( )( )∫

( ) . / . /, ( )

Exercise1: Let X be the random variable that denotes the life in hours of a certain electronic
device. The probability density function is

( ) {

Find the expected life of this type of device and interpret it.
Example4. The density function of X is

Solution
( ) ∫ ( ) , ( ) ∫ ( )

( ) ∫, ( )

Page 77 of 92
8.2 Expectation of a function of a random variable
Now let us consider a new random variable g(x), which depends on X; that is, each value of g(x)
is determined by the value of X.

Let X be a discrete random variable with probability mass function ( ). The expected
value of the random variable g(x) is

, ( )- ∑ ( ) ( )

Example5: Suppose that the number of cars X that pass through a car wash between 4:00 P.M.
and 5:00 P.M. on any sunny Friday has the following probability distribution:
4 5 6 7 8 9
( ) 1/12 1/12 1/4 ¼ 1/6 1/6

Let g(X)=2X−1, represent the amount of money, in dollars, paid to the attendant by the manager.
Find the attendant’s expected earnings for this particular time period.
Solution: 2X−1=7, 9, 11, 13, 15, 17

, ( )- ( ) ∑( ) ( )

( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( )

For X a continuous random variable X with p.d.f. f(x) and for any real-valued function g(x)

Example6: Let X be a random variable with density function

( ) {

Find the expected value of ( ) .

( )
( ) ∫ ∫( )

Page 78 of 92
We shall now extend our concept of mathematical expectation to the case of two random
variables X and Y with joint probability density function ( )

8.3 Variance of a random variable


The mean, or expected value, of a random variable X is of special importance in statistics
because it describes where the probability distribution is centered. By itself, however, the mean
does not give an adequate description of the shape of the distribution. We also need to
characterize the variability in the distribution.

Let X be a discrete random variable with probability mass function ( ) and mean ̅ . The
variance of X is denoted by and it is defined by:

∑( ̅) ( )

If X is a continuous random variable with probability density function ( ) and mean ̅ . The
variance of X is defined by:

∫( ̅) ( )

The quantity x− ̅ is called the deviation of an observation from its mean. Since the deviations are
squared and then averaged, a will be much smaller for a set of x values that are close to ̅ than
it will be for a set of values that vary considerably from ̅ .

Example7: Let the random variable X represent the number of automobiles that are used for
official business purposes on any given workday. The probability distribution for company A is
1 2 3
( ) 0.3 0.4 0.3
And that for company B is
0 1 2 3 4
( ) 0.2 0.1 0.3 0.3 0.1
Show that the variance of the probability distribution for company B is greater than that for
company A.
Solution: For a company A, we find that, ( ) ( )( ) ( )( ) ( )( )
And then ∑ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

Page 79 of 92
For a company B,
We find that, ( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) And then
∑ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

Clearly, the variance of the number of automobiles that are used for official business purposes is
greater for company B than for company A.
An alternative and preferred formula for finding , which often simplifies the calculations, is
stated as follows:
The variance of a random variable X is: ( ) ̅
Example8: Let the random variable X represent the number of defective parts for a machine
when 3 parts are sampled from a production line and tested. The following is the probability
distribution of X.
0 1 2 3
( ) 0.51 0.38 0.10 0.01
Calculate the variance of X.
We first find that, ( ) ( )( ) ( )( ) ( )( ) ( )( )
And, ( ) ( )( ) ( )( ) ( )( ) ( )( )
Therefore, ( ) ̅ ( ) ,

Example9: The weekly demand for a drinking-water product, in thousands of liters, from a local
chain of efficiency stores is a continuous random variable X having the probability density
( )
( ) {

Find the mean, variance and standard deviation of X.


Solution: Calculating ( ) and ( ), we have

̅ ( ) ∫ ( ) ( ) ∫ ( )

Therefore, . / . Standard deviation √ √

At this point, the variance or standard deviation has meaning only when we compare two or
more distributions that have the same units of measurement. Therefore, we could compare the
variances of the distributions of contents, measured in liters, of bottles of orange juice from two

Page 80 of 92
companies, and the larger value would indicate the company whose product was more variable
or less uniform. It would not be meaningful to compare the variance of a distribution of heights
to the variance of a distribution of speed.

8.4 Covariance and Correlation Coefficient


Let X and Y be the discrete random variables with joint probability mass function ( ) the
covariance of X and Y is denoted by or Cov(x,y) and given by:

,( ̅ )( ̅)- ∑ ∑( ̅ )( ̅) ( )

For continuous random variables X and Y with joint probability density function ( ) the
covariance of X and Y is given by:

,( ̅ )( ̅)- ∫ ∫( ̅ )( ̅) ( )

The covariance between two random variables is a measure of the nature of the association
between the two. If large values of X often result in large values of Y or small values of X result
in small values of Y, positive ̅ will often result in positive ̅ and negative ̅ will
often result in negative ̅. Thus, the product ( ̅ )( ̅) will tend to be positive. On the
other hand, if large X values often result in small Y values, the product ( ̅ )( ̅) will tend
to be negative. The sign of the covariance indicates whether the relationship between two
dependent random variables is positive or negative. When X and Y are statistically independent,
it can be shown that the covariance is zero. The converse, however, is not generally true. Two
variables may have zero covariance and still not be statistically independent. Note that the
covariance only describes the linear relationship between two random variables. Therefore, if a
covariance between X and Y is zero, X and Y may have a nonlinear relationship, which means
that they are not necessarily independent.
In short the covariance of two random variables X and Y with means ̅ and ̅, respectively, is
given by:
( ) ̅̅
Example10: Let X be the number of blue refills and Y be the number of red refills. Two refills
for a ballpoint pen are selected at random from a certain box, and the following is the joint
probability distribution:

Page 81 of 92
( ) 0 1 2 ( )
0

1 0

2 0 0

( ) 1

Find the covariance of X and Y.

( ) ( ) ( ) ( ) ∑ ∑( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( )

( )( ) ( )( )

( )

( ) ∑( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )( ) ( )( )

( ) ∑( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )( ) ( )( )

( ) ( ) ( ) ( )( )

Page 82 of 92
Although the covariance between two random variables does provide information regarding the
nature of the relationship, the magnitude of does not indicate anything regarding the strength
of the relationship, since is not scale-free. Its magnitude will depend on the units used to
measure both X and Y. There is a scale-free version of the covariance called the correlation
coefficient that is used widely in statistics.

Definition: Let X and Y be random variables with covariance and standard deviations and
, respectively. The correlation coefficient of X and Y is denoted by defined by:

It should be clear to the reader that is free of the units of X and Y. The correlation coefficient
is always between . It assumes a value of zero when . Where there is an
exact linear dependency, say Y = a + bX,

Example11: Find the correlation coefficient between X and Y in the above Example.
Solution: Since

( ) ( )( ) ( )( ) ( )( )

And

( ) ( )( ) ( )( ) ( )( )

We obtain . / and . /

Therefore, the correlation coefficient between X and Y is




√. ⁄ /( ⁄ )

Example 12: Find the correlation coefficient of X and Y in Example 4.14. Solution: Because

( ) ( ) ( ) ( )

( ) ∫ ( ) ( ) ∫ ( )

Page 83 of 92
( ) ∫∫ ( )( )

( ) ∫ ( ) ( ) ∫ ( )

We conclude that . / and . /

Therefore, the correlation coefficient between X and Y is




√. ⁄ /. ⁄ /

If the covariance between X and Y is positive, negative, or zero, the correlation between X and Y
is positive, negative, or zero, respectively.

The correlation just scales the covariance by the standard deviation of each variable.
Consequently, the correlation is a dimensionless quantity that can be used to compare the linear
relationships between pairs of variables in different units.

If the points in the joint probability distribution of X and Y that receive positive probability tend
to fall along a line of positive (or negative) slope, is near +1 (or -1). If equals +1 or -1, it
can be shown that the points in the joint probability distribution that receive positive probability
fall exactly along a straight line. Two random variables with nonzero correlation are said to be
correlated. Similar to covariance, the correlation is a measure of the linear relationship between
random variables.

Exercise2: Suppose that the random variable X has the following distribution: P(X = 1) = 0.2,
P(X= 2) = 0.6, P(X = 3) = 0.2. Let Y = 2X+5. That is, P(Y = 7) = 0.2, P(Y = 9) = 0.6, P(Y = 11)
= 0.2. Determine the correlation between X and Y.
Because X and Y are linearly related, . This can be verified by direct calculations: Try it.

Page 84 of 92
Chapter Nine
9 Common Probability distributions
9.1 Common Discrete Distributions and their Properties
9.1.1 Binomial distribution
A binomial experiment is a probability experiment that satisfies the following four
requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.

Examples of binomial experiments

 Tossing a coin 20 times to see how many tails occur.


 Asking 200 people if they watch BBC news.
 Registering a newly produced product as defective or non defective.
 Asking 100 people if they favor the ruling party.
 Rolling a die to see if a 5 appears.

Definition: The outcomes of the binomial experiment and the corresponding


probabilities of these outcomes are called Binomial Distribution.
Let P  the probability of success
q  1  p  the probability of failureon any given trial
Then the probability of getting x successes in n trials becomes:
n
P( X  x)    p x q n  x , x  0,1,2,....,n
 x
And this is some times written as:
X ~ Bin (n, p)
When using the binomial formula to solve problems, we have to identify three things:
 The number of trials ( n )
 The probability of a success on any one trial ( p ) and
 The number of successes desired ( X ).
Examples:
1. What is the probability of getting three heads by tossing a fair con four times?

Page 85 of 92
Solution:
Let X be the number of heads in tossing a fair coin four times
X ~ Bin (n  4, p  0.50)
n
 P( X  x)    p x q n  x , x  0,1,2,3,4
 x
 4
  0.5 x 0.54  x
 x
 4
  0.54
 x
 4
 P( X  3)   0.54  0.25
 3
2. Suppose that an examination consists of six true and false questions, and assume that a
student has no knowledge of the subject matter. The probability that the student will
guess the correct answer to the first question is 30%. Likewise, the probability of
guessing each of the remaining questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution
Let X = the number of correct answers that the student gets.
X ~ Bin (n  6, p  0.30)
a) P( X  3)  ?
 n  x n x
 P( X  x)  
 xp q , x  0,1,2,..6
 
6 6 x
 x
x
0.3 0.7
 

 P ( X  3)  P ( X  4)  P ( X  5)  P ( X  6)
 0.060  0.010  0.001
 0.071

Page 86 of 92
Thus, we may conclude that if 30% of the exam questions are answered by guessing, the
probability is 0.071 (or 7.1%) that more than four of the questions are answered correctly by
the student.
b) P( X  2)  ?

P( X  2)  P( X  2)  P( X  3)  P( X  4)  P( X  5)  P( X  6)
 0.324  0.185  0.060  0.010  0.001
 0.58

c) P( X  3)  ?
P( X  3)  P( X  0)  P( X  1)  P( X  2)  P( X  3)
 0.118  0.303  0.324  0.185
 0.93
d) P( X  5)  ?
P( X  5)  1  P( X  5)
 1  {P( X  5)  P( X  6)}
 1  (0.010  0.001)
 0.989
Exercises:
1. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If eight of these
TVs are randomly selected from across the country and tested, what is the probability that
exactly three of them are defective? Assume that each TV is made independently of the
others.
2. An allergist claims that 45% of the patients she tests are allergic to some type of weed. What
is the probability that
a) Exactly 3 of her next 4 patients are allergic to weeds?
b) None of her next 4 patients are allergic to weeds?
3. Explain why the following experiments are not Binomial
 Rolling a die until a 6 appears.
 Asking 20 people how old they are.
 Drawing 5 cards from a deck for a poker hand.

Page 87 of 92
Remark: If X is a binomial random variable with parameters n and p then

E ( X )  np , Var ( X )  npq
9.1.2 Poisson distribution
Experiments yielding numerical values of a random variable X, the number of outcomes
occurring during a given time interval or in a specified region, are called Poisson experiments.
The given time interval may be of any length, such as a minute, a day, a week, a month, or even
a year. For example, a Poisson experiment can generate observations for the random variable X
representing the number of telephone calls received per hour by an office, the number of days
school is closed due to snow during the winter, or the number of games postponed due to rain
during a baseball season. The specified region could be a line segment, an area, a volume, or
perhaps a piece of material. In such instances, X might represent the number of field mice per
acre, the number of bacteria in a given culture, or the number of typing errors per page. A
Poisson experiment is derived from the Poisson process and possesses the following properties.

Properties of the Poisson Process


1. The number of outcomes occurring in one time interval or specified region of space is
independent of the number that occurs in any other disjoint time interval or region. In this sense
we say that the Poisson process has no memory.

2. The probability that a single outcome will occur during a very short time interval or in a small
region is proportional to the length of the time interval or the size of the region and does not
depend on the number of outcomes occurring outside this time interval or region.

3. The probability that more than one outcome will occur in such a short time interval or fall in
such a small region is negligible. The number X of outcomes occurring during a Poisson
experiment is called a Poisson random variable, and its probability distribution is called the
Poisson distribution. The mean number of outcomes is computed from μ = λt, where t is the
specific “time,” “distance,” “area,” or “volume” of interest. Since the probabilities depend on λ,
the rate of occurrence of outcomes, we shall denote them by p(x;λt). The derivation of the
formula for p(x;λt), based on the three properties of a Poisson process listed above, is beyond the
scope of this book. The following formula is used for computing Poisson probabilities.

Page 88 of 92
Poisson distribution
The probability distribution of the Poisson random variable X, representing the number of
outcomes occurring in a given time interval or specified region denoted by t, is

Example 5.18: Ten is the average number of oil tankers arriving each day at a certain port. The
facilities at the port can handle at most 15 tankers per day. What is the probability that on a given
day tankers have to be turned away? Solution: Let X be the number of tankers arriving each day.
Then, using Table A.2, we have

Like the binomial distribution, the Poisson distribution is used for quality control, quality
assurance, and acceptance sampling. In addition, certain important continuous distributions used
in reliability theory and queuing theory depend on the Poisson process. Some of these
distributions are discussed and developed in Chapter 6. The following theorem concerning the
Poisson random variable is given in Appendix A.25.

9.1.3 Geometric distribution


9.2 Common Continuous Distributions and their Properties
9.2.1 Uniform distribution
One of the simplest continuous distributions in all of statistics is the continuous uniform
distribution. This distribution is characterized by a density function that is “flat,” and thus the
probability is uniform in a closed interval, say [A, B]. Although applications of the continuous
uniform distribution are not as abundant as those for other distributions discussed in this chapter,
it is appropriate for the novice to begin this introduction to continuous distributions with the
uniform distribution.

Page 89 of 92
The density function of the continuous uniform random variable X on the interval [A, B] is

The density function forms a rectangle with base B−A and constant height 1 B−A
. As a result, the uniform distribution is often called the rectangular distribution. Note, however,
that the interval may not always be closed: [A,B]. It can be (A,B) as well. The density function
for a uniform random variable on the interval [1, 3] is shown in Figure 6.1. Probabilities are
simple to calculate for the uniform distribution because of the simple nature of the density
function. However, note that the application of this distribution is based on the assumption that
the probability of falling in an interval of fixed length within [A, B] is constant.
Example 6.1: Suppose that a large conference room at a certain company can be reserved for no
more than 4 hours. Both long and short conferences occur quite often. In fact, it can be assumed
that the length X of a conference has a uniform distribution on the interval [0, 4].
(a) What is the probability density function?
(b) What is the probability that any given conference lasts at least 3 hours?
Solution: (a) The appropriate density function for the uniformly distributed random variable X in
this situation is

9.2.2 Normal distribution


The most important continuous probability distribution in the entire field of statistics is the
normal distribution. Its graph, called the normal curve, is the bell-shaped curve of Figure 6.2,
which approximately describes many phenomena that occur in nature, industry, and research. For
example, physical measurements in areas such as meteorological experiments, rainfall studies,
and measurements of manufactured parts are often more than adequately explained with a
normal distribution. In addition, errors in scientific measurements are extremely well

Page 90 of 92
approximated by a normal distribution. In 1733, Abraham DeMoivre developed the
mathematical equation of the normal curve. It provided a basis from which much of the theory of
inductive statistics is founded. The normal distribution is often referred to as the Gaussian
distribution, in honor of Karl Friedrich Gauss

rements of the same quantity. A continuous random variable X having the bell-shaped
distribution of Figure 6.2 is called a normal random variable. The mathematical equation for the
probability distribution of the normal variable depends on the two parameters μ and σ, its mean
and standard deviation, respectively. Hence, we denote the values of the density of X by
n(x;μ,σ).

We list the following properties of the normal curve:


1. The mode, which is the point on the horizontal axis where the curve is a maximum, occurs at x
= μ.
2. The curve is symmetric about a vertical axis through the mean μ.
3. The curve has its points of inflection at x = μ±σ; it is concave downward if μ−σ<X<μ+ σ and
is concave upward otherwise.
4. The normal curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the mean.
5. The total area under the curve and above the horizontal axis is equal to 1.

Page 91 of 92
9.2.3 Exponential distribution

Page 92 of 92

You might also like