UDSM Statistics and Probability For Non-Majors

1
CHAPTER ONE
PRESENTATION OF STATISTICAL DATA
1.1 Introduction
Statistics is a Science of collecting, organizing, summarizing, presenting and analyzing of

data as well as drawing valid conclusion and making reasonable decisions on the basis of
that analysis. Statistical data are facts or information that are useful in statistical analyses.
Statistical investigation and analysis of data fall into two broad categories; these are
descriptive statistics and inductive/inferential statistics.
Descriptive statistics deals with processing data without attempting to draw any
inferences from them. It refers to the presentation of data in the form of tables,
charts/graphs and gives some characteristics of data such as averages and dispersion.
Inductive statistics is a scientific discipline concerned with developing and using
mathematical tools to make forecasts and inferences. The term inference means the act or
process of deriving a conclusion based solely on what an individual knows.
This chapter introduces presentation of data, the techniques that are commonly used in
statistics. Inductive statistics will be discussed later after the review of probability and
probability distributions.
1.2 Frequency distribution

A frequency distribution is the arrangement of data in tabular form according to
frequencies. Data in frequency distributions may be ungrouped or grouped. The
following subsections discuss the presentation of each type of data.
1.2.1. Ungrouped data

In this case each individual data is assigned its own frequency when formulating
frequency distributions. This technique is suitable when there are few data points that
repeat themselves several times.
2
Example 1.1 The following data were obtained when a die was rolled 30 times.
1 2 4 2 2 6 3 5 6 3 3 1 3 1 3
4 5 3 5 3 5 1 6 3 1 2 4 2 4 4
Use the above data to construct a frequency table
Solution
The frequency table is constructed by tallying repeated observations/numbers, in order to
know a number of times a certain observation appears in the data set. This exercise is
shown in the following table.
Table 1.1 Frequency table of ungrouped data
Number Tally Freq. Relative Percentage

Frequency Frequency
1 5 5/30 = 0.1667 16.67
2 5 5/30 = 0.1667 16.67
3 8 8/30 = 0.2666 26.66
4 5 5/30 = 0.1667 16.67
5 4 4/30 = 0.1333 13.33
6 3 3/30 = 0.1000 10.00
TOTAL 30 1.0 100.00
1.2.2 Grouped data with classes of equal width
When there is a huge mass of data with many of the values being distinct, it is convenient
to form a grouped frequency distribution rather than ungrouped. In this case various
values are grouped in a class and they are tallied to obtain a class frequency. The grouped
frequency distributions of equal class size are reasonable only when the data do not
contain extreme values (values that are very far from others in the same data set).
3
Consider the following two sets of data

Set 1: 6, 8, 5, 8, 9, 4, 7 and 9. There are no extreme values in this set.
Set 2: 12, 15, 9, 16, 90 and 600. In this case 90 and 600 are the extreme values.
There are no specific rules in formulating such kind of frequency distributions, it depends
on the number of classes do you want. It is recommended to have grouped frequency
tables with number of classes between 6 and 12 inclusive, depending on the size of the
data set.
The following steps may however be helpful in formulating such a distribution.
1. Identify the smallest and largest values of the data set and hence compute the range.
2. Decide on the number of classes do you want in the distribution and hence compute
the class size h using the relation
range
h
Number of classes
3. Write your first class of size h with the lower limit (first value) 2 or 3 units below the
minimum value. List other classes each of size h make sure that all data are included
in the distribution. To avoid confusion during tallying, the disconnected class are
preferable.
4. Tally the frequency of each class and hence obtain a grouped frequency distribution.
Example 1.2 The following data give the amount (in dollars) spent on groceries by a
family during the past forty weeks
32 22 19 18 43 42 40 43 18 21
31 26 22 25 47 40 26 32 22 34
28 35 47 26 35 38 35 28 19 38
35 38 36 25 22 45 48 26 34 41
Construct a frequency distribution using seven classes
Solution
The minimum value is 18, the maximum value is 48.
Then, the range = 48 – 18 = 30.
Number of classes is 7, so the class size h = 30/7 = 4.29. So we try h = 5.
4
The classes are 15 – 19, 20 – 24, 25 – 29, 30 – 34, 35 – 39, 40 – 44, 45 – 49 which surely
include all values from 18 to 48.
The frequency distribution for the grocery expenditure is formulated below:
Table 1.2 Frequency table of grouped data
Class Tally Frequenc

15 – 19 4
20 – 24 5
25 – 29 8
30 – 34 5
35 – 39 8
40 – 44 6
45 – 49 4
TOTAL 40
1.2.3 Classes of unequal width/size
If data consists of some extreme values, the previous techniques can not be generally
applicable. In this case only values that are closer from each other are considered first and
the extreme values might be grouped together into one class.
Example 1.3 Prices of thirty stocks (in thousand of Shillings) on a given day were recorded
as follows
11.2 8.9 20.0 9.5 35.0 41.0 14.6 100.0 9.0 10.5
79.0 32.5 46.7 22.9 13.5 17.3 41.8 30.4 93.0 33.7
14.4 20.9 34.5 10.8 45.7 104.0 42.6 10.1 41.0 53.8
Formulate a grouped frequency distribution of five classes only.

5
Solution
Although data range from 8.9 to 104 we find that most of the values concentrated between
8 and 55, and only few observations fall between 55 and 104.
Since we need only five classes, we obtain four classes from nearly closed values and one
class for the extreme values. For the first four classes we proceed as follows;
Minimum value = 8.9, maximum value = 53.8. So range = 44.9
Number of classes is 4. Hence h = 44.9/4 = 11.225. So try h = 12, hence the first four
classes are 8 – 19, 20 – 31, 32 – 43 and 44 – 55, and the fifth class is 56 – 104
The frequency distribution table is given below;
Table 1.3 Frequency table of grouped data with unequal width
Class Tally Frequency
8 – 19 11
20 – 31 4
32 – 43 8
44 – 55 3
56 – 104 4
TOTAL 30
1.3 Class Limits, Class Boundaries, Class Marks and Class Intervals
The above concepts are defined below;

 Class limits are the lower and upper values of a class. Thus each class has lower and
upper limits.
 A lower class boundary of a class is the middle value between an upper limit of the
preceding class and the lower limit of the current class. Similarly, an upper
boundary is the middle value between an upper limit of the current class and the
lower limit of the next class.
6
 A class mark is the middle value between lower and upper class boundaries or limits
 A class interval/size/width/length is the difference between upper boundary and lower
boundary of a class.
Example 1.4 Find the class limits, boundaries, class marks and class width of the following
classes 15 – 19 and 20 – 29.
Solution
The required statistics are summarized in the following table.
Limits Boundaries Class Class

Class Lower Upper Lower Upper mark size
15 – 19 15 19 14.5 19.5 17 5
20 – 29 20 29 19.5 29.5 24.5 10
1.4 Histograms
A histogram is a graphic representation of frequency distribution, with vertical rectangles

erected on the horizontal axis. The rectangles are joined through the class boundaries and
the frequencies give their height on the vertical axis.
Example 1.5 Draw the histogram for grocery problem given in Example 1.2.
Solution
In this case we first create frequency table with class boundaries as shown below
Table 1.5 Class boundaries and frequency
Class Boundaries frequency

15 – 19 14.5 – 19.5 4
20 – 24 19.5 – 24.5 5
25 – 29 24.5 – 29.5 8
30 – 34 29.5 – 34.5 5
35 – 39 34.5 – 39.5 8
40 – 44 39.5 – 44.5 6
45 – 49 44.5 – 49.5 4
7
The histogram is given below;
The Histogram for Amount Spent in Grocery
9
8
7
Frequency
6
5
4
3
2
1
0
14.5 – 19.5 19.5 – 24.5 24.5 – 29.5 29.5 – 34.5 34.5 – 39.5 39.5 – 44.5 44.5 – 49.5
1.5 Frequency Polygon
A frequency polygon is a polygon whose vertices are the frequencies at the class marks
of the classes. To create a frequency polygon, one have to extend the distribution by
introducing one class before the lowest class and one class after the highest class both
of them will be having zero frequencies.
Example 1.6 Draw a frequency polygon for the data in Example 1.2.
Solution
The frequency distribution with class marks is shown below
Class Class mark Frequency

10 – 14 12 0
15 – 19 17 4
20 – 24 22 5
25 – 29 27 8
8
30 – 34 32 5
35 – 39 37 8
40 – 44 42 6
45 – 49 47 4
50 – 54 52 0
The frequency polygon is given below where the class marks are amounts in dollars
The frequency Polygon
9
8
7
6
Frequency
5
4
3
2
1
0
$ 12 $ 17 $ 22 $ 27 $ 32 $ 37 $ 42 $ 47 $ 52
Class marks
1.6 Cumulative Frequency Polygon (OGIVE)
This is a line graph obtained by representing the upper class boundaries along the
horizontal axis and the corresponding cumulative frequencies along the vertical axis.
Example 1.7 Draw a cumulative frequency polygon for the data given in Table 1.5.
Solution
The table will be extended by adding a column of upper class boundaries and a column of
cumulative frequency as shown below
9
Class Boundaries Frequency Upper Cum.

boundary Frequency
Less than 14.5 0
15 – 19 14.5 – 19.5 4 Less than 19.5 4
20 – 24 19.5 – 24.5 5 Less than 24.5 9
25 – 29 24.5 – 29.5 8 Less than 29.5 17
30 – 34 29.5 – 34.5 5 Less than 34.5 22
35 – 39 34.5 – 39.5 8 Less than 39.5 30
40 – 44 39.5 – 44.5 6 Less than 44.5 36
45 – 49 44.5 – 49.5 4 Less than 49.5 40
The cumulative frequency polygon (OGIVE) is shown below.
Cumulative Frequency Polygon
45
40
35
Cumulative Frequency
30
25
20
15
10
0
Less than Less than Less than Less than Less than Less than Less than Less than
14.5 19.5 24.5 29.5 34.5 39.5 44.5 49.5
Boundaries
10
EXERCISES 1
1. The scores of 50 students in one of their mathematics tests were recorded as follows:
55 34 49 73 81 38 66 71 63 56
47 52 39 75 61 56 50 48 65 73
54 62 58 46 48 35 76 85 68 55
66 72 69 58 54 46 49 58 76 67
77 44 59 68 36 36 48 47 58 73
Construct the frequency distribution using six classes with the lower limit of the first
class interval being 30. The distribution should include the relative frequencies.
2. Forty statistical values were collected from a certain engineering firm and the results
in thousands of dollars were given below;
1.8 2.3 10.0 30.0 1.2 4.9 2.0 5.1 8.1 40.1
1.3 2.5 3.6 1.8 2.4 7.6 3.1 8.5 6.5 6.2
1.2 2.8 4.5 6.1 7.8 3.2 2.8 6.1 3.8 4.9
19.6 1.7 8.6 6.4 5.2 4.1 3.1 8.1 6.8 5.8
Construct a frequency distribution with five classes only. Starting with the lower limit
of the first class as 1.0
3. By considering the frequency table obtained in (1) above, do the following

(a) Draw a histogram.
(b) Draw a frequency polygon.
(c) Draw the cumulative frequency polygon (OGIVE).
4. Repeat question (3) using the grouped frequency distribution obtained in (2) above.
11
CHAPTER TWO
MEASURES OF AVERAGE AND DISPERSION
2.1 Introduction
In this chapter, we shall discuss the measure of averages (central tendencies) and
dispersion (spread). These measures are important for statistical reporting and analyses.
They do not involve any inference on them, but their information is important for
decision making.
2.2 Measures of Central Tendency
Mean, median and Mode are three major groups of measures of central tendency or
some times called measures of average. These measures can be determined for
ungrouped and grouped data.
2.2.1 Ungrouped data

In this subsection we shall discuss how to compute each of the mentioned measure of
average for ungrouped data.
2.2.1.1 Mean
There are various mean measures in statistics. These include arithmetic mean,
geometric mean and harmonic mean. We define each of these measures below;
Definition 2.1 (Geometric mean) Consider n observations x1 , x2 ,, xn , then the

geometric mean is given by
G.M  n x1 x2  xn .
It is not commonly used measure of average, but is still applicable in physical sciences.
Definition 2.2 (Harmonic mean) Consider a set of n observations x1 , x2 ,, xn ,

then the harmonic mean is defined by
12
1
H  n
1 1
n
x
i 1 i
This is also not commonly used measure of average, sometimes it is also desirable.
Definition 2.3 (Arithmetic mean) The most commonly used measure of average is the
arithmetic mean. It is denoted by x . The arithmetic mean is regarded as a suitable
measure of central tendency when the values in the data set are symmetric. In other
words, if the data set contains no extreme values. Given a set of n observations as
shown above, then
1 n
x   xi
n i1
2.2.1.2 Median
Another measure of central tendency is the median. This is defined as the middle value
of the data set when the data are arranged in order (ascending or descending). The
median is a suitable measure of average for data with extreme values. It is also used to
give the general overview for a huge mass of data whereby the computation of the
arithmetic mean might be tedious. It can be denoted by ~
x.
 n  1
The median takes the position  th for odd number of observations. However, if
 2 
th th
n n 
there is an even number of observations, it is the average of the   and   1
2 2 
observations.
2.2.1.3 Mode
This is the value(s) with the highest frequency from the data set. It can be used to
determine the most favourable output of a certain experiment and help decide on what
measures may be taken from that output. It is commonly denoted by x̂ . For instance, a
shop keeper may observe that the preferable neck size of shirts in his/she is 40, this will
enable him/her to increase the stock of the most preferable neck size in the future orders
Example 2.1 Compute arithmetic mean, median and mode for the following data.
3, 4, 6, 8, 3, 5, 9, 11, 7, 10
13
Solution
We can first arrange the data in ascending order as follows;
3, 3, 4, 5, 6, 7, 8, 9, 10, 11
Arithmetic mean The arithmetic mean is given by,
x
 x  3  3  4  5  6  7  8  9  10  11  66  6.6 .
n 10 10
Mode is the value with highest frequency. In this case the highest frequency is 2.
Therefore, the mode is 3.
Median There is even number of observations n = 10. In this case, median is the
average of the fifth and sixth observations which are 6 and 7. Hence,
67
Median = = 6.5.
2
2.2.2 Grouped Data

In this case we are going to discuss the arithmetic mean, median and mode for grouped
data. The data should be first presented in a grouped frequency distribution.
2.2.2.1 The Arithmetic Mean

Given a grouped frequency distribution we need to create a column for class midpoints
or class marks  xi  . Then the arithmetic mean will be obtained from the general
formula given by
k
f
i 1
i xi
x ,
f i
where k is the number of classes and f i n
2.2.2.2 The median
To compute the median we first need to identify the class which contains the median,
we call this as the median class. Then the median can be computed using the formula
14
N 
  Cb  h
x L  
~ 2
fm
where L = lower boundary of the median class
N = total number of observations
C b = cumulative frequency before median class
f m = frequency of the median class
h = class width/size of the median class
2.2.2.3 Mode
Mode  x̂  of a grouped data can be computed using the formula

 1 
xˆ  L    h
 1   2 
Suppose that f m = frequency of the modal class
f b = frequency of a class before the modal class
f a = frequency of a class after the modal class
Then, 1  f m  f b and  2  f m  f a
Example 2.2 The height (in inches) of 100 male students at ABC College were
recorded as follows
Height 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74
Frequency 5 18 42 27 8
Compute (a) Arithmetic mean (b) Median height (c) Mode.
Solution
Given such a grouped frequency, we first compute various sums and summarize in a
table as shown below
15
Class Class mark Freq  f i  Cum. f i xi

 xi  freq
60 – 62 61 5 5 305
63 – 65 64 18 23 1152
66 – 68 67 42 65 2814
69 – 71 70 27 92 1890
72 – 74 73 8 100 584
TOTAL 100 6745
(a) Mean =
fx i i

6745
 67.45
f i 100
(b) Median is the value contained in class 66 – 68. From this class we have
L  65.5 , f m  42 , Cb  23 , h  3.
Then,
N 
  Cb  h
Median = L  
2   65.5  50  23(3)  67.43
fm 42
(c) The class with the highest frequency is 66 – 68 and thus it is the modal class.
From this class, we find that
L  65.5 , h  3 , f m  42 , f b  18 , f a  27 .
Implying that 1  f m  f b  42  18  24 ,  2  f m  f a  42  27  15 .
 1   24 
Therefore, mode = L    h  65.5    (3)  65.5  1.846  67.35
 1   2   39 
2.3 Measures of Dispersion
Measures of dispersion show how data deviate from the given measure of average
(arithmetic mean or median). These measures include range, mean absolute deviation,
standard deviation and quartile deviation.
The most commonly used measure of variation is the sample standard deviation, since
the population standard deviation is not easily obtained in practice. However, this
measure is not suitable for data with extreme values. If the data consists of some
extreme values, the appropriate measure would be the quartile deviation. The mean
absolute deviation is rarely used to compare the variation between two data sets.
16
The range is a very crude measure of dispersion. It is used just to give the general
overview on the spread of data. In similar fashion we shall separately discuss
ungrouped and grouped data.
2.3.1 Ungrouped data
Consider a set of values X 1 ,, X N from a certain population, and x1 , x2 ,, xn from a
sample of a given population. Then, we compute the following measures
Mean Absolute Deviation

Mean Absolute Deviation for a population is given by
MAD 
X i X
N
where X is the population mean and N = population size.
Similarly for sample we have
MAD 
x i x
, where n = sample size.
n
Variance and Standard Deviation

The population variance,  2 , is given by the formula
 2

 X i  X2
N
And the sample variance denoted by s 2 , is given by the formula
s2 
 x .
i  x 2
n 1
The standard deviation is defined as the square root of the variance. It is denoted by 
for a population, and s for a sample. It is also denoted in general by SD X  .
From the above formulas we have

 X i  X2
and s 
 x i  x 2
N n 1
17
Quartile Deviation
The quartile deviation (Q.D) is given by Q.D 

1
Q 3  Q1  ,
2
where, Q1 = the first quartile, and
Q 3 = the third quartile
Note that quartiles are measures of average and they depend on the number of
observations. For odd number of observations, it is simple to obtain the quartiles, while
for even number of observations; determination of quartiles is not straight forward.
Suppose we have n  10 observations. Then, the first quartile will be taking the
position
th
1

n  1th   11   2.75th  2 nd  0.75 3rd  2 nd 
4 4
Similarly, the third quartile will take the position
th
3

n  1th   33   8.25th  8th  0.25 9 th  8th 
4 4
Example 2.3 Compute mean absolute deviation, sample standard deviation and quartile
deviation of the following data 10, 12, 8, 16, 8, 20, 21, 15
Solution We first compute the arithmetic mean as

10  12  8  16  8  20  21  17 112
x   14
8 8
Then, the data are arranged in order and various calculations are summarized in the
table below
xi xi  x xi  x  xi  x  2
8 - 6 6 36
8 - 6 6 36
10 - 4 4 16
12 - 2 2 4
16 2 2 4
17 3 3 9
20 6 6 36
21 7 7 49
36 190
18
Then, mean absolute deviation is given by
MAD 
x i x

36
 4.5
n 8
Sample standard deviation is given by
s
 x i  x 2
=
190
= 5.21
n 1 7
We can also obtain the sample variance without prior computation of the arithmetic
mean, x . This alternative formula is given by
1   xi  2


s 
2
n 1
 xi2 
n 
 
For quartile deviation, we proceed as follows;

The data are arranged in ascending order as follows: 8, 8, 10, 12, 16, 17, 20, 21
We have an even number of observations, n = 8.
Then,
 n 1
th th
9
Q1    observation =   observation
 4  4
= 2.25th observation
= 2 nd  0.25 3rd  2 nd 
= 8  0.25 10  8 = 8 + 0.5 = 8.5
th
Q3 
3
n  1th observation =  27  observation
4  4 
= 6.75th observation
= 6 th  0.75 7 th  6 th 
= 17 + 0.75 (20 - 17) = 17 + 2.25 = 19.25
Then, Quartile deviation =

1
Q 3  Q1  = 1 19.25  8.50  10.25  5.375
2 2 2
19
2.3.2 Grouped data
In this subsection we will consider only two measures commonly used in statistical
analysis and decision making. These include sample standard deviation and quartile
deviation.
2.3.2.1 Mean Absolute Deviation

From a grouped data we first compute the class marks ( xi ) and the arithmetic mean, x ,
such that
x
fx , where n   f i
i i
n
Then, the mean absolute deviation for the sample is given by
MAD 
f i xi  x
n
2.3.2.2 Sample Standard Deviation
Sample standard deviation is given by
s2 
 f x i i  x 2
n 1
Alternatively we use
1   f i xi  2 
s 
2
n  1 
 f i xi  n 
2
2.3.2.3 Quartile Deviation
The quartile deviation is computed again as Q.D 

1
Q3  Q1  but in this case we need
2
to compute Q1 and Q3 using the formula for median. Q1 is called the lower quartile
and Q3 is called the upper quartile of the grouped data. We have therefore
20
N 
  Cb  h
Q1  L   
4
f
where, L = lower boundary of the class contains lower quartile
C b = cumulative frequency before the class which contains lower quartile
h = the class size
f = the class frequency
Similarly, we define the upper quartile by
3 
 N  Cb  h
Q3  L   
4
f
where, L = lower boundary of the class contains upper quartile
C b = cumulative frequency before the class which contains upper quartile
h = the class size
f = the class frequency
Example 2.4 Compute the sample standard deviation and the quartile deviations for
the data in the following table
Class 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49
Freq. 4 5 8 5 8 6 4
Solution
Calculations are summarized in the table below
Class Freq. ( f i ) c. mark ( xi ) f i xi f i xi2
15 – 19 4 17 68 1156
20 – 24 5 22 110 2420
25 – 29 8 27 216 5832
30 – 34 5 32 160 5120
35 – 39 8 37 296 10952
40 – 44 6 42 252 10584
45 – 49 4 47 188 8836
TOTAL 40 1290 44900
21
(a) Sample Standard deviation

Using summations from the above table, we compute variance as
1  1290 2 
s2   44900    84.55
39  40 
 s  84.55  9.2
(b) Quartile deviation
We first compute Q1
N  1 41
Note that   10.25 . Thus the class which contains Q1 is 25 – 29.
2 2
From this class, we find that L = 24.5, f = 8, Cb = 9, h = 5,
Thus,
5  40 
Q1  24.5    9   25.125
8 4 
Then, we compute Q3 as follows;
3
N  1  3 41  30.75 . Hence the class which contains upper quartile is 40 – 44.
4 4
From this class we have; L  39.5 , f  6 , Cb  30 , h  5 ,
Therefore,
53 
Q 3  39.5   40  30   39.5  30  305  39.5  0  39.5
1
64  6
Thus the Quartile Deviation =

1
39.5  25.125  7.1875
2
22
EXERCISES 2
1. Given the following data set

12 23 17 11 9 12 32 24 17 25 30 19 15 25 22 18
Compute the following statistics;
(a) Geometric mean and harmonic mean.

(b) The arithmetic mean, median and mode.
(c) Mean absolute deviation.
(d) Sample standard deviation.
(e) Quartile deviation.
2. The diameters of a sample of 100 rods (mm) were recorded as follows;
Class 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64
frequency 15 30 35 15 5
Compute the following statistics;

(a) The mean, median and mode of diameters.
(b) The mean absolute deviation.
(c) Sample standard deviation.
(d) Quartile deviation.
3. The absolute errors done by two different measuring balances (grams) for the past
eight days were recorded as follows;
Balance A: 1.3 2.1 0.9 4.2 1.1 2.3 0.7 3.2

Balance B: 0.8 1.8 2.7 4.2 1.9 0.9 4.7 3.7
Based on sample variances only, suggest which of the two measuring balances may
be considered fairly consistent.
23
CHAPTER THREE
INTRODUCTION TO PROBABILITY
3.1 Introduction
The term probability can be defined as the chance that a certain event will occur. It
ranges between zero and one. It can be represented as a decimal, percentage or fraction.
For instance, according to Tanzania Meteorological Authority (TMA), the probability
that there will be a rain tomorrow is 0.65.
3.2 Probability models

These are the models which involve chances of certain outcomes to occur or chances of
propagation for series of events. Some modeling required the use of probability concepts
and hence called probabilistic/stochastic models. These involve modeling of
environmental issues, business forecasts, and weather and so on.
3.3 Probability Concepts

The term probability can be determined from two major concepts; empirical and
theoretical concepts.
3.3.1 Empirical Concept

In this we define a probability as a relative frequency; that is the ratio of the total number
of occurrences of a situation to the total number of times an experiment was repeated.
These numbers of repetition however need to be large to justify the concept. That is, if f x
is the number of times a situation x has occurred out of n repeated trials, then we define
the probability of an event x to occur as
f 
P X  x   lim  x 
n
 n
Sometimes it is difficult to obtain probabilities from practical experiment. In which case a
certain expert can assign probabilities to given events by using his/her own long run
experience. This kind of probability is known as subjective probability and it is widely
used for planning activities.
3.3.2 Theoretical Concept

24
Before giving the theoretical concept to probability let us define some of the important
terms.
A sample space: This is a set consisting of all sample points. It is denoted by S. Consider
a case of tossing a fair coin twice with H being number of heads shown up and T being
number of tails. In this case, S is a set of four points given by S  HH , HT , TH , TT .
An Event: An event is a subset of a sample space. It is denoted by E. From the above
experiment, if E is an event that only one head shown up, then E will be a set of two
points given by E  HT , TH .
A sample Point: It is an individual element of a sample space. It is denoted by e . From
the above experiment, e1  HH .
The theoretical definition of probability is also depending on one of the two cases.
Case1: Sample Points are equally likely

Let E be any events whose members is a subset of S with equally likely points.
In this case, the probability of an event E is given by
n E 
P E  
nS 
where nE  and nS  are number of elements in E and S respectively.
From the above experiment, we can find the probability that only one head shown up as
n E  2 1
P E    
nS  4 2
Case2: Sample Points are NOT Equally Likely

In this particular case the probability of a given event E can be obtained by adding the
probabilities of individual points in E. To illustrate this case consider the following
example
Example 3.1
A football team has to play two matches to qualify for the second round. There is 0.7
chances that it will win the first match and 0.8 chance of winning the second match. By
assuming that winning or losing the first match does not affect the second match, find the
probability that the team will win
(a) Only one match.
(b) At least one matches.
25
Solution
First step is to define the outcomes
Let W1 = a team wins the first match
L1 = a team loses the first match
W2 = a team wins the second match
L2 = a team loses the second match
From the given information we find that
PW1   0.7 , PL1   0.3 , PW2   0.8 , PL2   0.2
The sample space can be obtained from the tree diagram as shown below:
W2 W1W2
W1 L2 W1 L2
L1 W2 L1W2
L2 L1 L2
The sample space is therefore S  W1W2 ,W1 L2 , L1W2 , L1 L2 

(a) If A is the event that a team wins only one match, then A  W1 L2 , L1W2 
Hence P A  PW1 PL2   PL1 PW2   0.7  0.2  0.3  0.8  0.38
(a) If B is the event that a team wins at least one match, then
B  W1W2 , W1 L2 , L1W2 
And,
PB   PW1 PW2   PW1 PL2   PL1 PW2 
 0.7  0.7  0.38  0.56  0.38  0.94
3.3.3 Axioms of Probabilities

Let E be an event consists of one or more sample points e, then
i  Pei   0 , for each i
26
ii  0  PE   1
iii   Pei   1
all i
3.4 Rules of Probabilities

There are four basic rules of probabilities namely addition, mutually exclusive,
complement and multiplicative.
(a) Addition Rule
This rule states that if A and B are any two events, then
P A  B  P A  PB  P A  B
(b) Mutually Exclusive Events

Two events A and B are said to be mutually exclusive if they can not occur at the same
time. That is P A  B   0 and hence
P A  B  P A  PB
(c) Complement Rule

Let A denotes that an event A can not occur. Then,
P A  1  P A
Example 3.2
Given the following information P A  0.6 , PB  0.5 , P A  B  0.4
(a) Find the following i  P A  B ii  P A' iii  PB ' iv  P A  B'
(b) Show that P A  B '  P A 'B '
Solution
a  i  P A  B   P A  PB   P A  B   0.6  0.5  0.4  0.7
ii  P A '  1  P A  1  0.6  0.4
iii  PB '  1  PB   1  0.5  0.5
iv  P A  B '  1  P A  B   1  0.4  0.6
27
b  P A  B '  1  P A  B   1  0.7  0.3

now, from P A ' B '  P A '  PB '  P A ' B '
 P A ' B '  P A '  PB '  P A ' B '
but P A ' B '  P A  B '  0.6
 P A ' B '  P A '  PB '  P A  B '  0.4  0.5  0.6  0.3  P A  B ' 
Multiplicative rule follows after discussing the concept of conditional probabilities.
3.5 Conditional Probability and Independent Events

3.5.1 Conditional Probability
This is the probability of a certain event to occur provided that other event has already
occurred. Such events have the property of depending one another. Examples of
conditional events include; (1) enrollment of first year students at UDSM and
performance on ACSEE, (2) Sunday open market and rain, (3) Expenditure and
Individual income and so on.
Definition 3.1 The conditional probability of an event A to occur given that event B has
already occurred denoted by P A / B  is given by
P A  B 
P A / B   , P B   0
P B 
P  B  A
Similarly, PB / A  , p A  0
P  A
Example 3.3
An individual is picked at random from a group of 52 athletes. Suppose that 26 of the
athletes are female of which 6 are swimmers. Also, there are 10 swimmers among male
athletes.
(a) Given that the individual picked is a female, find the probability that she is a
swimmer.
(b) Given that the individual picked is a swimmer, find the probability that he is a male.
Solution
Let M = male, F = female, S = swimmer
PM   , P F   , PF  S   , PM  S  

26 26 6 10
52 52 52 52
28
(a) Required PS / F 

PS  F  6 / 52
PS / F  
6 3
Now,   
P F  26 / 52 26 13
(b) Required PM / S 
PM  S  PM  S 
PM / S   
PS  PM  S   PF  S 
10 / 52 10 / 52 10 5
   
10 / 52  6 / 52 16 / 52 16 8
The formula for the conditional probability gives that general multiplication rule of
probability.
(d) Multiplication rule of probabilities

If A and B are any two events then, P A  B  PB.P A / B
Example 3.4
The probability that the stock market goes up on Monday is 0.6. Given that it goes up on
Monday, the probability that it goes up on Tuesday is 0.3. Find the probability that the
market will go up on both days.
Solution
Let M = market goes up on Monday, T = market goes up on Tuesday
Given PM   0.6 , PT / M   0.3 . Required to find PM  T 
From the rule, PM  T   PM .PT / M   0.6  0.3  0.18
3.5.2 Independent Events

In some situations, the occurrence of one event does not influence the occurrence of
another event. Such kind of events are knows as independent events. For these events,
P A / B  P A and PB / A  PB .
Definition 3.2 If two events A and A are independent then, P A  B  P A.PB
Example 3.5
A bag contains 2 white and 3 red balls. If two balls are picked one at a time with
replacement, find the probability that,
29
(a) Both balls are red

(b) Balls are of different colors.
Solution
Let R = red ball, W = white ball
The sample space is S  RR, RW , WR , WW . Given that PR  3 / 5, PW   2 / 5
(a) Required: PRR 
Since the events are independent, then PRR   PR PR    

3 3 9
5 5 25
(b) Required: PRW  or PWR
PRW  or PWR   PR PW   PW PR      

3 2 2 3 12
5 5 5 5 25
Example 3.6
A certain machine is operated using three components C1 , C 2 and C 3 . The probabilities
of these three components to perform well are respectively 0.80, 0.96 and 0.91. Suppose
that these components work independently, find the probability that
(a) All three components work properly.
(b) Only two components work properly.
Solution
(a) PAll three components work   PC1  C 2  C3 
 PC1 PC 2 PC3 
 0.80  0.96  0.91  0.69888
(b) POnly twocomponents work   PC1  C 2  C3   PC1  C 2  C3   PC1  C 2  C3 

 PC1 PC 2 PC3   PC1 PC 2 PC3   PC1 PC 2 PC3 
 0.80  0.96  0.09  0.80  0.04  0.91  0.20  0.96  0.91
 0.27296
3.6 Counting Techniques

3.6.1 The Basic Counting Principle
30
This principles states that if a certain experiment is performed in r ways, and

corresponding to each of these ways, another experiment is performed in k ways, then the
combined experiment can be performed in r  k ways.
We illustrate this by an experiment of rolling a fair die and tossing a fair coin. A die has
six faces r  6 and a coin has two faces k  6
Then, the combined experiment has a sample space of 6  2  12 points. These sample
points can be shown in the following tree diagram.
H 1H
T 1T
1
H 2H
2 T 2T
H 3H
3 T 3T
H 4H
4
T 4T
5 H 5H
T 5T
6
H 6H
T 6T
3.6.2 Permutation
Permutation is an ordered arrangement of objects (letters or numbers). These n objects
could be distinct or not distinct.
The number of permutations of n distinct objects taken r at a time denoted by n Pr is
given by
n!
n Pr 
n  r !
where n! nn  1n  22  1 .Note that 0! 1 and nPn  n!
31
Example 3.7
How many numbers with three distinct digits are possible using the digits 3, 4, 5, 6, 7, 8?
Solution
We need to find n Pr where n = 6 and r = 3
6! 6!
Then, 6 P3    120
6  3! 3!
Five of these numbers are 345, 356, 347, 378, 567.
Example 3.8
In how many ways can 5 people be arranged in a line?
Solution:
There are 5! 120 ways of arranging five people in a line.
3.6.3 Permutations when some objects resemble

The arrangement of n objects such that s of them resemble and t of them resemble and
so on is given by
n!
s !t !
Example 3.9
In how many ways can the letters of the word ESSENTIAL be arranged?
Solution:
The word ESSENTIAL consists of nine letters of which S repeats 2 times and E repeats 2
times.
9!
Thus, the possible number of ways =  90720 .
2! 2!
3.6.4 Combination
Unlike permutation, in the case of combination, the order is not important.
We can define a combination as a selection of r objects in a group of n objects. It is
denoted by
n n n!
  or nCr , where   
r   r  r !n  r !
32
Example 3.10
How many ways are there of choosing a set of three books from a set of eight books?
Solution
 n  8
Since the order is not important, the answer is       56 ways.
 r   3
3.6.5 Applications of Counting in computing probabilities

In some applications, probabilities are computed from counting. This involves both
permutations and combinations problems. The following examples illustrate.
Example 3.11
The letters of the word VOLUME are arranged in all possible ways. Find the probability
that
(a) The word ends with a vowel.
(b) The word starts with a consonant and ends with a vowel.
Solution
The word volume has six distinct letters and hence there are 6! ways of arranging these
letters, where 6! = 720.
(a) There are three choices of the last letter so that the word ends with a vowel. The first
through the fifth letters are arranged in 5! ways. Thus the number of ways that a word
ends with a vowel is given by 3  5  4  3  2 1  360
360
The required probability =  0.5
720
(b) For a word to start with a consonant and end with a vowel, there are three choices for
the first letter and also three choices for the last letter. The middle four letters can be
arranged in 4! ways. The required number of ways in this case is 3  3  4  3  2 1  216
216
The required probability =  0.3
720
Example 3.12
An Engineering consultant is faced with a problem of surveying five sites at Kinondoni,
seven sites at Ilala and eight sites at Temeke. Due to time constraint he/she decided to
choose only six sites to survey. Of these six sites, find the probability that
33
(a) 2 sites are from Kinondoni, One site from Ilala and three sites from Temeke.
(b) 2 sites are from Kinondoni.
Solution
(a) We first need to find the total number of ways of selecting six sites out of twenty.
 20 
This selection can be done in    38,760 ways.
6 
5
There are   ways of selecting two sites out of five from Kinondoni.
 2
7 8
Similarly, there are   ways for one site at Ilala, and 
 3
 ways of three sites from
1   
Temeke.
 5  7  8 
Combined number of ways is given by 
 2 1  3   3920
   
3920
The required probability =  0.101
38,760
(b) Since the only restriction is that two sites are from Kinondoni, the rest four sites can
be chosen from either Ilala or Temeke.
5 15 
Hence there are   2
 ways for two sites from Kinondoni and   ways of selecting
  4 
the rest four sites.
 5 15 
The combined number of ways is therefore     13,650.
 2  4 
13,650
The required probability =  0.352
38,760
34
Exercises 3
1. A certain consultant has written two proposals ready to request for funding. The
probability that either of the proposals will be successful is 0.7, and that both
proposals will be successful is 0.2. There is 0.6 chance that the first proposal will not
be successful. Find the probability that the second proposal will be successful.
2. The probability that a female student studies is 0.7. Given that she studies, the
probability is 0.8 that she will pass a course. Given that she does not study, the
probability is 0.3 that she will pass the course. Find the probability that
(a) she will study and pass
(b) she will not study and pass the course
(c) she will pass the course
3. The probability that an executive is promoted to a higher position is 0.625. If he is
promoted, he will go on vacation with a probability of 0.83; however, if he is not
promoted, there is a probability of 0.33 that he will take a vacation.
(a) find the probability that he will go on a vacation
(b) Given that he has gone on a vacation, find the probability that he had been
promoted.
4. A box contains five identical items of which three are defective. Suppose that three
items are selected at random.
(a) Obtain the sample space for the experiment.
(b) Find the probability that exactly two items are not defective.
5. A certain system consists of three independent components. The reliabilities of these

components are respectively given by 0.85, 0.56 and 0.93. This system works only if
at least two of its components work. Find the probability that the system will work
properly.
6. An engineering firm wants to buy six new machines to increase production of a

certain commodity. The supplier has ten machines in stock of which four are foreign
made.
(a) In how many ways can the firm buy those six machines?
(b) If the firm decided to buy four domestic and two foreign made machines. In
how many ways can this selection been done?
35
7. A four-digit number is written using the digits 2, 3, 4, 5, 7 and 8. Find the probability
that a number formed is an odd number.
36
CHAPTER FOUR
PROBABILITY DISTRIBUTIONS
4.1 Introduction
In this chapter we will discuss some probability distributions of random variables. There
are two types of probability distributions depend on the type of a random variable; these
are discrete and continuous probability distributions.
4.2 Random Variables

Definition 4.1 A random variable is the function which is defined over the elements in
the sample space, S. Random variables are denoted by capital letters X, Y, and so on, and
their corresponding values are denoted by small letters x , y , etc.
Random variables are classified into two groups, discrete random variables and
continuous random variables.
4.2.1 Discrete Random Variables

A discrete random variable is the one which can assume only integer values of elements
in the sample space.
Examples of discrete random variables include
(a) Number of cars passed at a certain zebra marking.
(b) Number of possible outcomes in tossing a fair coin or dice twice etc.
Definition 4.2 If X is a discrete random variable, the function given by f x   P X  x 

for each x within the range of X is called the probability distribution of X.
Probability distributions of discrete random variables are commonly represented in
tabular form, where values of X with their corresponding probabilities are shown. Some
times they are given in the form of formulae
Theorem 4.1 The function f x  can serve as a probability distribution of a discrete

random variable X if and only if
1. f x   0 for each x  X
2.  f x   1
all x
37
Example 4.1
Find the formula for the probability distribution of a total number of heads obtained by
tossing a fair coin three times.
Solution
The following tree diagram is used to obtain the sample space S
H HHH
H T HHT
H HTH
H T
T HTT
T H H THH
T THT
T H TTH
T TTT
Therefore S  HHH, HHT, HTH, HTT , THH, THT, TTH , TTT

By letting X = number of heads shown up, we find that X can take values 0, 1, 2 or 3.
Then the probability distribution of X in this case is given as follows:
X 0 1 2 3
P(X) 1/8 3/8 3/8 1/8
Hence the formula for the probability distribution is given by

1  3
f x     , x  0,1, 2, 3
8  x
Definition 4.3 If X is a discrete random variable, the function given by

F x   P X  x    f t 
tx
where f t  is the value of the probability distribution of X at t, is called the distribution

function, or the cumulative distribution of X.
38
Theorem 4.2 The value F x  of the distribution function of a discrete random variable X
satisfy the following conditions
1. F    0 and F   1
2. If a  b , then F a   F b for any real numbers a and b.
Example 4.2
Find the distribution function from the probability distribution obtained in example 3.1.
Solution
From the above example we have f 0  1 / 8, f 1  3 / 8, f 2  3 / 8 and f 3  1 / 8 ,
then by definition of distribution function, we get
F 0  f 0  1 / 8
F 1  f 0  f 1  1 / 8  3 / 8  4 / 8
F 2  f 0  f 1  f 2  1 / 8  3 / 8  3 / 8  7 / 8
F 3  f 0  f 1  f 2  f 3  1 / 8  3 / 8  3 / 8  1 / 8  1
Hence, the distribution function is given by

0 for x  0
1
 for 0  x  1
8
 4
F x    for 1  x  2
8
7
8 for 2  x  3

1 for x  3
4.2.2 Continuous Random Variables

A continuous random variable is the one which assume real numbers within the specified
interval I . The formula which gives the probabilities for a continuous random variable is
called a probability density function (p.d.f) for short or simply a density.
Definition 4.4 A function with values f x  defined over the set of all real numbers is
called a p.d.f. of the continuous random variable X if and only if
Pa  X  b    f x  dx
b
a
39
Theorem 4.3 A function f x  can serve as a probability density of a continuous random

variable X if the following conditions are satisfied
1. f x   0 ,    x  

2.  f x  dx  1

Example 4.3
If X has the probability density given by
k e 3 x for x  0
f x   
0 elsewhere
Find (i) the value of k (ii) P0.5  X  1
Solution:
(i) Since f x  is a probability density, then

 k

3 x
ke dx  1   e 3 x 1
0 3 0
 k
 0    1
 3
 k 3
P0.5  X  1   3 e 3 x dx   e 3 x
1 1
0.5 0.5
 e 3  (e 30.5  )  e 1.5  e 3  0.173
Definition 4.4 If X is a continuous random variable and the value of its probability
density is f t  , then the distribution function of X is given by
F x   P X  x    f t  dt ,    x  
x

Theorem 4.4 If f x  and F x  are the respective values of the probability density and
the distribution function of X at x, then
Pa  X  b  F b  F a 
40
Example 4.4
Find the distribution function of X whose p.d.f. is given by
3 e 3 x for x  0
f x   
0 elsewhere
Solution
Using the definition, we have
F x   P X  x    f t dt
x
0
x
  3 e 3 t dt
0
 e 3 t   1  1  e 3 x
x
  e 3 t
0
Therefore the distribution function of X is

1  e 3 x for x  0
F x   
0 elsewhere
4.3 Expected Values and the Standard deviations of Random Variables

Expected values and standard deviations shall be discussed separately for discrete and
continuous random variables.
4.3.1 Discrete Random Variables

Definition 4.5 If X is a discrete random variable, then its expected value is given by
EX    x px 
all x
Definition 4.6 If X is a discrete random variable, then its variance is given by

Var  X   Ex  E X   E X 2  E X 
2
  2
Where    x p x 
E X2  2
all x
The standard deviation of X is therefore given by

SD X   Var  X 
Example 4.5
The five days incomes of a certain firm in thousands of dollars with their associated
probabilities were given as follows:
Income: 1.2 3.3 1.8 0.9 2.8
Prob. 0.30 0.15 0.20 0.15 0.20
Find the expected income and the standard deviation of the firm.
41
Solution
Let X be a daily income of the firm. We summarize the required sums in the following
table
X p(x) x p(x) x2p(x)
1.2 0.30 0.36 0.432
3.3 0.15 0.495 1.6335
1.8 0.20 0.36 0.648
0.9 0.15 0.135 0.1215
2.8 0.20 0.56 1.568
SUM 1.91 4.403
Then,
E  X    xpx   1.91
 
Var  X   E X 2  E X   4.403  1.912  0.7549
2
 SD X   Var  X   0.7549  0.8688

Therefore the expected daily income is $1.91 (‘000) and the standard deviation is
$0.8688 (‘000).
4.3.2 Continuous Random Variables
Definition 4.7 If X is a continuous random variable, then its expected value is given by
E  X    xf x dx


Definition 4.8 If X is a continuous random variable, then its variance is given by

2
 
Var  X   Ex  E X   E X 2  E X 
2
Where   
E X 2   x 2 f x dx

The standard deviation of X is therefore given by

SD X   Var  X 
42
Example 4.6
A continuous random variables X has the probability density given by
3 e 3 x for x  0
f x   
0 elsewhere
Find the expected values and the standard deviation of X.
Solution
Given f x   3e 3 x for 0  x  
Then by definition,
  
E  X    xf x dx   3xe 3 x dx   xe 3 x
 1
  e 3 x dx 
0 0 0 0 3
Also,
E X 2    x 2 f x dx   3x 2 e 3 x dx 
  2
0 0 9
Implying that Var  X   E X 2   E  X  

2 1 1
 
2
9 9 9
Therefore, the standard deviation of X is SD X  

1
3
4.4 Joint Probability Distributions

The joint distributions are distributions joining two or more random variables. They are
used to determine how random variables are jointly related using different categories.
There are discrete joint probability distributions and continuous joint probability
distributions.
4.4.1 Discrete Joint Probability Distributions

Definition 4.9 If X and Y are discrete random variables, the function given by
f x, y   P X  x, Y  y  for each pair of values x, y  within the range of X and Y is
called the joint probability distribution of X and Y
Theorem 4.5 Two discrete random variables (say X and Y) are said to be jointly
distributed if the following conditions are satisfied
1. f x, y   0
2.  f x, y   1
x y
43
Example 4.7
Find the value of k if the following is a joint probability distribution
kxy for x  1, 2,3; y  2, 3
f x, y   
0 otherwise
Solution
Given that the function is a joint probability distribution, we have  f x, y   1 .
x y
It implies that,
f 1,2  f 1,3  f 2,2  f 2,3  f 3,2 f 3,3  1
 2k  3k  4k  6k  6k  9k  1
1
So that 30k  1 or k 
30
Example 4.8
40 defective items produced by a certain engineering firm in 2010 was recorded
depending on the type of department they belongs. They are further categorized as high,
average and low. Their joint probabilities were given in the table below
Departments
Defective items Civil (C) Electrical (E) Plumbing (PL)
(12) (14) (14)
High (H), (14) 0.10 0.10 0.15
Average (A), (16) 0.10 0.20 0.10
Low (F), (10) 0.10 0.05 0.10
Find the probability of getting

(a) Average defective from electrical
(b) Low defective only
(c) Defective from plumbing
44
Solution
Using probabilities from the table, we get the following
a  P Average from Elecrical   P A, E   0.20
b PLow defective only   P( F )  0.10  0.05  0.10  0.25
c  PDefective from plumbing   PPL   0.15  0.10  0.10  0.35
4.4.2: Marginal Discrete Probability Distributions

These are distributions obtained by considering only some of the categories (variable)
from the joint distribution. For the case of two variables, a marginal distribution is a
function of one variable obtained by summing up all probabilities of the other variable.
Definition 4.10 If X and Y are discrete random variables and f x, y  is the value of
their joint probability distribution at x, y  , then the function given by
g x    f x, y 
y
for each x within the range of X is called the marginal distribution of X, similarly, the
function
h y    f x, y 
x
for each y within the range of Y is called marginal distribution of Y.
Example 4.9
Use the joint distribution of example 3.8 to obtain a marginal distribution of
department.
Solution
By summing all probabilities from defective items category we get
PCivil   0.10  0.10  0.10  0.30
PElecrical   0.10  0.20  0.05  0.35
PPlumbing   0.15  0.10  0.10  0.35
And the resulting marginal distribution is given by
Department Civil Electrical Plumbing

Defective items 12 14 14
45
Probability 0.30 0.35 0.35
4.4.3 Conditional Discrete Probability Distributions

Given a joint discrete distribution, one can obtain the conditional distribution of some
variable(s) over the other. We define the case for two variables as follows;
Definition 4.11 If f x, y  is the value of the joint probability distribution of the
discrete random variables X and Y at x, y  , and h y  is the marginal distribution of Y
at y , the function
f x, y 
f x | y   , h y   0
h y 
for each x within the range of X , is called the conditional distribution of X / Y  y ,
correspondingly, the function
f x, y 
f  y | x  , g x   0
g x 
for each y within the range of Y , is called the conditional distribution of Y / X  x .
Example 4.10
Use the table in example 3.8 to obtain the conditional distribution of defective items
given that they come from electrical department.
Solution
We know from example 3.9 that PE   0.35 . Now we need to find PH / E , P A / E 
and PL / E  in order to obtain a complete distribution.
But,
PH , E  0.10
P H / E     0.286
P E  0.35
P A, E  0.20
P A / E     0.571
P E  0.35
PL, E  0.05
P L / E     0.143
P E  0.35
46
Hence, the conditional distribution is given by
Defective/Electrical High Average Low

Defective items 14 16 10
Probability 0.286 0.571 0.143
Once the distributions are obtained, mathematical expectations and standard deviations
are computed in the same way as in univariate cases.
4.4.4 Continuous Joint Probability Distributions

Definition 4.12 A bivariate function with values f x, y  , defined over the xy – plane, is
called a joint probability density function of the continuous random variables X and Y
if and only if
Px, y   D    f x, y dxdy
D
for any region D in the plane.
Theorem 4.6 A bivariate function can serve as a joint probability density function of a
pair of continuous random variables X and Y if its values, f x, y  , satisfy the conditions
1. f x, y   0 for    x  ,    y  ;
 
2.   f x, y  dx dy  1
 
Since the probabilities are values of functions of several variables, answers to various
questions need techniques of multiple integrals.
Example 4.11
Given a function of two random variables X and Y by
3
 x y  x  for 0  x  1, 0  y  2
f  x, y    5

0 elsewhere
(a) Show that the given function is a joint probability density
 1 
(b) Find P 0  x  , 1  y  2 
 2 
47
Solution
(a) We see that for each point x, y  , f x, y   0 , we now need to show that
1 2
  5x y  x  dy dx  1
3
0 0
But,
1 2 1 2
3xy 2 3x 2 y
3
0 0 5 x  y  x dydx  0 10  5 dx
0
1
 6x 6x 2 
     dx
0 
5 5
1
3x 2 2 x 3 3 2
      1 
5 5 0 5 5
0.5 2 0.5 2
  3xy 2 3x 2 y
1 5 x y  x dydx 
1 3
(b) P  0  x  , 1  y  2  
 2 

0

0
10

5 1
dx
0.5
 12 6 3 3 
   10 x  5 x  x  x 2  dx
2
0
10 5 
0.5
9 1 3
   10 x  x  dx
2
0
5 
0.5
9 2 1 3 9  1  1  1  11
 x  x      
20 5 0 20  4  5  8  80
Definition 4.13 If X and Y are continuous random variables, the function

y x
F x, y   P X  x, Y  y     f s, t  ds dt for    x  ,    y  
 
where f s, t  is the value of the joint probability density of X and Y at s, t  is called
the joint distribution function of X and Y.
In order to get the joint density from the joint distribution function, we apply the mixed
partial derivative, such that
48
 2 F  x, y 
f  x, y  
x y
Example 4.12
The joint density of X and Y is given by
 x  y for 0  x  1, 0  y  1
f x, y   
0 elsewhere
Find the joint distribution function of these random variables.
Solution
In order to obtain the joint distribution function, we need to consider different cases as
shown in the following diagram
III IV
I II
x
1
1. When x  0 , y  0 ,
In this case, it follows immediately that F x, y   0
2. When 0  x  1, 0  y  1 (Region I). In this case we get
49
y x
F x, y     s  t  ds dt  2 xy x  y 
1
0 0
3. When x  1, 0  y  1 (Region II). In this case we get

y 1
F x, y     s  t  ds dt  2 y y  1
1
0 0
4. When 0  x  1, y  1 (Region III), we get

1 x
F x, y     s  t  ds dt  2 xx  1
1
0 0
5. Finally, we consider the case when x  1, y  1 (Region IV). Here we have

1 1
F x, y     s  t  ds dt 1
0 0
The resulting joint distribution is given by

0 for x  0 , y  0
1
 xy  x  y  for 0  x  1, 0  y  1
2
 1
F x, y    y  y  1 for x  1, 0  y  1
2
1
 2 xx  1 for 0  x  1, y  1

1 for x  1, y  1
Example 4.13
Given the joint distribution function by
 
 1  ex 1  e y
F x, y   
 for x  0 , y  0
0 elsewhere
Find the joint probability density of the two random variables X and Y and hence find
P1  X  3 ,1  Y  2
Solution
Using partial differentiation we get
50
 2 F  x, y    
f  x, y  
x y
 
x  y

1  e  y  e x  e x e  y 

 y

x

e  e x e  y 
 e x e  y
 e  x  y 
Therefore, the joint density is given by
e  x  y  for x  0 , y  0
f x, y   
0 elsewhere
Hence,
2 3
P1  X  3 , 1  Y  2    e  x  y  dx dy
1 1
2 3
 x  y 
 e dy
1 1
 
2
   e  y 3   e  y 1 dy
1
 e  y 3  e  y 1
2
1
5 3 4
e e e  e 2
 0.074
4.4.5 Marginal Density Functions

Definition 4.14 If X and Y are continuous random variables and f x, y  is the value of
their joint density at x, y  , the function given by

g x    f x, y  dy for    x  

is called the marginal density of X. Correspondingly, the function given by


h y    f x, y  dx for    y  

is called the marginal density of Y.
Example 4.14
The joint density of X and Y is given by
51
 x  y for 0  x  1, 0  y  1
f x, y   
0 elsewhere
Find the marginal densities of X and Y .
Solution
The marginal density of X is given by
1 1
g x    x  y  dy  xy  y 2 2 x  1
1 1

0
2 0
2
Written as
1
 2 x  1 for 0  x  1
g x    2

0 elsewhere
Similarly, the marginal density of Y is given by
1 1
h y    x  y  dx  x 2  xy  2 y  1
1 1
0
2 0
2
And is written as
1
 2 y  1 for 0  y  1
h y    2

0 elsewhere
4.4.6 Conditional Density Functions

Definition 4.15 If f x, y  is the value of the joint density of continuous random
variables X and Y at x, y  , and h y  is the value of the marginal density of Y at y such
that h y   0 , the function given by
f x, y 
f x / y   for    x  
h y 
is called the conditional density of X given Y  y . Correspondingly, if g x   0 is the
value of the marginal density of X, the function given by
f x, y 
f  y / x  for    y  
g x 
is called the conditional density of Y given X  x .
52
Example 4.15
With reference to example 3.14, find the conditional density of X and use it to evaluate
 1 1
P X  , Y  
 2 4
Solution
From the definition, the conditional density of X is given by
f  x, y  x y
f x / y   
h y  1
2 y  1
2
Written by
 x y
 1 for 0  x  1
f x / y    2 y  1
2
 0 elsewhere
 1 1  1
Before we evaluate P X  , Y   , we first find f  x / Y  
 2 4  4
But,
1
x
 1
 4 x  1
4 1
f x/  
 4  1  2  1  1 3
 
2 4 
It follows that
1
1
 1
  11 1 1
2
P x  , Y     4 x  1 dx  2 x 2  x
1 1 1 2
   
 2 4 0 3 3 0 3 2 2 3
Expected values and standard deviations are computed in similar fashion as for single
variable cases.
4.5 Covariance, Independence and Linear Combinations of Random Variables

4.5.1 Covariance
Covariance is a measure of how two random variables change together. If one random
variable varies directly to the other, we get a positive covariance. However, if they vary
53
inversely, they give a negative covariance. We can precisely define the covariance as
follows;
Definition 4.16 If X and Y are two random variables, their covariance denoted by
COV  X , Y  or   X , Y  is given by
COV  X , Y   E X  E X Y  EY   E XY   E X EY 
Where E  XY  is a joint expectation, and E  X  and E Y  are marginal expectations of
X and Y respectively.
For discrete random variables we define E  XY  as
E  XY    xyf x, y 
x y
And for continuous case we have

 
E  XY     xyf x, y  dx dy
 
Example 4.16
Two machines A and B are expected to produce identical items which categorized as
high and standard quality. Their joint probability distribution is given in the following
table
Items by machine (X)

A (8) B (7)
Items by H (10) 6/15 4/15
quality (Y) S (5) 2/15 3/15
Find the covariance between the number of items produced by the machines and their
qualities.
Solution
The joint distribution enables us to directly compute the joint expectation such that
E  XY    xyf x, y   8  10 
6 4 2 3 189
 7  10   8  5   7  5    63
x y 15 15 15 15 3
In order to find the marginal expectations we first have to obtain the marginal
distribution for each variable.
The marginal distribution of number of items produced by the machines (X) is given by
54
Number of items by machines (X) 8 7

Probability 8/15 7/15
The marginal expectation of X is given by
E  X    xf x   8 
8 7 113
 7 
x 15 15 15
The marginal distribution of number of items by quality (Y) is given by
Number of items by quality (Y) 10 5

Probability 10/15 5/15
The marginal expectation of Y is given by
E Y    yf  y   10 
10 5 25
 5 
y 15 15 3
Therefore, the covariance is given by
 X ,Y  
189 113 25 567 565 2
   
3 15 3 9 9 9
Comment Positive covariance indicating a direct relationship between the machines
and the quality of products.
4.5.2 Independence
Definition 4.17 Two jointly distributed random variables are said to be independent if
E XY   E X EY 
It follows immediately from the above definition that if two jointly random variables
are independent, their covariance is zero.
4.5.3 Linear Combinations of Random Variables

It appears frequently in statistics that a given random variable is expressed a linear
combination of two or more random variables, that requires finding of its mean and
variance or the standard deviation.
Definition 4.18 If X 1 , X 2 ,, X n are random variables and a1 , a2 ,, an are constants,
the sum
55
n
Y   ai X i
i 1
is called the linear combination of random variables
Theorem 4.7 If X 1 , X 2 ,, X n are random variables with a linear combination

n
Y   ai X i
i 1
Where a1 , a2 ,, an are constants, then
E Y    ai E  X i  and Var Y    ai2Var  X i   2 ai a j CovX i , X j 

n n
i 1 i 1 i j i j
Corollary 4.1 If X 1 , X 2 ,, X n are independent random variables such that

n
Y   ai X i
i 1
Where a1 , a2 ,, an are constants, then

n
Var Y    ai2Var  X i 
i 1
By considering only two random variables, we have

Y  aX  bY
with EY   aE X   bEY  and Var Y   a 2Var  X   b 2Var Y   2abCov X , Y 
For three random variables, a linear combination becomes

Y  a1 X 1  a2 X 2  a3 X 3
The expected value of Y is given by
EY   a1 E X 1   a2 E X 2   a3 E X 3 
The variance of Y is given by
Var Y   a12Var  X 1   a 22Var  X 2   a32Var  X 3   2a1a 2 Cov X 1 , X 2 
 2a1a3Cov X 1 , X 3   2a 2 a3Cov X 2 , X 3 
If we three or more random variables, it is more convenient to use matrix approach to
compute mean and variance of a linear combination.
If we denote the constants and expected values by column matrices as
56
 a1   EX 1 
   
a     and E  X     
a   E  X 
 n  n 
Then the expected value of the linear combination

 EX 1  
n  
Y   ai X i is given by E Y   a' E  X   a1  a n   
i 1  E  X 
 n 
Similarly, by defining a variance covariance matrix by

 Var  X 1  Cov X 1 , X 2   Cov X 1 , X n  
 
 Cov X 2 , X 1  Var  X 2   Cov X 2 , X n 
M  
   
 
 Cov X , X  Cov X , X   Var  X n  
 n 1 n 2
The matrix M is an n  n and symmetric, such that CovX i , X j   CovX j , X i , i  j

The diagonal elements are variances and off-diagonal elements are covariances. Under
this presentation, the variance of a linear combination is given by
 Var  X 1  Cov X 1 , X 2   Cov X 1 , X n  

  a1 
 Cov X 2 , X 1  Var  X 2   Cov X 2 , X n  
Var Y   a' M a  a1  a n    
   
  a n 
 Cov X , X  Cov X , X 
 n 1 n 2  Var  X n   
Example 4.17
Given the random variables X, Y, and Z with means  x  2 ,  y  5 ,  z  2 , variances
 x2  1,  y2  4,  z2  2 and covariances  xy  3,  xz  4 ,  yz  2 . Find the expected

value and variance of W  2 X  3Y  Z .
Solution
Form the linear combination, we have constants a  2, b  3 , c  1
Putting these information in matrices, we have
2  2  1 3 4 
     
a   3  , E  X    5  and M   3 4  2 
  1   2 4  2 2 
     
57
The expected value of W is given by

 2 
 
E W   a' E  X   2 3  1 5   4  15  2  21.
  2
 
The variance of W is given by
1 3 4  2 
  
Var W   a' M a  2 3  1 3 4  2   3 
 4  2 2    1
  
 294 
 
 2 3  1 6  12  2 
 862 
 
7
 
 2 3  1 20   14  60  0  74
0
 
58
Exercises 4
1. A discrete random variable Y has a probability distribution given by

Y -1 0 1 2 4
f(y) 0.10 0.15 0.24 0.27 0.24
(a) Obtain the distribution function of Y.

(b) Find PY  2
(c) Find P0  Y  2
(d) Compute the expected value and the standard deviation of Y
2. The distribution function of a discrete random variable X is given by

0 for x 1
1
 for 1 x  4
3

1
F x    for 4 x6
2
5
6 for 6  x  10


1 for x  10
Find i  P2  X  6 ii  P X  4 iii  P X  5 iv  probability distribution of X

3. A continuous random variable has a p.d.f given by
k e 4 x for x  0
f x   
0 elsewhere
(a) Find (i) the value of k (ii) P1  X  3
(b) Obtain the distribution function of X
(b) Find the mean and standard deviation of X
4. The lifetimes in years of a five-year old building in Kariakoo is a random variable

whose
distribution function is given by
0 for x  5

F x    25
1 for x  5
 x2
Find the probability that such a building will be living
59
(a) Beyond eight years (b) between six and eight years.
CHAPTER FIVE
SPECIAL PROBABILITY DISTRIBUTIONS
5.1 Introduction
We already have visited probability distributions in the general cases. In this chapter we
shall concentrate on special cases of those distributions that are commonly used in daily
life applications. We shall start our discussion with discrete probability distributions
before we go on continuous cases.
5.2 Special Discrete Probability Distributions

5.2.1 Binomial Distribution
In some situations experiments are characterized by only two outcomes success or
failure, such experiments are called binomial experiments. In this case the probability of
success is given by p, and 1 – p is the probability of failure. If this experiment is repeated
n-times with the number of successes x, out of n total outcomes, we define the binomial
distribution as follows:
Definition 5.1 A discrete random variable X is said to have a binomial distribution with
parameters n and p if its probability distribution is given by
 n
P X  x     p x 1  p 
n x
x  0,1, 2,, n
 x
The expected value of this random variable is given by E  X   np , and the standard
deviation is given by SD X   np1  p 
Example 5.1
Produced items are to be inspected and checked whether they are good or defective. The
probability that an item chosen is defective is 0.003. Find the probability that out of 6
items inspected
(a) Only one is defective
(b) At least one is defective
(c) No defective item be found
Solution
60
This is a binomial situation with p  0.003 and n  6 . Note that the success here is the
number of defective items.
(a) Required P X  1 , then
 6
P X  1    0.003 0.997   0.01773
5
1
(b) Required P X  1 , then
 6
P X  1  1  P X  1  1  P X  0  1    0.003 0.997   1  0.997 6  0.01787
0 6
 0
(c) Required P X  0 , then

 6
P X  0    0.003 0.997   0.997 6  0.9821 .
0 6
 0
5.2.2 Poisson Distribution

We may be interested in determining the probability of occurrence of a certain event in a
given period of time. For instance, number of calls per hour, number of road accidents
per year and so on. This situation can be best described by a Poisson distribution.
Definition 5.2 A discrete random variable X has a Poisson distribution with a

parameter  if its probability function is given by
e   x
P X  x   , x  0,1, 2,
x!
Where  , is a known constant called the mean occurrence per unit interval of time.
If X has a Poisson distribution, then E  X    and SD X   
Example 5.2
The number of customers attended at NBC MLIMANI follows a Poisson distribution
with mean 8 per hour. Find the probability that in any given hour
(a) Exactly 6 customers will be attended
(b) No customer will be attended
(c) At least 2 customers will be attended
Solution
61
This is a Poisson situation with   8

(a) Required P X  6 , then
e  8 86
P X  6    0.1221
6!
(b) Required P X  0 , then

e  8 80
P X  0    e 8  0.0003355
0!
(c) Required P X  2 , then

P X  2  1  P X  2  1  P X  0  P X  1
e  8 8 0 e  8 81
 1 
0! 1!
 1  0.0003355  0.002684
 0.9970
5.3 Poisson Approximation to Binomial Distribution

If the number of repetitions n are getting large while the probability of success p is very
small (e.g. 0.005, 0.001 etc.), then the Poisson distribution can be used to approximate a
Binomial distribution. In this case, the approximated distribution becomes
e  x
P X   , x  0,1, ..., n
x!
where,   E X   np also Var  X     np  SD X   np
Example 5.3
The probability of getting a defective tire is estimated at 5%. Find the probability that out
of 50 tires produced,
(a) Exactly one will be defective
(b) At least one will be defective
Solution
Let X be the number of defective produced.
This is a binomial situation with n  50 and p  5%  0.05
Using Poisson approximation to binomial, we have   np  50  0.05  2.5
62
(a) Required P X  1 , but
e 2.5 2.51
P X  1   0.205212
1!
(b) Required P X  1 , but
e 2.5 2.5 0
P X  1  1  P X  1  1  P X  0  1   0.917915
0!
5.4 Special Continuous Probability Distributions

There are several continuous probability distributions, but we will discuss only the
uniform and normal distributions because of their importance in statistical decisions.
5.4.1 Uniform Probability Distribution

This is the simple continuous probability distribution whose applications are on
simulation and parameter estimations.
Definition 5.3 A continuous random variable X is called a uniform random variable if

and only if its probability density is given by
 1
 for   x  
U x;  ,       
 0
 elsewhere
It is important to remember that for a continuous distribution, one can find the probability
of a random variable in a specified interval. This is therefore implying that the probability
of a random variable at a single value is always zero. That is for any value x  X ,
P X  x   0 .
The probability between any two values a and b such that   a  b   is given by
ba
Pa  X  b  
 
Theorem 5.1 The mean and variance of a uniformly distributed random variable are
given by
 
EX   and Var  X      2
1
2 12
Example 5.4
63
A random variable X has a uniform distribution given by

1
 for 2  x  5
f x    3

0 elsewhere
(a) Find the following (i) P X  4 (ii ) P X  3 iii  P1  X  10
(b) Find the expected value and the standard deviation of X.
Solution
Given   3 ,   5 , then,
53 2
a  P X  3  P3  X  5   .
52 3
b P X  4  0 because of a single point .
52
c  P1  X  10  P1  X  2  P2  X  5  P5  X  10  0  0 1.
52
5.4.2 Normal Distribution

We shall discuss a bit more on this distribution
Definition 5.4 A continuous random variable X has a normal distribution with parameters
 and  if its probability density is given by
1  x  2
  
f x  
1  
e 2 ,   x  
 2
If X has a normal distribution, then E  X    and Var  X    2
 
The common notation for a normal random variable is X ~ N  ,  2 . Meaning that a
random variable X is normally distributed with mean  and variance  2
The normal distribution curve has the following properties.

(a) it is bell-shaped ranging from negative to positive infinity
(b) it is symmetric about the arithmetic mean 
(c) the area under the curve is unity
The normal curve is shown in the figure below;
64
f(x)
X

The normal density is very complicated to handle but the shape of the normal curve is
more promising and also interested.
In order to answer probability questions concerning with the normal random variable, X,
we use a standardized distribution with a standard normal variable Z. The values of the
standard normal variable are obtained from standard normal tables which are designed in
different forms.
The standardization is done using the transformation

X 
Z

Where the variable Z is such that E Z   0 and Var Z   1
 X  x  x
For instance, the question P X  x  becomes P   or P Z  
      
5.4.3 Uses of standard normal tables

Consider a standard normal table which gives the probability of the form P0  Z  a  . In
this case most probability questions should be re-phrased in order to satisfy this
inequality.
The following are examples of probability questions based on the afore mentioned form
65
a  PZ  a   0.5  P0  Z  a 
b  PZ  a   PZ  a   0.5  P0  Z  a  Because of symmetry
c  P a1  Z  a 2   P0  Z  a1   P0  Z  a 2 
d  Pa1  Z  a2   P0  Z  a 2   P0  Z  a1 
Example 5.5
Use the standard normal table to evaluate the following probabilities
a P0  Z  1.25 b PZ  2.21 c P 3.01  Z  0.5 d  P1.0  Z  1.4
Solution
a  P0  Z  1.25  0.3944
b  PZ  2.21  0.5000  P0  Z  2.21  0.5000  0.4864  0.9864
c  P 3.01  Z  0.5  P0  Z  3.01  P0  Z  0.5  0.4987  0.1915  0.6902
d  P1  Z  1.4  P0  Z  1.4  P0  Z  1.0  0.4192  0.3413  0.0779
Example 5.6
A random variable X has a normal with mean 12 and variance 16. Find the following
probabilities
a P X  10 b P X  10 c P8  X  14 d  P X  15
Solution
Given   12,  2  16    4
66
 X   10  12 
a  P X  10   P    PZ  0.5  0.5  P0  Z  0.5
  4 
 0.5  0.1915  0.6915
 X   10  12 
b  P X  10  P    PZ  0.5  PZ  0.5
  4 
 0.5  P0  Z  0.5  0.5  0.1915  0.3085
 8  12 X  12 14  12 
c  P8  X  14   P   
 4 4 4 
 P 1  Z  0.5  P0  Z  1.0  P0  Z  0.5
 0.3413  0.1915  0.5328
 X  12 15  12 
d  P X  15  P    PZ  0.75  0.5  P0  Z  0.75
 4 4 
 0.5  0.2734  0.2266
Example 5.7
The incomes in thousand of dollars of a given company are normally distributed with
mean 20 and the standard deviation of 5. Find the probability that a selected income
will be
(a) More than twenty four thousand dollars.
(b) Anywhere between eighteen and twenty five thousand dollars.
Solution
Let X be a random variable representing an income of a given company
In this we have   20 and   5
(a) Required P X  24 , then
67
 24  20 
P X  24  P Z    PZ  0.8
 5 
 P X  24  0.5  P0  Z  0.8  0.5  0.2881  0.2119

(b) Required P18  X  25 , then
 18  20 25  20 
P18  X  25  P Z 
 5 5 
 P 0.4  Z  1.0 
 P18  X  25  P0  Z  0.4  P0  Z  1.0  0.1554  0.3413  0.4967
5.5 Normal approximation to Binomial and Poisson Distributions

In this section we shall see the situations whereby discrete probability distributions of
binomial and Poisson are approximated to normal. Since these are discrete distributions,
the concept of continuity corrections should be considered.
5.5.1 Approximating Binomial Distribution

We have seen that the binomial distribution is applied if there are only two occurrences
success and failure. It is however, suitable for small number of trials (sample size) n. For
large sample sizes such that both np  5 and npq  5 , a normal distribution becomes a
good approximate of the binomial situation. In this case,   np and  2  npq implying
that   npq . Hence, the standard normal variable, Z , becomes
68
X  X  np
Z 
 npq
Since binomial distribution is a discrete distribution while the normal is continuous there
is a need of continuity correction. This is done by adding or subtracting 0.5 to the number
of successes depending on the nature of the inequality. See the following table for
illustrations;
Discrete case Continuity corrected case

P X  6  P X  6.5
P X  6  P X  5.5
P X  6  P5.5  X  6.5
P4  X  8 P3.5  X  8.5
Example 5.8
The probability of defective tire is 8%. Find the probability that out of 400 tires
inspected, at most 20 will be defective.
Solution
Given p  8%  0.08 and n  400 implying that
  0.08  400  32 ,   400  0.08  0.92  5.43
Then,
 20.5     20.5  32 
P X  20  P X  20.5  P Z    P Z    PZ  2.12
    5.43 
Therefore, PZ  2.12  0.5  P0  Z  2.12  0.5  0.4830  0.0170
5.5.2 Approximating Poisson distribution

We have seen that the Poisson distribution is applied if we have the average numbers of
occurrences in a specified interval of time,  . It is however suitable for small values of
 and small number of trials (sample size) n. For large sample sizes and if   5 , the
normal distribution becomes a good approximate of the Poisson situation. Similar to
binomial, large sample sizes make the computation of probabilities using the Poisson
69
formula very tedious. In this case,    and  2   implying that    . Hence, the
standard normal variable, Z , becomes
X  X 
Z 
 
Example 5.9
In a certain automobile plant, the number of work stoppages per day due to equipment
problems in a production process is 12. What is the probability of having less than 15
stoppages in any working day?
Solution
Let X be the number of stoppages during a production process.
This is a Poisson situation with   12 and n  15 .
Using normal approximation to Poisson, we have   12,   12  3.46
Then,
 14.5  12 
P X  15  P X  14.5  P Z    PZ  0.72
 3.46 
Therefore, PZ  0.72  0.5  P0  Z  0.72  0.5  0.2642  0.7642
5.6 Sampling Distributions

5.6.1 Introduction
We start our discussion by defining several concepts used in the study of sampling.
Definition 5.5 Sampling is the technique used to select the individual members of a
population to make a sample.
Definition 5.6 A population is a set of all individuals under consideration.

70
Definition 5.7 A sample is the subset of a population.
Definition 5.8 A random sampling is the sampling procedure in which each member of a
population has a known chance of being selected.
The most commonly used sampling procedure is a simple random sampling which is
defined as follows.
Definition 5.9 Simple random sampling is a sampling technique whereby each
individual member of a population has an equal chance of being selected.
We are going to discuss the most applicable sampling distributions. These include the
distribution of the arithmetic mean, Chi-square distribution, student’s t – distribution and
F – distribution.
5.6.2 Distribution of the arithmetic mean X

 
Let X ~ N  ,  2 . In this case we need to find the expected value and variance of X . i.e
E X  and Var X  given that

1
X 
n
 X i where n is the sample size.
Now,
1  1
E X   E   X i    E  X i  , by the properties of E and Σ operators
n  n
Let E  X i    for any i , then
1 n
1
 E  X i   
1
  n    , since  is constant,
n n i 1 n
Therefore, E X   
1  1
Var X   Var   X i   2 Var  X  , by the properties of Var and Σ
i
n  n
operators
Let Var  X i    2 for each i and that X i ' s are independent, then
2
Var X   2
n
1 1
n

i 1
2
  2 n 
n
2
n
71
 2 
 
It is concluded that if X ~ N  ,  2 then X ~ N   , 
 n 
The square root of the variance of X is called the standard error of X , given by

SE X   Var X  
n
Hence the standardized normal variable Z in this case is given by

X  x X 
Z  
SE X  
n
Example 5.10
A random variable X is normally distributed with mean 8 and variance 25. A sample of
36 observations yields x  7.5 . Find
(a) The standard error of this sampling distribution.
(b) The probability that the sample mean is greater than 9.
Solution
The following information are given
  8 ,  2  25 , n  36 , x  7.5
(a) We need to find SE X  , but

SE X  
5 5
   0.833
n 36 6
(b) We need to find PX  9 , then
 x   98 
PX  9  P Z    P Z    PZ  1.20
 SE X    0.833 
But,
PZ  1.20  0.5  P0  Z  1.20  0.5  0.3849  0.1151
5.6.3 The Chi – Squared Distribution for sampling

72
Definition 5.9 If s 2 is the variance of the sample of size n taken from a normal
population of mean  and variance  2 , then the random variable C 

n  1s 2 has a
2
chi-square distribution with n-1 degrees of freedom.
Degrees of freedom can be defined as the number of free scores minus the number of
parameters used in the particular distribution.
Chi-square distribution is used in estimation as well as testing for hypotheses concerning
with the population variance. These concepts are discussed in the next chapters.
5.6.4 The t-Distribution

Definition 5.10 If C and Z are independent random variables, C has a chi-square
distribution with d  n  1 degrees of freedom, and Z has a standard normal distribution,
then the ration
Z
T
C
d
has a t-distribution with d degrees of freedom.
Given that Z 
X 
and C 
n  1s 2 , then the formula for T is given by
 2
n
T
X 

n  1 s 2
 n  1 2
n
X  s
 
 
n
X  
 
 s
n
X 

s
n
Therefore,
X 
T ~ t n  1
s
n
73
This distribution is highly used in estimating and hypotheses testing of population means
in cases whereby the population variances are not known and the sample sizes are small.
In most statistical applications, a sample is considered to be small if its size n  30 .
5.6.5 The F Distribution

Another distribution that plays an important role in connection with sampling from
normal populations is the F distribution, named after Sir Ronald Fisher, one of the most
prominent statisticians in the world. Originally, it was studied as the sampling
distribution of the ratio of two independent random variables with chi-square
distributions, each divided by its degrees of freedom, the structure that will be presented
and applied in this manual. We therefore ignore probability density of the F distribution.
Definition 5.11 If U and V are independent random variables having chi-square

distributions with d1  n1  1 and d 2  n2  1 degrees of freedom, then the ratio
U
d1
F
V
d2
is a random variable having an F distribution with d 1 and d 2 degrees of freedom.
The F distribution is used in the statistical inference concerning the variances of two
independent normal populations.
We know that
n1  1s12 n2  1s22
U and V 
 12  22
Then, under the assumption that the variances of the two populations are equal, that is
 12   22 we have the following result
s12
F ~ F n1  1, n2  1
s 22
Applications of the above distributions are discussed in the next chapters.
74
Exercises 5
1. Show that if X has a binomial distribution with parameters n and p then
a E X   np b Var  X   np1  p
2. The probability is 0.23 that a car stolen in Dar es Salaam will be recovered. Find the
probability that out of 8 cars stolen
(a) More than three will be recovered.
(b) At least two will be recovered.
3. If X is a discrete random variable having Poisson distribution. Show that its mean and
variance is  , where  is constant.
4. The probability is 0.002 that a manufactured item from a certain engineering firm is
defective. Find the probability that out of 400 items manufactured by the firm
(a) Exactly 10 will be defective;
(b) At most 2 will be defective.
5. Given that Z ~ N 0, 1 , evaluate the following probabilities
a  P 2.14  Z  2  b P 1.7  Z  0.8556 c  P 3.4  Z  0.65
d  PZ  2 e PZ  1.67
6. Given that X ~ N 8, 9 , evaluate the following probabilities
a P X  7 b P X  10 c P6  X  12
7. A random variable has a normal distribution with   10. If the probability that the
random variable will take on a value less than 82.5 is 0.8212, what is the probability
that it will take on a value greater than 58.3?
8. Suppose that the actual amount of instant coffee that a filling machine puts into a 6g
cane is a random variable having a normal distribution with standard deviation of
0.05g. If only three percent of these canes are to contain less that 6g of coffee, what is
the mean fill of these canes?
9. Suppose that 23 percent of all patients with high blood pressure have bad side effects
from a certain kind of medicine. Find that probability that among 120 patients treated
with this medicine,
(a) More that 32 will have bad side effects.
(b) At most 50 will have bad side effects.
10. The fuel consumption of a certain type of machines is approximately normal with
mean 2.4
litres per hour and the standard deviation of 0.4. Using a random sample of size 32
machines, find the probability that the mean fuel consumption of all machines is at
least 2.2
75
litres per hour.
CHAPTER SIX
ESTIMATION
6.1 Introduction.
In practice it is not always possible to work with the whole population and determine the
desirable statistical measures, like mean and standard deviation. This is because the
populations might be infinite or very expensive to work on it.
Estimation involves sampling techniques whereby findings from those samples are used
to represent the whole population. Estimators are the formulas used to estimate the
population parameters.
A good estimator must have the following properties
(a) Unbiasedness
(b) Efficiency
(c) Consistency
Unbiasedness

An estimator ˆ for a population parameter  is said to be unbiased if E ˆ   . The

quantity E ˆ   is called the bias of  .
Efficiency
This property is used to compare the efficiency of one estimator over the others in
estimating the same population parameter  . The estimator with this property is also
known as MVUE (Minimum Variance Unbiased Estimator). This property is described
as follows
Let ˆ and ˆ be two unbiased estimators for  , then ˆ is said to be more efficient over
1 2 1
than ˆ2 if
   
Var ˆ1  Var ˆ2
Consistency
An estimator ˆ for  is said to be consistent if both its bias and variance tend to zero
when the sample size approaches infinity.
There are two types of estimation.

(a) Point estimation
76
(b) Interval estimation
6.2 Point estimation

This is the technique of estimating population parameters using single valued
statistics/estimators. Commonly used approaches in point estimations are Maximum
Likelihood and the Method of moments. We shall not discuss these approaches.
6.3 Interval Estimation

This is a technique that applies probability theory to determine an interval for which a
true value of a population parameter lies. This involves construction of confident
intervals of population parameters for different situations and different levels of
significance.
6.4 Confident Interval Estimate for a Population Mean when the Variance is known.
In this situation, the distribution used is Z, and hence the formula for 1   100%
confidence interval estimate for  is given by

x  Z 2
n
Example 6.1
A population is known to have a variance of 81. A random sample of size 16 showed
that x  10.5 . Estimate the population mean by means of 95% confidence interval.
Solution
Given  2  81    9, n  16, x  10.5,   5%  0.05
Then, 95% confidence interval is given by
 9
x  Z 2  10.5  Z 0.025  10.5  (1.96) (2.25)  10.5  4.41
n 16
6.5 Confidence Interval Estimate for a Population Mean when Variance is unknown
There are two situations describing unknown variance.
1. Large sample size n  30
2. Small sample size n  30
6.5.1 Large Sample Size

77
If the sample size is large, the unknown population variance is replaced by sample
variance. The distribution used is still Z. The formula for 1   100% confidence
interval estimate for  is given by
s
x  Z 2
n
Example 6.2
Let X be a normal random variable representing the value of individual invoices (in
dollars) issued by a certain firm. Suppose that  and  are unknown. A random sample
of 49 invoices selected, showed that x  520 and s  91. Compute 95% confidence
interval estimate for 
Solution
Given n  49, x  520, s  91,   5%
Since the sample size is large, the distribution used is Z, and hence the formula for 95%
confidence interval for  is
s
x  Z 2
n
But Z 2  Z 0.025  1.96 , so we have
 520  1.9613  520  25.48

s 91
x  Z 2  520  1.96
n 49
6.5.2 Small Sample Size

If the sample size is small and the population variance is unknown, the standard normal
variable Z is no longer used. In this case a suitable distribution is t-distribution with n  1
degrees of freedom. Degrees of freedom are a parameters attached to the t-distribution
whose derivation is not discussed here! The formula for 1   100% confidence interval
estimate for  is given by
s
x  t 2, n1
n
Where n  1 is called the degrees of freedom of the t– distribution.
Example 6.3
Repeat example 5.2 with sample size 25.
Solution
78
Given n  25, x  520, s  91,   5%

Since the sample size is small, the distribution used is t, and hence the formula for 95%
confidence interval for  is
s
x  t 2, n1
n
But t 2, n1  t 0.025, 24  2.064 , then we have
 520  2.06418.2  520  37.56

s 91
x  t 2, n1  520  2.064
n 25
6.6 Estimation of Difference between Means when Population Variances are known
Let X and Y be two normally distributed random variables representing two populations,
and let n x and n y be respective sample sizes. Then, the formula for
1   100% confidence interval estimate for the difference between means  x   y is
given by
 x2  y2
x  y   Z  2 
nx ny
Example 6.4
Random variables X and Y are normally distributed with standard deviations
 x  1.2 and  y  0.9 ; random samples of observations on both variables, each of size
32, provide the following information x  4.1 and y  3.5 . Estimate the difference
between population means by means of a 95% confidence interval.
Solution
Since the populations variances are known, the distribution used is Z, and hence the
formula for 95% confidence interval estimate for  x   y is given by
 x2  y2
x  y   Z  2 
nx ny
Where Z 2  Z 0.025  1.96 , then the confidence interval is
79
 x2  y2 1.2 2 0.9 2
x  y   Z  2   4.1  3.5  1.96 
nx ny 32 32
 0.60  1.960.2652
 0.60  0.52
6.7 Estimation of the Difference between Means when Population Variances are
unknown
In similar fashion as the population mean, two cases are considered in this situation.
1. Sample sizes are both large, n1  30 and n2  30 .
2. At least one of the samples is small.
6.7.1 Large Sample Sizes
Similarly, in this case population variances are replaced by sample variances and the
distribution used is Z. Hence the formula for 1   100% confidence interval estimate for
the difference between means  x   y is given by
2
s x2 s y
x  y   Z  2 
nx n y
Example 6.5
A utility company used to send out monthly statements to its customers without
addressed return envelopes. From a random sample of 120 customers it was determined
that, on average, it took 9 days for a payment to be made, with a sample standard
deviation of 2 days.
Wishing to speed up receipt of payment, pre-addressed return envelopes were
subsequently included with the invoices. An independent sample of 130 customers
indicated that average payment time fell to 8 days, with a sample standard deviation of
2.2 days.
Compute a 95% confidence interval estimate for the difference between population
means.
Solution
Let X represent the invoices sent without addressed return envelopes.
Let Y represent the invoices sent with pre-addressed return envelopes.
80
nx  120, x  9, s x  2 , n y  130, y  8, s y  2.2
Since the sample sizes are large, the distribution used is Z, population variances are
unknown but are replaced by the corresponding sample variances and hence the formula
for 95% confidence interval is given by
2
s x2 s y
x  y   Z  2 
nx n y
Where Z 2  Z 0.025  1.96 , so we have
2
s x2 s y 2 2 2.2 2
x  y   Z  2   9  8  1.96 
nx n y 120 130
 1.0  1.960.2656
 1.0  0.52
6.7.2 Small Sample Sizes

In this case estimation is done under the assumptions that samples are drawn from two
independent populations (say X and Y) and that these populations have a common
variance. i. e.  x2   y2   2 . This common variance is then estimated by a common
sample variance called a pooled sample variance s 2p , such that
n x  1 s x2  n y  1s y2
s 
2
nx  n y  2
p
The distribution used for estimation is t – distribution with n x  n y  2 degrees of

freedom. Hence the formula for 1   100% confidence interval estimate for the
difference between means  x   y is given by
s 2p s 2p
 x  y   t 2 , n  n  2 
1 2
nx ny
Example 6.6
Repeat example 5.5 with sample sizes n x  19 and n y  25
Solution
81
nx  19, x  9, s x  2 , n y  25, y  8, s y  2.2

Since the population variances are unknown and the sample sizes are small, the
distribution used is t-distribution and the population variances are replaced by sample
variances, and hence the formula for 95% confidence interval is
s 2p s 2p
 x  y   t 2 , n  n  2 
1 2
nx ny
We first compute s 2p using the formula
n x  1 s x2  n y  1s y2 19  1.2 2  25  1 2.2 2 188.16

s 
2
   4.48
nx  n y  2 19  25  2
p
42
The value of t 2 , nx n y 2  t 0.025, 42  2.021

Therefore the required confidence interval is
s 2p s 2p
 x  y   t 2 , n  n  2   9  8  2.021
4.48 4.48

1 2
nx ny 19 25
 1.0  2.0210.6442
 1.0  1.30
Other formulas for confidence interval estimations are summarized in the following table
Type of estimation Confidence interval formula

Population proportion,  p1  p 
  p  Z
2 n
Deference between two p1 1  p1  p 2 1  p 2 
 1   2   p1  p2   Z  
population proportions,  1   2 2 n1 n2
Population variance,  2 n  1s 2 2 

n  1s 2
 2 , n 1
12 , n 1
2 2
Ratio of two population s12  12 s12

F n2  1, n1  1
1
  
 12 s 22 F n1  1, n2  1  22 s 22 2
variances, 2
 22
82
Example 6.7
In a random sample of 500 families owning television sets in a certain city, it is found
that 340 have not yet subscribed to a newly introduced digital transmission system. Find
a 95% confidence interval for the actual proportion of all families in the city who have
not yet subscribed to the system.
Solution
The point estimate of p is pˆ  340 500  0.68. For 95% confidence, we have   0.05 ,
then, Z   Z 0.025  1.96 . Therefore, 95% confidence interval for p is
2
0.68  1.96
0.680.34  p  0.68  1.96 0.680.34
500 500
 0.64  p  0.72
Example 6.8
The following are weights, in decagrams, of 10 packages of grass seed distributed by a
certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. Find a 95%
confidence interval estimate for the variance of all such packages of grass seed
distributed by this company, assuming the normal population.
Solution
We first compute sample variance of this data as follows;
1  2
  xi2   x     21,273.12  641.22   0.286
1 1 1
s2   
n 1 n  9 10 
For 95% confidence interval, we have   0.05 , then
 2 n  1   02.025 9  19.023 and 12 n  1   02.975 9  2.700
2 2
Therefore, the 95% confidence interval estimate for  2 is given by

90.286   2  90.286
19.023 2.700
 0.135    0.953
2
83
Exercises 6
1. A study of the annual growth of a certain kind of fish showed that 64 of them selected
at
random in a lake, grew on the average of 52.80 mm with a standard deviation of 4.5
mm. Estimate the true average annual growth of all fish in the lake by means of 99%
confidence interval.
2. Independent random samples of size n1  16 and n2  25 from normal populations
with  1  4.8 and  2  3.5 have means x1  18.2 and x2  23.4 . Construct 90%
confidence interval for 1   2
3. Repeat Q2 when the population variances are unknown but the samples result with
s1  5.0 and s 2  4.0
4. 32 out of 500 items produced by a certain engineering firm are found defective.
Estimate the true proportion of all defective items by means of 95% confidence
interval.
5. 14 out of 120 items from machine A are defective and only 8 out of 70 produced by
machine B are defective. Construct the 99% confidence interval for the true
difference between proportions.
6. The following are the heat-producing capacities of coal from two mines (in millions of
calories per ton)
Mine A : 8500 8330 8480 7960 8030
Mine B : 7710 7890 7920 8270 7860
Assuming that the data constitute independent random samples from normal
populations. Construct 90% confidence interval for the ratio of their variances.
84
CHAPTER SEVEN
HYPOTHESES TESTING
7.1 Introduction
A statistical hypothesis is an assertion made about the distribution or value of a given
random variable. A hypotheses testing is a technique involving a set of rules to be
followed in order to make a decision of choosing one of the two conflicting
hypotheses/claims. These conflicting hypotheses are referred to as null and alternative
hypotheses. A null hypothesis is a statement that is considered to be true unless it has
been tested. It is denoted by H 0 . While an alternative hypothesis is the statement which
may be considered to be true only if the null hypothesis is not true. It is denoted by H 1 .
7.2 Type I and Type II Errors

In making a statistical decision, one can a correct decision or commit an error. There are
two types of errors. These are Type I and Type II errors. Type I error is the one
committed when a true null hypothesis is rejected. Type II error is the acceptance of a
null hypothesis which is false.
The size of type I and type II errors can be computed according to the decision rules.
Example 7.1
A certain machine produces ball bearing whose variable diameter is 0.0025. A random
sample of 49 bearings gives the sample mean of 5.01mm. One wishes to test the null
hypothesis that the mean diameter is 5.00mm against the alternative that the mean
diameter is 5.035mm. If the decision rule says: Accept the null hypothesis if mean
diameter is less than 5.02mm and reject it otherwise. Find
(a) The size of type I error
(b) The size of type II error
Solution
From the given information we have  2  0.0025 , n  49, x  5.01
We need to test the following hypotheses H0 :   5 .00 against H1 :   5.035
The decision rule is: Accept H 0 , if x  5.02 . Reject H 0 , if x  5.02
Since the population variance is known, then the statistic used is Z such that
85
x
Z

n
(a) From the definition,
Size of typeI error  PReject H 0 / H 0 is true
 PX  5.02 /   5.00
 
 5.02  5.00 
 P Z  
 0.05 
 49 
 PZ  2.8
 0.0026
(b) Similarly,
Size of typeII error  PAccept H 0 / H 0 is false 
 PX  5.02 /   5.035
 
 5.02  5.035 
 P Z  
 0.05 
 49 
 PZ  2.1
 0.0177
The testing procedures include choosing a suitable test statistic (a random variable whose
distribution is known whenever H0 is true) and dividing its values into two regions known
as rejection and non-rejection regions. This partitioning is done at the critical value(s).
The size of the rejection region is just the probability of committing Type I error. This
probability is also known as the level of significant. A critical value is the value of the
random variable whose area is equal to the level of significant.
Define 1   as the power of the test, where  is the probability of committing Type II
error.
There are two types of testing procedure. These are one-sided/tailed and two-sided/tailed
tests.
It is customary to represent the null hypothesis by equality sign and the alternative ones
by inequalities. Suppose that H 0 :   0 , then the test will be one-sided if the
86
alternative hypothesis will take the form H1 :   0 or H1 :   0 . There will be

a two-sided test if the alternative hypothesis will take the form H1 :   0
The test procedures are summarized below:
1. Identify H 0 , H1 and 
2. Choose the appropriate test statistic and compute its value
3. Using the given level of significance, find the critical value(s) and specify the
rejection and non-rejection regions of the distribution. Allocate also the value of
the test statistic in the distribution.
4. Make a statistical decision based on where the value of the test statistic falls
5. Give a managerial decision or conclusion or comment
7.3 Testing for Population Mean  when  is known

We are testing H 0 :   0 against one of the alternative hypotheses.
x
The test statistic here is Z such that Z  where n is the sample size

n
Example 7.2
A random sample of size 16 was taken from a normal population of variance 4. The information
from the sample showed that x  9.8 . Test H 0 :   9.0 against the alternative H1 :   9.0 at 5%
level of significant.
Solution
Given: n  16, x  9.8 ,
1. Hypotheses:
H 0 :   9.0 ; H1 :   9.0 ,  = 0.05
2. Test statistic:
Since  is known, the test statistic is Z such that
x 9.8  9.0 0.8
Z    1.60
 2 0.5
n 16
3. Critical values and regions:
The critical value for one-sided test with   0.05
is Z 0.05  1.645 . The distribution is then divided as follows
87
Non-rejection region
Rejection region
0 1.645
1.60
4. Decision
Since the value of the test statistic falls within non-rejection region we do not reject H0 at
5% level of significant.
7.4 Testing for  when  is unknown

There are two cases in this situation
Case I: When the sample size is large n  30
In this case the test statistic is again Z but  is replaced by the sample standard deviation
x
s and it is given by Z 
s
n
Case II: When the sample size is small n  30
In this case the Z statistic will be no longer used. The appropriate test statistic is T where
x
T has a t-distribution with n-1 degrees of freedom and is given by T 
s
n
Example 7.3
Let X be a normally distributed random variable with unknown mean and variance.
Information obtained from a sample of size 20 showed that x  9.5 and s  3 . Test the
null hypothesis that the true mean is 11 against the alternative that it is different from 11
at 5% level of significance.
88
Solution
Given: n  20 , x  9.5 and s  3
1. Hypotheses: H 0 :   11 against H 1:   11   0.05
2. Test statistic: Since  is unknown and the sample is small, the test statistic will
be T.
x   9.5  11
Thus T    2.24
s 3
n 20
3. The critical values for two-sided test are  t 0.025,19   2.093
Rejection region non-rejection region rejection region
-2.093 0 2.093
-2.24
4. Decision: Since the value of the test statistic falls within the rejection region, we
reject H 0 is favour of H 1 at 5% level of significance.
7.5 Testing for the Difference Between two Means when Variances are known
We test for H 0 : 1   2  c against one of the alternative hypotheses H1 : 1   2  c
or H1 : 1   2  c or H1 : 1   2  c
x1  x2   c
The test statistic is Z such that Z 
 12  22

n1 n2
7.6 Testing for the Difference Between two Means when the Variances are unknown.
There are two cases to consider under situation.

Case 1: When both of the samples are large n1  30 and n2  30
89
In this case the test procedure is similar to previous situation except that the population
variances  i2  are replaced by sample variances si2  for i  1, 2
The test statistic is thus Z 

x1  x2   c
s12 s 22

n1 n2
Case 2: When either one or both of the samples are small.
As in estimation, in this case we assume that the variances are common and can be
estimated by pooled sample variance s where s  2 2 n1  1s12  n2  1s22

n1  n2  2
p p
The test statistic will be no longer Z, instead, T is again used which has a t-distribution
x1  x2   c
with n1  n2  2 degrees of freedom, where T 
s 2p s 2p

n1 n2
Example 7.4
A random sample of size 16 showed an average of 480g with a standard deviation of 21g.
On the other hand, a sample of size 25 resulted to an average of 490g with a standard
deviation of 24. Test the null hypotheses H 0 : 1   2  0 against the alternative
H1 : 1   2  0 at 5% level of significant.
Solution
Given: n1  16, x1  480, s1  21, n2  25, x2  490, s2  24
Since the samples are small and the population variances are unknown, we use T as the
x1  x2   c n1  1s12  n2  1s22
test statistic, such that T  where s 
2
n1  n2  2
p
s 2p s 2p

n1 n2
n1  1s12  n2  1s22 16  1 212  25  1 24 2
Now, s 2p    524.08
n1  n2  2 16  25  2
Then,
x1  x2   c 480  490  0
T   1.36
2 2
s s 524.08 524.08
p

p 
n1 n2 16 25
The critical value (for one-sided test) is - t , n1  n2  2  t 0.05 (39)  1.645

The distribution is partitioned as shown below;
90
Rejection region non-rejection region
-1.645
-1.36
Since T falls within non-rejection region, we do not reject H 0 at 5% level.
7.7 Tests Concerning Variances

It is important to test the variances of populations. It is advisable to test the equality of
variances of two populations even before testing for the equality of the populations’
means. In most practical situations, however, statistical analyst assumed that samples are
drawn from two independent populations of equal variances. A manufacturer may need to
test the variability of the products produced by a given machine to maintain the
standards. In the next sections, we discuss the variability in both, single and two
populations.
7.7.1 Testing for Variance in a Single Population

In this case we test the following hypotheses H 0 :  2   02 against one of the following
alternative hypotheses
H1 :  2   02 or H1 :  2   02 or H1 :  2   02
The test statistic is known as chi-squared denoted by  2 such that
 2n  1 s 2
0
2
Where n is the sample size and s 2 is the sample variance.

The general critical value is given by
 2  2 n  1
91
Where,  is the level of significance, and n  1 , is called the degrees of freedom. The
degrees of freedom simply indicating the number of free independent scores minus the
number of parameters that a certain distribution contains.
The critical value(s) depends on the nature of the alternative hypothesis, the following
table summarizes
Alternative hypothesis Critical value(s)

H1 :  2   02  2 n  1
H1 :  2   02 12 n  1
H1 :  2   02 12 n  1 and 2 n  1
2 2
Example 7.5
Suppose that the thickness of a part used in a semiconductor is its critical dimension and
that measurements of the thickness of a random sample of 18 such parts have the
variance of 0.68, measured in thousands of inch. The process is considered to be under
control if the variation of the thickness is not greater than 0.36. Assuming that the
measurements constitute a random sample from a normal population, test at 5% level that
the process is under control.
Solution
Given n  18, s 2  0.68,  02  0.36,   0.05
Hypotheses: H 0 :  2  0.36 against H1 :  2  0.36
Test statistic: The test statistic is computed as follows
2 
n  1 s 2 
17  0.68
 32.11
 2
0 0.36
Critical value: The critical value for this one-sided test is
 2   02.05 17  27.587
Decision: Since the value of the test statistic is greater than the critical value, it falls
within the rejection region. Therefore the null hypothesis is rejected at 5% level, and
concludes that the process is out of control and it should be adjusted immediately.
7.7.2 Testing for Equality of two Population Variances

92
Suppose we have two independent populations 1 and 2. We need to test the null
hypothesis H 0 :  12   22 against one of the following alternatives
H1 :  12   22 or H1 :  12   22 or H1 :  12   22
The appropriate test statistic in this case is F such that
s12 s 22
F if s12  s 22 or F  if s 22  s12
s 22 s12
The critical value also depends on the nature of the alternative hypotheses as well the
critical values, the following table summarizes
Alternative Condition Test statistic Critical value(s)

hypothesis
s12
F 2
 12   22 s12  s 22 s2 F  F n1  1, n2  1
s 22
F
 12   22 s12  s 22 s12 F  F n2  1, n1  1
s12
F
 12   22 s12  s 22 s 22 F  F n1  1, n2  1
2
s 22
F
s 22  s12 s12 F  F n2  1, n1  1
2
Example 7.6
In comparing the variability of the tensile strength of two kinds of structural steel, an
experiment yielded the following results; n1  13, s12  19.2 , n2  16 , s22  3.5 . Assuming
that the measurements constitute independent random samples from two normal
populations, test at 2% level the null hypothesis that  12   22 against the
alternative  12   22 .
Solution
Given n1  13, s12  19.2 , n2  16 , s22  3.5 and   0.02
Hypotheses: H 0 :  12   22 against H1 :  12   22
93
Test statistic: Since s12  s 22 , the test statistic is

s12 19.2
F   5.49
s 22 3.5
Critical value: The critical value for this twp-sided test is given by
F  F n1  1, n2  1  F0.01 12,15  3.67
2
Decision: Since the value of the test statistic exceeds the critical value, we reject the null
hypothesis at 2% level of significance and conclude that the variability of the tensile
strength of the two kinds of steel is not the same.
7.8 Tests Concerning Population Proportions

In some practical situations, one would like to test whether a predetermined proportion of
a certain outcome (success or failure) is attained. For instance, the proportion of patients
who suffer side effects after the introduction of new medication, proportion of good items
produced by a machine and so on. Such a situation is well analyzed using a binomial
distribution. We are going to consider only a situation whereby the sample size is large,
so that one can use a normal approximation to binomial random variable. This will enable
us obtain a standardized normal variable by
pˆ  p0 x  np0
Z  , q0  1  p0
p0 q0 np0 q0
n
7.8.1 Testing for Proportion from a Single Population

In this case we test H 0 : p  p0 against one of the alternatives
H 1 : p  p0 or H1 : p  p0 or H 1 : p  p0
The test statistic is Z such that
X  np0
Z
np0 q0
The critical is Z  for one-sided test and Z  for two-sided test as previously discussed
2
for the cases of the Z – statistic.
Example 7.7
A commonly prescribed drug for receiving nervous tension is believed to be only 60%
effective. Experimental results with a new drug administered to a random sample of 100
94
adults who were suffering from nervous tension show that 70 received relief. Is this
sufficient evidence to conclude that the new drug is superior to the one commonly
prescribed? Test at 5% level of significance.
Solution
Given p0  0.60, n  100, x  70,   0.05
Hypotheses: H 0 : p  0.60 against H1 : p  0.60
Test statistic: The test statistic used is Z such that
X  np0 70  1000.60
Z   2.04
np0 q0 1000.600.40
Critical value: The critical value for the one-sided test is Z 0.05  1.645
Decision: Since the value of the statistic exceeds the critical value, the null hypothesis is
rejected and concludes that the new drug is superior.
7.8.2 Testing for the Equality of Proportions from two Populations

Testing the equality of proportions from two binomial populations frequently arise in
statistics. One would like to test if the proportions of engineers in two regions are equal,
or the proportion of patients who suffer lung cancer is higher for smokers than that of
non-smokers, and so on.
In this case we test the null hypothesis H 0 : p1  p2 against one of the alternatives
H1 : p1  p2 or H1 : p1  p2 or H1 : p1  p2
The test statistic here is Z such that

Pˆ1  Pˆ2
Z
1 1 
pˆ qˆ   
 n1 n2 
x x
Where Pˆ1  1 and Pˆ21  2 are the sample proportions, and p̂ is called the pooled
n1 n2
sample proportion from the two samples, and is given by
x1  x 2
pˆ 
n1  n2
Where x1 and x 2 are number of successes from the samples one and two respectively,
and, n1 and n 2 are the sample sizes while qˆ  1  pˆ .
95
Example 7.8
In a study to estimate the proportion of residents in a certain city and its suburbs who
favour the construction of nuclear plant, it is found that 63 of 100 urban residents favour
the deal while only 59 of 125 sub-urban residents are in favour. Is there a significance
difference between the proportion of urban and suburban residents who favour the
construction of the nuclear plant? Test this at 5% level of significance.
Solution
Given the following information x1  63, n1  100, x2  59, n2  125,   0.05
Hypotheses: We test H 0 : p1  p2 against H1 : p1  p2
Test statistic: The test statistic is Z such that
Pˆ1  Pˆ2
Z
1 1 
pˆ qˆ   
 n1 n2 
We first compute sample proportions and the pooled sample proportion by
x1 x 63  59
pˆ 1   0.63, pˆ 2  2  0.472, pˆ   0.542  qˆ  0.458
n2 n2 100  125
Then,
0.63  0.472
Z  2.36
0.5420.458 1  1 
 100 125 
Critical value: The critical values for the two-sided test are  Z    Z 0.025  1.96
2
Decision: Since the calculated value of the test statistic is greater than the critical value,
we reject the null hypothesis and conclude that there is a significant difference between
the proportion of urban residents and suburban residents who favour the construction of
the nuclear plant in their city.
7.9 Test Concerning Independence (Categorical Data)

In this case we would like to test the independence of two factors or categories. For
instance, the performance of students in school and their IQs, family’s wealth and the
family size (number of children) and so on.
96
Such information are commonly given is the form of a tables known as contingency
tables and the analysis of data is refereed to as analysis of r  c tables.
Here we test the null hypothesis that two factors are independent against the alternative
hypothesis that they are not independent.
The test statistic is a  2 given by
  
2
r c
O  E 2
i 1 j 1 E
Where
O  Observed value in a cell and E  Expeceted value in a cell
The expected value for each cell is computed as follows;
Row total  Column total
E
Grand total
The critical value is given by 2 r  1c  1 , where  is the level of significance and
r  1c  1 is the degrees of freedom for this test.
Example 7.9
Use the data in the following table to test at 0.01 level of significance whether a person’s
ability in mathematics is independent of his or her interest in statistics.
Ability in mathematics
Low Average High
Interest in Low 63 42 15
statistics Average 58 61 31
High 14 47 29
Solution
We need to test the following hypotheses
H 0 : Ability in mathematics and interest in statistics are independen t
H 1 : Ability in mathematics and interest in statistics are not independen t
Before we compute the value of the test statistic, we need to compute the row and column
total as well as the expected values E, using the above relation. The following table
summarizes this information, whereby the expected value for each cell is given in a
bracket.
97
Low Average High Total

Low 63 (45) 42 (50) 15 (25) 120
Average 58 (56.25) 61 (62.5) 31 (31.25) 150
High 14 (33.75) 47 (37.5) 29 (18.75) 90
Total 135 150 75 360
The test statistic is therefore given by
 2  
3 3
O  E 2 
63  452  42  502 
29  18.752  32.14
i 1 j 1 E 45 50 18.75
The critical value is given by  02.01 4  13.277 .
Decision: Since the value of the test statistic exceeds the critical value, we reject the null
hypothesis at 0.01 and conclude the person’s ability in mathematics and the interest is
statistics are not independent. Implying that, there is a relationship between one’s ability
in mathematics and his or her interest in statistics.
98
Exercises 7
1. A certain firm produces items that are normally distributed with variance 9. There
was a policy of paying double tax for those firms whose products exceed 400 items
per week. To avoid paying the tax, the manager claimed to produce only 400 items
per week. A random sample of 20 weeks showed that x  430 and s  3.6 . Does the
manager subject to this penalty? Test this claim at 5% level of significance.
2. A random sample of size 25 students showed a mean score of 45 marks and the
standard deviation of 6 marks. Test the null hypotheses that the average score of the
class was 50 against the alternative that it was different from 50 at 1% level of
significant. What will be your conclusion at 5% level.
3. Suppose you want to test the equality in revenues from two different sectors at your
region. Two samples each of size 14 taken from normal populations, showed that
x  4.5, s x  1.2 , y  5.1, s y  1.5 . Perform suitable testing hypotheses procedure
for this situation.
4. The owner of a certain private school claimed that the number of girls and boys at her
school is equally likely. To test this claim, a sample of 80 students was randomly
selected and it was found that, of them 36 were girls. Test the owner’s claim at 5%
level of significance.
5. The fidelity and the selectivity of 190 radios produced the results shown in the
following table. Test for independence at 5% level of significance.
Fidelity
Low Average High
Low 7 12 31
Selectivity Average 35 59 18
High 15 13 0
99
CHAPTER EIGHT
REGRESSION AND CORRELATION ANALYSIS
8.1 Introduction
Regression analysis is the study of investigating the relationship between two or more
variables, one being the dependent variable and the rest are called independent variables.
The relationship could be linear or non linear, simple or multiple. We are going to discuss
a simple linear regression analysis which will involve a dependent variable labeled Y
and only one independent variable labeled X related linearly.
Generally, this relation can be expressed in the following equation
yi  1   2 xi  ui , i  1,, n 1
Where  1 and  2 are constant parameters and u is called the disturbance term. The  1
and  2 as well as x are treated as non-random, and u is treated as a random term and
hence y.
There is a number of assumptions concerning with the probability distribution of u . The
first one is that E ui   0 for each i  1,, n . This enable us derive the population
regression line given by
E yi xi   1   2 xi , i  1,, n 2
We still can not compute the parameters  1 and  2 from equation (2). However, this is
possible if we take a paired sample of size n from the same population and estimate their
values. The estimated parameters reveal what we call it the line of best fit, sometimes is
referred to as a sample regression line given in equation (3) below
yˆ  ˆ  ˆ x , i  1,, n
i 1 2 i 3
Where ˆ1 and ˆ 2 are estimates of  1 and  2 respectively. Suitable estimators for these
parameters are obtained from the method of least squares.
8.2 The Method of Least Squares

Define the difference between y i and ŷ i as the residual denoted by ei  .Thus
ei  yi  yˆ i  yi  ˆ1  ˆ2 xi , i  1,, n 4

The method of least squares suggests the values of ˆ1 and ˆ 2 that minimizes the residual
sum of squares, e 2
i , where e 2
i
2

   yi  yˆ i    yi  ˆ1  ˆ2 xi  2
. If we let
q   ei2 , then the method involving solve the following system

100
q q
 0, 0
ˆ1 ˆ 2
This results to the estimators; ˆ 2 

 x  x  y  y   n xy   x y
i i
 x  x  i
n x   x 
2 2 2
And ˆ1  y  ˆ2 x 

1
n
 y  ˆ  x 
i 2 i
If we define S xx   x 2 
1
 x 2 , S yy   y 2  1  y  2 , S xy   xy  1  x y ,
n n n
Sx y
we can simply define ̂ 2 
Sxx
The line of best fit can roughly be estimated from a scatter diagram plotted using paired
sample data x, y  . The sample regression line can be used to predict the value of y given
the value of x.
Example 8.1
The following pairs of values (x, y) are given by
X 5 10 15 20 25
Y 22 32 38 59 67
(a) Plot a scatter diagram and estimate the line of best fit.
(b) Obtain the regression line by using least squares method.
(c) Predict the value of y when x = 110.
Solution
(a) a scatter diagram and the estimated line of best fit is shown in the figure below
101
80
70
60
50
40
30
20
10
0
0 5 10 15 20 25
(b) The summary data can be obtained from the following table
x y xy x2 y2
5 22 110 25 484
10 32 320 100 1024
15 38 570 225 1444
20 59 1180 400 3481
25 67 1675 625 4489
Total 75 218 3855 1375 10922
From the table we find that

 x  75,  y  218 ,  x y  3855 ,  x 2
 1375 ,  y 2  10922
x y  3855  75218  585

1 1
Now, S x y   x y  
n 5
Sx x   x2 
1
 x 2  1375  1 752  250
n 5
Sy y   y2 
1
 y 2  10922  1 2182  1417.2
n 5
Then, ˆ 2 
S xy
S xx

585
250
 2.34 , ˆ1 
1
n
 y  ˆ  x  15 218  2.34 75  8.5
2
102
Thus the required line of best fit is yˆ  8.5  2.34 x

(c) If x  110 , then the predicted value of y is yˆ  8.5  2.34 110  265.9
8.3 Sampling Distributions for Estimators

8.3.1 Distributions for u and y
The disturbance term u is assumed to be normally distributed with mean zero and
variance  2 . Thus
u ~ N 0 ,  2  . The dependent variable y has also a normal distribution with variance  2
but its mean is 1   2 x since x is non random or fixed variable.
8.3.2 Distributions for ˆ1 and ˆ 2

If the underlying assumptions concerning the distribution of u are held true, then it can be
 2  x2
shown that ˆ1 has a normal distribution with mean  1 and variance .
n Sxx
 2
Similarly, the distribution of ˆ 2 is a normal with mean  2 and variance , where  2
Sxx
is the variance of u.
Thus,
  2  x2   
 and ˆ 2 ~ N   2 , 
2
ˆ 
 1 ~ N 1 , .
 n S x x   Sxx 
  
We can not directly compute the value of  2 , and thus in order to proceed with our
analysis, we have to estimate its value. An unbiased estimator of  2 is ˆ 2 such that
e 2
e
RSS
ˆ 2

i
 , where
2
 S y y  ̂ 2 S x y .
n2 n2
i
Implying that
1
ˆ 2 
n2

S y y  ˆ2 S x y . 
These estimators revealed the t-distribution such that
ˆ1  1 ˆ   2
~ t n  2 and 2 ~ t n  2
SE ˆ1   SE ˆ 2  
ˆ 2  x 2
ˆ  
Where, SE 1 
n S xx
and SE ˆ 2    ˆ 2
Sx x
103
8.4 Estimations of ˆ1 and ˆ 2

From the sampling distributions, we have in general 1   100% confidence interval
   
estimate of  1 as ˆ1  t n  2SE ˆ1 and that of  2 as ˆ2  t n  2SE ˆ2 .
2 2
Example 8.2
Construct 95% confidence interval estimate of ˆ1 and ˆ 2 using the data given in
example 7.1.
Solution
Recall that S xx  250 , S yy  1417.2 , S x y  585 , ˆ1  8.5 and ˆ2  2.34
We first compute ˆ 2 as follows
ˆ 2 
1
n2
 
S yy  ˆ 2 S xy  1417.2  2.34  585  16.1
1
3
It implies that
 
SE ˆ1 
16.1  1375
5  250
 4.21 and SE ˆ2 
16.1
250
 
 0.254
We also have t n  2  t 0.025 3  3.182 .

2
Therefore, the 95% confidence interval estimate of  1 is

8.5  3.182  4.21  8.5  13.4
and that of  2 is
2.34  3.182  0.254  2.34  0.81
8.5 Correlation Analysis

8.5.1 Correlation Coefficient
In this analysis we simply check how strong the variables (dependent and independent)
relate. The analysis involves computing the correlation coefficient (r) and its value
suggests the strength of the linear relationship. This coefficient is computed using the
formula
covx, y  Sx y
r  , 1  r  1
SDx SD y  Sx xSy y
Positive values of r indicating the positive linear relationship and the negative values
indicating negative relationship. The relationship is called perfect if r  1 and when
104
r  0 we say there is no linear relationship. Very strong relationship commented

when 0.8  r  1 , it is just strong when 0.6  r  0.8 . The relation is weak when r  0.3
and so on.
8.5.2 Coefficient of determination r 2 
This is the square of correlation coefficient. It is normally represented in (%) and it can
be interpreted as the number of percentage the independent/explanatory variable (x), has
explained all linear factors of the dependent variable (y).
Example 8.3
Use information given in example 7.1, compute both r and r 2 and interpret their results.
Solution
Sx y 585
(a) Using different sum of squares, we have r    0.9828 .
Sx xSy y 250  1417.2
This value suggests that there is a very strong positive linear relationship between the
variables x and y.
(b) The coefficient of determination is r 2  0.9828  0.966  96.6% .
2
This measure indicating that an explanatory variable (x) has explain 96.59% of all linear
factors of the dependent variable (y), and the rest 3.4% can be explained by other
variables, not considered in this case.
8.5.3 Hypotheses testing for the regression coefficients

Generally we are testing for the null hypothesis H 0 :  i  c for j  1, 2 against one of the
alternative hypotheses.
ˆ j   j
The test statistic is generally given by T 
SE  j 
. This statistic has a t-distribution with
n -2 degrees of freedom.
8.5.4 Testing for the Correlation

In this case we are testing for the significance of the relationship in general.
The null hypothesis here is H 0 :   0 against the alternative H1 :   0
105
The test statistic is again T such that T 

n  2 r 2
1  r 
2
. This also has a t-distribution with n
-2 degrees of freedom. Rejection of H0 implying the existence of the linear relationship

between x and y. Otherwise, there is no linear relationship at all.
Example 8.4
Use the data given in example 4.1 to test H 0 :   0 against H1 :   0 at 5% level of
significance.
Solution
The test statistic is T such that
T
n  2 r 2 
3  0.966
 9.23
1  r 
2
1  0.966
The critical values at 5% level are  t 0.025 3   3.182 . Clearly, H0 is rejected at 5%
level.
This implying that there is a very strong linear relationship between the variables.
106
Exercises 8
1. various doses of a poisonous substance were given to groups of 25 mice and the
following results were recorded:
Dose (mg) Number of deaths

x y
4 1
6 3
8 6
10 8
12 14
14 16
16 20
(a) Find the equation of the least squares line to fit these data.
(b) Estimate the number of deaths in groups of 25 mice who received a 7-
milligram dose of this poison.
(c) Compute the correlation coefficient as well as the coefficient of determination
and comment on your results.
(d) Compute 90% confidence interval estimates for  1 and  2 .
(e) Use the data to test H 0 :   0 against H1 :   0 at 5% level of significance.
107
CHAPTER NINE
ANALYSIS OF VARIANCE
9.1 Introduction
In hypotheses testing, we were testing for the values assigned to a single population, as
well as testing the difference between means of two independent populations. In this
chapter we are going to give a general test procedure to test the equality of two or more
population means. The general test discussed here is called the analysis of variance or
simply known as ANOVA. Analysis of variance is classified into two procedures; One-
Way analysis of variance and Two-Way analysis of variance. Each of these procedures
are discussed below.
9.2 One-Way Analysis of Variance

In this case we consider random samples taken from k – independent populations each of
size n k . The null hypothesis to be tested is H 0 : 1   2     k against the alternative
that the means are not equal. We simply set these hypotheses by
H 0 : i  0 against H1 : i  0 for i  1,, k
We are going to derive the suitable F – statistic fro this test.
Suppose that samples are taken from k populations each of size n, as follows
Population 1 : x11 , x12 ,, x1n
Population 2 : x 21 , x 22 ,, x 2 n

Population k : x k1 , x k 2 ,, x kn
Let,
xi . = mean of a sample from the i th population
x.. = mean of all observations
Define the following sum of squares by
 x  x..   n xi .  x..    xij  xi . 

k n k k n
2 2 2
ij
i 1 j 1 i 1 i 1 j 1
or
SST  SSTr  SSE
Where,
SST = Total sum of squares
SSTr = Treatment sum of squares
108
SSE = Error sum of squares
 x  x
2
 is an unbiased estimator for  2 and that

2 i
Recall that s
n 1
n
 x  x
2
i 1
i

n  1s 2 ~  2 n  1
2 2

Then, for k populations and n total observations we have
 x  xi .  ~  2 k n  1
k n
1 2
 2 ij
i 1 j 1
And hence defined the mean square error by

SSE
MSE 
k n  1
Similarly, it follows that
 x  x..  ~  2 k  1
k
n 2
 2 i.
i 1
And hence the treatment mean of squares is given by

SSTr
MSTr 
k 1
Finally, we obtain an F as the ratio of MSTr over MSE.
Thus,
~ F k  1, k n  1
MSTr
F
MSE
The sum of squares can easily be computed as follows

k n
1 2
SST   xij2  T..
i 1 j 1 kn
1 k 2 1 2
SSTr   Ti .  kn T..
n i 1
Where,
T.. = total value of all observations (grand total)
Ti . = total for the i th population

The following table summarizes the calculations
109
Populations Observations Subtotal

Population 1 x11    x1n T1.
Population 2 x 21    x 2 n T2 .
  
Population k x k1    x kn Tk .
Grand total T..
The above calculations are usually presented in the form of table known as Analysis of
Variance (ANOVA) table whose general layout is shown below
Source of Variation Degrees of Sum of Mean F – value

freedom Squares Square
Treatment k 1 SSTr MSTr MSTr
MSE
Error k n  1 SSE MSE
Total kn  1 SST
Example 9.1
Scores of five best candidates in mathematics from three different schools at the same
level are given below
School A: 77, 81, 71, 76, 80
School B: 72, 58, 74, 66, 70
School C: 76, 85, 82, 80, 77
Perform a suitable test to check if the mean performances of these three schools are equal
or significantly different from one another.
Solution
For the above populations, we compute sum of squared observations as
3 5
 x
i 1 j 1
2
ij  77 2  812    80 2  77 2  85041
Other summations are summarized in the following table

Populations Observations Sub-total
School A 77, 81, 71, 76, 80 385
School B 72, 58, 74, 66, 70 340
School C 76, 85, 82, 80, 77 400
110
Grand total 1125

Then,
k n
T..  85041  1125  666
1 2 1
SST   xij2 
2
i 1 j 1 kn 15
SSTr 
1 k 2 1 2 1

n i 1 kn 5

Ti .  T..  385 2  340 2  400 2  1125  390
1
15
2

It follows that
SSE  SST  SSTr  666  390  276
The rest of the calculations are summarized in the following table

Treatment 2 390 195 8.48
Error 12 276 23
Total 14 666
The calculated value of F – statistic is 8.48.

The critical value at 5% is F0,.05 2,12  3.89
Since 8.48  3.89 , it shows that the value of the F – statistic falls within the rejection
region, and hence the null hypothesis that the mean performance are equal is rejected, and
hence conclude that the mean performance of the schools are different.
9.3 One-Way Classification for Populations with different sample sizes

If k – populations are involved with varying sample sizes n1 ,, nk , then the formula for
sum of squares also change. Let N  n1  n2    nk .
Then,
k ni
1 2
SST   xij2  T..
i 1 j 1 N
k Ti 2. 1 2
SSTr    T..
i 1 ni N
The ANOVA table becomes

111

Treatment k 1 SSTr MSTr MSTr
MSE
Error N k SSE MSE
Total N 1 SST
Example 9.2
A consumer wishes to test the accuracy of the thermostats of three different kinds of
electric irons set at 240 0 C .
Iron X: 251, 246, 238, 245
Iron Y: 242, 250, 248
Iron Z: 240, 248, 247, 246, 247
Test at 5% level that the three means of thermostatic accuracies are equal against the
alternative hypothesis that the means are different.
Solution
We first compute the total sum of squared observations as follows
3 5
 x
i 1 j 1
2
ij  2512  246 2    246 2  247 2  724392
Other summations are summarized in the following table
Populations Observations Sample size Sub-total

Iron X 251, 246, 238, 245 4 980
Iron Y 242, 250, 248 3 740
Iron Z 240, 248, 247, 246, 247 5 1228
Grand total 12 2948
Total number of observations is N  12

Then, we compute the sum of squares as
112
k ni
T..  724392  2948  166.67
1 2 1
2
i 1 j 1 N 12
k
1 2  980 2 740 2 1228 2
Ti 2.  1
SSTr    T..       29482  4.8
i 1 ni N  4 3 5  12
It follows that
SSE  SST  SSTr  166.67  4.8  161.87
The ANOVA table is therefore given as follows

Treatment 2 4.8 2.4 0.13
Error 9 161.87 17.99
Total 11 166.67
The tabulated value of is F0.05 2,9  4.26

Since the calculated value is less than the critical value, we can not reject the null
hypothesis that the three means of the thermostatic accuracies are equal.
9.4 Experimental Design

9.4.1 Introduction
In example 9.1, the analysis of variance resulted to the conclusion that the performances
between the schools were significantly different. Remember however that in that test we
considered only one factor, which was the students’ ability. We did not consider for
instance, the study environment, teachers’ competence and so on. The concept of
experimental design meaning that designing of statistical experiments which include not
only the desirable factors but also those attributed to chance.
The experiments are planned in such away that they can be analyzed two-way analysis of
variance technique. Two factors considered here are the treatments and blocks. In our
students’ performance example; schools might be the treatments and students being
blocks. The necessary condition required to do two-way analysis of variance is that
samples of the same size are taken from k – populations.
113
9.4.2 Two-Way Analysis of Variance without Interaction

The important assumption here is that treatments and blocks are two independent
variables
Block 1 Block 2  Block n
Treatment 1 x11 x12  x1n
Treatment 2 x 21 x 22  x2n
    
Treatment k x k1 xk 2  x kn
We test two null hypotheses here;

H 0 :  i  0 , i  1,, k Treatments
H 0' :  j  0 , j  1,, n Blocks 
Against the alternative hypotheses that H1 :  i  0 and H1' :  i  0
We also have two F-statistics; one for treatment and the other for blocks. These statistics
are computed as follows
~ F k  1, n  1k  1
MSTr
FTr 
MSE
And
~ F n  1, n  1k  1
MSB
FB 
MSE
The identity for sums of squares is
SST  SSTr  SSB  SSE

Where,
1 2
SST   xij2  T..
kn
1 k 2 1 2
SSTr   Ti .  kn T..
n i 1
1 n 2 1 2
SSB   T. j  kn T..
k i 1
And hence
SSE  SST  SSTr  SSB
114
The ANOVA table takes the following shape
Source of Degrees of Sum of Mean squares F - statistics

Variation freedom squares
Treatments k 1 SSTr SSTr MSTr
MSTr  FTr 
k 1 MSE
Blocks n 1 SSB SSB MSB
MSB  FB 
n 1 MSE
Error k  1n  1 SSE
MSE 
SSE
k  1n  1
Total kn  1
Example 9.3
The driving time to work (in minutes) taken by a person from Monday to Friday using
four different routes were recorded as follows;
Mon Tue Wed Thu Fri
Route 1 22 26 25 25 31
Route 2 25 27 28 26 29
Route 3 26 29 33 30 33
Route 4 26 28 27 30 30
Test at 5% level whether or not the mean driving time among routes are significantly
different.
Also test the mean time among days.
Solution
In this case we have four routes (treatments) and five days (blocks). That means we have
k=4 and n=5.
We compute the total sum of squared observations as follows
4 5
 x
i 1 j 1
2
ij  22 2  26 2    30 2  30 2  15610
Treatments, blocks and grand total are summarized in the following table
Blocks Total
Treatments 22 26 25 25 31 129
25 27 28 26 29 135
115
26 29 33 30 33 151
26 28 27 30 30 141
Total 99 110 113 111 123 556
The required sum of squares are computed as follows
4 5
T..  15610  556  153.2
1 2 1
2
i 1 j 1 kn 20
SSTr 
1 4 2 1 2 1

5 i 1 kn 5
 20

Ti .  T..  129 2  135 2  1512  1412  556   52.8
1 2
SSB 
1 5 2 1 2 1

4 i 1 kn 4
 20

T. j  T..  99 2  110 2  113 2  1112  123 2  556  73.2
1 2
It implies that
SSE  153.2  52.8  73.2  27.2
The ANOVA table is therefore given by
Source of Degrees of Sum of Mean F - statistics

Variation freedom squares squares
Treatments 3 52.8 17.6 7.75
Blocks 4 73.2 18.3 8.06
Error 12 27.2 2.27
Total 19 153.2
The critical value for testing for the treatment means is F0.05 3,12  3.49 , and the critical
value for testing the block means is F0.05 4,12  3.26 .
Decisions
Since the calculated F – value for routes (treatments) is greater than the critical value; we
reject the null hypothesis that the mean driving times among the routes are equal, and
conclude that the mean driving times among different routes are significantly different.
116
Similarly, the calculated F – value for days (blocks) is greater than the critical value; we
reject the null hypothesis that the mean driving times among the days are equal, and
conclude that the mean driving times among different days are also significantly
different.
117
Exercises 9
1. The following are the numbers of mistakes made in five successive weeks by four
technicians working for medical laboratory:
Technician I: 13, 16, 12, 14, 15
Technician II: 15, 16, 11, 19, 15
Technician III: 13, 18, 16, 14, 18
Technician IV: 18, 10, 14, 15, 12
Test at 0.05 level of significance whether the differences among the four sample
means can be attributed to chance.
2. An experiment was performed to judge the effect of four different fuels and three
different types if launchers on the range of a certain rocket. Test, on the basis of the
following ranges in miles, whether there is a significant effect due to differences in fuels.
Test also whether there is a significant effect due to differences in launchers.
Fuel 1 Fuel 2 Fuel 3 Fuel 4

Launcher X 45.9 57.6 52.2 41.7
Launcher Y 46.0 51.0 50.1 38.8
Launcher Z 45.7 56.9 55.3 48.1
Use the level of significance   0.01.

118
CHAPTER TEN
QUALITY CONTROL
10.1 Introduction
Quality control designed to maintain quality in production processes. It is important
ingredient to the development of Japan’s industry and economy. It is now receiving
increasing attention as a management tool in which important characteristics of products
are observed, assessed and compared with some types of standard. The various
procedures in quality control involve considerable use of sampling procedures and
statistical principles. It is clear that effective quality control programs enhances the
quality of the product being produced and increases profits.
10.2 The control chart

The purpose of a control chart is to determine if the performance of a process is
maintaining an acceptable level of quality. It is expected that any process will experience
natural variability due to essentially important and uncontrollable sources of variation.
On the other hand, a process may experience more serious types of variability in key
performance measures.
These sources of variability may arise from one of several types of nonrandom
“assignable causes” such as operator errors or improperly adjusted dials on a machine. A
process operating at this state is called out of control. A process experiencing only
chance variation is said to be in statistical control. A successful production process may
operate in an in-control state for a long period of time. It is presumed that during this
period, the process produces acceptable products, however, there may be either a gradual
or sudden “shift” that requires detection.
A control chart is a device intended to detect the nonrandom or out-of-control state of

process.
Some types of quality characteristic must be under consideration and the units of the
process are sampled overtime. Examples of characteristics include diameter of engine
cylinders, circumference of engine bearings and so on.
A typical control chart takes the form shown below;
119
Figure 10.1: Typical control chart
The line shown is called the centerline which represents an expected value of the
characteristic when the process is in control. The potted points represent the sample
average of a characteristic with samples taken over time. The upper and lower control
limits are chosen in such away that all sample points should be covered by these
boundaries for the process to be in control. If any point falls outside these boundaries, it
suggests that the process is out of control. Also nonrandom pattern of points within the
boundaries may be considered suspicious and certainly an indication that the process
requires appropriate correction action.
10.3 Nature of the Control Limits

The fundamental ideas on which control charts are based are similar in structure to
hypotheses testing. Control limits are established to control the probability of making the
error of concluding that the process is out of control when in fact it is not. This
corresponds to the probability of making type I error if we were testing the null
hypothesis that the process is in control. Therefore, the choice of control limits is similar
to the choice of a critical region. The idea of sample size and the notion of power of the
test is also similar to that of hypotheses testing situations. The latitudes given by the
control limits obviously must depend in sense on the process variability.
10.4 Purpose of the Control Chart

120
One obvious purpose of the control chart is mere surveillance of the process, that is, to
determine if changes need to be made. Constant systematic gathering of data often allows
management to access process capability. Quality characteristics of control charts fall
generally into two categories; variables and attributes.
10.5 Control Chart for Variables

We illustrate this idea by considering the following example.
Example 10.1
A quality control charts are to be used on a process of manufacturing a certain engine
parts. Suppose the process mean is   50 mm and the standard deviation is
  0.01mm . Suppose also that groups of five parts are sampled every hour and the
values of the sample mean X are recorded and plotted. Based on the standard deviation
of the random variable X , set the control limits of the X - chart.
Solution
Assume that engine parts are normally distributed, then by the central limit theorem, we
have
 0.01 
X ~ N  50, 
 5 
Then, 1   100% of the X - values fall inside the limits when the process is in control.
The required limits are given by

 50  Z  0.00447
0.01
LCL    Z   50  Z 
2 n 2 5 2
And,

 50  Z  0.00447
0.01
UCL    Z   50  Z 
2 n 2 5 2
In most practical situations, control analysts use “three-sigma” limits, meaning that they
use Z   3 . Therefore, we have
2
LCL  50  30.00447  49.9866 and UCL  50  30.00447  50.0134
10.6 X - Chart with Estimated Parameters

In previous example we assumed that  and  are known, however, in many practical
situations these parameters are unknown. In which case, estimates are obtained from data
121
taken when the process is in control. Typically, the estimates are determined during a
period in which background or start-up information is gathered.
A basis for rational subgroups is chosen and the data are gathered with samples of size n
in each subgroup. The sample sizes are usually small 4, 5 or 6 and k samples are taken,
with k  20 . During the period whereby the process is assumed to be in control, the user
establishes estimates for  and  on the control chart is based. The important
information gathered during this period includes the sample means in the subgroups, the
overall mean, and the sample range in each subgroup.
For each sample we compute X i , i  1,, k to form the sample points X 1 ,, X k , and the
general sample mean is given by
1 k
X   Xi
k i 1
Where, X is the appropriate estimator for  .
In quality control applications it is often convenient to estimate  from the information
related to the ranges in the samples rather than sample standard deviations.
For the i th sample, we define the range for the data by
Ri  X max, i  X min ,i
The appropriate estimate of  is the function of the average range given by
1 k
R  Ri
k i 1
An estimate of  , say ˆ , is obtained from the formula
R
ˆ 
d2
Where d 2 is a constant depending on the sample size, n.
The use of range to estimate  has roots in quality control type applications since it can
be easily computed. Under the assumption of normality, we make use of a random
variable called the relative range, and is given by
R
W

Which is a simple function of the sample size, n whose expected gives d 2 . That is
E R 
E W    d2

R
This makes the estimate ˆ  more understood.
d2
122
It is known that the use of range produces as efficient estimator of  in relatively small
samples. Using range method, the estimated parameters are given by
3R 3R
Centerline  X , LCL  X  , UCL  X 
d2 n d2 n
3
By defining the quantity A2  , we have
d2 n
LCL  X  A2 R and UCL  X  A2 R

Where, A2 is a tabulated value which depends on the sample size.
10.7 R-Charts to Control Variation

It is also important for control to be applied to variability as well as centre of location. In
fact, many experts feel as if control of variability of the performance characteristic is
more important and should be established before centre of location is considered. Process
variability can be controlled through the use of plots of the sample range. A plot over
time of the sample range is called an R-chart. The same general structure is used as the
case of X -chart, with R being the centre line, and control limits are from an estimate of
the standard deviation of R.
The estimate of  R is based on the distribution of the relative range, W, such that
R
W

The standard deviation of W is a known function of the sample size and is generally
denoted by d 3 such that  R   d 3 .
d3
Replacing  by ˆ  R , we have ˆ R  R
d2 d2
Therefore, the quantities that define the R-chart are
d3  d 
LCL  R  3R  R 1  3 3   R D3
d2  d2 
.
d  d 
UCL  R  3R 3  R 1  3 3   R D4
d2  d2 
The constants D3 and D4 are tabulated values which depend on the sample size.
123
Sample Observations Xi Ri
number
1 1515 1518 1512 1498 1511 1510.8 20
2 1504 1511 1507 1499 1502 1504.6 12
3 1517 1513 1504 1521 1520 1515.0 17
4 1497 1503 1510 1508 1502 1504.0 13
5 1507 1502 1497 1509 1512 1505.4 15
6 1519 1522 1523 1517 1511 1518.4 12
7 1498 1497 1507 1511 1508 1504.2 14
8 1511 1518 1507 1503 1509 1509.6 15
9 1506 1503 1498 1508 1506 1504.2 10
10 1503 1506 1511 1501 1500 1504.2 11
11 1499 1503 1507 1503 1501 1502.6 8
12 1507 1503 1502 1500 1501 1502.6 7
13 1500 1506 1501 1498 1507 1502.4 9
14 1501 1509 1503 1508 1503 1504.8 8
15 1507 1508 1502 1509 1501 1505.4 8
16 1511 1509 1503 1510 1507 1508.0 8
17 1508 1511 1513 1509 1506 1509.4 7
18 1508 1509 1512 1515 1519 1512.6 11
19 1520 1517 1519 1522 1516 1518.8 6
20 1506 1511 1517 1516 1508 1511.6 11
21 1500 1498 1503 1504 1508 1502.6 10
22 1511 1514 1509 1508 1506 1509.6 8
23 1505 1508 1500 1509 1503 1505.0 9
24 1501 1498 1505 1502 1505 1502.2 7
25 1509 1511 1507 1500 1499 1505.2 12
Total 37683.2 268
124
Table 10.1 : Table for example 10.2
Example 10.2
A process manufacturing missile component parts is being controlled, with the
performance characteristic being the tensile strength in pounds per square inch. Samples
of size 5 each are taken every hour and 25 samples are reported.
The data are shown in the above table (sample means and sample ranges have already
computed from the data and shown in the last two columns). Construct (a) the R-chart
and (b) the X -chart of the tensile strength.
Solution
(a) For R-chart we proceed as follows,
The centreline is computed as
1 25 268
R 
25 i 1
R1 
25
 10.72
Using the table A.23 with n  5 we find that D3  0 and D4  2.114 . Using these
results, we construct the control limits as follows
LCL  R D3  10.720  0 UCL  R D4  10.722.114  22.6621
The R-chart with limits are shown below;
Figure 10.2: R-chart for the tensile strength

125
(b) Similarly, for X -chart we have the following limits

The centreline is given by
1 25 37683.2
X  
25 i 1
Xi 
25
 1507.328
For the sample of size 5, table A.23 gives A2  0.577 . The control limits are
LCL  X  A2 R  1507.328  0.577 10.72  1501.1426

UCL  X  A2 R  1507.328  0.577 10.72  1513.5134
The X -chart is shown in the figure below
Figure 9.3: X -chart for the tensile strength
10.8 X and S-Charts for Variables

The range is used as an efficient estimator of  , however, the efficiency of R decreases
as the sample size becomes large n  10 . In that case, the familiar statistic
 x  x
2
s
i
n 1
should be used in the control of both the mean and variability.
We know that s 2 is an unbiased estimator of  2 , but unfortunately s not unbiased for
 . That means E s    .
However, if X is independently and normally distributed random variable, then
126
n 2
E s   c4 where c4 
2
n  1 n  1 2
We also have

vars    2 1  c42    s   1  c42
Therefore, if  is unknown, the control limits are computed by

LCL  c4  3 1  c42   c4  3 1  c42   B5 
UCL  c4  3 1  c42   c 4  3 1 c   B
2
4 6
Where B5 and B6 are tabulated values depending on the sample size. However, In most
practical situations,  is unknown, and needs to be estimated. In this case, an unbiased
estimator is ˆ which is given by
ˆ  s c
4
1 m
where s   si with si being a sample standard deviation.
m i 1
The control limits for S-chart are given by
centreline  s LCL  B3 s UCL  B4 s
3 3
Where B3  1  1  c42 and B4  1  1  c42 are tabulated values.
c4 c4
Similarly, for the X -chart we have,
centreline  X LCL  X  A3 s UCL  X  A3 s
Where A3 is a tabulated value depending on the sample size.
Example 10.3
Containers are produced by a process where the volume of the containers is subject to
quality control. Twenty five samples of size 5 each were used to establish the quality
25 25
control parameters. The results showed that  X i  1558.14 and
i 1
s
i 1
i  0.9025 .
Construct the X -chart and the s-chart for the volume of the containers.
Solution
(a) The X -chart is constructed as follows;
The centreline is given by
1 25 1558.14
X  
25 i 1
Xi 
25
 62.3256
127
We also compute mean sample standard deviation, s , as

1 25 0.9025
s 
25 i 1
si 
25
 0.0361
From the table A.23 with n  5 , we have A3  1.427, B3  0, B4  2.089 .

The control limits are therefore given by
LCL  X  A3 s  62.3256  1.427 0.0361  62.2740
UCL  X  A3 s  62.3256  1.427 0.0361  62.3771
The X -chart for the volume of the containers is given in the following figure
Figure 10.4: X -chart for the volume of containers
(b) The control limits for the s-chart are given by

centreline  s  0.0361
LCL  B3 s  00.0361  0 UCL  B4 s  2.0890.0361  0.0754
The s-chart is shown in the following figure

128
Figure 10.5: s-hart for the volume of containers
10.9 Control Charts for Attributes

In many industrial applications of quality control, require that quality characteristic
indicate no more than the statement that item “conforms”. In which case there is no
continuous measurement that is crucial to the performance of the process. This type of
sampling is called sampling for attributes. Best example of this sampling is the
performance of light bulbs, whereby the bulb is either defective or good. Other examples
include manufactured metal pieces, which may contain deformities, and liquid containers
that may contain leakages.
10.9.1 The p-Chart for Fraction Defective

In this case we consider only a single characteristic. Suppose that for all items produced,
the probability of a defective item is p, and that all items are produced independently. In a
random sample of n items produced, allowing X to be the number of defective items, we
have
 n
P X  x     p x 1  p  , x  0,1,, n
n x
 x
This is a binomial situation with E  X   np and var X   np1  p  . An unbiased
estimator of p is the fraction defective or proportion of defective given by pˆ  x .

n
129
The distribution properties of p̂ are important in the development of the p-chart. These
are
p1  p 
E  pˆ   p and var  pˆ  
n
Thus, if p is known, the control limits are given by
p1  p  p1  p 
centreline  p LCL  p  3 UCL  p  3
n n
If p is not known, then an unbiased estimator for p is p such that
m
pˆ i
p
i 1 m
Where p̂ i is the fraction defective in the i th sample.
The resulting control limits are
p 1  p  p 1  p 
centreline  p LCL  p  3 UCL  p  3
n n
Note that m is the number of samples and n is the common sample size.
Example 10.4
The number of defective electronic components is subject to quality control. To establish
preliminary control chart values, twenty samples each of size 50 are taken and the
20
number of defective items was recorded. The results show that  pˆ i  1.76 . Determine
i 1
the control limits of the p-chart and comment on your findings.
Solution
We first compute the centreline as
m
pˆ i 1.76
p   0.088
i 1 m 20
Then, the control limits are computed by
p 1  p  0.0880.912
LCL  p  3  0.088  3  0.0322
n 50
p 1  p  0.0880.912  0.2082
UCL  p  3  0.088  3
n 50
Comments:
1. Negative LCL is set to zero, since all fractional defective are positive or zero.
130
2. The control limits show that the process is in control during this preliminary period.
10.9.2 Choice of Sample size from p-Chart

There is no best or general method to choose the sample size in attributes, however, many
statistical analysts, take the chance of 0.5 that the shift is detected from p to (say p1 ).
Assuming normal approximation to binomial distribution, we have
 UCL  p1 
P pˆ  UCL  P  Z    0.5
 p1 1  p1  n 
Since PZ  0  0.5 we get
UCL  p1
0  UCL  p1  0
p1 1  p1  n
p1  p 
Substituting UCL  p  3 in the above equation we get
n
p1  p 
 p  p1   3 0
n
We can now solve for n, the size of each sample, using 3 limits, to get
p 1  p  ,
9
n
2
Where  is the “shift” in the value of p.
However, if the control charts based on k in general, we have
k2
n p 1  p 
2
Example 10.5
Suppose that an attribute quality control chart is being designed with a value of p  0.01
for the in-control probability of a defective. What is the sample size per subgroup
producing a probability of 0.5 that the process shift to p  p1  0.05 will be defective?
Given that the resulting p-chart will involve 3 limits.
Solution
Given p  0.01, p1  0.05 , it follows that   p  p1  0.01  0.05  0.04
Then, using 3 we have
131
p 1  p   0.010.99  56.
9 9
n
2
0.042
132
Exercises 10
1. Consider the following data taken on subgroups of size 5. the data contains 20
averages, and ranges on the diameter (in millimeters) of an important component part of
an engine.
Sample X R
1 2.3972 0.0052
2 2.4191 0.0117
3 2.4215 0.0062
4 2.3917 0.0089
5 2.4151 0.0095
6 2.4027 0.0101
7 2.3921 0.0091
8 2.4171 0.0059
9 2.3951 0.0068
10 2.4215 0.0048
11 2.3887 0.0082
12 2.4107 0.0032
13 2.4009 0.0077
14 2.3992 0.0107
15 2.3889 0.0025
16 2.4107 0.0138
17 2.4109 0.0037
18 2.3944 0.0052
19 2.3951 0.0038
20 2.4015 0.0017
Display X - and R-charts. Does the process appear to be in control?

133
2. Samples of size 50 are taken every hour from a process producing a certain type
of item that is either considered defective or not defective. Twenty samples are
taken and the results are shown in the following table:
Sample Number of defective items

1 4
2 3
3 5
4 3
5 2
6 2
7 2
8 1
9 4
10 3
11 2
12 4
13 1
14 2
15 3
16 1
17 1
18 2
19 3
20 3
(a) Construct a control chart for the control of proportion defective.

(b) Does the process appear to be in control? Explain.
134
CHAPTER ELEVEN
ELEMENTARY CONCEPTS ON SYSTEM RELIABILITY
11.1 Introduction
The analysis of the reliability of a system must be based on precisely defined concepts.
Since it is readily accepted that a population of supposedly identical systems, operating
under similar conditions, fall at different points in time, then a failure phenomenon can
only be described in terms of probabilities. Thus, the fundamental definitions of
reliability must depend on concepts of probability theory. In general, a system may be
required to perform various functions, each of which may have a different reliability. In
addition, at different times, the system may have a different probability of successfully
performing the required function under the stated conditions. The term failure means the
system is not capable of performing a function when required.
11.2 Reliability Measures

Definition 11.1 (Reliability): Reliability is the probability of success or the probability
that the system will perform its intended function under specified design limits.
More precisely, reliability is the probability that a system or part will operate properly for
a specified period of time (design time) under the designed operating conditions (such as
temperature, voltage, and so on) without failure.
Reliability is one of the quality characteristics that consumers require from the
manufacturer of products.
Mathematically, reliability denoted by Rt  is the probability that a system will be
successful in the interval from time 0 to time t;
Rt   PT  t , t  0 (11.1)
Where T is a random variable denoting the time-to-failure or failure time.
Definition 10.2 (Unreliability): Unreliability denoted by F t  is a measure of failure,

and is defined as the probability that the system will fail by time t;
F t   PT  t , t  0 (11.2)
In other words, F t  is the failure distribution function. If the time-to-failure random
variable T has a density function f t  , then, we define a reliability as

Rt    f s  ds (11.3)
t
135
Or, equivalently, as
f t   
d
Rt  (11.4)
dt
The density function can be mathematically described in terms of failure time T;
f t   lim Pt  T  t  t  (11.5)
t  0
Equation (11.5) can be interpreted as the probability that the failure will occur between
the operating time t and the next interval of operation, t  t .
It is believed that a system operates at a probability of one at time t  0 and decreases to
zero probability as time increases to infinity without any repair. Clearly, reliability is a
function of mission time.
Example 11.1
A computer system has an exponential failure time density function
t
1  9000
f t   e , t0
9000
What is the probability that the system will fail after the warranty (six months or 4380
hours) and before the end of year one (or 8760 hours)?
Solution
The required probability is given by
8760 t
1  9000
P4380  T  8760   e dt  0.237
4380
9000
This indicates that the probability of failure during the interval from six months to one
year is 23.7 %.
If the time to failure is described by an exponential failure time density function, then
t

f t  
1
e 
, t  0,   0 (11.6)

And this will lead to the reliability function
 s t
 
Rt    e ds  e
1  
, t0 (11.7)
t

136
11.3 System Mean Time to Failure

Suppose that the reliability function for a system is given by Rt  . The expected failure
time during which a component is expected to perform successfully is known as the
system mean time to failure (MTTF), and is given by

MTTF   t f t  dt (11.8)
0
Substituting f t   
d
Rt  into equation (10.8) and taking integration by parts gives
dt
  
MTTF    t d Rt    t Rt    Rt  dt (11.9)
0 0 0
The first term on the right hand side of (11.9) is zero at both limits, since the system must
fail after a finite number of operating time. That is we must have tR t   0 as t   .
Therefore, we have

MTTF   Rt  dt (11.10)
0
Thus, MTTF is the definite integral evaluation of the reliability function.
11.4 Failure Rate Function

The probability of a system failure in a given time interval t1 ,t 2  can be expressed in
terms of the reliability function as
t2  
 f t  dt   f t  dt   f t  dt  F t 2   F t1 
t1 t1 t2
The rate at which failure occur in a certain time interval t1 ,t 2  is called the failure rate.
It is defined as the probability that a failure per unit time occurs in the interval, given that
a failure has not occurred prior to time t1 , the beginning of the interval. Thus, the failure
rate is
Rt1   Rt 2 
FR  (11.11)
t 2  t1  Rt1 
Note that a failure rate is a function of time. If we redefine the interval as t , t  t  , the
above expression becomes
Rt   Rt  t 
FR  (11.12)
t Rt 
137
Using equation (11.12) we define the hazard function as the limit of the failure rate as
the interval approaches zero. Thus, the hazard function ht  is the instantaneous failure
rate, and is defined by
Rt   Rt  t  1  d  f t 
ht   lim    Rt   (11.13)
t 0 t Rt  Rt   dt  Rt 
The importance of hazard function is that it indicates the change in the failure rate over
the life of a population of components by plotting their hazard functions on a single axis.
11.5 Maintainability
When a system fails to perform satisfactorily, repair is normally carried out to locate and
correct the fault. The system is restored to operational effectiveness by making an
adjustment or by replacing a component.
Maintainability is defined as the probability that a failed system will be restored to
specified conditions within a given period of time when maintenance is performed
according to prescribed procedures and resources. In other words maintainability is the
probability of isolating and repairing a fault in a system within a given time.
Maintainability engineers must work with system designers to ensure that the system
product can be maintained by the customer efficiency and cost effectively.
This function requires the analysis of part removal, replacement, tear-down, and build-up
of the product in order to determine the required time to carry out the operation, the
necessary skill, the type of support equipment and the documentation.
Let T denote the random variable of the time to repair or the total downtime. If the repair
time T has a repair time density function g t  , then the maintainability, denoted by V t  ,
is defined as the probability that the failed system will be back in service by time t;
t
V t   PT  t    g s  ds (11.14)
0
For instance, if g t    e   t , where   0 is a constant repair rate, then

V t   1  e   t
This represents the exponential form of the maintainability function.
An important measure often used in maintenance studies is the mean time to repair
(MTTR) or the mean downtime. MTTR is the expected value of the random variable
repair time, not failure time, and is given by
138

MTTR   t g t  dt (11.15)
0
When the distribution has a repair time has the exponential density given by
g t    e   t , then the MTTR  1 .

11.6 Availability
Reliability is a measure that requires system success for an entire mission time. No
failures or repairs are allowed. Space missions and aircraft flights are good examples of
systems that do not allow failure or repair. Availability is a measure that allows for a
system to repair when failure occurs.
The availability of a system is defined as the probability that the system is successful at
time t. Mathematically we have
System uptime
Availabili ty 
System uptime  System downtime
(11.16)
MTTF

MTTF  MTTR
Availability is a measure of success used primarily for repairable systems. For non-
repairable systems, availability At  is equal to reliability Rt  . In repairable systems,
At   Rt  .
most important measure in repairable system is called Mean time between repairs
(MTBR). It implies that the system has failed and has been repaired. This is
mathematically given by
MTBF  MTTF  MTTR (11.17)
11.7 Common Probability Distribution

Reliability engineering uses the common probability distributions. These include
binomial distribution and Poisson distribution for discrete cases, and exponential and
normal distributions for continuous cases.
11.7.1 Exponential distribution

The exponential distribution plays as essential role in reliability engineering because it
has a constant failure rate. This distribution has been used to model the lifetime of
electronic and electrical components and systems. This distribution is appropriate when
already used component that has not failed is as good as a new component. Recall that for
exponential distribution, we have
139
t t
 
f t    e  t and Rt   e
1
e  
, t0

Where   1  0 is an MTTF’s parameter and   0 is a constant failure rate.

The hazard function or failure rate for the exponential density function is constant, that is,
f t   e   t
ht     t   (11.18)
Rt  e
The failure rate for this distribution is a constant  , which is the main reason for this
widely used distribution. Because of its constant failure rate property, the exponential is
an excellent model for the long flat “intrinsic failure” portion of the bathtub curve.
Example 11.2
A manufacturer performs an operational life test on ceramic capacitors and finds that they
exhibit constant failure rate with a value of 3  10 8 failure per hour. What is the
reliability of a capacitor at 10 4 hour?
Solution
The reliability of a capacitor at 10 4 hours is
Rt   e  t  e  310
8 4
104
 e  310  0.9997
11.7.2 Normal Distribution

The normal distribution plays an important role in classical statistics owing to the central
limit theorem. In reliability engineering, the normal distribution primarily applies to
measurements of product susceptibility and external stress. This two-parameter
distribution is used to describe systems in which a failure results due to some wear out
effect for many mechanical systems.
If T is a random variable denoting the time to failure, then using the standardized normal
variable, we have
T 
Z

The reliability R(t) is therefore given by
 t
Rt   PT  t   P Z   (11.19)
  
140
Given that
f t   
d
Rt 
dt
The hazard function becomes
f t  1  Rt 
ht    (11.20)
Rt   Rt 
Example 11.3
a component has a normal distribution of failure times with mean 2000 hours and the
standard deviation of 100 hours. Find the reliability of the component and the value of the
hazard function at 1900 hours.
Solution
Given   2000 ,   100 , then
The reliability at time t  1900 hours is given by
 1900  2000 
R1900  P Z    PZ  1.0  0.8413
 100 
The value of the hazard function is given by
f 1900 1  R1900 1  0.8413

h1900     0.0019 failure / cycle
R1900  Rt  1000.8413
The normal distribution is flexible enough to make it a very useful empirical model. It
can be theoretically derived under assumptions that matching many failure mechanisms
resulting from chemical reactions and processes.
141
Exercises 11
1. A operating unit is supported by n  1 identical units on cold standby. When it fails, a

unit from standby takes its place. The system fails if all n units fail. Assume that units on
standby cannot fail and the lifetime of each unit follows the exponential distribution with
failure rate  .
(a) What is the distribution of the system lifetime?
(b) Determine the reliability of the standby system for a mission of 100 hours
when   0.0001 per hour and n  5 .
2. A fax machine with constant failure rate  will survive for a period of 720 hours
without failure, with probability 0.80.
(a) Determine the failure rate  .
(b) Determine the probability that the machine, which is functioning after 600
hours, will still function after 800 hours.
(c) Find the probability that the machine will fail within 900 hours, given that the
machine was functioning at 720 hours.
3. A diesel is known to have an operating life (in hours) that fits the following pdf
f t  
2a
, t0
t  b2
The average operating life of the diesel has been estimated to be 8760 hours.
(a) Determine a and b.
(b) Determine the probability that the diesel will not fail during the first 6000
operating hours.
(c) If the manufacturer wants no more than 10% of the diesel returned for
warranty service, how long should the warranty be?
4. The failure rate for a hydraulic component is given by
ht  
t
, t 0
t 1
Where t is time in years.
(a) Determine the reliability function Rt  .
(b) Determine the MTTF of the component.
142
Bibliography
1. Walpole, R. E., Myers, R. H., Myers, S. L., Ye K., (2002), “Probability and
Statistics for Engineers and Scientists”, 7th edition, Prentice Hall.
2. Miller, I., Miller M., (1999), “John E. Freunds’s Mathematical Statistics”, 6th
edition, Prentice Hall.
3. http://www.springer.com/978-1-85233-950-0
143
144
145
146
147
148

UDSM Statistics and Probability For Non-Majors

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UDSM Statistics and Probability For Non-Majors

Uploaded by

Copyright:

Available Formats

1

PRESENTATION OF STATISTICAL DATA

Statistics is a Science of collecting, organizing, summarizing, presenting and analyzing of

1.2 Frequency distribution

1.2.1. Ungrouped data

Table 1.1 Frequency table of ungrouped data

Number Tally Freq. Relative Percentage

2 5 5/30 = 0.1667 16.67

3 8 8/30 = 0.2666 26.66

4 5 5/30 = 0.1667 16.67

5 4 4/30 = 0.1333 13.33

6 3 3/30 = 0.1000 10.00

TOTAL 30 1.0 100.00

1.2.2 Grouped data with classes of equal width

Consider the following two sets of data

The following steps may however be helpful in formulating such a distribution.

Construct a frequency distribution using seven classes

Table 1.2 Frequency table of grouped data

Class Tally Frequenc

1.2.3 Classes of unequal width/size

Formulate a grouped frequency distribution of five classes only.

Table 1.3 Frequency table of grouped data with unequal width

Class Tally Frequency

The above concepts are defined below;

Limits Boundaries Class Class

A histogram is a graphic representation of frequency distribution, with vertical rectangles

Table 1.5 Class boundaries and frequency

Class Boundaries frequency

The histogram is given below;

The Histogram for Amount Spent in Grocery

1.5 Frequency Polygon

Class Class mark Frequency

The frequency Polygon

1.6 Cumulative Frequency Polygon (OGIVE)

Class Boundaries Frequency Upper Cum.

The cumulative frequency polygon (OGIVE) is shown below.

Cumulative Frequency Polygon

3. By considering the frequency table obtained in (1) above, do the following

MEASURES OF AVERAGE AND DISPERSION

2.2 Measures of Central Tendency

2.2.1 Ungrouped data

Definition 2.1 (Geometric mean) Consider n observations x1 , x2 ,, xn , then the

Definition 2.2 (Harmonic mean) Consider a set of n observations x1 , x2 ,, xn ,

2.2.2 Grouped Data

2.2.2.1 The Arithmetic Mean

where k is the number of classes and f i n

2.2.2.2 The median

Mode  x̂  of a grouped data can be computed using the formula

Compute (a) Arithmetic mean (b) Median height (c) Mode.

Class Class mark Freq  f i  Cum. f i xi

2.3 Measures of Dispersion

2.3.1 Ungrouped data

Mean Absolute Deviation

Variance and Standard Deviation

And the sample variance denoted by s 2 , is given by the formula

The quartile deviation (Q.D) is given by Q.D 

Solution We first compute the arithmetic mean as

Then, mean absolute deviation is given by

Sample standard deviation is given by

For quartile deviation, we proceed as follows;

Then, Quartile deviation =

2.3.2 Grouped data

2.3.2.1 Mean Absolute Deviation