You are on page 1of 330

The word correlation is used in everyday life to denote some

form of association. We might say that we have noticed a


correlation between foggy days and attacks of wheeziness.
However, in statistical terms we use correlation to denote
association between two quantitative variables. We also
assume that the association is linear, that one variable
increases or decreases a fixed amount for a unit increase or
decrease in the other. The other technique that is often used in
these circumstances is regression, which involves estimating
the best straight line to summarise the association.

35
If two sets of variables vary in such a way that the
changes of one set are related by changes in the other,
then these sets are said to be correlated. For example,
there is a relation between income and expenditure,
height and weight, rainfall and production, demand and
price, etc.
(OR)
Correlation measures the degree of relationship between
two variables.

36
The following are the types of correlation.

 Positive and Negative Correlation.


 Simple, Partial and Multiple Correlation.
 Linear and Non linear Correlation.

37
A positive correlation is a relationship between two
variables in which both variables move in the same
direction. Therefore, when one variable increases as the
other variable increases, or one variable decreases while
the other decreases. An example of positive
correlation would be height and weight, household
income and expenditure.

38
A negative correlation is a relationship between two
variables in which an increase in one variable is
associated with a decrease in the other. An example of
negative correlation would be price and demand of
goods, unemployment and purchasing power.

39
 Simple Correlation
Simple correlation is defined as a variation related amongst any two
variables. E.g. Income and expenditures.
 Multiple Correlation

The correlation among three or more variable is called multiple correlation.


E.g. Production of rice , amount of rainfall and average daily temperature.
 Partial Correlation

Correlation between two variables where three or more variables are included
is called partial correlation. E.g. Correlation between production of rice and
amount of rainfall after removing the effect of third variable as average daily
temperature.
 Linear Correlation

If the relation between x and y is expressed as 𝑦 = 𝑎 + 𝑏𝑥 or if values of x


and y are close to straight line in a graph, it is known as linear correlation.
 Non Linear Correlation

When the amount of change in one variable is not in a constant ratio to the
change in the other variable, we say that the correlation is non linear.

40
Correlation lying between −1 𝑡𝑜 + 1 and denoted by ′𝑟′.

Where
 𝑟 < 0, Negative correlation
 𝑟 > 0, Positive correlation
 𝑟 = 0, No relationship between variables
 𝑟 = 1, Perfect positive correlation
 𝑟 = −1, Perfect Negative correlation

41
42
43
A scatter diagram is the simplest way of the diagrammatic representation of bivariate
data. One variable is represented along the X-axis and the other variable is represented
along the Y-axis. The pair of points are plotted on the two dimensional graph. The
diagram of points so obtained is known as scatter diagram. The direction of flow of
points shows the type of correlation that exists between the two given variables.

44
When there exists some relationship between two
measurable variables, we compute the degree of
relationship using the correlation coefficient.

Or

Where

45
 The correlation coefficient between X and Y is same as the
correlation coefficient between Y and X (i.e, rxy = ryx ).
 The correlation coefficient is free from the units of
measurements of X and Y
 The correlation coefficient is unaffected by change of scale
and origin.
 Thus, if ui = [xi – A] /c and vi = [yi – B] /d with c ≠ 0 and d ≠ 0
i=1,2, ..., n

where A and B are arbitrary values.


Remark 1: If the widths between the values of the variables are
not equal then take c = 1 and d = 1.

46
Example

47
48
Example

49
50
The Coefficient of determination is defined as the square
of the coefficient of correlation and when multiplied by
100, it gives the proportion of the variance in the
dependent variable that is predictable from the
independent variable.
E.g. 𝑟 = 0.80
𝑟 2 = 0.64 × 100 = 64%
In other words we can say that the regression equation is
64% reliable to be used for estimation.

51
To check the reliability (or significance) of coefficient of
correlation r, probable error is used. The formula for
calculating the probable error is
1−𝑟 2
𝑃. 𝐸 = 0.6745
𝑛
Where ‘r’ is the coefficient of correlation and ‘n’ is the
number of pairs of observations.

52
(a) If r is less than P.E, then there is no evidence of
correlation (i.e. the correlation is not significant).
(b) If r is greater than 6 × 𝑃. 𝐸, then there is certain
correlation (i.e. coefficient of correlation is
significant).
(c) If the P.E is comparatively smaller than the
coefficient of correlation then the following rules
hold good:
(i) If r is less than 0.3, correlation is
insignificant i.e. there is not much evidence
of correlation.
(ii) If r is more than 0.3, then there is good
evidence of correlation.
53
Example

Using the result of previous example we have 𝑟 = 0.60


and 𝑛 = 8
Now will calculate P.E
1−𝑟 2
𝑃. 𝐸 = 0.6745
𝑛
= 0.15
Now 6 × 𝑃. 𝐸 = 0.90
Since r is not greater than 6 × 𝑃. 𝐸, but it is more than
0.3. Hence, there is good evidence of correlation.

54
In 1904, Charles Edward Spearman, a British
psychologist found out the method of ascertaining the
coefficient of correlation by ranks. This method is based
on rank. This measure is useful in dealing with
qualitative characteristics, such as intelligence, beauty,
morality, character, etc. It cannot be measured
quantitatively, as in the case of Pearson’s coefficient of
correlation.
Rank correlation is applicable only to individual
observations. The result we get from this method is only
an approximate one, because under ranking method
original value are not taken into account.

55
The formula for Spearman’s rank correlation which is denoted
by ρ (pronounced as row) is

where
d = The difference of two ranks = R X - RY and
N = Number of paired observations.
Rank coefficient of correlation value lies between –1 and +1.
Symbolically, –1 ≤ ρ ≤ +1
When we come across spearman’s rank correlation, we may find
three types of problem
(i) When ranks are given
(ii) When ranks are not given
(iii) When the values of the series are the same.

56
CASE#01

Example

57
58
Example

59
Solution:

60
61
Presenter: Ms. Sidra Raees

LECTURE 04
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
CASE#02

Example

2
Solution:

3
4
CASE#03

Example
Calculate spearman‟s rank correlation of the following
data.

X 50 55 65 50 55 60 50 65 70 75
Y 110 110 115 125 140 115 130 120 115 160

5
6
Formula

6 𝑑2 + 𝑚1 3 − 𝑚1 + 𝑚2 3 − 𝑚2 + ⋯ + 𝑚𝑛 3 − 𝑚𝑛
𝜌=1−
𝑁 𝑁2 − 1
= 0.155 (Negligible and no relation)

Where
𝑛 = 10
𝑚𝑖 = 2,2,2,3,3
𝑑 2 = 134

7
CASE#03

Example
Calculate spearman‟s rank correlation of the following
data.

8
9
Formula

6 𝑑2 + 𝑚1 3 − 𝑚1 + 𝑚2 3 − 𝑚2 + ⋯ + 𝑚𝑛 3 − 𝑚𝑛
𝜌=1−
𝑁 𝑁2 − 1

Where
𝑛 = 10
𝑚1 = 2

10
Rank correlation coefficient measure the degree of
agreement between two ranking but sometimes it happen
that the individual or object are ranked by more than two
person or judges. In that case we have to find out the
measure of element among the judges. This can be
calculated by the following formula:
12𝑆
𝐶= 2 3
𝑚 𝑛 −𝑛
𝑚 𝑛+1
Where 𝑆 = 𝑥−𝑥 2 , 𝑥=
2
𝑚 = 𝑛𝑜. 𝑜𝑓 𝑗𝑢𝑑𝑔𝑒𝑠
𝑛 = 𝑛𝑜. 𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠

11
Example
The following data give ranking of six persons for their
ability by three judges P, Q and R. Calculate coefficient
of concordance.

12
Here
𝑚 = 3, 𝑛 = 6, 𝑥 = 10.5
12𝑆
𝐶= 2 3
𝑚 𝑛 −𝑛
𝐶 = 0.34

13
Regression analysis, in general sense, means the
estimation or prediction of the unknown value of one
variable from the known value of the other variable. It is
one of the most important statistical tools which is
extensively used in almost all sciences, Natural, Social
and Physical. It is specially used in business and
economics to study the relationship between two or more
variables that are related causally and for the estimate of
demand and supply graphs, cost functions, production
and consumption functions and so on.

14
A line of regression is a line which gives the best
estimate for the values of X for any given value of Y.
There are two lines on Regression:
(i) Line Y on X
(ii) Line X on Y

15
Y on X
The regression line Y on X is used for estimating Y. If
there is a linear relationship between the two variables X
and Y, the equation y= 𝑎 + 𝑏𝑦𝑥 𝑥 , is called the
regression equation Y on X. Where a and b are some
constants which determine the line.
Where
𝑛 𝑥𝑦−( 𝑥)( 𝑦)
𝑏𝑦𝑥 =
𝑛 𝑥2− 𝑥 2
𝑎 = 𝑦 − 𝑏𝑦𝑥 𝑥
𝑏𝑦𝑥 is called regression coefficient.

16
X on Y
The regression line X on Y is used for estimating X. If
there is a linear relationship between the two variables X
and Y, the equation x = 𝑎 + 𝑏𝑥𝑦 𝑦 , is called the
regression equation X on Y. Where a and b are some
constants which determine the line.
Where
𝑛 𝑥𝑦−( 𝑥)( 𝑦)
𝑏𝑥𝑦 =
𝑛 𝑦2− 𝑦 2
𝑎 = 𝑥 − 𝑏𝑥𝑦 𝑦
𝑏𝑥𝑦 is called regression coefficient.

17
The regression lines of Y on X and X
on Y are also called least squares lines
of regression.

18
Example

19
20
21
The following table give the age of cars of a certain make and
actual maintenance cost.

(i) Calculate correlation between age of cars and their


maintenance cost.
(ii) Calculate coefficient of determination 𝑟 2 .
(iii) Estimate probable error.
(iv) Obtain regression equation for cost related to age.
(v) Estimate the maintenance cost of 10 year old car.

22
12
As an application of probability, there are two more concepts
namely random variables and probability distributions. Before
seeing the definition of probability distribution, random variable
needs to be explained. It has been a general notion that if an
experiment is repeated under identical conditions, values of the
variable so obtained would be similar. However, there are situations
where these observations vary even though the experiment is
repeated under identical conditions. As the result, the outcomes of
the variable are unpredictable and the experiments become random.

We have already learnt about random experiments and formation of


sample spaces. In a random experiment, we are more interested
in, x number associated with the outcomes in the sample space
rather than the individual outcomes. These numbers vary with
different outcomes of the experiment. Hence it is a variable. That
is, this value is associated with the outcome of the random
experiment. To deal with such situation we need a special type of
variable called random variable.
13
A variable whose values are determined by the outcomes
of a random experiment is called a Random Variable.
To fully understand, let us consider the following
example:
Suppose an experiment consists of tossing a coin two
times. The sample space of the possible outcomes is
S = {HH, HT, TH, TT}. The set of possible outcomes is
not a numerical quantity and suppose we are interested in
the number of Heads. To express the outcomes in
numbers, we assign to each non-numerical outcomes of
the sample space in terms of numerical values as;

14
Sample Space HH TH HT TT
X 2 1 1 0

That is;
X(HH) = 2, X(HT) = 1, X(TH) = 1, X(TT) = 0
Therefore the numbers 2, 1, 0 in the above example are
random quantities determined by the outcomes of the
random experiment. Such a numerical quantity whose
value is determined by the outcome of a random
experiment, is called a random variable. Thus, the
number of heads obtained in the experiment of tossing a
coin two times in the above example are the values of
random variable.

15
A random variable is also called a chance
variable, a Stochastic variable or simply a
variate. We shall denote a random variable by
capital letters X, Y, Z, etc. and the values of the
random variable are denoted by the
corresponding small letters x, y, z, etc.

16
Random variable may be Discrete or Continuous. If the
random variable takes on the integer values (i.e. the values in
whole numbers) such as 0, 1, 2, 3,……., then it is called a
discrete random variable. For example, the number of
defective items in a sample, the number of printing mistakes
in each page of a book, the number of telephone calls
received by an office of a firm, etc. A discrete random
variable may be defined as a random variable whose values
form a finite (or countably infinite) set of numbers.
If the random variable can take any value (i.e. numerical or
fractional) within a given interval, then it is called a
Continuous Random Variable. For example, height of a
person, weight of a baby, temperature at a place, etc.

17
For a discrete random variable X, a table, a graph or a
formula showing all possible values of the random variable X
i.e. 𝑥1 , 𝑥2 , 𝑥3 , … … . . 𝑥𝑛 with their corresponding probabilities
𝑃 𝑋 = 𝑥1 , 𝑃 𝑋 = 𝑥2 , 𝑃 𝑋 = 𝑥3 , … … … , 𝑃 𝑋 = 𝑥𝑛
Is called a discrete probability distribution of the random
variable X. In any probability distribution, the sum of all
probabilities should be equal to unity.

NOTE:
The probability distribution of a discrete random variable X is
also called probability mass function or simple probability
function of the random variable X.

18
A discrete probability distribution must posses the
following properties:

 𝑓 𝑥 ≥ 0,
 𝑥 𝑓 𝑥 = 1,
 𝑃 𝑋 = 𝑥 = 𝑓(𝑥).

19
Example
Suppose a unbiased coin is tossed 3 times, then find
probability distribution of the random variable “No. of
Heads” in the following forms:
(a) Tabular form (b) Graphic Form

Solution:

(a) Tabular Form


Since S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Let X = No. of Heads = 0, 1, 2 and 3, then

20
1
𝑓 0 = 𝑃 𝑋 = 0 = 𝑃 𝑎𝑙𝑙 𝑇𝑎𝑖𝑙𝑠 =
8
3
𝑓 1 = 𝑃 𝑋 = 1 = 𝑃 1 − 𝐻𝑒𝑎𝑑 𝑎𝑛𝑑 2 − 𝑇𝑎𝑖𝑙 =
8
3
𝑓 2 = 𝑃 𝑋 = 2 = 𝑃 2 − 𝐻𝑒𝑎𝑑 𝑎𝑛𝑑 1 − 𝑇𝑎𝑖𝑙 =
8
1
𝑓 3 = 𝑃 𝑋 = 3 = 𝑃 3 𝐻𝑒𝑎𝑑𝑠 =
8
Therefore, the probability distribution (i.e. probability
mass function) in tabular form is
𝒙 0 1 2 3
𝑓 𝑥 = 𝑃(𝑋 = 𝑥) 1 3 3 1
8 8 8 8

21
(b) Graphic Form

22
Example
A shipment of 20 similar laptop computers to a retail
outlet contains 3 that are defective. If a school makes a
random purchase of 2 of these computers, find the
probability distribution for the number of defectives.

Solution:
Let X be a random variable whose values x are the
possible numbers of defective computers purchased by
the school. Then x can only take the numbers 0, 1, and 2.
Now

23
3𝐶 × 17𝐶 68
0 2
𝑓 0 =𝑃 𝑋=0 = 20𝐶
=
2 95
3𝐶 × 17𝐶 51
1 1
𝑓 1 =𝑃 𝑋=1 = 20𝐶
=
2 190
3
𝐶2 × 17𝐶0 3
𝑓 2 =𝑃 𝑋=2 = 20𝐶
=
2 190

Thus, the probability distribution of X is

𝒙 0 1 2
𝑓(𝑥) 68 51 3
95 190 190

24
Example
A bag contains two white and three black balls. Two balls
are selected at random. Find the probability distribution
for the number of white balls.

Solution:
Let X = No. of white balls, then the possible values of
x = 0, 1, and 2. Now find the probabilities for x = 0, 1, 2

25
2𝐶 × 3𝐶 3
0 2
𝑓 0 =𝑃 𝑋=0 = 5𝐶
=
2 10
2𝐶 × 3𝐶 6
1 1
𝑓 1 =𝑃 𝑋=1 = 5𝐶
=
2 10
2
𝐶2 × 3𝐶0 1
𝑓 2 =𝑃 𝑋=2 = 5𝐶
=
2 10
Therefore, the probability distribution for the number of
white balls is

𝒙 0 1 2
𝑓 𝑥 = 𝑃(𝑋 = 𝑥) 3 6 1
10 10 10

26
The cumulative distribution function 𝐹(𝑥) of a discrete
random variable X with probability distribution 𝑓 𝑥 is

𝐹 𝑥 =𝑃 𝑋≤𝑥 = 𝑓 𝑡 , 𝑓𝑜𝑟 − ∞ < 𝑥 < ∞


𝑡≤𝑥

27
Example
Find the cumulative distribution of the random variable
X for the following probability distribution:

Solution:
The cumulative distribution function of coin is
1
𝐹 0 =𝑓 0 =
8
1 3 4
𝐹 1 =𝑓 0 +𝑓 1 = + =
8 8 8

28
1 3 3 7
𝐹 2 =𝑓 0 +𝑓 1 +𝑓 2 = + + =
8 8 8 8
1 3 3 1
𝐹 3 =𝑓 0 +𝑓 1 +𝑓 2 +𝑓 3 = + + + =1
8 8 8 8

Hence,
0, 𝑓𝑜𝑟 𝑥 < 0
1
, 𝑓𝑜𝑟 0 ≤ 𝑥 < 1
8
4
, 𝑓𝑜𝑟 1 ≤ 𝑥 < 2
8
7
, 𝑓𝑜𝑟 2 ≤ 𝑥 < 3
8
1, 𝑓𝑜𝑟 𝑥 ≥ 3

29
30
Example
A random variable x has the following probability
distributions.
𝒙 0 1 2 3 4 5 6 7
𝑃 𝑥 = 𝑓(𝑥) 0 k 2k 2k 3k 𝑘2 2𝑘 2 7𝑘 2 + 𝑘

Find
(a) K
(b) 𝑃 𝑥 < 6
(c) 𝑃 𝑥 ≥ 6
(d) 𝑃 0 < 𝑥 < 5
(e) Distribution function (CDF)

31
Solution:
(a) 𝑓 𝑥 =1
10𝑘 2 + 9𝑘 = 1
10𝑘 2 + 9𝑘 − 1 = 0
10𝑘 2 + 10𝑘 − 𝑘 − 1 = 0
𝑘 + 1 10𝑘 − 1 = 0
1
𝑘 = −1 , 𝑘=
10
𝒙 0 1 2 3 4 5 6 7
𝑃(𝑥) 0 0.1 0.2 0.2 0.3 0.01 0.02 0.17

32
(b) 𝑃 𝑥 < 6 = 𝑃 0 + 𝑃 1 + 𝑃 2 + 𝑃 3 + 𝑃 4 + 𝑃(5)
= 0 + 0.1 + 0.2 + 0.2 + 0.3 + 0.01
= 0.81
(OR)
𝑃 𝑥 <6 =1−𝑃 𝑥 ≥6
= 1 − *𝑃 6 + 𝑃(7)+
= 1 − 0.19
= 0.81

(c) 𝑃 𝑥 ≥ 6 = 𝑃 6 + 𝑃 7
= 0.02 + 0.17
= 0.19

33
(d) 𝑃 0 < 𝑥 < 5 = 𝑃 1 + 𝑃 2 + 𝑃 3 + 𝑃 4
= 0.1 + 0.2 + 0.2 + 0.3
= 0.8

(e) Cumulative Distribution Function (CDF)


0, 𝑥 ≤ 0
0.1, 𝑥 ≤ 1
0.3, 𝑥 ≤ 2
0.5, 𝑥 ≤ 3
0.8, 𝑥 ≤ 4
0.81, 𝑥 ≤ 5
0.83, 𝑥 ≤ 6
1, 𝑥≤7

34
Presenter: Ms. Sidra Raees

LECTURE 08
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
The function 𝑓 𝑥 is a probability density function (pdf) for the
continuous random variable X, defined over the set of real numbers, if

 𝑓 𝑥 ≥ 0, −∞ < 𝑥 < ∞


−∞
𝑓 𝑥 =1
𝑏
 𝑃 𝑎<𝑋<𝑏 = 𝑎
𝑓(𝑥)
 𝑃 𝑎<𝑋<𝑏 =𝑃 𝑎≤𝑋<𝑏 =𝑃 𝑎<𝑋≤𝑏 =𝑃 𝑎≤𝑋≤𝑏

NOTE:
The probability distribution of a continuous random variable X is also
called probability density function (pdf) or simple density function of the
random variable X.

2
Example
Suppose that the error in the reaction temperature, in ℃,
for a controlled laboratory experiment is a continuous
random variable X having the probability density
function.
𝑥2
𝑓 𝑥 = 3 , −1 < 𝑥 < 2
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

(a) Verify that 𝑓(𝑥) is a density function.


(b) Find 𝑃(0 < 𝑋 ≤ 1)

3
Solution:
(a) We have

−∞
𝑓 𝑥 =1
2 2
𝑥
𝑑𝑥 = 1
−1 3
𝑥3 2
=1
9 −1
8 1
+ =1
9 9
1=1
1 𝑥2 𝑥3 1 1
(b) 𝑃 0 < 𝑋 ≤ 1 = 𝑑𝑥 = =
0 3 9 0 9

4
Example

The probability density function (pdf) of a random


variable X is given by
𝐶
𝑓𝑜𝑟 0 < 𝑥 < 4
𝑓 𝑥 = 𝑥
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find
(a) The value of C.
1
(b) 𝑃 𝑋<
4
(c) 𝑃(𝑋 > 1)

5
Solution:

(a) −∞
𝑓 𝑥 =1
4
𝐶
𝑑𝑥 = 1
0 𝑥
1 4
2𝐶𝑥 2 = 1
0
4𝐶 = 1
1
𝐶=
4
1 1
1 1 1 1 1
(b) 𝑃 𝑋 < = 4
0 4 𝑥
𝑑𝑥 = 𝑥 2 04 =
4 2 4
4 1 1 1 4 1
(c) 𝑃 𝑋 > 1 = 1 4 𝑥
𝑑𝑥 = 𝑥2 1 =
2 2

6
The cumulative distribution function 𝐹(𝑥) of a
continuous random variable X with density function 𝑓 𝑥
is

𝑥
 𝐹 𝑥 =𝑃 𝑋≤𝑥 = 𝑓
𝑡 𝑑𝑡, 𝑓𝑜𝑟 − ∞ < 𝑥 < ∞
−∞
 𝑃 𝑎 < 𝑋 < 𝑏 = 𝐹 𝑏 − 𝐹(𝑎)
𝑑
 𝑓 𝑥 = 𝑑𝑥
𝐹(𝑥)

Note:
Integrate pdf to find cdf and differentiate cdf to find pdf.

7
Example
For the density function of example on slide no. 04 , find
𝐹 𝑥 , and use it to evaluate 𝑃 0 < 𝑋 ≤ 1 .

Solution:
𝑥 𝑥
𝑡2 𝑡3 𝑥 𝑥3 + 1
𝐹 𝑥 = 𝑓 𝑡 𝑑𝑡 = 𝑑𝑡 = =
−∞ −1 3 3 −1 9
Now
𝑃 0 < 𝑋 ≤ 1 = 𝐹 1 − 𝐹(0)
2 1
=9−9
1
= 9
Which agrees the result obtained by using the density
function.

8
Example

The Department of Energy (DOE) puts out on bid and


generally estimates what a reasonable bid should be. Call the
estimate b. The DOE has determined that the density function
of the winning (low) bid is

5 2
𝑓 𝑦 = , 𝑏 ≤ 𝑦 ≤ 2𝑏
8𝑏 5
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Find 𝐹(𝑦) and use it to determine the probability that the


winning bid is less than the DOE’s preliminary estimate b.

9
Solution:

2
For 𝑏 ≤ 𝑦 ≤ 2𝑏,
5
𝑦
5 5𝑡 𝑦 5𝑦 1
𝐹 𝑦 = 𝑑𝑡 = = −
2𝑏 8𝑏 8𝑏 2𝑏 8𝑏 4
5 5

To determine the probability that the bid is less than the


preliminary bid estimate b, we have
5𝑏 1 3
𝑃 𝑌≤𝑏 =𝐹 𝑏 = − =
8𝑏 4 8

10
11
An extremely useful concept in problems involving random
variables or distributions is that of expectation. Random
variables can be characterized and dealt with effectively for
practical purposes by consideration of quantities called their
expectation. The concept of mathematical expectation arose
in connection with games of chance. For example, a gambler
might be interested in his average winnings at a game, a
businessman in his average profits on a product, and so on.
The average value of a random phenomenon is also termed as
its Mathematical expectation or expected value. In the
following sections, we will define and study the concept of
mathematical expectation for both discrete and continuous
random variables, which will be used in the following
subsection.

12
Probability distribution gives us an idea about the likely value
of a random variable and the probability of the various events
related to random variable. Even though it is necessary for us
to explain probabilities using central tendencies, dispersion,
symmetry and kurtosis. These are called descriptive measures
and summary measures. Like frequency distribution we have
to see the properties of probability distribution. This section
focuses on how to calculate these summary measures. These
measures can be calculated using
i. Mathematical Expectation and variance.
ii. Moments.

13
14
 𝐸 𝑐 = 𝑐, where c is a constant
 𝐸 𝑐𝑋 = 𝑐𝐸 𝑋 , where c is a constant
 𝐸 𝑎𝑋 + 𝑏 = 𝑎𝐸 𝑋 + 𝑏

Variance
The variance of a random variable X will be a measure of
the spread or dispersion of the density of X or simply the
variability in the values of a random variable.

 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 2 − 𝐸(𝑋) 2

15
Example

Solution:

16
Example

17
Solution:

18
Example

Solution:

19
Example

Solution:

20
21
Example

A salesperson for a medical device company has two


appointments on a given day. At the first appointment, he
believes that he has a 70% chance to make the deal, from
which he can earn $1000 commission if successful. On
the other hand, he thinks he only has a 40% chance to
make the deal at the second appointment, from which, if
successful, he can make $1500. What is the expected
commission based on his own probability belief? Assume
that the appointment results are independent of each
other.

22
Solution:
First, we know that the salesperson, for the two appointments,
can have 4 possible commission totals: $0, $1000, $1500 and
$2500. We then need to calculate their associated
probabilities. By independence, we obtain
𝑓 $0 = 1 − 0.7 1 − 0.4 = 0.18
𝑓 $2500 = 0.7 0.4 = 0.28
𝑓 $1000 = 0.7 1 − 0.4 = 0.42
𝑓 $1500 = 1 − 0.7 0.4 = 0.12
Therefore, the expected commission for the salesperson is
𝐸 𝑋 = $0 0.18 + $1000 0.42 + $1500 0.12
+ $2500 0.28
= $1300

23
24
Example

Solution:

25
Example

Solution:

26
27
Example

Solution:

28
29
Example

Solution:

30
Another approach helpful to find the summary measures
for probability distribution is based on the ‘moments’.
We will discuss two types of moments.

i. Moments about origin. (Origin may be zero or any


other constant say A ). It is also called as raw moments.
ii. Moments about mean is called as central moments.

31
32
𝑬(𝒙 − 𝒙)𝒓 , 𝒇𝒐𝒓 𝒅𝒊𝒔𝒄𝒓𝒆𝒕𝒆 𝒓𝒂𝒏𝒅𝒐𝒎 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆
𝒓 ∞
𝝁𝒓 = 𝑬 𝑿 − 𝑿 =
𝒙 − 𝒙 𝒓 𝒇 𝒙 𝒅𝒙 , 𝒇𝒐𝒓 𝒄𝒐𝒏𝒕𝒊𝒏𝒖𝒐𝒖𝒔 𝒓𝒂𝒏𝒅𝒐𝒎 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆
−∞

Note:
In the calculation of moments about mean we generally use
the relationship between moments about mean and about
origin.

33
1. 𝜇1 = 0
2. 𝜇2 = 𝜇2 ′ − 𝜇1 ′ 2
3. 𝜇3 = 𝜇3 ′ − 3𝜇1 ′ 𝜇2 ′ +2 𝜇1 ′ 3
4. 𝜇4 = 𝜇4 ′ − 4𝜇3 ′ 𝜇1 ′ + 6𝜇2 ′ 𝜇1 ′ 2 − 3 𝜇1 ′ 4

34
Skewness

Kurtosis

35
Example

Solution:

36
37
38
Presenter: Ms. Sidra Raees

LECTURE 09
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
The probability of the various values of the random variables are obtained in
accordance with the events and the nature of the experiment

In this chapter we are going to see some distributions called theoretical


distributions. In these distributions, probabilities of the events are to be
obtained using (formula) derived under certain conditions or assumptions. Of
the many distributions available, the more common are Bernoulli, Binomial,
Poisson, Hypergeometric and Normal distributions.

In practical situations one has to thoroughly understand the random


environment and to describe it. It is followed by suggesting one of the above
probability functions suitable to the situation and to obtain the requirement.
The characteristics of the probability distributions such as Central Tendency,
Dispersions, and Skewness are also to be studied.

2
Note:

The frequency distributions are of two types namely Observed


frequency distribution and Theoretical frequency distribution.
The distributions which are based on actual data or
experimentation are called the Observed Frequency distribution.
On the other hand, the distributions based on expectations on the
basis of past experience are known as Theoretical Frequency
distribution or Probability distribution.

3
The following are the two types of Theoretical
distributions:
1. Discrete distribution
2. Continuous distribution

4
Discrete Distribution
In discrete probability distribution we will discuss:
 Binomial distribution
 Poisson distribution
 Hypergeometric distribution

Continuous Distribution
In continuous probability distribution we will discuss
Normal Distribution. Normal distribution is the most
important and powerful of all the distribution in statistics.

5
6
A Bernoulli trial can result in a success with probability
𝑝 and a failure with probability 𝑞 = 1 − 𝑝. Then the
probability distribution of the binomial random variable
X, the number of successes in 𝑛 independent trials, is

𝑏 𝑥; 𝑛, 𝑝 = 𝑛𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥 , 𝑥 = 0, 1, 2, … , 𝑛

7
 The result of each trial can be classified into only two
categories called success or failure.
 The probability of success remains constant from one
trail to the next.
 The successive trials are independent.
 The experiment is repeated a fixed number of times.

Remarks:
 Binomial Distribution has two parameters i.e. 𝑛 and 𝑝.
 The mean and variance of the binomial distribution are
𝜇 = 𝑛𝑝 and 𝜎 2 = 𝑛𝑝𝑞

8
Example
The probability that a certain kind of component will
survive a shock test is 3/4. Find the probability that
exactly 2 of the next 4 components tested survive.

Solution:
Assuming that the tests are independent and 𝑝 = 3/4 for
each of the 4 tests, we obtain
2 4−2
3 4
3 1 27
𝑏 𝑥; 𝑛, 𝑝 = 𝑏 2; 4, = 𝐶2 =
4 4 4 128

9
Example
The probability that a patient recovers from a rare blood
disease is 0.4. If 15 people are known to have contracted
this disease, what is the probability that
(a) at least 10 survive, (b) from 3 to 8 survive, and
(c) exactly 5 survive?

Solution:
Let X be the number of people who survive.
(a) 𝑃 𝑋 ≥ 10 = 15𝑥=10 𝑏 𝑥; 15,0.4 = 0.0338
(b) 𝑃 3 ≤ 𝑋 ≤ 8 = 8𝑥=3 𝑏 𝑥; 15,0.4 = 0.8779
(c) 𝑃 𝑋 = 5 = 𝑏 5; 15,0.4 = 0.1859

10
Example
It is conjectured that an impurity exists in 30% of all
drinking wells in a certain rural community. In order to
gain some insight into the true extent of the problem, it is
determined that some testing is necessary. It is too
expensive to test all of the wells in the area, so 10 are
randomly selected for testing.
(a) Using the binomial distribution, what is the
probability that exactly 3 wells have the impurity,
assuming that the conjecture is correct.
(b) What is the probability that more than 3 wells are
impure?

11
Solution:

(a) We require
𝑏 3; 10, 0.3 = 10𝐶3 (0.3)3 (0.7)10−3
= 0.2668

(b) 𝑃 𝑋 > 3 = 1 − 𝑃 𝑋 ≤ 3
3

=1− 𝑏(𝑥; 10, 0.3)


𝑥=0
= 1 − 0.6496
= 0.3504

12
Example
A T.V channel conducted a poll regarding construction of
dams in Pakistan. 75% people were support at construction,
15% were against and 10% were undecided. A sample of 10
person taken. What is the probability that at least 3 will
support the construction.

Solution:
We have
𝑝 = 0.75, 𝑞 = 0.25, 𝑛 = 10
𝑃 𝑋 ≥3 =1−𝑃 𝑋 <3
= 1 − 2𝑥=0 𝑏(𝑥; 10, 0.75)
10𝐶 0.75 0 0.25 10 +
0
= 1 − 10
𝐶1 0.75 1 0.25 9 + 10𝐶2 0.75 2 0.25 8

= 0.99

13
Example
A and B play a game in which A’s probability of winning
is 2/3. In a series of 8 games, what is the probability that
A will win
(a) Exactly 4 games (b) at least 4 games
(c) 6 or more games (d) from 3 to 6 games.

Solution:
We have
2 1
𝑝= , 𝑞= , 𝑛=8
3 3

14
2 4 1 8−4
(a) P X = 4 = 𝑏 4; 8, = 8𝐶 2
3 4 3 3
= 0.1707

(b) 𝑃(𝑋 ≥ 4) = 1 − 𝑃 𝑋 < 4


3 2
= 1 − 𝑥=0 𝑏 𝑥; 8,
3
= 0.9121

8 2
(c) 𝑃 𝑋 ≥ 6 = 𝑥=6 𝑏 𝑥; 8, = 0.4682
3

6 2
(d) 𝑃(3 ≤ 𝑋 ≤ 6) = 𝑥=3 𝑏 𝑥; 8, = 0.7852
3

15
Example
If on the average rain falls on 9 days in every thirty days, find
the probability that rain will fall on at least two days of a
given week.

Solution:
Probability of raining on a particular day is given by
9 3 7
𝑝 = 30 = 10 and 𝑞 = 1 − 𝑝 = 10.
There are 7 days in a week so the probability of raining for at
least 2 days is given by
𝑃 𝑋 ≥ 2 = 1−𝑃 𝑋 < 2
1 3
=1− 𝑥=0 𝑏 𝑥; 7, 10
= 1 − 0.329
= 0.6706
16
The binomial distribution finds applications in many
scientific fields. An industrial engineer is keenly interested in
the “proportion defective” in an industrial process. Often,
quality control measures and sampling schemes for processes
are based on the binomial distribution. This distribution
applies to any industrial situation where an outcome of a
process is dichotomous and the results of the process are
independent, with the probability of success being constant
from trial to trial. The binomial distribution is also used
extensively for medical and military applications. In both
fields, a success or failure result is important. For example,
“cure” or “no cure” is important in pharmaceutical work, and
“hit” or “miss” is often the interpretation of the result of firing
a guided missile.

17
18
Experiments yielding numerical values of a random variable
X, the number of outcomes occurring during a given time
interval or in a specified region, are called Poisson
experiments. The given time interval may be of any length,
such as a minute, a day, a week, a month, or even a year. For
example, a Poisson experiment can generate observations for
the random variable X representing the number of telephone
calls received per hour by an office, the number of days
school is closed due to snow during the winter, or the number
of games postponed due to rain during a baseball season. The
specified region could be a line segment, an area, a volume,
or perhaps a piece of material. In such instances, X might
represent the number of field mice per acre, the number of
bacteria in a given culture, or the number of typing errors per
page. A Poisson experiment is derived from the Poisson
process.

19
The probability distribution of the Poisson random
variable X, representing the number of outcomes
occurring in a given time interval or specified region
denoted by t, is

𝑒 −𝜆𝑡 𝜆𝑡 𝑥
𝑝 𝑥; 𝜆𝑡 = , 𝑥 = 0, 1, 2, … .
𝑥!

Where 𝜆 is the average number of outcomes per unit


time, distance, area, or volume and 𝑒 = 2.71828 … . .
Remarks:
Mean = Variance = 𝜆𝑡

20
Example
During a laboratory experiment, the average number of
radioactive particles passing through a counter in 1
millisecond is 4. What is the probability that 6 particles
enter the counter in a given millisecond?

Solution:
Using the Poisson distribution with 𝑥 = 6 and 𝜆𝑡 = 4, we
have
𝑒 −4 4 6
𝑝 𝑥; 𝜆𝑡 = 𝑃 6; 4 = = 0.1042
6!

21
Example
Ten is the average number of oil tankers arriving each
day at a certain port. The facilities at the port can handle
at most 15 tankers per day. What is the probability that on
a given day tankers have to be turned away?

Solution:
Let X be the number of tankers arriving each day. Then,
we have
𝑃 𝑋 > 15 = 1 − 𝑃 𝑋 ≤ 15 = 1 − 15 𝑥=0 𝑝( 𝑥; 10)
= 1 − 0.9513
= 0.0487

22
Example
Flaws in a certain type of drapery material appear on the
average of one in 150 square feet. If we assume the
Poisson distribution, find the probability of at most one
flaw in 225 square feet.

Solution:
Taking 150 square feet as the unit area, we have 𝜆 = 1
flaw per 150 square feet.
225
As 225 square feet are = 1.5 units of area, so 𝑡 = 1.5
150
and therefore the average number of flaws per 225 square
feet, i.e. 𝜆𝑡 = 1 × 1.5 = 1.5.

23
Assuming the flaws are a Poisson process, we have

𝑃 𝑋 ≤ 1 = 𝑝 𝑥; 𝜆𝑡 = 𝑝(𝑥; 1.5)
𝑥=0
= 0.2231 + 0.3347
= 0.5578

24
Poisson and binomial distributions give approximately
the same results under the following conditions:
Let X be a binomial random variable with probability
distribution 𝑏 𝑥; 𝑛, 𝑝 . When 𝑛 → ∞, 𝑝 → 0, and
𝑛𝑝 𝜇 remains constant,

𝑏 𝑥; 𝑛, 𝑝 𝑝(𝑥; 𝜇)

25
Example
In a certain industrial facility, accidents occur infrequently. It
is known that the probability of an accident on any given day
is 0.005 and accidents are independent of each other.
(a) What is the probability that in any given period of 400
days there will be an accident on one day?
(b) What is the probability that there are at most three days
with an accident?

Solution:
Let X be a binomial random variable with 𝑛 = 400 and
𝑝 = 0.005. Thus, 𝑛𝑝 = 2. Using the Poisson approximation,
𝑒 −2 2 1
(a) P X = 1 = 𝑝 𝑥; 𝜇 = 𝑃 1; 2 = 1! = 0.271
3 3 𝑒 −2 (2)𝑥
(b) 𝑃 𝑋≤3 = 𝑥=0 𝑝(𝑥; 2) = 𝑥=0 𝑥!
= 0.857

26
Example
In a manufacturing process where glass products are made, defects
or bubbles occur, occasionally rendering the piece undesirable for
marketing. It is known that, on average, 1 in every 1000 of these
items produced has one or more bubbles. What is the probability
that a random sample of 8000 will yield fewer than 7 items
possessing bubbles?

Solution:
This is essentially a binomial experiment with 𝑛 = 8000 and
𝑝 = 0.001. Since 𝑝 is very close to 0 and 𝑛 is quite large, we shall
approximate with the Poisson distribution using
𝜇 = 8000 0.001 = 8
Hence, if X represents the number of bubbles, we have
6

𝑃 𝑋<7 = 𝑏 𝑥; 8000, 0.001 ≈ 𝑝 𝑥; 8 = 0.3134


𝑥=0

27
28
The simplest way to view the distinction between the binomial
distribution and the hypergeometric distribution is to note the way the
sampling is done. The types of applications for the hypergeometric are
very similar to those for the binomial distribution. We are interested in
computing probabilities for the number of observations that fall into a
particular category. But in the case of the binomial distribution,
independence among trials is required. As a result, if that distribution is
applied to, say, sampling from a lot of items (deck of cards, batch of
production items), the sampling must be done with replacement of each
item after it is observed. On the other hand, the hypergeometric
distribution does not require independence and is based on sampling done
without replacement.
Applications for the hypergeometric distribution are found in many areas,
with heavy use in acceptance sampling, electronic testing, and quality
assurance. Obviously, in many of these fields, testing is done at the
expense of the item being tested. That is, the item is destroyed and hence
cannot be replaced in the sample. Thus, sampling without replacement is
necessary.

29
The probability distribution of the hypergeometric
random variable X, the number of successes in a random
sample of size 𝑛 selected from 𝑁 items of which 𝑘 are
labeled success and 𝑁 − 𝑘 labeled failure, is

𝑘𝐶 𝑁−𝑘 𝐶
𝑥 𝑛−𝑥
ℎ 𝑥; 𝑁, 𝑛, 𝑘 = 𝑁𝐶
, 𝑥 = 0, 1, 2 … . , 𝑛
𝑛

30
 The result of each trial can be classified into one of two
categories, say success and failure.
 The probability of success changes on each trial.
 Successive trials are dependent.
 The experiment is repeated a fixed number of times.

Remarks:
 The mean and variance of the hypergeometric
𝑛𝑘 2 𝑁−𝑛 𝑛𝑘(𝑁−𝑘)
distribution are 𝜇 = and 𝜎 = ∙ 2
𝑁 𝑁−1 𝑁

31
Like the binomial distribution, the hypergeometric
distribution finds applications in acceptance sampling,
where lots of materials or parts are sampled in order to
determine whether or not the entire lot is accepted.

32
Example
A box of 8 screws contains 5-defective screws. If a random
sample of 3 screws is selected without replacement. What is
the probability that the number of defective screws in the
sample is 2.

Solution:
Let 𝑥 = 2 = no. of defective screws in the sample
𝑛 = 3 = size of sample, 𝑘 = 5 = no. of successes
𝑁 − 𝑘 = 8 − 5 = 3 = no. of failure
𝑃 𝑋 = 2 = ℎ 2; 8, 3, 5
5𝐶 ∙ 8−5𝐶
2 3−2
= 8𝐶
3
15
=
28

33
Example
Lots of 40 components each are deemed unacceptable if they
contain 3 or more defectives. The procedure for sampling a
lot is to select 5 components at random and to reject the lot if
a defect is found. What is the probability that exactly 1
defective is found in the sample if there are 3 defectives in the
entire lot?

Solution:
Using the hypergeometric distribution with 𝑁 = 40, 𝑛 = 5,
𝑘 = 3, 𝑁 − 𝑘 = 37 𝑎𝑛𝑑 𝑥 = 1, we find the probability of
obtaining 1 defective to be
3
𝐶1 ∙ 40−3𝐶5−1
ℎ 1; 40, 5, 3 = 40𝐶
= 0.3011
5
Once again, this plan is not desirable since it detects a bad lot
(3 defectives) only about 30% of the time.

34
Presenter: Ms. Sidra Raees

LECTURE 10
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
The most important continuous probability distribution in the entire
field of statistics is the normal distribution. Its graph, called the
normal curve, is the bell-shaped curve, which approximately
describes many phenomena that occur in nature, industry, and
research. For example, physical measurements in areas such as
meteorological experiments, rainfall studies, and measurements of
manufactured parts are often more than adequately explained with
a normal distribution. In addition, errors in scientific measurements
are extremely well approximated by a normal distribution. In 1733,
Abraham DeMoivre developed the mathematical equation of the
normal curve. It provided a basis from which much of the theory of
inductive statistics is founded. The normal distribution is often
referred to as Gaussian distribution, in honor of Karl Friedrich
Gauss (1777-1855), who also derived its equation from a study of
errors in repeated measurements of the same quantity.

2
A continuous random variable X having the bell shaped
distribution is called a normal random variable. The
mathematical equation for the probability distribution of
the normal variable depends on the two parameters 𝜇 and
𝜎, its mean and standard deviation, respectively. Hence,
we denote the values of the density of X by 𝑛 𝑥; 𝜇, 𝜎 .

3
The density of the normal random variable X, with mean
𝜇 and variance 𝜎 2 , is

1 1
− 2 𝑥−𝜇 2
𝑛 𝑥; 𝜇, 𝜎 = 𝑒 2𝜎 , −∞ < 𝑥 < ∞
2𝜋𝜎

Where 𝜋 = 3.14159 … and 𝑒 = 2.71828

4
The curve of any continuous probability distribution or
density function is constructed so that the area under the
curve bounded by the two ordinates 𝑥 = 𝑥1 and 𝑥 = 𝑥2
equals the probability that the random variable X
assumes a value between 𝑥 = 𝑥1 and 𝑥 = 𝑥2 . Thus, for
the normal curve
𝑥2
𝑃 𝑥1 < 𝑋 < 𝑥2 = 𝑥 𝑛 𝑥; 𝜇, 𝜎 𝑑𝑥
1
1 𝑥2 − 1 2 𝑥−𝜇 2
= 𝑥1
𝑒 2𝜎 𝑑𝑥
2𝜋𝜎
Is represented by the area
of the shaded region.

5
To determine the area or probability of an interval of a normal
distribution with mean 𝜇 and standard deviation 𝜎, first we
convert X values in Z values using
𝑋−𝜇
𝑍=
𝜎
Where Z is called standard normal variable with mean zero
and standard deviation one.
𝑥2 1
1 − 2 𝑥−𝜇 2
𝑃 𝑥1 < 𝑋 < 𝑥2 = 𝑒 2𝜎 𝑑𝑥
2𝜋𝜎 𝑥1
1 𝑧2 −1𝑧 2 𝑧2
= 𝑧
2𝜋 1
𝑒 2 𝑑𝑥 = 𝑧1
𝑛(𝑧; 0,1)
= 𝑃 𝑧1 < 𝑍 < 𝑧2
= 𝑃 𝑍 < 𝑧2 − 𝑃(𝑍 < 𝑧1 )
Where Z is seen to be a normal random variable with mean 0
and variance 1.

6
7
8
9
Example
Given a standard normal distribution, find the area under
the curve that lies
(a) To right of 𝑧 = 1.84 and
(b) Between 𝑧 = −1.97 and 𝑧 = 0.86

Solution:
(a) 𝑃 𝑍 > 1.84 = 1 − 𝑃 𝑍 < 1.84
= 1 − 0.9671
= 0.0329

10
(b) Between 𝑧 = −1.97 and 𝑧 = 0.86

𝑃 −1.97 < 𝑍 < 0.86 = 𝑃 𝑍 < 0.86 − 𝑃(𝑍 < −1.97)


= 0.8051 − 0.0244
= 0.7807

11
Example
Given a standard normal distribution, find the value of k
such that 𝑃 𝑧 > 𝑘 = 0.3015.

Solution:
𝑃 𝑧>𝑘 = 0.3015
1−𝑃 𝑧 ≤ 𝑘 = 0.3015
𝑃 𝑧<𝑘 = 1 − 0.3015
𝑃 𝑧<𝑘 = 0.6985
𝑃 𝑧<𝑘 = 0.52
So, 𝑘 = 0.52

12
Example
Given a normal distribution with 𝜇 = 40 and 𝜎 = 6, find the
probability (or area) that X assumes a value
(a) Below 42 (b) Above 27
(c) Between 42 and 51

Solution:
Here 𝜇 = 40 and 𝜎 = 6, then
(a) 𝑃 𝑏𝑒𝑙𝑜𝑤 42 = 𝑃 𝑋 < 42
42 − 40
=𝑃 𝑍<
6
= 𝑃 𝑍 < 0.33
= 0.6293

13
(b) 𝑃 𝐴𝑏𝑜𝑣𝑒 27 = 𝑃 𝑋 > 27
= 1 − 𝑃 𝑋 < 27
27−40
=1−𝑃 𝑍 <
6
= 1 − 𝑃 Z < −2.17
= 1 − 0.0151
= 0.9850

(c) 𝑃 42 < 𝑋 < 51 = 𝑃 𝑋 < 51 − 𝑃 𝑋 < 42


51−40 42−40
=𝑃 𝑍< −𝑃 𝑍 <
6 6
= 𝑃 𝑍 < 1.83 − 𝑃 𝑍 < 0.33
= 0.9664 − 0.6293
= 0.3371

14
15
Example
The burning time of an experiment rocket is a random
variable having the normal distribution with 𝜇 = 4.76
seconds and 𝜎 = 0.04 seconds. What is the probability
that this kind of rocket will burn
(a) Less than 4.66 seconds
(b) More than 4.80 seconds
(c) Anything from 4.70 to 4.82 seconds

Solution:
Here, 𝜇 = 4.76 and 𝜎 = 0.04

16
(a) 𝑃 𝑋 < 4.66 = 𝑃 𝑍 < −2.50
= 0.0062

(b) 𝑃 𝑋 > 4.80 = 1 − 𝑃 𝑋 < 4.80


= 1 − 𝑃 𝑍 < 1.00
= 1 − 0.8413
= 0.1587

(c) 𝑃 4.70 < 𝑋 < 4.82 = 𝑃 𝑋 < 4.82 − 𝑃 𝑋 < 4.70


= 𝑃 𝑍 < 1.50 − 𝑃 𝑍 < −1.50
= 0.9332 − 0.0668
= 0.8664

17
Example
A certain type of storage battery lasts, on average, 3.0
years with a standard deviation of 0.5 years. Assuming
that battery life is normally distributed, find the
probability that a given battery will last less than 2.3
years.

Solution:
First construct a diagram, showing the given distribution
of battery lives and the desired area. To find 𝑃(𝑋 < 2.3),
we need to evaluate the area under the normal curve to
the left of 2.3. This is accomplished by finding the area to
the left of the corresponding 𝑧 value.

18
Hence, 𝜇 = 3.0 and 𝜎 = 0.5
and we find that

𝑃 𝑋 < 2.3 = 𝑃 𝑍 < −1.4


= 0.0808

19
Example
A certain machine makes electrical resistors having a mean
resistance of 40 ohms and a standard deviation of 2 ohms.
Assuming that the resistance follows a normal distribution
and can be measured to any degree of accuracy, what
percentage of resistors will have a resistance exceeding 43
ohms?

Solution:
A percentage is found by multiplying the relative frequency
by 100%. Since the relative frequency for an interval is equal
to the probability of a value falling in the interval, we must
find the area to the right of 𝑥 = 43 in figure. This can be done
by transforming 𝑥 = 43 to the corresponding 𝑧 value,
obtaining the area to the left of 𝑧 from Table, and then
subtracting this area from 1.
20
We find
𝑃 𝑋 > 43 = 𝑃 𝑍 > 1.5
= 1 − 𝑃 𝑍 < 1.5
= 1 − 0.9332
= 0.0668
Hence 6.68% of the resistors will have a resistance
exceeding 43 ohms.

21
22
In any statistical investigation, the interest lies in the assessment of one or
more characteristics relating to the individuals belonging to a group.
When all the individuals present in the study are investigated, it is called
complete enumeration, but in practice, it is very difficult to investigate all
the individuals present in the study. So the technique of sampling is done
which states that a part of the individuals are selected for the study and
the assessment is made from the selected group of individuals. For
example
 A housewife tastes a spoonful whatever she cooks to check whether it
tastes good or not.
 A few drops of our blood are tested to check about the presence or
absence of a disease.
 A grain merchant takes out a handful of grains to get an idea about the
quality of the whole consignment.
These are typical examples where decision making is done on the basis of
sample information. So sampling is the process of choosing a
representative sample from a given population.

23
Sampling is the procedure or process of selecting a
sample from a population. Sampling is quite often used
in our day-to-day practical life.

24
Population
The group of individuals considered under study is called as
population. The word population here refers not only to people but
to all items that have been chosen for the study. Thus in statistics,
population can be number of bikes manufactured in a day or week
or month, number of cars manufactured in a day or week or month,
number of fans, TVs, chalk pieces, people, students, girls, boys,
any manufacturing products, etc…

Finite and infinite population


 When the number of observations/individuals/products is
countable in a group, then it is a finite population. Example:
weights of students of class XII in a school.
 When the number of observations/individuals/products is
uncountable in a group, then it is an infinite population.
Example: number of grains in a sack, number of germs in the
body of a sick patient.
25
Sample and Sample size
A selection of a group of individuals from a population in such a
way that it represents the population is called as sample and the
number of individuals included in a sample is called the sample
size.

Parameter
The statistical constants of the population like mean, variance
are referred as population parameters.

Statistic
Any statistical measure computed from sample is known as
statistic.

Note:
In practice, the parameter values are not known and their estimates
based on the sample values are generally used.

26
Sampling is said to be with replacement when we draw a unit
from a finite population and return it to the population before
the next unit is drawn. In this case each unit can be drawn
more than once and the probability of drawing of each unit
remains constant throughout the sampling procedure.
Sampling is said to be without replacement, if we do not
return the selected unit to the population and draw the next
unit. In this case each unit can’t be drawn more than once and
the probability of drawing of each unit changes throughout
the sampling procedure.

27
Census or complete enumeration means to get the
information about each and every unit in the population.

A sample survey is a technique of getting information


about the characteristics of the population by studying
only a part (i.e. by studying a sample) of the population.

28
The important advantages of sampling are listed below:
 Sampling method is cheaper to collect information as
compared to census (i.e. complete enumeration).
 The data may be collected, classified and analyzed
much more quickly with a sample than with a census
enquiry.
 A sample is often used as a check to verify the accuracy
of complete count.
 It provides greater accuracy because the volume of
work is reduced in the sample survey.

29
This is the simplest and the easiest method of drawing a
sample from a population. According to this method each and
every unit in the population has an equal chance of being
included in the sample and also each possible sample of the
size has an equal probability of being chosen.
Suppose, there is a population of N units and we want to draw
a sample of n units, then the possible number of samples in
case of sampling without replacement will be
𝑁𝐶 =
𝑁!
𝑛
𝑛! 𝑁 − 𝑛 !
And in case of sampling with replacement the possible
number of samples will be 𝑁 𝑛 .

30
Consider all possible samples of size n which can be
drawn from a given population (either with or without
replacement). For each sample we can compute a
statistics, such as mean, variance, etc. Which will vary
from sample to sample. In this way we obtain a
distribution of the statistics which is called its sampling
distribution. Therefore, the sampling distributions may be
of mean, variance, etc.

31
 Mean of all sample means is equal to the population
mean 𝐸 𝑋 = 𝜇.
𝜎 2 𝑁−𝑛
 𝑉 𝑋 = ∙ , where 𝜎 2 is the population variance.
𝑛 𝑁−1

32
 Mean of all sample means is equal to the population
mean 𝐸 𝑋 = 𝜇.
𝜎2
 𝑉 𝑋 = , where 𝜎 2 is the population variance.
𝑛

33
Example
A population consists of five 0, 2, 4, 6, 8
(a) List all possible samples of size 2 that can be drawn from
this population without replacement.
(b) Find mean of each sample
(c) Construct sampling distribution of 𝑋.
(d) Verify that mean of all sample means is equal to
population mean.

Solution:
Since N=5 and n=2 and the sampling is done without
replacement, then all possible samples
5
5!
𝐶2 = = 10
2! 5 − 2 !

34
35
Sampling Distribution of 𝑋 is

Now find mean of all means i.e. mean of sampling


distribution of 𝑋 is computed as

36
𝑓𝑋 40
The mean of 𝑋 = 𝐸 𝑋 = = =4
𝑓 10
and since,
𝑋
Population mean 𝜇 =
𝑁
0+2+4+6+8
=
5
=4
Therefore, it is verified that, mean of all sample means is
equal to population mean.

37
Example
A population consists of five numbers 0, 3, 6, 9, 12
(a) List all possible samples of size 3 that can be draw
from this population without replacement.
(b) Verify that, Mean of 𝑥 = 𝐸 𝑋 = 𝜇 and
𝜎2 𝑁 − 𝑛
𝑉 𝑋 = ∙
𝑛 𝑁−1

Solution:
Since N=5 and n=3 and sampling is done without
replacement. Then all possible samples
5𝐶 =
5!
3 = 10
3! 5 − 3 !

38
0, 3, 6, 9, 12

39
Mean
𝐸 𝑋 =𝜇
𝑋 𝑋
=
𝑛 𝑁
60 30
=
10 5
6=6
Variance
𝜎2 𝑁 − 𝑛
𝑉 𝑋 = ∙
𝑛 𝑁−1
2
𝑋2 𝑋 𝜎2 𝑁 − 𝑛
− = ∙
𝑛 𝑛 𝑛 𝑁−1
2
390 60 18 5 − 3
− = ∙
10 10 3 5−1
3=3
Therefore, it is verified.

40
Example
Draw all possible samples each of size 2 from the
population 2, 4, 6 and 8 using sampling with
replacement. Find mean of each sample and verify that
(a) Mean of 𝑋 = 𝐸 𝑋 = 𝜇
𝜎2
(b) 𝑉 𝑋 =
2

Solution:
Since N=4 and n=2 then, possible samples are
𝑁 𝑛 = 42 = 16

41
42
Mean
𝐸 𝑋 =𝜇
𝑋 𝑋
=
𝑛 𝑁
80 20
=
16 4
5=5
Variance
𝜎2
𝑉 𝑋 =
𝑛
2
𝑋2 𝑋 𝜎2
− =
𝑛 𝑛 𝑛
2
440 80 5
− =
16 16 2
2.5 = 2.5
Therefore, it is verified.

43
If 𝑋 is the mean of a random sample of size n taken from
a population with mean 𝜇 and finite variance 𝜎 2 , then the
limiting form of the distribution of

𝑋−𝜇
𝑍=𝜎 ,
𝑛

As 𝑛 → ∞, is the standard normal distribution 𝑛 𝑧; 0, 1 .

44
Example
An electrical firm manufactures light bulbs that have a
length of life that is approximately normally distributed,
with mean equal to 800 hours and a normal deviation of
40 hours. Find the probability that a random sample of 16
bulbs will have an average life less than 775 hours.

Solution:
The sampling distribution of 𝑋 will be approximately
normal, with 𝜇𝑋 = 800 and 𝜎𝑋 = 40 16 = 10. The
desired probability is given by the area of the shaded
region in figure.

45
Corresponding to 𝑥 = 775, we find that
𝑃 𝑋 < 775 = 𝑃 𝑍 < −2.5
= 0.0062

46
Example
Hourly wages of workers in an industry have a mean
wage rate of PRs. 50 per hour and a standard deviation of
PRs. 6. what is the probability that the mean wage of a
random sample of 50 workers will be between PRs. 51
and PRs. 52.

Solution:
The sampling distribution of 𝑋 will be approximately
normal, with 𝜇𝑋 = 50 and 𝜎𝑋 = 6 50 = 0.85. The
desired probability is given by the area of the shaded
region in figure.

47
We find that
𝑃 51 < 𝑋 < 52 = 𝑃 1.18 < 𝑍 < 2.35
= 𝑃 𝑍 < 2.35 − 𝑃 𝑍 < 1.18
= 0.9906 − 0.8810
= 0.1096

48
Presenter: Ms. Sidra Raees

LECTURE 11
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
One of the main objectives of any statistical investigation
is to draw inferences about a population from the analysis
of samples drawn from that population. Statistical
Inference provides us how to estimate a value from the
sample and test that value for the population. This is done
by the two important classifications in statistical
inference,
(i) Estimation
(ii) Testing of Hypotheses

2
It is possible to draw valid conclusion about the population
parameters from sampling distribution. Estimation helps in
estimating an unknown population parameter such as
population mean, standard deviation, etc., on the basis of
suitable statistic computed from the samples drawn from
population.
For Example, if a candidate for public office may wish to
estimate the true proportion of voters favoring him by
obtaining the opinions from a random sample of 100 eligible
voters. The fraction of voters in the sample favoring the
candidate could be used as an estimate of the true proportion
of the population of voters.

3
An estimator stands for the rule or a formula that is used
to estimate a parameter whereas an estimate stands for
the numerical value obtained by substituting the sample
observations in the rule or the formula.
For example, 2, 4, 6, 8, 10 are sample observations then

2 + 4 + 6 + 8 + 10
𝑥= =6
5
6 is an estimate whereas the statistic 𝑥 used as formula is
called an estimator.

4
To estimate an unknown parameter of the population,
concept of theory of estimation is used. There are two
types of estimation namely,

1. Point estimation
2. Interval estimation

5
When a single value is used as an estimate, the estimate is
called a point estimate of the population parameter. In other
words, an estimate of a population parameter given by a
single number is called as point estimation.
For example
 55 is the mean marks obtained by a sample of 5 students
randomly drawn from a class of 100 students is considered
to be the mean marks of the entire class. This single value
55 is a point estimate.
 50 kg is the average weight of a sample of 10 students
randomly drawn from a class of 100 students is considered
to be the average weight of the entire class. This single
value 50 is a point estimate.

6
Generally, there are situations where point estimation is not
desirable and we are interested in finding limits within which
the parameter would be expected to lie is called an interval
estimation.
For Example
If the average height of all college students is a value between
61” and 65”, then range of values from 61” and 65” is an
interval estimate.

Thus on the basis of a sample, if we estimate the average


income of the people living in a city as Rs. 3000, it will be a
point estimate. On the other hand if we say that the average
income could lie between Rs. 2000 to Rs. 4000, it will be an
interval estimate.

7
The limits which contains a population parameter with a
given degree of confidence are called the confidence
limits. The interval between these limits is called
confidence interval.

8
Confidence level 𝟏−𝛂 𝛂 𝛂 𝒛𝛂
𝟐 𝟐
(𝟏 − 𝜶) 𝟏𝟎𝟎%
90 0.90 0.10 0.050 1.645
95 0.95 0.05 0.025 1.960
98 0.98 0.02 0.010 2.326
99 0.99 0.01 0.005 2.575

9
10
11
Confidence Interval on 𝝁 (When 𝝈 is known)
If 𝑥 is the mean of a random sample of size n from a
population with known variance 𝜎 2 , a 100(1 −𝛂) %
confidence interval for 𝜇 is given by
𝜎 𝜎
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂 ,
2 𝑛 2 𝑛
Where 𝑧𝛂 2 is the z-value leaving an area of 𝛂 2 to the right.

Note:
For small samples selected from nonnormal populations, we
cannot expect our degree of confidence to be accurate.
However, for samples of size 𝑛 ≥ 30, with the shape of the
distribution not too skewed, sampling theory guarantees good
results.

12
Example
The average zinc concentration recovered from a sample of
measurements taken in 36 different locations in a river is found to be 2.6
grams per milliliter. Find the 99% confidence interval for the mean zinc
concentration in the river. Assume that the population standard deviation
is 0.3 gram per milliliter.

Solution:
We have 𝑛 = 36, 𝑥 = 2.6, 𝜎 = 0.3
1 − 𝛂 = 0.99
𝛂 = 0.01
𝑧𝛂 = 𝑧0.01 = 𝑧0.005 = 2.575
2 2
Hence, the 99% confidence interval is
𝜎 𝜎
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂
2 𝑛 2 𝑛
0.3 0.3
2.6 − 2.575 < 𝜇 < 2.6 + 2.575
36 36
2.47 < 𝜇 < 2.73

13
Example
The quality control manager of a tyre company has sample of
hundred tyres and has found the mean life time to be 30214 km.
The population standard deviation is 860. Construct 95%
confidence interval for the mean life of tyres.

Solution:
We have 𝑛 = 100, 𝑥 = 30214, 𝜎 = 860
1 − 𝛂 = 0.95
𝛂 = 0.05
𝑧𝛂 = 𝑧0.05 = 𝑧0.025 = 1.96
2 2
Hence, the 95% confidence interval is
𝜎 𝜎
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂
2 𝑛 2 𝑛
860 860
30214 − 1.96 < 𝜇 < 30214 + (1.96)
100 100
30045.44 < 𝜇 < 30382.56
14
Confidence Interval on 𝝁 (When 𝝈 is unknown, 𝒏 ≥ 𝟑𝟎)

If 𝑥 and s are the mean and standard deviation of a


random sample from a population with unknown
variance 𝜎 2 , a 100(1−𝛂)% confidence interval for 𝜇 is
given by
𝑠 𝑠
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂 ,
2 𝑛 2 𝑛

Where 𝑧𝛂 2 is the z-value leaving an area of 𝛂 to the


2
right.

15
Example
The systolic blood pressure of 90 man has a mean of 128.9mnHg &
a standard deviation of 17mnHg. Assuming that these are a random
sample of B.P. Calculate 99% confidence interval for the mean B.P
in the population.

Solution:
We have 𝑛 = 90, 𝑥 = 128.9, 𝑠 = 17
1 − 𝛂 = 0.99
𝛂 = 0.01
𝑧𝛂 = 𝑧0.01 = 𝑧0.005 = 2.575
2 2
Hence, the 99% confidence interval is
𝑠 𝑠
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂
2 𝑛 2 𝑛
17 17
128.9 − 2.575 < 𝜇 < 128.9 + 2.575
90 90
124.74 < 𝜇 < 133.05
16
Example
Scholastic Aptitude Test (SAT) mathematics scores of a random
sample of 500 high school seniors in the state of Texas are
collected, and the sample mean and standard deviation are found to
be 501 and 112, respectively. Find a 99% confidence interval on the
mean SAT mathematics score for seniors in the state of Texas.

Solution:
We have 𝑛 = 500, 𝑥 = 501, 𝑠 = 112
1 − 𝛂 = 0.99
∝= 0.01
𝑧𝛂 = 𝑧0.01 = 𝑧0.005 = 2.575
2 2
Hence, the 99% confidence interval is
𝑠 𝑠
𝑥 − 𝑧𝛂 < 𝜇 < 𝑥 + 𝑧𝛂
2 𝑛 2 𝑛
112 112
501 − 2.575 < 𝜇 < 501 + 2.575
500 500
488.1 < 𝜇 < 513.9
17
Confidence Interval on 𝝁 (When 𝝈 is unknown, 𝒏 < 𝟑𝟎)

If 𝑥 and s are the mean and standard deviation of a


random sample from a population with unknown
variance 𝜎 2 , a 100(1−𝛂)% confidence interval for 𝜇 is
given by
𝑠 𝑠
𝑥 − (𝑡𝛂 2 , 𝑛 − 1) < 𝜇 < 𝑥 + (𝑡𝛂 2 , 𝑛 − 1) ,
𝑛 𝑛

Where 𝑡𝛂 2 is the t-value with 𝑣 = 𝑛 − 1 degrees of


freedom, leaving an area of 𝛂 2 to the right.

18
19
Example
An electrical firm manufacture light bulbs that have a length of life
with mean 𝜇 and an standard deviation of 40 hours. If a sample of
29 bulbs has an average life of 780 hours. Find 95% confidence
interval for the population mean of all bulbs produced by this firm.

Solution:
We have 𝑛 = 29, 𝑥 = 780, 𝑠 = 40
1 − 𝛂 = 0.95
𝛂 = 0.05
(𝑡𝛂 2 , 𝑛 − 1) = (𝑡0.05 , 29 − 1) = 𝑡0.025 , 28 = 2.048
2
Hence, the 95% confidence interval is
𝑠 𝑠
𝑥 − (𝑡𝛂 2 , 𝑛 − 1) < 𝜇 < 𝑥 + (𝑡𝛂 2 , 𝑛 − 1)
𝑛 𝑛
40 40
780 − 2.048 < 𝜇 < 780 + 2.048
29 29
764.78 < 𝜇 < 795.21
20
Example
The contents of seven similar containers of sulfuric acid are 9.8, 10.2,
10.4, 9.9, 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for the
mean contents of all such containers, assuming an approximately normal
distribution.

Solution:
The sample mean and standard deviation for the given data are
𝑥 = 10.0 and 𝑠 = 0.283
1 − 𝛂 = 0.95
𝛂 = 0.05
(𝑡𝛂 , 𝑛 − 1) = (𝑡0.05 , 7 − 1) = 𝑡0.025 , 6 = 2.447
2 2
Hence, the 95% confidence interval is
𝑠 𝑠
𝑥 − (𝑡𝛂 , 𝑛 − 1) < 𝜇 < 𝑥 + (𝑡𝛂 , 𝑛 − 1)
2 𝑛 2 𝑛
0.283 0.283
10.0 − 2.447 < 𝜇 < 10.0 + 2.447
7 7
9.74 < 𝜇 < 10.26
21
22
If we have two populations with means 𝜇1 and 𝜇2 and
variances 𝜎1 2 and 𝜎2 2 , respectively, a point estimator of
the difference between 𝜇1 and 𝜇2 is given by the statistic
𝑋1 − 𝑋2 . Therefore, to obtain a point estimate of 𝜇1 − 𝜇2 ,
we will select two independent random samples, one
from each population, of sizes 𝑛1 and 𝑛2 , and compute
𝑥1 − 𝑥2 , the difference of the sample means. Clearly, we
must consider the sampling distribution of 𝑋1 − 𝑋2 .

23
Confidence Interval for 𝝁𝟏 − 𝝁𝟐 (When 𝝈𝟏 and 𝝈𝟐 are known)

If 𝑥1 and 𝑥2 are means of independent random samples of


sizes 𝑛1 and 𝑛2 from populations with known variances 𝜎1 2
and 𝜎2 2 , respectively, a 100(1−𝛂)% confidence interval for
𝜇1 − 𝜇2 is given by

𝜎1 2 𝜎2 2 𝜎1 2 𝜎2 2
𝑥1 − 𝑥2 − 𝑧𝛂 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑧𝛂 + ,
2 𝑛1 𝑛2 2 𝑛
1 𝑛 2

Where 𝑧𝛂 2
is the z-value leaving an area of 𝛂 2 to the right.

24
Example

A study was conducted in which two types of engines, A


and B, were compared. Gas mileage, in miles per gallon,
was measured. 50 experiments were conducted using
engine type A and 75 experiments were done with engine
type B. The gasoline used and other conditions were held
constant. The average gas mileage was 36 miles per
gallon for engine A and 42 miles per gallon for engine B.
Find a 96% confidence interval on 𝜇𝐵 − 𝜇𝐴 , where 𝜇𝐴
and 𝜇𝐵 are population mean gas mileages for engines A
and B, respectively. Assume that the population standard
deviations are 6 and 8 for engines A and B, respectively.

25
Solution:
The point estimate of 𝜇𝐵 − 𝜇𝐴 is 𝑥𝐵 − 𝑥𝐴 = 42 − 36 = 6.
using
1 − 𝛂 = 0.96
𝛂 = 0.04
𝑧𝛂 = 𝑧0.04 = 𝑧0.02 = 2.05
2 2
Hence, with substitution in the formula above, the 96%
confidence interval is
𝜎1 2 𝜎2 2 𝜎1 2 𝜎2 2
𝑥1 − 𝑥2 − 𝑧𝛂 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑧𝛂 + ,
2 𝑛1 𝑛2 2 𝑛
1 𝑛2

36 64 36 64
6 − 2.05 + < 𝜇𝐵 − 𝜇𝐴 < 6 + 2.05 +
50 75 50 75
3.43 < 𝜇𝐵 − 𝜇𝐴 < 8.57

26
Confidence Interval for 𝝁𝟏 − 𝝁𝟐 (When 𝝈𝟏 = 𝝈𝟐 but unknown)

If 𝑥1 and 𝑥2 are means of independent random samples of


sizes 𝑛1 and 𝑛2 from approximately normal populations with
unknown but equal variances, a 100(1 −𝛂)% confidence
interval for 𝜇1 − 𝜇2 is given by

1 1 1 1
𝑥1 − 𝑥2 − 𝑡𝛂 𝑠𝑝 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑡𝛂 𝑠𝑝 + ,
2 𝑛1 𝑛2 2 𝑛1 𝑛2

Where 𝑠𝑝 is the pooled estimate of the population standard


deviation and 𝑡𝛂 2 is the t-value with 𝑣 = 𝑛1 + 𝑛2 − 2 degrees of
freedom, leaving an area of 𝛂 2 to the right.

𝑛1 − 1 𝑠1 2 + (𝑛2 − 1)𝑠2 2
𝑠𝑝 2 =
𝑛1 + 𝑛2 − 2
27
Example

A course in statistics is taught to 12 students by the


conventional classroom procedure. A second group of 10
students was given the same course by means of
programmed materials. The 12 students meeting in the
classroom made in average grade of 75 with a standard
deviation of 4, while the 10 students using programmed
materials made an average of 71 with a standard
deviation of 5. find a 90% confidence interval for the
difference between the population means, assuming the
populations are approximately normally distributed with
equal variances.

28
Solution:

For first group, we have 𝑥1 = 75, 𝑠1 = 4, and 𝑛1 = 12. For


second group, 𝑥2 = 71, 𝑠2 = 5, and 𝑛2 = 10. We wish to find a
90% confidence interval for 𝜇1 − 𝜇2 . Since the population
variances are assumed to be equal, therefore 90% confidence
interval is computed by:
Where,
𝑛1 − 1 𝑠1 2 + (𝑛2 − 1)𝑠2 2
𝑠𝑝 2 =
𝑛1 + 𝑛2 − 2
12 − 1 16 + (10 − 1)(25)
𝑠𝑝 = = 4.48
12 + 10 − 2

Our point estimate of 𝜇1 − 𝜇2 is


𝑥1 − 𝑥2 = 75 − 71 = 4

29
1 − 𝛂 = 0.90
𝛂 = 0.10
𝑡𝛂 2 , 𝑛1 + 𝑛2 − 2 = 𝑡0.10 , 20 = 𝑡0.05 , 20 = 1.725
2

Therefore, the 90% confidence interval is

1 1 1 1
𝑥1 − 𝑥2 − 𝑡𝛂 𝑠𝑝 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑡𝛂 𝑠𝑝 + ,
2 𝑛1 𝑛2 2 𝑛1 𝑛2

1 1 1 1
4 − 1.725 4.48 + < 𝜇1 − 𝜇2 < 4 + 1.725 4.48 +
12 10 12 10
0.69 < 𝜇1 − 𝜇2 < 7.31

30
Confidence Interval for 𝝁𝟏 − 𝝁𝟐 (When 𝝈𝟏 ≠ 𝝈𝟐 but unknown)

If 𝑥1 and 𝑥2 are means of independent random samples of sizes 𝑛1 and


𝑛2 from approximately normal populations with unknown and unequal
variances, a 100(1−𝛂)% confidence interval for 𝜇1 − 𝜇2 is given by

𝑠1 2 𝑠2 2 𝑠1 2 𝑠2 2
𝑥1 − 𝑥2 − 𝑡𝛂 2 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑡𝛂 2 + ,
𝑛1 𝑛2 𝑛1 𝑛2

Where 𝑡𝛂 2 is the t-value with

𝑠1 2 /𝑛1 + 𝑠2 2 /𝑛2 2
𝑣=
𝑠1 2 /𝑛1 2 /(𝑛1 − 1) + 𝑠2 2 /𝑛2 2 /(𝑛2 − 1)

degrees of freedom, leaving an area of 𝛂 2 to the right.

31
Example

A study was conducted by the Department of zoology at the


Virginia Tech to estimate the difference in the amounts of the
chemical orthophosphorus measured at two different stations
on the James River. Orthophosphorus was measured in
milligrams per liter. Fifteen samples were collected from
station 1, and 12 samples were obtained from station 2. The
15 samples from station 1 had an average orthophosphorus
content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from
station 2 had an average content of 1.49 milligrams per liter
and a standard deviation of 0.80 milligrams per liter. Find a
95% confidence interval for the difference in the true average
orthophosphorus contents at these two stations, assuming that
the observations came from normal populations with different
variances.

32
Solution:

For station 1, we have 𝑥1 = 3.84, 𝑠1 = 3.07, and 𝑛1 = 15. For


station 2, 𝑥2 = 1.49, 𝑠2 = 0.80, and 𝑛2 = 12. We wish to find a
95% confidence interval for 𝜇1 − 𝜇2 . Since the population
variances are assumed to be unequal, we can only find an
approximate 95% confidence interval based on the t-distribution
with 𝑣 degrees of freedom, where

3.072 /15 + 0.802 /12 2


𝑣= 2 2 2 2
= 16.3 ≈ 16
3.07 /15 /14 + 0.80 /12 /11

Our point estimate of 𝜇1 − 𝜇2 is


𝑥1 − 𝑥2 = 3.84 − 1.49 = 2.35

33
Using
1 − 𝛂 = 0.95
𝛂 = 0.05
𝑡𝛂 2 , 𝑣 = 𝑡0.05 , 16 = 𝑡0.025 , 16 = 2.120
2
Therefore, the 95% confidence interval is
𝑠1 2 𝑠2 2 𝑠1 2 𝑠2 2
𝑥1 − 𝑥2 − 𝑡𝛂 + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑡𝛂 +
2 𝑛1 𝑛2 2 𝑛1 𝑛2

3.072 0.802 3.072 0.802


2.35 − 2.120 + < 𝜇1 − 𝜇2 < 2.35 + 2.120 +
15 12 15 12
0.60 < 𝜇1 − 𝜇2 < 4.10
Hence, we are 95% confident that the interval from 0.60 to 4.10
milligrams per liter contains the difference of the true average
orthophosphorus contents for these two locations.

34
Presenter: Ms. Sidra Raees

LECTURE 12
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
One of the important areas of statistical analysis is testing of
hypotheses. Often, in real life situations we require to take
decisions about the population on the basis of sample information.
Hypotheses testing is also referred to as “Statistical Decision
Making”. It employs statistical techniques to arrive at decisions in
certain situations where there is an element of uncertainty on the
basis of sample, whose size is fixed in advance. So statistics helps
us in arriving at the criterion for such decision is known as Testing
of hypotheses.
For Example: We may like to decide on the basis of sample data
whether a new vaccine is effective in curing cold, whether a new
training methodology is better than the existing one, whether the
new fertilizer is more productive than the earlier one and so on.

2
The structure of hypothesis testing will be formulated
with the use of the term null hypothesis, which refers to
any hypothesis we wish to test and is denoted by
𝐻0 . While the hypothesis opposite the null hypothesis is
called the alternative hypothesis, denoted by 𝐻1 .
For Example:
A car battery manufacturing company claims that the
batteries they produce possess an average length of life of
2 years. We can accept or reject their claims on the basis
of a sample by testing the relevant hypothesis.

3
In this example the null hypothesis is that the average
length of life is 2 years.

i.e. 𝐻0 ∶ 𝜇 = 2
The alternative hypothesis may be stated as
𝐻1 ∶ 𝜇 < 2, 𝐻1 ∶ 𝜇 > 2 or 𝐻1 ∶ 𝜇 ≠ 2

Thus the null hypothesis is a statement of the “Claim” to


be verified. The alternative hypothesis is exactly the
reverse of null hypothesis. The rejection of the null
hypothesis leads to the acceptance of the alternative
hypothesis.

4
The null and alternative hypotheses must be established
in such a way that when one is true, the other is false i.e.
𝐻0 and 𝐻1 are opposites or disjoint.
The alternative hypothesis is always the form of
inequality. Inequality may be expressed in one of only
three ways:
greater than ( > ), less than ( < ), or not equal to ( ≠ )
Whereas, the null hypothesis is always expressed in some
form of equality such as, less than or equal to (≤),
greater than or equal to (≥), or exactly equal to (=).

5
Hence if 50 is the specified value of the population mean
𝜇 (i.e. parameter), then the possible null and alternative
hypotheses are;

𝐻0 ∶ 𝜇 = 50 𝐻1 ∶ 𝜇 ≠ 50
𝐻0 ∶ 𝜇 ≥ 50 𝐻1 ∶ 𝜇 < 50
𝐻0 ∶ 𝜇 ≤ 50 𝐻1 ∶ 𝜇 > 50

or sometimes they can be expressed as

𝐻0 ∶ 𝜇 = 50 𝐻1 ∶ 𝜇 ≠ 50
𝐻0 ∶ 𝜇 = 50 𝐻1 ∶ 𝜇 < 50
𝐻0 ∶ 𝜇 = 50 𝐻1 ∶ 𝜇 > 50

6
The decision to accept or reject the null-hypothesis 𝐻0 is
made on the basis of the information supplied by the
sample data. Therefore, there is always chance of making
wrong decision. There are two types of wrong decision
that can be made. One is the rejection of a true null
hypothesis and the other is the acceptance of a false null
hypothesis. The wrong decision of rejecting a given null
hypothesis when it is really true is called a type-I error,
whereas the wrong decision of acceptance a given null
hypothesis when it is really false is called a type-II error.

7
These two types of error may be displayed by the
following table.
Accept 𝐻0 Reject 𝐻0
𝐻0 is True Correct Decision Wrong Decision
(No error) (Type-I error)
𝐻0 is False Wrong Decision Correct Decision
(Type-II error) (No error)

The probabilities of committing type-I and type-II errors


are denoted by 𝛼 and 𝛽, respectively.

8
The probability of making a type-I error is also called the
level of significance of the test and it is denoted by 𝛼.
Whenever we test a given null-hypothesis, we fix a
certain amount of 𝛼 in the very beginning of the problem.
Generally we take 𝛼 = 0.01, 0.05, 𝑜𝑟 0.10. (i.e. 1%, 5%
or 10%) etc. The level of significance guards against
rejecting the null hypothesis when it is true.

9
The decision to reject or not to reject the null-hypothesis
is based on a statistic, called a test statistic computed
from sample data. A test statistic is a random variable and
possess an appropriate probability distribution. Some
common probability distribution which are used in
testing are z, t or 𝜒 2 distributions.

10
The main job of a decision maker is to establish a cut-off
point that can be used to separate the entire sample space
( i.e. all possible values of test statistic ) into two groups
or regions. One group makes up the acceptance region
and the other group the rejection region or critical region
and the cut-off point is called a critical value. In other
words we can say, the critical region is the region where
we reject 𝐻0 .

11
One Tailed and Two Tailed Tests

12
13
Testing procedure is as under:
1. Null Hypothesis:
𝐻0 ∶ 𝜇 = 𝜇0

2. Alternative Hypothesis:
𝐻1 ∶ 𝜇 < 𝜇0 , μ > 𝜇0 , 𝒐𝒓 𝜇 ≠ 𝜇0

3. Level of Significance:
Choose a level of significance equal to α. (generally we
take α = 0.01, 0.05, 𝑜𝑟 0.10 etc.)

14
4. Test Statistic:
The Test Statistic (z or t) may be decided according to the following
rules:
CASE-I
𝑥 − 𝜇0
𝑧= 𝜎 ,
𝑛
When 𝜎 is known.
CASE-II
𝑥 − 𝜇0
𝑧= 𝑠 ,
𝑛
When 𝜎 is unknown and 𝑛 ≥ 30, then 𝜎 is replaced by s (i.e. S.D. of a
sample).
CASE-III
𝑥 − 𝜇0
𝑡= 𝑠 ,
𝑛
When 𝜎 is unknown and 𝑛 < 30 with degree of freedom 𝑛 − 1 .

15
5. Critical Region:

16
6. Rejection Rule & Conclusion:

If the calculated value of the test statistic (z or t)


falls in critical region, we reject 𝐻0 and if it falls
in acceptance region, we don’t reject 𝐻0 .

17
Level of Significance (𝜶) 0.10 0.05 0.01

One Tail 𝑧𝛼 = ±1.28 𝑧𝛼 = ±1.645 𝑧𝛼 = ±2.330


Two Tail 𝑧𝜶 = ±1.645 𝑧𝜶 = ±1.960 𝑧𝜶 = ±2.575
2 2 2

18
Example
A random sample of 100 recorded deaths in the United
States during the past year showed an average life span
of 71.8 years. Assuming a population standard deviation
of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of
significance.

Solution:
1. 𝐻0 ∶ 𝜇 = 70 years.
2. 𝐻1 ∶ 𝜇 > 70 years.
3. 𝛼 = 0.05.
4. Critical Region: 𝑧𝛼 = 𝑧0.05 = 1.645

19
5. Computations:
𝑥 = 71.8 years, 𝜎 = 8.9 years, and hence
𝑥 − 𝜇0 71.8 − 70
𝑧= 𝜎 = = 2.02
8.9
𝑛 100

6. Decision:
Since calculated value falls in critical region
𝑧𝑐𝑎𝑙 > 𝑧𝑡𝑎𝑏 therefore, we reject 𝐻0 and conclude that
the mean life span today is greater than 70 years.

20
Example
A manufacturer of sports equipment has developed a new
synthetic fishing line that the company claims has a mean
breaking strength of 8 kilograms with a standard deviation of
0.5 kilogram. Test the hypothesis that 𝜇 = 8 kilograms
against the alternative that 𝜇 ≠ 8 kilograms if a random
sample of 50 lines is tested and found to have a mean
breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.

Solution:
1. 𝐻0 ∶ 𝜇 = 8 kilograms.
2. 𝐻1 ∶ 𝜇 ≠ 8 kilograms.
3. 𝛼 = 0.01.
4. Critical Region: 𝑧𝛼 2 = 𝑧0.01 2 = 𝑧0.005 = ±2.575

21
5. Computations:
𝑥 = 7.8 kilograms, s = 0.5 kilograms, and hence
𝑥 − 𝜇0 7.8 − 8
𝑧= 𝑠 = = −2.83
0.5
𝑛 50

6. Decision:
Since calculated value falls in critical region
therefore, we reject 𝐻0 and conclude that the average
breaking strength is not equal to 8 but is, in fact, less
than 8 kilograms.

22
Example
The Edison Electric Institute has published figures on the number
of kilowatt hours used annually by various home appliances. It is
claimed that a vacuum cleaner uses an average of 46 kilowatt hours
per year. If a random sample of 12 homes included in a planned
study indicates that vacuum cleaners use an average of 42 kilowatt
hours per year with a standard deviation of 11.9 kilowatt hours,
does this suggest at the 0.05 level of significance that vacuum
cleaners use, on average, less than 46 kilowatt hours annually?
Assume the population of kilowatt hours to be normal.

Solution:
1. 𝐻0 ∶ 𝜇 = 46 kilowatt hours.
2. 𝐻1 ∶ 𝜇 < 46 kilowatt hours.
3. 𝛼 = 0.05.
4. Critical Region: (𝑡𝛼 , 𝑛 − 1) = (𝑡0.05 , 11) = −1.796

23
5. Computations:
𝑥 = 42 kilowatt hours, s = 11.9 kilowatt hours, and
𝑛 = 12. Hence
𝑥 − 𝜇0 42 − 46
𝑡= 𝑠 = = −1.16
11.9
𝑛 12

6. Decision:
Since calculated value falls in acceptance region
therefore, we don’t reject 𝐻0 and conclude that the
average number of kilowatt hours used annually by
home vacuum cleaners is not significantly less than
46.

24
25
The procedure for testing the difference between two
population means may be written as:
1. Null Hypothesis:
𝐻0 ∶ 𝜇1 − 𝜇2 = 𝑑0

2. Alternative Hypothesis:
𝐻1 ∶ 𝜇1 − 𝜇2 < 𝑑0 , 𝜇1 −𝜇2 > 𝑑0 , 𝒐𝒓 𝜇1 − 𝜇2 ≠ 𝑑0

3. Level of Significance:
Decide on the significance level α = 0.01, 0.05, 𝑜𝑟 0.10 etc.

26
4. Test Statistic:
Test Statistic (z or t) is decided according to the following
summarized rules:
CASE-I
(𝑥1 − 𝑥2 ) − 𝑑0
𝑧= ,
𝜎1 2 𝜎2 2
𝑛 + 𝑛1 2
When 𝜎1 and 𝜎2 are known.
CASE-II
(𝑥1 − 𝑥2 ) − 𝑑0
𝑡= ,
1 1
𝑠𝑝 𝑛 + 𝑛
1 2
With degree of freedom 𝑣 = 𝑛1 + 𝑛2 − 2
𝑛1 − 1 𝑠1 2 + (𝑛2 − 1)𝑠2 2
𝑠𝑝 2 =
𝑛1 + 𝑛2 − 2
When 𝜎1 = 𝜎2 but unknown.
27
CASE-III
(𝑥1 − 𝑥2 ) − 𝑑0
𝑡= ,
𝑠1 2 𝑠2 2
+
𝑛1 𝑛2

With degree of freedom


𝑠1 2 /𝑛1 + 𝑠2 2 /𝑛2 2
𝑣=
𝑠1 2 /𝑛1 2 /(𝑛1 − 1) + 𝑠2 2 /𝑛2 2 /(𝑛2 − 1)

NOTE:
When 𝜎1 ≠ 𝜎2 and unknown.

28
5. Critical Region:
Critical Regions for Test Statistic z and t
For Alternative C.R. for C.R. for
Hypothesis 𝑯𝟏 Test Statistic z Test Statistic t
𝜇1 −𝜇2 > 𝑑0 𝑧 > 𝑧𝛼 𝑡 > 𝑡𝛼
𝜇1 − 𝜇2 < 𝑑0 𝑧 < −𝑧𝛼 𝑡 < −𝑡𝛼
𝜇1 − 𝜇2 ≠ 𝑑0 𝑧 > 𝑧𝛼 and 𝑧 < −𝑧𝛼 𝑡 > 𝑡𝛼 and 𝑡 < −𝑡𝛼
2 2 2 2

6. Conclusion:
If the calculated value of the test statistic (z or t) falls in
critical region, we reject 𝐻0 ; otherwise accept 𝐻0 .

29
Example
A farmer claims that the average yield of wheat of variety A
exceeds the average yield of variety B by at least 12 bushels per
acre. To test this claim, 50 acres of each variety are planted and
grown under similar conditions. Variety A yielded on the average,
86.7 bushels per acre with a population standard deviation of 6.28
bushels per acre, while variety B yielded, on the average 77.8
bushels per acre with a population standard deviation of 5.61
bushels per acre. Test the farmer’s claim at 𝛼 = 0.01.

Solution:
Let 𝜇1 and 𝜇2 represent the population means for the variety A and
variety B, respectively.
1. 𝐻0 ∶ 𝜇1 − 𝜇2 ≥ 12.
2. 𝐻1 ∶ 𝜇1 − 𝜇2 < 12.
3. 𝛼 = 0.01.
4. Critical Region: 𝑧𝛼 = 𝑧0.01 = −2.33
30
5. Computations:
𝑥1 = 86.7 , 𝜎1 = 6.28 , 𝑛1 = 50,
𝑥2 = 77.8 , 𝜎2 = 5.61 , 𝑛2 = 50,

(𝑥1 − 𝑥2 ) − 𝑑0 86.7 − 77.8 − 12


𝑧= = = −2.605
𝜎1 2 𝜎2 2 6.282 5.612
+ +
𝑛1 𝑛2 50 50

6. Decision:
Since calculated value falls in critical region
therefore, we reject 𝐻0 . In other words, the farmer’s
claim cannot be accepted.

31
Example
An experiment was performed to compare the abrasive
wear of two different laminated materials. Twelve pieces
of material 1 were tested by exposing each piece to a
machine measuring wear. Ten pieces of material 2 were
similarly tested. In each case, the depth of wear was
observed. The samples of material 1 gave an average
(coded) wear of 85 units with a sample standard
deviation of 4, while the samples of material 2 gave an
average of 81 with a sample standard deviation of 5. Can
we conclude at the 0.05 level of significance that the
abrasive wear of material 1 exceeds that of material 2 by
more than 2 units? Assume the populations to be
approximately normal with equal variances.

32
Solution:
Let 𝜇1 and 𝜇2 represent the population means of the
abrasive wear for material 1 and material 2, respectively.

1. 𝐻0 ∶ 𝜇1 − 𝜇2 = 2.
2. 𝐻1 ∶ 𝜇1 − 𝜇2 > 2.
3. 𝛼 = 0.05.
4. Critical Region:
(𝑡𝛼 , 𝑛1 + 𝑛2 − 2) = (𝑡0.05 , 20) = 1.725

33
5. Computations:
𝑥1 = 85 , 𝑠1 = 4 , 𝑛1 = 12,
𝑥2 = 81 , 𝑠2 = 5 , 𝑛2 = 10,
2 2
𝑛1 − 1 𝑠1 + (𝑛2 − 1)𝑠2
𝑠𝑝 2 =
𝑛1 + 𝑛2 − 2
12 − 1 16 + (10 − 1)(25)
𝑠𝑝 = = 4.478,
12 + 10 − 2
(𝑥1 − 𝑥2 ) − 𝑑0 85 − 81 − 2
𝑡= = = 1.04,
1 1 1 1
𝑠𝑝 𝑛 + 𝑛 4.478 12 + 10
1 2

6. Decision:
Since calculated value falls in acceptance region
therefore, we don’t reject 𝐻0 . We are unable to conclude
that the abrasive wear of material 1 exceeds that of
material 2 by more than 2 units.
34
Example
A manufacturing company is interested in determining
whether there is a significant difference between the average
number of units produced per day by two different machine
operators. A random sample of ten daily outputs was selected
for each operator from the outputs over the past years. The
data on number of items produced per day are summarized in
the table.
Operator - 1 Operator - 2
𝑛1 = 10 𝑛1 = 10
𝑥1 = 35 𝑥2 = 31
𝑠1 2 = 17.2 𝑠2 2 = 19.1

Do the samples provide sufficient evidence at 𝛼 = 0.01 to


conclude that a difference does exist between the mean daily
outputs of the machine operators. Assume the populations to
be approximately normal with equal variances.

35
Solution:
Let 𝜇1 and 𝜇2 represent the population means of the
operator 1 and operator 2, respectively.

1. 𝐻0 ∶ 𝜇1 = 𝜇2 𝑜𝑟 𝜇1 − 𝜇2 = 0.
2. 𝐻1 ∶ 𝜇1 ≠ 𝜇2 𝑜𝑟 𝜇1 − 𝜇2 ≠ 0.
3. 𝛼 = 0.01.
4. Critical Region:
(𝑡𝛼 , 𝑛1 + 𝑛2 − 2) = (𝑡0.005 , 18) = ±2.878
2

36
5. Computations:
𝑥1 = 35 , 𝑠1 2 = 17.2 , 𝑛1 = 10,
𝑥2 = 31 , 𝑠2 2 = 19.1 , 𝑛2 = 10,
2 2
𝑛 1 − 1 𝑠1 + (𝑛2 − 1)𝑠2
𝑠𝑝 2 =
𝑛1 + 𝑛2 − 2
10 − 1 17.2 + (10 − 1)(19.1)
𝑠𝑝 = = 4.26,
10 + 10 − 2
(𝑥1 − 𝑥2 ) − 𝑑0 35 − 31 − 0
𝑡= = = 2.10,
1 1 1 1
𝑠𝑝 𝑛 + 𝑛 4.26 10 + 10
1 2

6. Decision:
Since calculated value falls in acceptance region
therefore, we don’t reject 𝐻0 and conclude that the samples do
not provide sufficient evidence at 𝛼 = 0.01 that a difference
does exist between the mean daily outputs of the machine
operators.
37
Presenter: Ms. Sidra Raees

LECTURE 13
Department of Mathematics, NED University of
Engineering & Technology, Karachi

1
Throughout this chapter, we have been concerned with
the testing of statistical hypotheses about single
population parameters such as μ and 𝜎 2 . Now we shall
consider a test to determine if a population has a
specified theoretical distribution. The test is based on
how good a fit we have between the frequency of
occurrence of observations in an observed sample and the
expected frequencies obtained from the hypothesized
distribution.

2
A goodness of fit test is used to know whether or not a
given set of data follows a specified probability
distribution. For this purpose we use following test
statistic
𝑘 2
𝑜𝑖 − 𝑒𝑖
𝜒2 = ,
𝑒𝑖
𝑖=1

Where o and e represent the observed and expected


frequencies respectively of a set of k categories or classes
or cells.

3
If the observed frequencies are closed to the
corresponding expected frequencies, the 𝜒 2 value will be
small, indicating a good fit. If the observed frequencies
differ considerably from the expected frequencies, the 𝜒 2
value will be large and the fit is poor. A good fit leads to
the acceptance of 𝐻0 , whereas a poor fit leads to its
rejection. The critical region will, therefore, fall in the
right tail of the chi-square distribution.
For a level of significance 𝛼 with 𝑣 degrees of freedom
we find the critical region value 𝜒𝛼 2 from the table.

4
The number of degrees of freedom in chi-square
goodness of fit test is equal to the number of categories
minus the number of quantities obtained from the
observed data, which are used in the calculations of the
expected frequencies.
The shapes of 𝜒𝛼 2 distribution for various degrees of
freedom are given below.
As the degrees of freedom increases
The shape become symmetrical.

5
The testing procedure involves the following steps.
1. 𝑯𝟎 : Fit is Good or (Sample data obtained from specified
distribution)
2. 𝑯𝟏 : Fit is not Good or (Sample data not obtained from
specified distribution)
3. Choose a level of significance equal to 𝜶.
4. Test Statistic:
𝑘
2
𝑜𝑖 − 𝑒𝑖
𝜒2 = ,
𝑒𝑖
𝑖=1
With 𝑑. 𝑓 = 𝑣 = 𝑘 − 1, where k is the No. of cells or
categories or classes.

6
5. Critical Region:
The critical region at level of significance 𝛼 for right
tailed (always) with degrees of freedom 𝑣 is
𝜒 2 > 𝜒𝛼 2

6. Conclusion:
Reject 𝐻0 , if the calculated value of 𝜒 2 falls in
critical region; otherwise accept 𝐻0 .
(OR)
Reject 𝐻0 , if the calculated value of 𝜒 2 is greater than
the tabulated value of 𝜒 2 (i.e. if 𝜒𝑐𝑎𝑙 2 > 𝜒𝑡𝑎𝑏 2 ), otherwise
accept 𝐻0 .

7
Example
A die is tossed 180 times with the following results
Dots on die (𝒙) 1 2 3 4 5 6
Frequency (o) 28 36 36 30 27 23

Is this a fair die? Use a 0.01 level of significance.

Solution:
It is important to note that when we hypothesize that the
die is honest / balance or fair, which is equivalent to
testing the hypothesis that the distribution of outcomes is
uniform.

8
The testing procedure is given below:

1. 𝐻0 : The die is fair (or Distribution is uniform)


(OR)
1
𝑃 1 =𝑃 2 =𝑃 3 =𝑃 4 =𝑃 5 =𝑃 6 =
6
2. 𝐻1 : The die is not fair (or Distribution is not uniform)
3. 𝛼 = 0.01
4. Test Statistic:
𝑘
2
𝑜𝑖 − 𝑒𝑖
𝜒2 = ,
𝑒𝑖
𝑖=1

9
Computations are as under:
𝒙 Prob. o 𝒆 = 𝟏𝟖𝟎 × 𝒑𝒓𝒐𝒃. 𝒐−𝒆 𝟐

1 1 28 1 0.13
180 × = 30
6 6
2 1 36 1 1.20
180 × = 30
6 6
3 1 36 1 1.20
180 × = 30
6 6
1 1
4 30 180 × = 30 0.00
6 6
1 1
5 27 180 × = 30 0.30
6 6
1 1
6 23 180 × = 30 1.63
6 6

Total 1 180 180 4.46

Then 𝜒 2 = 4.46

10
5. Critical Region:
The critical region at 𝛼 = 0.01 with
𝑑. 𝑓. = 𝑣 = 6 − 1 = 5 is 𝜒 2 > 𝜒0.01(5) = 15.09

6. Conclusion:
Since calculated value of 𝜒 2 falls in acceptance region,
therefore we accept 𝐻0 , and conclude that the die is
fair.
(OR)
Since 𝜒𝑐𝑎𝑙 2 < 𝜒𝑡𝑎𝑏 2 , we accept 𝐻0 .

11
Example
An admission committee has submitted a report to the
principal of the college, claiming that among the
freshman, 20% of the students have shown preference for
Pre-Medical, 40% for Pre-Engineering, 25% for
Commerce and rest of the freshmen for Arts. Of the 620
freshmen for this year, 98 students declared Pre-Medical,
300 students went for Pre-Engineering, 182 students
preferred Commerce and the rest of the students declared
for Arts. Test at 𝛼 = 0.05, if this data confirms the claims
of the admission committee.

12
The testing procedure is given below:

1. 𝐻0 : P(Pre-Medical) = 0.20, P(Commerce) = 0.25


P(Pre-Engineering) = 0.40, P(Arts) = 0.15
2. 𝐻1 : Probabilities are different
3. 𝛼 = 0.05
4. Test Statistic:
𝑘 2
𝑜𝑖 − 𝑒𝑖
𝜒2 = ,
𝑒𝑖
𝑖=1

13
Computations are as under:

Category o Prob. 𝒆 = 𝟔𝟐𝟎 × 𝒑𝒓𝒐𝒃. 𝒐−𝒆 𝟐

Pre-Medical 98 0.20 124 5.45


Pre-Engineering 300 0.40 248 10.90
Commerce 182 0.25 155 4.70
Arts 40 0.15 93 30.20
Total N = 620 1 620 51.25

Then 𝜒 2 = 51.25

14
5. Critical Region:
The critical region at 𝛼 = 0.05 for right-tailed test
with 𝑑. 𝑓. = 𝑣 = 4 − 1 = 3 is 𝜒 2 > 𝜒0.05(3) = 7.815

6. Conclusion:
Since calculated value of 𝜒 2 falls in critical region,
therefore we reject 𝐻0 .
(OR)
Since 𝜒𝑐𝑎𝑙 2 > 𝜒𝑡𝑎𝑏 2 , we reject 𝐻0 .

15
Example
In a survey of 400 Infants chosen at random, it is found that
185 are girls. Are boy and girl births equally likely, according
to this survey use 𝛼 = 0.05 .

Solution:
The testing procedure is given below:
1. 𝐻0 : Proportion of girls and boys is same.
[i.e. P(B)=P(G)=1/2]
2. 𝐻1 : Proportion of girls and boys is not same.
3. 𝛼 = 0.05
4. Test Statistic:
𝑘
2
𝑜𝑖 − 𝑒𝑖
𝜒2 = ,
𝑒𝑖
𝑖=1

16
Computations are as under:

Category o Prob. 𝒆 = 𝟒𝟎𝟎 × 𝒑𝒓𝒐𝒃. 𝒐−𝒆 𝟐

Girls 185 1 1 1.125


400 × = 200
2 2
Boys 215 1 1 1.125
400 × = 200
2 2

Total N = 400 1 400 2.25

Then 𝜒 2 = 2.25

17
5. Critical Region:
The critical region at 𝛼 = 0.05 for right-tailed test
with 𝑑. 𝑓. = 𝑣 = 2 − 1 = 1 is 𝜒 2 > 𝜒0.05(1) = 3.84

6. Conclusion:
Since calculated value of 𝜒 2 falls in acceptance region,
therefore we accept 𝐻0 .
(OR)
Since 𝜒𝑐𝑎𝑙 2 < 𝜒𝑡𝑎𝑏 2 , we accept 𝐻0 .

18
19
A contingency table is defined as a two way table in
which frequencies of various categories of two attributes
(or factors) are classified in rows and columns.
For example, a sample of employed persons may be
classified according to educational attainment and type of
occupation; College students may be classified according
to class status and smoking habits, etc.

20
A table with r number of rows and c number of columns is called an 𝑟 × 𝑐
contingency table. The general 𝑟 × 𝑐 contingency table for the two attributes
A and B is shown below:
A B 𝑩𝟏 𝑩𝟐 ---- 𝑩𝒋 ---- 𝑩𝒄 Total
𝐴1 𝑂11 𝑂12 ---- 𝑂1𝑗 ---- 𝑂1𝑐 𝑅1
𝐴2 𝑂21 𝑂22 ---- 𝑂2𝑗 ---- 𝑂2𝑐 𝑅2
. . . . .
. . . . .
. . . .
𝐴𝑖 𝑂𝑖1 𝑂𝑖2 ---- 𝑂𝑖𝑗 ---- 𝑂𝑖𝑐 𝑅𝑖
. . . . . .
. . . . . .
. . . . .
𝐴𝑟 𝑂𝑟1 𝑂𝑟1 ---- 𝑂𝑟𝑗 ---- 𝑂𝑟𝑐 𝑅𝑟

Total 𝐶1 𝐶2 𝐶𝑗 𝐶𝑐 G
Where 𝑂𝑖𝑗 = observed frequency of ith row and jth column
𝑅𝑖 = total of ith row, 𝐶𝑗 = total of jth column,
G = grand total of all observed frequencies
21
A Contingency table is usually constructed for the
purpose of studying the relationship between two
attributes. It indicates whether two characteristics are
independent or dependent on one another.

Note:
It is important to note that the “natural” application of the
contingency table analysis is for cases in which each
observation is measured by two qualitative variables.
However quantitative variables may also be used to
classify the observations into rows and columns or both.

22
The procedure for testing the independence in a
contingency table is as follows:

1. 𝑯𝟎 : The two attributes (or factors) are independent


OR
There is no association between two factors
2. 𝑯𝟏 : The two attributes (or factors) are dependent
OR
There is an association between two factors
3. Choose a level of significance equal to 𝜶.

23
4. Test Statistic:
2
2
𝑜𝑖𝑗 − 𝑒𝑖𝑗
𝜒 = ,
𝑒𝑖𝑗
Where 𝑜𝑖𝑗 represents observed frequency of ith row and jth
column and 𝑒𝑖𝑗 represents the expected frequency of ith row
and jth column.
The number of degrees of freedom is
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1)
Where r and c are the number of rows and the columns in the
contingency table, respectively.
The formula for computing the expected frequency for each
cell (i.e. for ith row and jth column) is given as
𝑅𝑖 𝐶𝑗
𝑒𝑖𝑗 =
𝐺
24
5. Critical Region:
The critical region at level of significance 𝛼 for right
tailed (always) with degrees of freedom 𝑣 = (𝑟 − 1)(𝑐 − 1) is
𝜒 2 > 𝜒𝛼 2

6. Conclusion:
Reject 𝐻0 , if the calculated value of 𝜒 2 falls in
critical region; otherwise accept 𝐻0 .
(OR)
Reject 𝐻0 , if the calculated value of 𝜒 2 is greater than
the tabulated value of 𝜒 2 (i.e. if 𝜒𝑐𝑎𝑙 2 > 𝜒𝑡𝑎𝑏 2 ), otherwise
accept 𝐻0 .

25
Example
1600 families were selected at random in a city to test the
belief that high income families usually send their
children to private schools and low income families often
send their children to Government schools. The
following results were obtained:
School Income Private Govt. Total
Low 494 506 1000
High 162 438 600
Total 656 944 1600

Test whether income and type of schools are independent


at 𝛼 = 0.05.

26
Solution:
1. 𝐻0 : Income and type of schools are independent
2. 𝐻1 : Income and type of schools are not independent
3. 𝛼 = 0.05.
4. Test Statistic:
2
𝑜𝑖𝑗 −𝑒𝑖𝑗 𝑅𝑖 𝐶 𝑗
𝜒2 = , where 𝑒𝑖𝑗 =
𝑒𝑖𝑗 𝐺
Since 𝑂11 = 494, 𝑂12 = 506, 𝑂21 = 162, 𝑂22 = 438
then
𝑅1 𝐶1 (1000)(656) 𝑅 𝐶 (1000)(944)
𝑒11 = = = 410, 𝑒12 = 1 2 = = 590
𝐺 1600 𝐺 1600
𝑅2 𝐶1 (600)(656) 𝑅2 𝐶2 (600)(944)
𝑒21 = = = 246, 𝑒22 = = = 354
𝐺 1600 𝐺 1600

27
Computations are as under:

o e 𝒐−𝒆 𝟐
𝒆
𝑂11 = 494 𝑒11 = 410 17.2
𝑂12 = 506 𝑒12 = 590 11.96
𝑂21 = 162 𝑒21 = 246 28.68
𝑂22 = 438 𝑒22 = 354 19.93
Total 1600 1600 77.78

Then 𝜒 2 = 77.78

28
5. Critical Region:
The critical region at 𝛼 = 0.05 for right-tailed test
with 𝑑. 𝑓. = 𝑣 = (2 − 1)(2 − 1) = 1 is
𝜒 2 > 𝜒0.05(1) = 3.84

6. Conclusion:
Since calculated value of 𝜒 2 falls in critical region,
therefore we reject 𝐻0 and conclude that the income and
type of schools are dependent.
(OR)
Since 𝜒𝑐𝑎𝑙 2 > 𝜒𝑡𝑎𝑏 2 , we reject 𝐻0 .

29
Example
The following table shows the relation between the
number of accidents in 1 year and the age of the driver in
a random sample of 500 drivers between 18 and 50. Test
at 𝛼 = 0.01, the hypothesis that the number of accidents
is independent of driver’s age.
Age of Driver Total
18 - 25 26 - 40 Over 40
0 75 115 110 300
No. of 1 50 65 35 150
Accidents
2 25 20 5 50
Total 150 200 150 500

30
Solution:
1. 𝐻0 : There is no association between age of driver and
number of accidents
2. 𝐻1 : There is an association
3. 𝛼 = 0.01.
4. Test Statistic:
2
2 𝑜𝑖𝑗 −𝑒𝑖𝑗 𝑅𝑖 𝐶 𝑗
𝜒 = , where 𝑒𝑖𝑗 =
𝑒𝑖𝑗 𝐺
then
𝑅1 𝐶1 (300)(150) 𝑅1 𝐶2 (300)(200)
𝑒11 = 𝐺
= 500
= 90, 𝑒12 = 𝐺
= 500
= 120
𝑅 𝐶 (300)(150) 𝑅 𝐶 (150)(150)
𝑒13 = 1 3 = = 90, 𝑒21 = 2 1 = = 45
𝐺 500 𝐺 500
𝑅 𝐶 150 (200) 𝑅 𝐶 150 (150)
𝑒22 = 2𝐺 2 = 500 = 60, 𝑒23 = 2𝐺 3 = = 45
500

31
𝑅3 𝐶1 (50)(150) 𝑅3 𝐶2 (50)(200)
𝑒31 = 𝐺 = 500 = 15, 𝑒32 = = = 20
𝐺 500
𝑅 𝐶 (50)(150)
𝑒33 = 3 3 = = 15,
𝐺 500
Computations are as under:
o e 𝒐−𝒆 𝟐

𝒆
𝑂11 = 75 𝑒11 = 90 2.5
𝑂12 = 115 𝑒12 = 120 0.2
𝑂13 = 110 𝑒13 = 90 4.4
𝑂21 = 50 𝑒21 = 45 0.6
𝑂22 = 65 𝑒22 = 60 0.4
𝑂23 = 35 𝑒23 = 45 2.2
𝑂31 = 25 𝑒31 = 15 6.7
𝑂32 = 20 𝑒32 = 20 0
𝑂33 = 5 𝑒33 = 15 6.7
Total 500 500 23.7

Then 𝜒 2 = 23.7
32
5. Critical Region:
The critical region at 𝛼 = 0.01 for right-tailed test
with 𝑑. 𝑓. = 𝑣 = (3 − 1)(3 − 1) = 4 is
𝜒 2 > 𝜒0.01(4) = 13.28

6. Conclusion:
Since calculated value of 𝜒 2 falls in critical region,
therefore we reject 𝐻0 and conclude that there is a
relationship between number of accidents and age of the
drivers.
(OR)
Since 𝜒𝑐𝑎𝑙 2 > 𝜒𝑡𝑎𝑏 2 , we reject 𝐻0 .
33
34
 Curve fitting is the process of constructing a curve, or
mathematical functions, which possess closest proximity to the
series of data. By the curve fitting we can mathematically
construct the functional relationship between the observed fact
and parameter values, etc. It is highly effective in
mathematical modeling some natural processes.
 It is a statistical technique use to drive coefficient values for
equations that express the value of one (dependent) variable as
a function of another (independent variable).

35
The main purpose of curve fitting is to theoretically
describe experimental data with a model (function or
equation) and to find the parameters associated with this
model.

The method which we use here for finding curve fitting is


Method of Least Squares.

36
1. Equation of straight line
𝑦 = 𝑎𝑥 + 𝑏

2. Equation of parabola or second order polynomial or


Quadratic curve
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2

3. Exponential curve
𝑦 = 𝑎𝑏 𝑥 or 𝑦 = 𝑎𝑒 𝑏𝑥

37
A line to be fitted for the data
𝑦 = 𝑎𝑥 + 𝑏, where a and b needs
to be calculated.

The normal equations for determining


a and b are:

𝑦 = 𝑛𝑎 + 𝑏 𝑥
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥2

The equation for required straight line


Will be
𝑦𝑖 = 𝑎 + 𝑏𝑥𝑖

38
Then the error between the actual vertical points 𝑦𝑖 and
the fitted points 𝑦𝑖 is given by

𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖

𝑒𝑖 Minimum best fit

39
Example
Fit a straight line by the method of least squares to the
following data:
𝒙 1 2 3 4 5
𝑦 3 4 6 9 10

Solution:
Let the equation of the straight line to be fitted to the
data, be 𝑦 = 𝑎𝑥 + 𝑏, where a and b are to be evaluated.
The normal equations for determining a and b are
𝑦 = 𝑛𝑎 + 𝑏 𝑥
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2

40
𝒙 𝒚 𝒙𝒚 𝒙𝟐
1 3 3 1
2 4 8 4
3 6 18 9
4 9 36 16
5 10 50 25
𝑥 = 15 𝑦 = 32 𝑥𝑦 = 115 𝑥 2 = 55

Now normal equations become


5𝑎 + 15𝑏 = 32
15𝑎 + 55𝑏 = 115
Solving these equations
𝑎 = 0.7, 𝑏 = 1.9
Hence, the equation of the required straight line is
𝑦 = 0.7 + 1.9𝑥
41
The Simplest type of a non-linear approximating curve is the
second degree parabola that has the equation
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 ,

Where the values a, b and c to be determined by

𝑦 = 𝑛𝑎 + 𝑏 𝑥 + 𝑐 𝑥 2
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥 3
𝑥 2𝑦 = 𝑎 𝑥 2 + 𝑏 𝑥 3 + 𝑐 𝑥 4

Error:
𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖

42
Example
Fit a second degree parabola to the following data:
𝒙 0 1 2 3 4
𝑦 1 1.8 1.3 2.5 6.3

Solution:
Let the equation of the second degree parabola be
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
The normal equations are
𝑦 = 𝑛𝑎 + 𝑏 𝑥 + 𝑐 𝑥 2
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥 3
𝑥 2𝑦 = 𝑎 𝑥 2 + 𝑏 𝑥 3 + 𝑐 𝑥 4

43
𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒙𝟐 𝒚 𝒙𝟑 𝒙𝟒
0 1 0 0 0 0 0
1 1.8 1.8 1 1.8 1 1
2 1.3 2.6 4 5.2 8 16
3 2.5 7.5 9 22.5 27 81
4 6.3 25.2 16 100.8 64 256
10 12.9 37.1 30 130.3 100 354

Now the normal equations become


5𝑎 + 10𝑏 + 30𝑐 = 12.9
10𝑎 + 30𝑏 + 100𝑐 = 37.1
30𝑎 + 100𝑏 + 354𝑐 = 130.3
Solving simultaneously, we get
𝑎 = 1.42, 𝑏 = −1.07, 𝑐 = 0.55
Hence, the required second degree parabola equation is
𝑦 = 1.42 − 1.07𝑥 + 0.55𝑥 2
44
Example
Find the equation of the curve 𝑦 = 𝑎𝑏 𝑥 , that best fits the
following data. And find 𝑦 10 .
𝒙 3 4 5 6 7 8 9
𝑦 11 12 14 18 19 21 23

Solution:
The given relation is
𝑦 = 𝑎𝑏 𝑥
Taking ln on both sides
𝑙𝑛𝑦 = 𝑙𝑛𝑎 + 𝑥𝑙𝑛𝑏

45
Taking
𝑌 = 𝑙𝑛𝑦, 𝐴 = 𝑙𝑛𝑎, 𝐵 = 𝑙𝑛𝑏
So,
𝑙𝑛𝑦 = 𝑙𝑛𝑎 + 𝑥𝑙𝑛𝑏

𝑌 = 𝐴 + 𝑥𝐵

Which is now in linear form.

𝑌 = 𝑛𝐴 + 𝐵 𝑥
𝑥𝑌 = 𝐴 𝑥 + 𝐵 𝑥 2

46
𝒙 𝒚 𝒀 = 𝒍𝒏𝒚 𝒙𝐘 𝒙𝟐
3 11 2.3978 7.1934 9
4 12 2.4849 9.9396 16
5 14 2.6390 13.195 25
6 18 2.8903 17.3418 36
7 19 2.9444 20.6108 49
8 21 3.0445 24.356 64
9 23 3.1354 28.2186 81
𝑥 = 42 𝑦 = 118 𝑌 = 19.5359 𝑥𝑌 = 120.8552 𝑥 2 = 280

47
Now,
𝐴 = 𝑙𝑛𝑎, 𝐵 = 𝑙𝑛𝑏
𝑒𝐴 = 𝑎 𝑒𝐵 = 𝑏
𝑒 2.01088 = 𝑎 𝑒 0.12999 = 𝑏
𝑎 = 7.4698 𝑏 =1.1388

Hence the required equation of the curve is

𝑦 = 7.4698(1.1388)𝑥

𝑦(10) = 7.4698(1.1388)10 = 27.4020

48
Example
Determine the constants a and b by the method of least
square such that 𝑦 = 𝑎𝑒 𝑏𝑥
𝒙 2 4 6 8 10
𝑦 4.077 11.084 30.128 81.897 222.62

Solution:
The given relation is
𝑦 = 𝑎𝑒 𝑏𝑥
Taking ln on both sides
𝑙𝑛𝑦 = 𝑙𝑛𝑎 + 𝑏𝑥

49
Taking
𝑌 = 𝑙𝑛𝑦, 𝐴 = 𝑙𝑛𝑎
So,
𝑙𝑛𝑦 = 𝑙𝑛𝑎 + 𝑏𝑥

𝑌 = 𝐴 + 𝑏𝑥

Which is now in linear form.

𝑌 = 𝑛𝐴 + 𝑏 𝑥
𝑥𝑌 = 𝐴 𝑥 + 𝑏 𝑥 2

50
𝒙 𝒚 𝒀 = 𝒍𝒏𝒚 𝒙𝐘 𝒙𝟐
2 4.077 1.4054 2.8108 4
4 11.084 2.4055 9.622 16
6 30.128 3.4054 20.4324 36
8 181.897 5.2034 41.6272 64
10 222.62 5.4054 54.054 100

𝑥 = 30 𝑦 = 449.806 𝑌 = 17.8251 𝑥𝑌 = 128.5464 𝑥 2 = 220

51
Now,
𝐴 = 𝑙𝑛𝑎
𝑒𝐴 = 𝑎
𝑎 = 𝑒 0.3256
𝑎 = 1.3848

Hence the required equation of the curve is

𝑦 = 1.3848𝑒 0.539𝑥

52

You might also like