Professional Documents
Culture Documents
ABE 463:
ENGINEERIN
G
STATISTICS
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
REFERENCE MATERIALS
1. Spiegel, M.R and Stephens, L.J (2008). Statistics. Schaum’s Outlines 4th Edition Mc Graw- Hill, New
York.
2. Anderson D.R, Sweetney D.J, Williams, T.A (1986). Concepts and Applications, West Publishing
House Company
3. Hinnes W.W, Montgomery D.C, Goldman D.M and Bomor C.M (2003). Probability Statistics in
Engineering. 4th Edition, 1 -672 pgs. John Wiley, New York.
4. Montgomery D.C and Rung G.C 2010. Applied Statistics and Probability for engineering Students
Solution, 5th Edition Paper Back. 1 - 400 pgs.
COURSE OUTLINE
1. Moments, skewness and Kurtosis.
2. Chi – Square Test.
3. Curve fitting and method of least square.
4. Small sampling theory, test of hypothesis and significant.
5. Correlation theory.
6. Analysis of Variance.
7. Probability, Binomial, Poison, Hypergeometric Beta Distribution.
8. Probability density function, Cumulative distribution function etc.
9. Introduction to Spectra Analysis, value, Mean square, auto-correlation function and
spectra density of random signals.
10. Introduction to statistical software packages useful in engineering.
11. Analysis of Time Series.
INTRODUCTION
1.0 INTRODUCTION
1.1.1 MOMENTS
If X1, X2, ……XN are N values assumed by the variable X, the quantity
𝑟
𝑋 𝑟 +𝑋 𝑟 + ..… +𝑋𝑁 ∑ 𝑋𝑟
𝑋̅ r= 1 2 =∑𝑁 𝑋
𝑗=1 𝑗
𝑟
= ---------------- (1)
𝑁 𝑁
Is called the rth moment
To avoid particular units, we can define dimensionless moments about the mean as:
𝑚𝑟 𝑚𝑟
𝑎𝑟= 𝑚𝑟 = 𝑟 = ------------ (7)
𝑠𝑟 (√𝑚2 ) √𝑚22
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
1.2 SKEWNESS
Skewness is the degree of asymmetry of a distribution
If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail
to the right of the central maximum than to the left, the distribution is said to be
skewed to the right or to have +ve skewness.
-ve Skewness
+ve Skewness
For skewed distributions, the mean tends to lie on the side mode (the side with the
longer tail).
DEFINITION
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 𝑋̅−𝑚𝑜𝑑𝑒
Skewness = = . ---------- (8)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆
Avoid using the mode
3 (𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛) 3( 𝑋̅−𝑚𝑒𝑑𝑖𝑎𝑛)
Skewness = = ---------- (9)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆
Eqn (8) and (9) are called Pearson’s first and second coefficients of skewness
An important measure of skewness uses the third moment about the mean expressed in
dimensionless form as
𝑚3 𝑚3 𝑚
Moment coefficient of skewness = a3 = = 3
= 3 -------- (12)
𝑎3 ( √𝑚2 )
√𝑚23
1.3 KURTOSIS
Kurtosis is the degree of peakedness of a distribution usually taken relating to a
normal distribution. A distribution having a relatively high peak is called LEPTOKURTIC,
the one which is flat topped is called PLATYKURTIC, while a normal distribution that is
not leptokurtic nor platykurtic is called MESOKURTIC.
One measure of Kurtosis uses the fourth moment about the mean expressed in
dimensionless form and is given by:
𝑚4 𝑚4
Moment coeff of kurtosis = a4 = 4 = 2 --------- (13)
𝑆 𝑚2
Which is often denoted by b2. For the normal distribution, b2 = a4 = 3. For this reason,
kurtosis is sometimes defined by (b2 - 3) which is positive for a leptokurtosis
distribution, negative for a platykurtosis distribution and zero for the normal
distribution.
Another measure of kurtosis is based on quartiles and percentiles and is given by:
𝑄
K= ------- (14)
P90 − P10
Where; k = lowercase Greek letter kappa = percentile coefficient of kurtosis
WORKED EXAMPLE
(1.1).1 Find the (a) first (b) second (c) third and (d) fourth moments of the set
2, 3, 7, 8, and 10.
Solution
∑𝑋 2+3+7+8+10 30
(a) The first moment or arithmetic mean is 𝑋̅ = = = =6
𝑁 5 5
(1.1).2 Find the (a) first (b) second (c) third and (d) fourth moments about the
mean for the set 2,3,7,8 and 10.
Solution
̅̅̅̅̅̅̅̅ Ʃ(𝑋−𝑋̅ ) 30
(a) 𝑚1 = (𝑋 − 𝑋̅) = , 𝑋̅ = = 6
𝑁 5
(2−6)+(3−6)+(7−6)+(8−6)+(10−6) 0
= =5 = 0
5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )2
̅̅̅̅̅̅̅̅̅
(b) 𝑚2 = (𝑋 − 𝑋̅)2 =
𝑁
(2−6)2 +(3−6)2 +(7−6)2 +(8−6)2 +(10−6)2 46
= = = 9.2
5 5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )3
̅̅̅̅̅̅̅̅̅
(c) 𝑚3 = (𝑋 − 𝑋̅)3 =
𝑁
(2−6)3 +(3−6)3 +(7−6)3 +(8−6)3 +(10−6)3 −18
= = = -3.6
5 5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )4
̅̅̅̅̅̅̅̅̅
(d) 𝑚4 = (𝑋 − 𝑋̅)4 =
𝑁
(2−6)4 +(3−6)4 +(7−6)4 +(8−6)4 +(10−6)4 610
= = = 122
5 5
(1.1).3 Find the (a) first (b) second (c) third and (d) fourth moments about the
origin 4 for the 2,3,7,8 and 10.
Solution
Ʃ(𝑋−4) (2−4)+(3−4)+(7−4)+(8−4)+(10−4)
(a) 𝑚1′ = (𝑋 − 4) =
̅̅̅̅̅̅̅ = =2
𝑁 5
Ʃ (𝑋−4)2 (2−4)2 +(3−4)2 +(7−4)2 +(8−4)2 +(10− 4)2 66
(b) 𝑚2′ =(𝑋 − 4)2 =
̅̅̅̅̅̅̅ = = = 13.2
𝑁 5 5
(1.1).4 Find the first four moments about the mean for the height distribution of
100 male students of the Faculty of Engineering and Technology, University of Ilorin,
Ilorin.
72 – 74 73 8
X µ=
𝑿−𝑨
F fµ f𝝁𝟐 f𝝁𝟑 f𝝁𝟒
𝑪
61 -2 5 -10 20 -40 80
64 -1 18 -18 18 -18 18
A - 67 0 42 0 0 0 0
70 1 27 27 27 27 27
73 2 8 16 32 64 128
Ʃ f =100 Ʃ fµ =15 Ʃ f𝜇 2 = 97 Ʃ f𝜇 3 = 33 Ʃ f𝜇 4 = 253
Ʃ 𝑓µ 15
𝑚1′ = C = 3 (100) = 0.45.
𝑁
Ʃ 𝑓𝜇2 97
𝑚2′ = 𝐶 2 = 32 (100) = 8.73.
𝑁
Ʃ 𝑓𝜇3 33
𝑚3′ = 𝐶 3 = 33 (100) = 8.91.
𝑁
Ʃ 𝑓𝜇4 203
𝑚4′ = 𝐶 4 = 34 (100) = 204.93.
𝑁
Thus 𝑚1 = 0
2
𝑚2 = 𝑚2′ - 𝑚1′ = 873 – (0.45)2 = 8.527
2.1 INTRODUCTION
Many times, the results obtained in sampling do not always agree exactly with theoretical
results expected according to the rules of probability e.g if a coin is tossed 100 times, we
expect 50 heads and 50 tails. It is rare to obtain these results exactly.
Suppose that in a particular sample, a set of possible events E1, E2, E3, …, Ek are
observed to occur with frequencies O1, O2, O3, …Ok , called observed frequencies, and
that according to probability rules they are expected to occur with frequencies e1, e2, e3,
…, ek called expected or theoretical frequencies, we can compare the observed
frequencies with the expected frequencies and see if they differ significantly.
TABLE 2.1
Events E1 E2 E3 … Ek
Observed frequency O1 O2 O3 … Ok
Expected frequency e1 e2 e3 … ek
2.2 DEFINITION Χ2
χ2 (read Chi-square) is a measure of discrepancy existing between observed and
expected frequencies. This is expressed by:
𝑘
2
(𝑂1 − 𝑒1 )2 (𝑂2 − 𝑒2 )2 (𝑂𝑘 − 𝑒𝑘 )2 (𝑂𝑗 − 𝑒𝑗 )2
χ = + +. . … + = ∑ … … … (𝑒𝑞𝑛 2.1)
𝑒1 𝑒2 𝑒𝑘 𝑒𝑗
𝑗=1
when the total frequency = N
∑ 𝑂𝑗 = ∑ 𝑒𝑗 = 𝑁 … … … … … … … … (𝑒𝑞𝑛 2.2)
An expression equivalent to (eqn 2.1) is
2
𝑂𝐽2
χ = ∑ 2 − 𝑁 … … … … … … … … … … (𝑒𝑞𝑛 2.3)
𝑒𝑗
If χ2 = 0, the observed and theoretical frequencies agree exactly, while if χ2 > 0, they do
not agree exactly. The larger the value of χ2, the greater is the discrepancy between the
observed and the expected.
Agreement between the observed & expected frequencies is computed using equation:
(𝑂𝑗 − 𝑒𝑗 )2
χ2 = ∑ … … … … … … … … … … … … . . (2.4)
𝑒𝑗
𝑗
where the sum is taken over all cell in the contingency table and where symbols Oj & ej
represent respectively, the observed and expected frequencies within the jth cell. This
sum is similar to eqn (2.1) contains ***h/c*** terms. The sum of all observed frequencies
is denoted by N and is equal to the sum of expected frequencies [compare with eqn(2)].
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Eqn (2.4) has a sampling distribution approximated very closely by eqn (2.3), provided
the expected frequencies are not too small. The no. of degrees of freedom, µ, of the
chi-square distribution is given for h>1 and k>1 by
TABLE 2.2
Sum 2 3 4 5 6 7 8 9 10 11 12
Observed 15 35 49 58 65 76 72 60 35 29 *6*
The expected number, if the dice are fair are determined from the distribution of χ as
shown in Table 2.3
TABLE 2.3
χ 2 3 4 5 6 7 8 9 10 11 12
P(χ) 1⁄ 2⁄ 3⁄ 4⁄ 5⁄ 6⁄ 5⁄ 4⁄ 3⁄ 2⁄ 1⁄
36 36 36 36 36 36 36 36 36 36 36
Under the Null hypothesis that there is no significant difference between the observed
(expts) & the theoretical or hypothetical values – good compatibility between theory of
expt Karl |Pearson| (1900) proved that:
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
𝑛
2
(𝑂1 − 𝐸1 )2
χ =∑
𝐸1
𝑖=1
(1) |in ******| the expected frequencies E1, E2, …, En corresponding to the observed
frequencies O1, O2, …, On
(ii) Compute the deviations (O – E) for each frequency & then square to obtain (O –
E)2.
(iii) Divide the square of the deviation (O – E)2 by the corresponding expected
frequency (O – E)2.
2
(𝑂 − 𝐸)2
χ =∑
𝐸2
(v) under the null hypothesis that the theory fits the data well, the above statistics
follows χ2 – distribution with ν = (n-1) dif
(vi)From tables, check the critical values of χ2 for (n-1) at a certain level of
significance 5% or 1%.
(vii) If the calculated value of χ2 obtained in step (iv) is less than the tabulated
value, it is said to be non-significant.
(vii) We therefore accept the null hypothesis and may conclude that the observed
and expected frequencies agree perfectly. that there is good between these ** the
****
(ix) if the calculated value of x2 (Chi-Square) is greater than the lab tested value, it
is said to be significant, the discrepancies between the observed and expected
frequencies cannot be attributed to chance, we reject the hypothesis.
Uses
Note: For circumstances where χ2 is too close to zero (very rare), it shows that
frequencies agree to well with the expected frequencies. To examine such
situations, determine whether the computed value of χ2 is less than χ0.052 or χ0.012 in
which cases, we would decide that the agreement is too good at the 0.05 or 0.01
significance levels respectively.
Observed 15 35 49 58 65 76 72 60 35 29 6
Expected 13.9 27.8 41.7 55.6 69.5 83.4 69.5 55.6 41.7 27.8 13.9
∑𝑗(𝑂𝑗 − 𝑒𝑗 )2
χ2 = ⁄𝑒 = 10.34
𝑗
P value corresponding to 10.34 is 0.411 because of the large value of P, the throwing is
fair.
2.5 CONTINGENCY TABLES
Table 2.1, in which the observed frequencies occupy a single row, is called a one-way
classification table, since the no of columns is k, this is called a 1 X k (read “1 by k”)
table. By extending these ideas, we can have two-way classification tables in which
the observed frequencies occupy h rows and k columns. Such tables are called
contingency tables.
contingency table, are called cell frequencies. The total frequency in each row or
column is called the marginal frequency.
𝑥2
C= √
(𝑥 2 +𝑁)
𝑥2
R=√
𝑁(𝑘−1)
Example 2.7
The effect of a drug to cure malaria was determined for 2 groups of people,
A and B, each contorting of 100 patients. The drug was given to A but not B
circle is called the control otherwise the two group are treated identically. It
is formed in group A and B, 75 and 65 people, respectively recovered from
the …….. (1) at the significant level of (a) 0.01 (b) 0.05 and (c)0.10 test the
hypothesis that the drug helps in curing the disease. Contingency and the
correlation of attribute.
Solution:
Develop a null hypothesis (H0) that the drug has no effect we would expect
70 patients.
75 + 65
= 70
2
to recover and 30 not to recover as shown in table 2.6b
U = 2-1 d=1
We conclude that the results are not significant at the 0.05 level. We
therefore accept the H0 at this level and can conclude that the drug is that
effective or we withhold our decision.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Note that a one tailed test using x2 is equivalent to a two tailed test using x
since x2>x20.95 corresponds to x > x0.95 or x<-0.95. for 2x 2 tables x2 is
square of 2 scores, it follows that x is the same as Z for this case.
𝑥2 2.38
√(𝑥 2+𝑁) = √(2.83+2) = √0.01176 = 0.1084
Since the expected cell frequency (assuming complete into …..) are all
100+0 (100−50) (0−50)2 (0−50)2 (100−50)2
event to 50( ie ) 𝑥 2= + + + = 200
2 50 50 50 50
𝑥2 200
Thus the maximum value of c is √ =√ = 0.707
(𝑥 2 +𝑁) 200+200
𝑥2 2.38
R=√ =√ = 0.1091
𝑁(𝑘+1) 200
Indicating very little correlation between recovery & the use of the drug.
e.g when you plot the points (x1,y1 ), (x2,y2). In a rectangular co-ordinate
system, the set of points may not be a straight line. This is called
SCATTERED DIAGRAM.
140 40
Weight
30
130
20
120 10
0
2 4 6 8 10
10 20 30 40
Height
Non-Linear Relationship
Linear Relationship
Example: weight of adult male depends to some degree on the height.
The point that lie on or near one particular line is called regression
line. The parabolic curve is called regression curve. Method adopted by
scientist and engineer is called the regression line or line of bestfits.
The right side of the equation is called the polynomials of the first order
second, third forth and with degree respectively. The function describe the
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
first four equation are sometimes as called the linear, quadratic, cubic and
quadratic functions respectively.
The disadvantage of this method is that different observers will obtain different curves
and equation.
The simplest type of curve fitting is a straight line, the equation can be written as
Y= a0 + a1X 3.1
Given any 2 points on the line, the constant a0 and a1 can be determined. The
resulting equation of the line can be written as
𝑌 −𝑌
Y-Y1 = (𝑋2 −𝑋1 ) (X - X1) ………. (3.2)
2 1
Y2−Y1
Y – Y1 = m(X- X1). Where m = …………………… (3.7)
X2−X1
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
It is called the slope of the line and represent the change in Y divided by the
corresponding change in X.
From Y= a0 + a1X
Where X = 0, Y = a0 = intercept.
Fig.3.1
Considering the data point (X1, Y1), (X2, Y2)…….. (XN, YN), for a given value of X
say X1, there will be a difference between the value Y1 and the corresponding
value as determined from the curve. Representing this difference with D1 which
is commonly called the derivation error or arbitral error
D1 may be +ve, -ve or zero (0). Similarly, we can obtain from the values X2......... XN,
we obtain the derivations D2…...DN.
The quantity D2 + D22 + DN2 is a measure of the “goodness of fit”, if this is small
the fit is good, if it is large the fit is bad.
Definition
A best-fitting curve is the approximating curve having the property that D12 + D22
+ ….DN2 is a minimum.
A curve having this property is said to fit the data in the least square sense, and
it is therefore called a least square curve. Thus a line having this property is called a least
square line and a parabola with the property is called a least square parabola etc.
It is customary to employ the above definition when X is the independent variable and Y
is the depending variable. If X is the dependent variable, the definition is modified by
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
The least-square line approximate the set of point (X1, Y1), (X2, Y2)…….. (XN, YN) has
the equation
Y= a0 + a1X
Where the constant a0 and a1 are determined by solving simultaneously the equation
Which are called the normal equations for least square line. The constant a0 and a1 can
be formed from the formulae
(∑𝑌)(∑X2 )−(∑X)(∑XY)
a0 =
N∑X2 − (∑X)2
(𝑁∑𝑋𝑌)−(∑X)(∑Y)
a1 = 3.10
N∑X2 − (∑X)2
REGRESSION
V = u + at 3.11
Dependent variable Independent variable
y = β0 + β1 x1 + β1 x 2 + βN xN
3
+ є.
y = β0 + β1 x1 + β2 x2 + …… + βN xN + є 3.12
Individual terms in the general linear model are classified by their exponents.
The degree of a term is giving by the sum of the exponents for the independent
variables appearing in the term. For instance
𝛽1 𝑥1 𝛽2 x21 𝛽3 𝑥2 𝛽4 𝑥1 𝑥2
y = β0 + + + + 3.13
1st degree 2nd degree 1st degree 2nd degree
A first order model is a general model that contains all parallel, first degree
terms in the linear model.
A second order model contains parallel first and second order terms.
This method can be used to determine the least square line that minimized the
square error of prediction ∑(y − Ӯ)2
For the linear model, y = β 0 + β1x + є. The sum of the square error can be
denoted as ∑(y − Ӯ)2 = ∑(y - β̂0 - β̂1)
∑x2
Where ∬ xx = ∑(x − x̅)2 =( ∑x2 - ) 3.16
n
∑x∑y
∬ xy = ∑(x − x̅)(y - y̅) = ∑xy - 3.17
n
Example 3.1
A random sample of x = 9 animal live weight and dead weights were recorded as
follows:
X X2 Y Y2 XY
4.2 17.64 2.8 7.84 11.76
3.8 14.44 2.5 6.25 9.5
4.8 23.04 3.1 9.61 14.88
3.4 11.56 2.1 4.81 7.14
4.5 20.05 2.9 8.41 13.05
4.6 21.16 2.8 7.84 12.88
4.3 18.49 2.6 6.76 10.75
3.7 13.68 2.4 5.76 8.88
3.9 15.21 2.5 6.25 9.75
37.2 155.4 23.7 63.13 99.08
4.8 3.1
3.4 2.1
4.5 2.9
4.6 2.8
4.3 2.5
3.7 2.4
3.9 2.5
∑𝐱 𝟐 (𝟑𝟕.𝟐)𝟐
∬ xx =∑x2 - = 155.48 – = 1.72
𝐧 𝟗
(∑𝐗∑𝐲) (𝟑𝟕.𝟐)(𝟐𝟑.𝟕)
∬ xy = ∑xy - = 99.08 - = 1.06
𝐧 𝟗
∬ 𝒙𝒚 𝟏.𝟎𝟔
𝛽̂ 1 = = = 0.616
∬ 𝒙𝒙 𝟏.𝟗𝟐
∑𝒚 𝟐𝟑.𝟕 ∑𝑋 37.2
Ӯ= = = 2.635, ̅=
𝒙 = = 4.133
𝒏 𝟗 𝒏 9
We try to find estimates β1, β2, β3….. βN that minimize the model
∑(𝑦 − 𝑦̂)2 = ∑(𝑦 − β ̂ −β ̂ x … . β 𝑥 )2
0 1 1 N 𝑁
CORRELATION
The ratio of the explained variation to the total variation is called the
Coefficient of determination.
𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
Coeff. Of correlation = r = ±√
𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
∬ 𝑥𝑦
r= 3.18
√∬ xx ʃyy
(∑𝑦)2
where ∬yy = ∑𝑦 2 –
𝑛
∬ 𝑥𝑦
Recall that 𝛽̂ (the square of the least square line) is
∬ 𝑥𝑥
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
∬ 𝑥𝑦
∴r=√ 𝛽̂ 3.19
∬ 𝑦𝑦
Properties of r
̅ )2
∑( Yest −Y
r = ±√ 3.18
∑(𝑌−𝑌̅)2
y y y y
r2 gives the proportion of the total can be accounted for by the independent
variable x.
If we let Yest represent the value of Y for a given values of X as estimated from
the equation Y = a0 + a1, a measure of the scatter about the regression line of Y
on X is supplied by the quantity
∑(𝑌−Yest )2
Syx = √ 3.20
N
2
∑(𝑋−𝑋𝑒𝑠𝑡)
Sxy = √ . 3.21
N
N
The standard deviation is given by 𝑠̅ = √ s 3.22
N−1
This is only useful for small samples, a modified standard error of estimate is
𝑁
given by Syx = √ Syx 3.33
N−2
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 ̅)
∑(𝑌𝑒𝑠𝑡− 𝑌
Since r = ±√ = ±√ ̅)
Total variance ∑(𝑌−𝑌
2 things
a. Estimation
b. Test of hypothesis
ESTIMATON
BOUND ON ERROR
2𝜎
Bound on error =2σӮ =
√𝑛
STATISTICAL HYPOTHESIS
In an attempt to reach a decision, it is useful to make assumptions ( or guesses)
about the population involved. Study assumptions may be or may not be true are
called statistical hypothesis. They are generally statements about the tabulating
distribution of the population.
LEVEL OF SIGNIFICANCE
In testing a given hypothesis, the maximum probability with which we are willing to
risk a type I error is called level of significance (or significance level) – denoted by
𝛼. This often specified before any samples are drawn so that the results obtained
*** not influence on choice.
In practice, a significance level of 0.05 or below is popularly used. It shows that
there are 5 chances in 100 that we would reject the hypothesis when it should be
accepted. We are about 95% confident that we have made the right decisions.
There is 0.05 probability that the decision is wrong.
Critical Region
Critical Region
0.080
0.080
Where;
𝜇= mean
𝜎= STD dev
𝜋= 3.14159
e= 2.71828
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
The total area bounded by curve of eqn N1 and the X-axis is 1; hence
the area under the curve between two ordinates X=a and X=b, where
a<b, represents the probability that X lies between a and b. this
probability is denoted by pr {a<X<b}
In cases like N2, it is said that Z is normally distributed with mean 0 nas
variance 1. Fig N1 is a graph of this standardized normal curve.
The areas included between Z = -1 and +1, Z=-2 and +2 and Z=-3 and +3
are respectively equal to 68.27%, 95.45% and 99.73% of the total area
which is 1. Table 2 given shows the area under this curve bounded by
the ordinates at Z=0 and any +ve value of Z.
From the table, the area between any two ordinates can be found by
using the symmetry of the curve about Z=1
0.4
0.3
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Fig
Mean 𝜇
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Variance 𝜎2
Standard deviation 𝜎
Moment coefficient of skewness 𝜎3=0
Moment coefficient of kurtosis 𝜎4=Z
Mean deviation
Hypergeometric Distribution
𝐾 𝑁−𝐾
( )( )
P(X=K) = 𝑘 𝑁𝑛−𝑘 . . . . (1)
𝑛
Where,
N = the population size
K = the number of success states in the population
n = the number of draws
k = the number of successes
𝑎
Is a binomial coefficient
𝑏
The p.m.f is positive when
Consider an urn of two types of marbles black and white ones. Define
drawing a white marble as a success and drawing a black as failure
(analogous to binomial function).
If the variable N describes the number of all marbles in the urn (see
contingency table 1) and K describes the number of white marbles,
then N-K corresponds to the number of black marbles.
Now, assume that there are 5 white and 45 black marbles in the urn.
Standing next to the urn, you close your eyes and draw 10 marbles
without replacement, what is the probability that exactly 4 of the 10
are white?
Note.
Example 1
The probability of getting exactly 2 heads in 6 tosses of a fair
coin can easily be obtained by using eqn B1
6 1 2 1 6−2 6! 1 6 6 1 6 15
( )( ) ( ) = ( ) = ( )( ) = 2
2 2 2 2!(6−2)! 2 2!4! 2 6
1
Putting N=6, X=2 and p=q=
2
Example 2
What is the probability of getting at least four heads in 6
tosses of a fair coin?
Solution
Similar to example 1 eqn B1 is shown as follows
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
4
6 1 1 6−4 6 1
5
1 6−5 6 1
6
1 6−6
( )( ) ( ) + ( )( ) ( ) + ( )( ) ( )
4 2 2 5 2 2 6 2 2
15 6! 1 5 1 1 6! 1 6 1 6−6
= + ( ) ( ) + ( ) ( )
64 5!(6−5)! 2 6 5!(6−6)! 2 6
15 6 1 64 11
= + + = =
64 64 64 22 32
The discrete distribution, eqn B1, is often called the binomial
distribution since for x = 0, 1, 2, N, it corresponds to successive
terms of the binomial formula or binomial expansion given by eqn
B2
𝑁 𝑁
(𝑞 + 𝑝)𝑁 = 𝑞 𝑁 + ( ) 𝑞 𝑁−1 𝑝 + ( ) 𝑞 𝑁−2 𝑝2 + ⋯ + 𝑝𝑁 …B2
1 2
𝑁 𝑁
𝑤ℎ𝑒𝑟𝑒 1, ( ) , ( )… Are called binomial coefficients
1 2
An example applying eqn B2
4 4 4
(𝑞 + 𝑝)4 = 𝑞 4 + ( ) 𝑞 3 𝑝 + ( ) 𝑞 2 𝑝2 + ( ) 𝑞𝑝3 + 𝑝4
1 2 3
= 𝑞 4 + 4𝑞 3 𝑝 + 6𝑞 2 𝑝2 + 4𝑞𝑝3 + 𝑝4 …. B4
Mean µ=Np
Variance a2=Npq
Standard deviation a=√𝑁𝑝𝑞
Moment coefficient of skewness 𝑞−𝑝
𝛼3 =
√𝑁𝑝𝑞
Moment coefficient of % kurtosis 1 − 6𝑝𝑞
𝛼3 = 3 +
𝑁𝑝𝑞
Example
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
In 100 tosses of a fair coin, find the mean no of heads and the STD
dev.
1 1
a=√𝑁𝑝𝑞 = √100 ( ) ( ) = 5
2 2
ANALYSIS OF VARIANCE
Analysis of variance often recognized by the acronym ANOVA (Analysis of Variance) is needed for testing
the significance of differences between three or more sampling means or equivalently, to test the null
hypothesis that the sample means are all equal.
E.g Suppose that in an Agricultural experiment, four different chemical treatments of soil produced
mean wheat yield of 28, 22, 18 and 24 barrels per acre, respectively. Is there a significance difference in
these means is the observed spread simply due to chance. Fisher developed the technique, analysis of
variance to solve such problems. It make use of the F distribution.
Treatment 1 X11
X12
X1b
Treatment 2 X21
X22
.
X2b
Treatment a Xa1
Xa2
Using table ANI XIJK denotes the measurement in the jth row and in kth column where j = 1, 2...a and k =
1, 2...b. For example, X28 refers to the 8th measurement for the second treatment. Also 𝑥̅𝐽 can be taken
to be the mean of the measurements in the jth row. This given equation AN1
𝑏
1
𝑥̅𝐽 = ∑ 𝑋𝑗𝑘 , 𝑗 = 1, 2 … 𝑎 − − − −𝐴𝑁1
𝑏
𝑘=1
The dot in 𝑥̅𝐽 , is used to show that the index k has been summed out. The 𝑥̅𝐽 values are called group
means, treatment means or row means.
The ground mean or the overall mean, is the mean of all the measurements in all the groups and is
denoted by x̅ as given in equation AN2.
𝑎 𝑏
1
𝑥̅𝐽 = ∑ ∑ 𝑋𝑗𝑘 , … … … 𝐴𝑁2
𝑎𝑏
𝑗=1 𝑘=1
In summary, the 3 variations are given by equations AN3, AN4 and AN5.
2 T2
V = ∑ xJk − … … . . 𝐴𝑁3
ab
J,k
1 𝑇2
𝑉𝐵 = ∑ 𝑇𝑗 2 − … … . 𝐴𝑁4
𝑏 𝑎𝑏
𝐽
𝑉𝑤 = 𝑉 − 𝑉𝐵 … … . 𝐴𝑁5
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Where,
V = Total Variation
In practice, with manual calculations, it is convenient to subtract some fixed values from all the data in
the table in order to simplify the calculations with affecting the final results.
Example.
Table ANE shows the yields in m3 of a certain variety of wheat per hectare. The wheat was grown in a
particular type soil treated with fertilizer: A, B or C.
Find (a) The mean yield for the different treatments (b) The grand mean for all treatments (c) The total
variations (d) The variation between treatments (e) The variations within treatments
Table ANE
A 48 49 50 49
B 47 49 48 48
C 49 51 50 50
Solution: For simplification, subtract 45 from all data to get table ANA
Table ANA
A 3 4 5 4
B 2 4 3 3
C 4 6 5 5
(a) Treatment (row) means are
1
𝑋̅1 = 4 (3 + 4 + 5 + 4) = 4
1
𝑋2 = 4 (2 + 4 + 3 + 3) = 3
1
𝑋3 = (4 + 6 + 5 + 5) = 5
4
Thus, the mean yields after adding 45 are 49, 48, 50 m3/ha for A, B and C respectively.
2
𝑉 = ∑(𝑥𝑗𝑘 − 𝑥) … … 𝐶1
𝑗,𝑘
= (3 – 4)2 + (4 – 4)2 + (5 – 4)2 + (4 – 4)2 + (2 – 4)2 + (4 – 4)2 + (3 – 4)2 + (3 – 4)2 + (4 – 4)2 + (6 – 4)2 + (5 – 4)2
+ (5 – 4)2 = 14
𝑉𝐵 = 𝑏 ∑(𝑥̅ − 𝑥̅ )2 … … . 𝑑1
𝑗
Table ANS summarizes in the ANOVA table, typical of one - factor expt.
TABLE ANS
In a two – factor expt, ANOVA can be understood in line with that of one – factor. Consider the E.g
Suppose that agricultural expt consists of examining the yields per acre of 4 different varieties of wheat,
where each variety is grown on 5 different plot of land. Thus a total of 4 x 5 = 20 plots are needed. It is
convenient in such case to combine the plots into blocks, with a different variety of wheat grown on each
plot within a block. Thus, five block would be required.
The two classifications or the factors are (i) The wheat variety grown (ii) The particular block used (which
may involve different soil fertility fertilizer type, dosage etc.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Alternatively, factor can be called or regarded as treatment and factor to the block.
Take that we have “a” treatments and “b” blocks from which table 2W1 an be constructed to illustrate
the notation.
Block
1 2 ----------------- b
Treatment 1 X11 X12 ----------------- X1b
Treatment 2 X21 X22 ----------------- X2b
.
.
.
That Is for treatment j and block k, the value is denoted by Xjk. The mean of the entries in the jth row is
denoted 𝑋𝑗 ̅̅̅̅, where
̅̅̅ , where j = 1,2….a, while the mean of the entries in the kth column is denoted by 𝑋𝑘
k = 1,2…..b.
̅ 𝑘 = 1 ∑𝑎𝑗=1 Xjk
𝑋. 𝑎
1
𝑋̅= 𝑎𝑏 ∑𝑗.𝑘 Xjk
In a two – factor expt, without replications the variations are as given in equation *
𝑉 = 𝑉𝐸 + 𝑉𝑅 + 𝑉𝐶 … . .∗
Where,
2
=∑ (𝑥𝑗𝑘 − 𝑋̅𝑗 − 𝑥̅𝑘 + 𝑥̅ )
𝑗⋅𝑘
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
The analysis of variance table for both mean value and two – factor with replications also for the
reasonings
Sampling Theory: is a study of relationship existing between a population and samples drawn from the
population.
Uses:
1. In estimating unknown population quantities (e.g. population mean and radiance) often called
population parameters or briefly parameters, from a sample mean and variance (often called
sample statistics).
2. To determine whether the observed differences between 2 samples are to due to chance
variation or whether they are really significant e.g. in obtaining whether a drug for use in
treating malaria is effective or not ( that of hypothesis and significance)
A study of inferences made conclusion of a population using samples drawn from it together with
indications of the accuracy of such inference by using probability theory is called statistical inference
To make reasonable conclusion using sampling theory and statistical theory samples must be chosen to
be representation of a population. A solution of sampling methods and of related problems arising is
called the design of the experiment.
Random sampling is of the ways of obtaining a representation sample. Here each member of the
population has an equal chance of being included in the sample. One sample to change are of obtaining
a random sample is to assign numbers to each member of the population, write this number on pieces
of paper and pick them randomly one by one.
The quantities µ, σ,𝜌𝜇𝑟 and 𝑋̅,S, P, M denote respectively the population and sampled means, standard
deviations population and moments about the means.
When the sample size (N) is very large the sampling distribution are normal or nearly normal. The
methods are known as large sampling methods. (Use the Z test. For small samples (N<0j e use the theory
of small samples (or exact sampling theory)- Use the t-test.
Biased Estimation: occurring when the mean of the sampling distribution of a statistics equal the
corresponding population parameter.
Example
The mean of the sampling distribution means () is µ, the distribution population mean. Hence the mean
𝑋̅ is an unbiased estimate of the population mean µ. The mean of the sampling distribution of variance
is
𝑁−1 2
𝜇𝑠2 = 𝜎
𝑁
N= sample size
Efficient Estimates: If the sampling distributions of two statistics have the same mean then the statistics
with the smaller variance is called a efficient estimator of the mean, while the other statistic is called the
inefficient estimate or
In practice inefficient estimation are often used before as of the relative case with which some of them
can be estimated.
Interval estimate: is an estimate of a population parameter given by two numbers between which the
parameters maybe considered to the interval estimates indicate the precision or accuracy of an estimate
and are there preferable than point estimates.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Let 𝜇𝑠 = mean and 𝜎𝑠 = standard deviation (standard error) of the sampling distribution of a statistics.
When N ≥ 30, the sampling distribution is approximately normal, we can expect to find an actual sample
statistics lying in the interval 𝜇𝑠 −𝜎𝑠 to 𝜇𝑠 + 𝜎𝑠 , 𝜇𝑠 − 2𝜎𝑠 to 𝜇𝑠 + 2𝜎𝑠 or 𝜇𝑠 − 3𝜎𝑠 to 𝜇𝑠 + 3𝜎𝑠 about
68.27%, 95.45% and 99.73% of the time respectively.
Also, we can expect to find 𝜇𝑠 in the intervals S - 𝜎𝑠 to S +𝜎𝑠 , S-2𝜎𝑠 to S+2𝜎𝑠 , S-3𝜎𝑠 to S+3𝜎𝑠 about
68.27%, 95.45% and 99.73% of the time. There are 4 intervals 68.27%, 95.45% and 99.73% are called
confidence intervals for estimating µ. The numbers of these intervals (S±𝜎𝑠 , S±2𝜎𝑠 , S±3𝜎𝑠 ) are then
called the 68.27%, 95.45% and 99.45% confidence limits.
Similarly, S ± 1.96𝜎 and S ± 2.58𝜎 are the 95% and 99% (or 0.95 and 0.99) confidence limits for S.
The percentage confidence is often called Confidence level. The number 1.96, 2.58 etc. are called
confidence coefficients or critical values and denoted by 𝑍𝑐 .
Confidence 99.73% 99% 0.98% 96% 95.45% 95% 90% 80% 68.27% 50%
Level
𝑍𝑐 3.00 2.58 2.33 2.05 2.00 1.96 1.645 1.28 1.00 0.6745
P(Z) = 0.4772.
Introduction
Very often we are called to make decisions about population the basis on sample information such
decisions are called Statistical Decisions. E.g. deciding whether a new drug is really effective in curing
malaria or not.
Statistical Hypothesis
Statistical hypothesis are assumptions (or guesses) about the population involved such assumptions may
be true or false. They are generally statements about the probability distributions of the populations.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Null Hypothesis: they are hypothesis formulated for the sole purpose of rejecting or nullifying
hypothesis e.g. in determining whether the tossing of a coin is fair (i.e. P=0.5, where P is the
probability of heads. Similarly, if we want to decide whether one procedure is better than another,
we formulate the hypothesis that there is difference between the procedures
Alternative Hypothesis(𝐻𝑎 )
Any hypothesis that differs from a given hypothesis is called an alternative hypothesis e.g. if 𝐻𝑜 :
P=0.5, 𝐻𝑎 ≠ 0.5, or p> 0.5
A statistical test is based on the concept of proof of contraction is comprised of four parts as
follows:
Z- Score
The Z-Score (or test) is used to determine the probability that a measurement will fall in the interval
from µ to same value Y to the right of µ, we calculate the of standard deviation that Y lies from the
mean by using the formula
𝑌−µ
Z= 𝜎
Example
(a) Determine the probability that a measurement will be in the interval 20 to 23Find the
probability that the measurement will be in the interval to 20
(b) Find the probability that the measurement will be in the interval to 20
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
Solution
𝑌−𝜇
Now Z = 𝜎
23−20
= 2
=1.5