You are on page 1of 41

Compiled by Computer Engineering Students, Class of 2019

Subjected to necessary corrections if and when correctly registered


All credits are to the original owner(s) of the document content

ABE 463:

ENGINEERIN
G
STATISTICS
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

REFERENCE MATERIALS
1. Spiegel, M.R and Stephens, L.J (2008). Statistics. Schaum’s Outlines 4th Edition Mc Graw- Hill, New
York.
2. Anderson D.R, Sweetney D.J, Williams, T.A (1986). Concepts and Applications, West Publishing
House Company
3. Hinnes W.W, Montgomery D.C, Goldman D.M and Bomor C.M (2003). Probability Statistics in
Engineering. 4th Edition, 1 -672 pgs. John Wiley, New York.
4. Montgomery D.C and Rung G.C 2010. Applied Statistics and Probability for engineering Students
Solution, 5th Edition Paper Back. 1 - 400 pgs.

COURSE OUTLINE
1. Moments, skewness and Kurtosis.
2. Chi – Square Test.
3. Curve fitting and method of least square.
4. Small sampling theory, test of hypothesis and significant.
5. Correlation theory.
6. Analysis of Variance.
7. Probability, Binomial, Poison, Hypergeometric Beta Distribution.
8. Probability density function, Cumulative distribution function etc.
9. Introduction to Spectra Analysis, value, Mean square, auto-correlation function and
spectra density of random signals.
10. Introduction to statistical software packages useful in engineering.
11. Analysis of Time Series.

INTRODUCTION

Engineering is the application of mathematics , science, economics, social and


practical knowledge to invent , innovate, design, build, maintain, research and improve
structures, machines, tools, systems, components, materials, processes, solutions, and
organizations. The discipline of engineering is very broad and encompasses a range of
more specialized fields of engineering, each with a more specific emphasis on particular
areas of applied mathematics, applied science and types of application. The term
engineering is derived from the Latin ingenium, meaning “cleverness” and ingeniare,
meaning “to contrive, devise” (IAENG 2016).

Statistics is the use of scientific methods for collection, organization,


summarizing, presentation and analyses of data and drawing valid conclusions and
reasonable decisions on the basis of such analyses.

Engineering Statistics is therefore the use of statistical principles for solving


engineering problems.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

1.0 INTRODUCTION

Engineering statistics is the use of scientific methods for collection,


organization, summarizing, presentation and analysis of data and drawing
valid conclusions and reasonable decisions on the basis of such analyses.
1.1 MOMENTS, SKEWNESS AND KURTOSIS

1.1.1 MOMENTS

If X1, X2, ……XN are N values assumed by the variable X, the quantity
𝑟
𝑋 𝑟 +𝑋 𝑟 + ..… +𝑋𝑁 ∑ 𝑋𝑟
𝑋̅ r= 1 2 =∑𝑁 𝑋
𝑗=1 𝑗
𝑟
= ---------------- (1)
𝑁 𝑁
Is called the rth moment

The moment about the mean (𝑋̅ ) is designed as:


∑𝑁 ̅ 𝑟
𝑗=1(𝑋𝑗 −𝑋 ) ∑(𝑋−𝑋̅ )𝑟 ∑(𝑋− 𝑋̅)𝑟
Mr = = = ̅̅̅̅̅̅̅̅̅̅̅̅
= (𝑋 − 𝑋̅ )𝑟 -----(2)
𝑁 𝑁 𝑁
If r=1, then m1= 0, If r=2, m2= S2 popularly called the variance.

The rth moment about any origin A is defined as:


∑𝑁
𝑗=1(𝑋𝑗 −𝐴)
𝑟
∑(𝑋−𝐴)𝑟 ∑ 𝑑𝑟
𝑀𝑟 = |
= = = (𝑋 ̅̅̅̅̅̅̅̅̅̅̅̅
− 𝐴)𝑟 -------(3)
𝑁 𝑁 𝑁
Where d = X – A are the deviations of X from A. if A = 0, eqn (3) reduce to eqn (1).
Therefore eqn (1) is often called the rth moment about zero.

1.1.2 MOMENT FOR GROUPED DATA


If x1, x2.. xk occur with frequency f1, f2…fk respectively, the abort moment are given by
𝑟 𝑟 𝑟 𝑟
𝑟 𝑓1𝑥1 +𝑓2𝑥2 +⋯…..+𝑓𝑘𝑥𝑘 𝑘 𝑓𝑗𝑥𝑗 𝑓𝑥 𝑟
𝑋 =
̅ = 𝑗=1
∑ = ∑ --------- (4)
𝑁 𝑁 𝑁
̅ 𝑟 ∑ 𝑓(𝑋−𝑋̅)𝑟
𝑘 𝑓𝑗(𝑋𝑗 − 𝑋) ̅̅̅̅̅̅̅̅̅̅̅̅
𝑚𝑟 = ∑𝑗=1 = = (𝑋 − 𝑋̅)𝑟 ----------- (5)
𝑁 𝑁
𝑟
𝑘 𝑓𝑗(𝑋𝑗 − 𝐴) ∑ 𝑓(𝑋−𝐴)𝑟
|
𝑚𝑟 = ∑𝑗=1 = ̅̅̅̅̅̅̅̅̅̅̅̅
= (𝑋 − 𝐴)𝑟 ---------- (6)
𝑁 𝑁

1.1.3 MOMENTS IN DIMENSIONLESS FORM

To avoid particular units, we can define dimensionless moments about the mean as:
𝑚𝑟 𝑚𝑟
𝑎𝑟= 𝑚𝑟 = 𝑟 = ------------ (7)
𝑠𝑟 (√𝑚2 ) √𝑚22
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

where: S = √𝑚2 = Standard Deviation since 𝑚1 = 0 and 𝑚2 = S2, we have a1 = 0 and


a2 = 1.

1.2 SKEWNESS
Skewness is the degree of asymmetry of a distribution

Normal Distribution (Symmetrical or


Bell Shaped Frequency Curve

If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail
to the right of the central maximum than to the left, the distribution is said to be
skewed to the right or to have +ve skewness.

-ve Skewness
+ve Skewness

If it is skewed to the left, it is said to have –ve skewness.

For skewed distributions, the mean tends to lie on the side mode (the side with the
longer tail).

DEFINITION
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 𝑋̅−𝑚𝑜𝑑𝑒
Skewness = = . ---------- (8)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆
Avoid using the mode
3 (𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛) 3( 𝑋̅−𝑚𝑒𝑑𝑖𝑎𝑛)
Skewness = = ---------- (9)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆
Eqn (8) and (9) are called Pearson’s first and second coefficients of skewness

Other measures of skewness are defined in terms of quantities and percentiles on Ch


as:
(𝑄3 −𝑄2 )−(𝑄2 −𝑄1 ) 𝑄3 −2𝑄2 +𝑄1
Quartile coefficient of skewness = = ------- (10)
𝑄3−𝑄1 𝑄3−𝑄1
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

(P90 − P50 )−(P50 − P10 ) P90 −2P50 + P10


10 – 90 percentile coeff of skewness =
(P90 − P10 )
= ----- (11)
P90 − P10

An important measure of skewness uses the third moment about the mean expressed in
dimensionless form as
𝑚3 𝑚3 𝑚
Moment coefficient of skewness = a3 = = 3
= 3 -------- (12)
𝑎3 ( √𝑚2 )
√𝑚23

1.3 KURTOSIS
Kurtosis is the degree of peakedness of a distribution usually taken relating to a
normal distribution. A distribution having a relatively high peak is called LEPTOKURTIC,
the one which is flat topped is called PLATYKURTIC, while a normal distribution that is
not leptokurtic nor platykurtic is called MESOKURTIC.
One measure of Kurtosis uses the fourth moment about the mean expressed in
dimensionless form and is given by:
𝑚4 𝑚4
Moment coeff of kurtosis = a4 = 4 = 2 --------- (13)
𝑆 𝑚2
Which is often denoted by b2. For the normal distribution, b2 = a4 = 3. For this reason,
kurtosis is sometimes defined by (b2 - 3) which is positive for a leptokurtosis
distribution, negative for a platykurtosis distribution and zero for the normal
distribution.

Another measure of kurtosis is based on quartiles and percentiles and is given by:
𝑄
K= ------- (14)
P90 − P10
Where; k = lowercase Greek letter kappa = percentile coefficient of kurtosis

Q = ½ (Q3 – Q1) = semi-interquartile range. For normal distribution, k = 0.263

WORKED EXAMPLE

(1.1).1 Find the (a) first (b) second (c) third and (d) fourth moments of the set
2, 3, 7, 8, and 10.
Solution
∑𝑋 2+3+7+8+10 30
(a) The first moment or arithmetic mean is 𝑋̅ = = = =6
𝑁 5 5

Ʃ𝑋 2 22 +32 +72 +82 +102 226


(b) The second moment is 𝑋̅ 2 = 𝑁 = = 5 =45
5

Ʃ𝑋 3 23 +33 +73 +83 +103 1890


(c) The third moment is 𝑋̅ 3 = = = 5 =378
𝑁 5

Ʃ𝑋 4 24 +34 +74 +84 +104 16,594


(d) The fourth moment is 𝑋̅ 4 = = = 5 =3318.5
𝑁 5
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

(1.1).2 Find the (a) first (b) second (c) third and (d) fourth moments about the
mean for the set 2,3,7,8 and 10.
Solution
̅̅̅̅̅̅̅̅ Ʃ(𝑋−𝑋̅ ) 30
(a) 𝑚1 = (𝑋 − 𝑋̅) = , 𝑋̅ = = 6
𝑁 5
(2−6)+(3−6)+(7−6)+(8−6)+(10−6) 0
= =5 = 0
5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )2
̅̅̅̅̅̅̅̅̅
(b) 𝑚2 = (𝑋 − 𝑋̅)2 =
𝑁
(2−6)2 +(3−6)2 +(7−6)2 +(8−6)2 +(10−6)2 46
= = = 9.2
5 5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )3
̅̅̅̅̅̅̅̅̅
(c) 𝑚3 = (𝑋 − 𝑋̅)3 =
𝑁
(2−6)3 +(3−6)3 +(7−6)3 +(8−6)3 +(10−6)3 −18
= = = -3.6
5 5
̅̅̅̅̅̅̅̅̅
Ʃ(𝑋−𝑋 ̅ )4
̅̅̅̅̅̅̅̅̅
(d) 𝑚4 = (𝑋 − 𝑋̅)4 =
𝑁
(2−6)4 +(3−6)4 +(7−6)4 +(8−6)4 +(10−6)4 610
= = = 122
5 5

(1.1).3 Find the (a) first (b) second (c) third and (d) fourth moments about the
origin 4 for the 2,3,7,8 and 10.
Solution
Ʃ(𝑋−4) (2−4)+(3−4)+(7−4)+(8−4)+(10−4)
(a) 𝑚1′ = (𝑋 − 4) =
̅̅̅̅̅̅̅ = =2
𝑁 5
Ʃ (𝑋−4)2 (2−4)2 +(3−4)2 +(7−4)2 +(8−4)2 +(10− 4)2 66
(b) 𝑚2′ =(𝑋 − 4)2 =
̅̅̅̅̅̅̅ = = = 13.2
𝑁 5 5

Ʃ (𝑋−4)3 (2−4)3 +(3−4)3 +(7−4)3 +(8−4)3 +(10− 4)3 298


(c) 𝑚3′ =(𝑋 − 4)3 =
̅̅̅̅̅̅̅ = = = 59.6
𝑁 5 5

Ʃ (𝑋−4)4 (2−4)4 +(3−4)4 +(7−4)4 +(8−4)4 +(10− 4)4


(d) 𝑚4′ =(𝑋 − 4)4 =
̅̅̅̅̅̅̅ = = 330
𝑁 5

Worked Examples on Computation of moments from Grouped Data

(1.1).4 Find the first four moments about the mean for the height distribution of
100 male students of the Faculty of Engineering and Technology, University of Ilorin,
Ilorin.

Height(in) Class Mark (X) Frequency (f)


60 – 62 61 5
63 – 65 64 18
66 – 68 67 42
69 – 71 70 27
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

72 – 74 73 8

Solution Using the coded method

X µ=
𝑿−𝑨
F fµ f𝝁𝟐 f𝝁𝟑 f𝝁𝟒
𝑪
61 -2 5 -10 20 -40 80
64 -1 18 -18 18 -18 18
A - 67 0 42 0 0 0 0
70 1 27 27 27 27 27
73 2 8 16 32 64 128
Ʃ f =100 Ʃ fµ =15 Ʃ f𝜇 2 = 97 Ʃ f𝜇 3 = 33 Ʃ f𝜇 4 = 253

Ʃ 𝑓µ 15
𝑚1′ = C = 3 (100) = 0.45.
𝑁

Ʃ 𝑓𝜇2 97
𝑚2′ = 𝐶 2 = 32 (100) = 8.73.
𝑁

Ʃ 𝑓𝜇3 33
𝑚3′ = 𝐶 3 = 33 (100) = 8.91.
𝑁

Ʃ 𝑓𝜇4 203
𝑚4′ = 𝐶 4 = 34 (100) = 204.93.
𝑁

Thus 𝑚1 = 0
2
𝑚2 = 𝑚2′ - 𝑚1′ = 873 – (0.45)2 = 8.527

𝑚3 = 𝑚3′ - 3𝑚1′ 𝑚2′ + 𝑚13 = 8.91 – 3(0.45)(8.73) + 2(0.45)3 = -2.6932


2 4
𝑚4 = 𝑚4′ - 4𝑚1′ 𝑚3′ + 6𝑚1′ 𝑚2′ - 3𝑚1′
= 20493 – 4(0.45) (8.91) + 6(0.45)2 (8.73) – 3 (0.45)4 = 199.3759

RELATIONSHIPS BETWEEN MOMENTS


The following relations exit between moments about mean (mr) and moment about an
arbitrary origin (𝑚𝑟′ )
m2 = 𝑚2′ − m12
3
𝑚3 = 𝑚3′ - 3𝑚1′ 𝑚2′ + 2𝑚1′
2 4
𝑚4 = 𝑚4′ - 4𝑚1′ 𝑚3′ + 6𝑚1′ 𝑚2′ - 3𝑚1′
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

2. THE CHI-SQUARE TEST

2.1 INTRODUCTION
Many times, the results obtained in sampling do not always agree exactly with theoretical
results expected according to the rules of probability e.g if a coin is tossed 100 times, we
expect 50 heads and 50 tails. It is rare to obtain these results exactly.
Suppose that in a particular sample, a set of possible events E1, E2, E3, …, Ek are
observed to occur with frequencies O1, O2, O3, …Ok , called observed frequencies, and
that according to probability rules they are expected to occur with frequencies e1, e2, e3,
…, ek called expected or theoretical frequencies, we can compare the observed
frequencies with the expected frequencies and see if they differ significantly.

TABLE 2.1

Events E1 E2 E3 … Ek
Observed frequency O1 O2 O3 … Ok
Expected frequency e1 e2 e3 … ek

2.2 DEFINITION Χ2
χ2 (read Chi-square) is a measure of discrepancy existing between observed and
expected frequencies. This is expressed by:
𝑘
2
(𝑂1 − 𝑒1 )2 (𝑂2 − 𝑒2 )2 (𝑂𝑘 − 𝑒𝑘 )2 (𝑂𝑗 − 𝑒𝑗 )2
χ = + +. . … + = ∑ … … … (𝑒𝑞𝑛 2.1)
𝑒1 𝑒2 𝑒𝑘 𝑒𝑗
𝑗=1
when the total frequency = N
∑ 𝑂𝑗 = ∑ 𝑒𝑗 = 𝑁 … … … … … … … … (𝑒𝑞𝑛 2.2)
An expression equivalent to (eqn 2.1) is

2
𝑂𝐽2
χ = ∑ 2 − 𝑁 … … … … … … … … … … (𝑒𝑞𝑛 2.3)
𝑒𝑗
If χ2 = 0, the observed and theoretical frequencies agree exactly, while if χ2 > 0, they do
not agree exactly. The larger the value of χ2, the greater is the discrepancy between the
observed and the expected.
Agreement between the observed & expected frequencies is computed using equation:

(𝑂𝑗 − 𝑒𝑗 )2
χ2 = ∑ … … … … … … … … … … … … . . (2.4)
𝑒𝑗
𝑗

where the sum is taken over all cell in the contingency table and where symbols Oj & ej
represent respectively, the observed and expected frequencies within the jth cell. This
sum is similar to eqn (2.1) contains ***h/c*** terms. The sum of all observed frequencies
is denoted by N and is equal to the sum of expected frequencies [compare with eqn(2)].
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Eqn (2.4) has a sampling distribution approximated very closely by eqn (2.3), provided
the expected frequencies are not too small. The no. of degrees of freedom, µ, of the
chi-square distribution is given for h>1 and k>1 by

1. µ = (h-1) (k-1) if the expected frequencies can be computed without having to


estimate population parameters from sample statistics.
2. µ = (h-1) (k-1)-m if the expected frequencies can be computed only by
estimating in-population parameters from sample statistics. Significance tests for
h X k tables are similar to those for I X k tables. The expected frequencies are
found subject to a particular hypothesis (H0). A hypothesis commonly assumed is
that the two classifications are independent of each other. Contingency tables
can be extended to higher dimensions. Thus for example, we have h X k X 1
tables where three classifications are present.
Example 2.1: A pair of dice is rolled 500 times with the sum in Table 2.2 shown on the
dice.

TABLE 2.2
Sum 2 3 4 5 6 7 8 9 10 11 12
Observed 15 35 49 58 65 76 72 60 35 29 *6*

The expected number, if the dice are fair are determined from the distribution of χ as
shown in Table 2.3
TABLE 2.3

χ 2 3 4 5 6 7 8 9 10 11 12
P(χ) 1⁄ 2⁄ 3⁄ 4⁄ 5⁄ 6⁄ 5⁄ 4⁄ 3⁄ 2⁄ 1⁄
36 36 36 36 36 36 36 36 36 36 36

Applications of the x2 distribution

1. Chi-square test of goodness of fit


2. x2-test for independence of attribute
3. To test if the population has a specified value of the variance (σ2)
4. To test the equality of several population proportions

σ2 test of goodness of fit

Under the Null hypothesis that there is no significant difference between the observed
(expts) & the theoretical or hypothetical values – good compatibility between theory of
expt Karl |Pearson| (1900) proved that:
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
𝑛
2
(𝑂1 − 𝐸1 )2
χ =∑
𝐸1
𝑖=1

(𝑂1 − 𝐸1 )2 (𝑂2 − 𝐸2 )2 (𝑂ℎ − 𝑒ℎ )2


= + + ⋯+
𝐸1 𝐸2 𝑒ℎ

Follows χ2 – distribution with the degrees of freedom ν = (n-1)

STEPS IN THE COMPUTATION OF Χ2 & IN DRAWING CONCLUSIONS

(1) |in ******| the expected frequencies E1, E2, …, En corresponding to the observed
frequencies O1, O2, …, On

(ii) Compute the deviations (O – E) for each frequency & then square to obtain (O –
E)2.

(iii) Divide the square of the deviation (O – E)2 by the corresponding expected
frequency (O – E)2.

(iv) Add the values obtained E2 in step (iii) to compute

2
(𝑂 − 𝐸)2
χ =∑
𝐸2

(v) under the null hypothesis that the theory fits the data well, the above statistics
follows χ2 – distribution with ν = (n-1) dif

(vi)From tables, check the critical values of χ2 for (n-1) at a certain level of
significance 5% or 1%.

(vii) If the calculated value of χ2 obtained in step (iv) is less than the tabulated
value, it is said to be non-significant.

(vii) We therefore accept the null hypothesis and may conclude that the observed
and expected frequencies agree perfectly. that there is good between these ** the
****

(ix) if the calculated value of x2 (Chi-Square) is greater than the lab tested value, it
is said to be significant, the discrepancies between the observed and expected
frequencies cannot be attributed to chance, we reject the hypothesis.

Uses

(i) It can be used as a test of variance


(ii) It can be used as a test of goodness of fit
(iii) Also to set up confidence interval for population variance.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

2.3 Significance Test


In practice, expected frequencies are computed on the basis of a hypothesis H0 (null
hypothesis). If under this hypothesis, the computed value of χ2 given by eqn (2.1)
or (2.3) is greater some critical value (such as χ952 or χ992, which are the critical
values of the 0.05 and 0.01 significance levels respectively) we would conclude that
the observes frequencies differ significantly from the expected frequencies and
would reject H0 at the corresponding level of significance; otherwise, we would
accept it). This procedure is called the chi-square test of hypothesis or significance

Note: For circumstances where χ2 is too close to zero (very rare), it shows that
frequencies agree to well with the expected frequencies. To examine such
situations, determine whether the computed value of χ2 is less than χ0.052 or χ0.012 in
which cases, we would decide that the agreement is too good at the 0.05 or 0.01
significance levels respectively.

2.4 Chi-Square Test for Goodness of Fit


The chi-square test can be used to determine how well theoretical distributions
(such as the normal and binomial distributions) fit empirical distributions (i.e those
obtained from sample data).

The observed and expected frequencies are as stated in table 2.4

Observed 15 35 49 58 65 76 72 60 35 29 6
Expected 13.9 27.8 41.7 55.6 69.5 83.4 69.5 55.6 41.7 27.8 13.9

∑𝑗(𝑂𝑗 − 𝑒𝑗 )2
χ2 = ⁄𝑒 = 10.34
𝑗
P value corresponding to 10.34 is 0.411 because of the large value of P, the throwing is
fair.
2.5 CONTINGENCY TABLES
Table 2.1, in which the observed frequencies occupy a single row, is called a one-way
classification table, since the no of columns is k, this is called a 1 X k (read “1 by k”)
table. By extending these ideas, we can have two-way classification tables in which
the observed frequencies occupy h rows and k columns. Such tables are called
contingency tables.

Corresponding to each observed frequency in an h X k contingency table, there is an


expected (or theoretical) frequency that is computed subject to some hypothesis
according to the rules of probability. These frequencies, which occupy the cells of
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

contingency table, are called cell frequencies. The total frequency in each row or
column is called the marginal frequency.

2.6 Coefficient of contingency: the measure of the degree of relationship,


association or dependence of classifications in a contingency table is given
by

𝑥2
C= √
(𝑥 2 +𝑁)

Were c =coefficient of contingency the larger the value of c the greater is


the degree of association, the number of rows and columns in the
contingency table determine the maximum value of c which never greater
than 1. If the number of rows and columns of a contingency table is equal
(𝑘−1)
to k the maximum value of c is given by √
(𝑁)

2.7. Correlation of Attribute

Because classification in a contingency table often describe characteristic


as of in individual or object, they are often called the correlation of
attribute(r) for k x k table we define

𝑥2
R=√
𝑁(𝑘−1)

Example 2.7

The effect of a drug to cure malaria was determined for 2 groups of people,
A and B, each contorting of 100 patients. The drug was given to A but not B
circle is called the control otherwise the two group are treated identically. It
is formed in group A and B, 75 and 65 people, respectively recovered from
the …….. (1) at the significant level of (a) 0.01 (b) 0.05 and (c)0.10 test the
hypothesis that the drug helps in curing the disease. Contingency and the
correlation of attribute.

Solution:

Table 2.6(a) frequency observed.


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

recover Do not total


recover
Group A (using the drug) 75 25 100
Group B (not using the 65 35 100
drug)
Total 140 60 200

Develop a null hypothesis (H0) that the drug has no effect we would expect
70 patients.
75 + 65
= 70
2
to recover and 30 not to recover as shown in table 2.6b

recover Do not total


recover
Group A (using the drug) 70 30 100
Group B (not using the 70 30 100
drug)
Total 140 60 200

(75−70) (65−70)2 (25−30)2 (35−30)2


𝑥 2= + + + = 238
70 70 30 30

U = 2-1 d=1

Another method of calculating u

U= (h-1) (k-1) = (2-1) (2-1)

Since x295 = 3.54 for 1df and since x2=2.38<3.84

We conclude that the results are not significant at the 0.05 level. We
therefore accept the H0 at this level and can conclude that the drug is that
effective or we withhold our decision.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Note that a one tailed test using x2 is equivalent to a two tailed test using x
since x2>x20.95 corresponds to x > x0.95 or x<-0.95. for 2x 2 tables x2 is
square of 2 scores, it follows that x is the same as Z for this case.

(ii) coefficient of contingency (c)

𝑥2 2.38
√(𝑥 2+𝑁) = √(2.83+2) = √0.01176 = 0.1084

(iii) to find the maximum value of c

recover Do not total


recover
Group A (using the drug) 100 0 100
Group B (not using the 0 100 100
drug)
Total 100 100 200

Since the expected cell frequency (assuming complete into …..) are all
100+0 (100−50) (0−50)2 (0−50)2 (100−50)2
event to 50( ie ) 𝑥 2= + + + = 200
2 50 50 50 50

𝑥2 200
Thus the maximum value of c is √ =√ = 0.707
(𝑥 2 +𝑁) 200+200

(iv) correlation of attributes

Since 𝑥 2 = 2.38, N=200 & k=2

𝑥2 2.38
R=√ =√ = 0.1091
𝑁(𝑘+1) 200

Indicating very little correlation between recovery & the use of the drug.

3.0 CURVE FITTING AND THE METHOD OF


LEAST SQUARES
Curve fitting is an attempt to develop or find the equation that
describes a curve that fits into a given set of data. To determine the
equation that connect variables together, there is need to collect data that
show the corresponding values of variables the development consideration
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

e.g when you plot the points (x1,y1 ), (x2,y2). In a rectangular co-ordinate
system, the set of points may not be a straight line. This is called
SCATTERED DIAGRAM.

From the scattered diagram, it is possible to visualize a smooth curve


that approximate the data and the curve is called an approximating curve.

if the data can be approximated by a straight line, we say a linear


relationship exist between the variable. On the other hand, if the
relationship is non-linear we call it nonlinear relationship.
70
60
150
50

140 40
Weight

30
130
20
120 10
0
2 4 6 8 10
10 20 30 40
Height
Non-Linear Relationship
Linear Relationship
Example: weight of adult male depends to some degree on the height.

The point that lie on or near one particular line is called regression
line. The parabolic curve is called regression curve. Method adopted by
scientist and engineer is called the regression line or line of bestfits.

Equation of approximating curve

Straight line y= a0 +a1x

Parabolic y= a0 +a1x + a2x2

Cubic curve y= a0 +a1x + a2x2+a3x3

Quadratic curve y= a0 +a1x + a2x2+a3x3+ a4x4

nth degree curve y= a0 +a1x + a2x2…….anxn.

The right side of the equation is called the polynomials of the first order
second, third forth and with degree respectively. The function describe the
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

first four equation are sometimes as called the linear, quadratic, cubic and
quadratic functions respectively.

The following are some other function frequently use in practice:


1 1
Hyperbola y= or = a0 +a1x
𝑎0 +𝑎1 𝑥 𝑦

Exponential curve y=a𝑏 𝑥 or log y = log a +b (log x)

To decide which curve should be used, it is helpful to obtain and catter


diagram of transferred variables eg if a catter diagram of log Y vs X then
the equation has the form of Y = a𝑥 𝑏 for the type of curve, we use a
semilog graph paper.

Freehand method of curve fitting

Individual judgment can often be used to draw an approximating


curve to fit a set of data. This is a free hand method of curve fitting. if the
type of equation of this curve is known, it is possible to obtain the constant
in the equation by choosing as many point on the curve as there as
constants in the equation.
E.g. (1) if the curve is a straight line, minimum of two (2) points are needed..

(2) If it is a parabola, three (3) points are necessary.

The disadvantage of this method is that different observers will obtain different curves
and equation.

THE SRAIGHT LINE

The simplest type of curve fitting is a straight line, the equation can be written as

Y= a0 + a1X 3.1

Given any 2 points on the line, the constant a0 and a1 can be determined. The
resulting equation of the line can be written as
𝑌 −𝑌
Y-Y1 = (𝑋2 −𝑋1 ) (X - X1) ………. (3.2)
2 1

Y2−Y1
Y – Y1 = m(X- X1). Where m = …………………… (3.7)
X2−X1
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

It is called the slope of the line and represent the change in Y divided by the
corresponding change in X.

From Y= a0 + a1X

Where X = 0, Y = a0 = intercept.

THE METHOD OF LEAST SQUARE

To avoid individual judgement in constructing line parabolas or other approximating


curves to fit set of data, it is necessary to have a best -fitting line or a best fitting
parabola.

Fig.3.1

Considering the data point (X1, Y1), (X2, Y2)…….. (XN, YN), for a given value of X
say X1, there will be a difference between the value Y1 and the corresponding
value as determined from the curve. Representing this difference with D1 which
is commonly called the derivation error or arbitral error

D1 may be +ve, -ve or zero (0). Similarly, we can obtain from the values X2......... XN,
we obtain the derivations D2…...DN.

The quantity D2 + D22 + DN2 is a measure of the “goodness of fit”, if this is small
the fit is good, if it is large the fit is bad.

Definition

A best-fitting curve is the approximating curve having the property that D12 + D22
+ ….DN2 is a minimum.

A curve having this property is said to fit the data in the least square sense, and
it is therefore called a least square curve. Thus a line having this property is called a least
square line and a parabola with the property is called a least square parabola etc.

It is customary to employ the above definition when X is the independent variable and Y
is the depending variable. If X is the dependent variable, the definition is modified by
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

considering horizontal instead of vertical derivations which amounts to interchange of the


X and Y axes.

THE LEAST SQUARE LINE

The least-square line approximate the set of point (X1, Y1), (X2, Y2)…….. (XN, YN) has
the equation
Y= a0 + a1X
Where the constant a0 and a1 are determined by solving simultaneously the equation

∑Y= a0N + a1∑X…………….. (3. 8)


∑XY= a0∑X + a1∑X2 ……………… (3.9)

Which are called the normal equations for least square line. The constant a0 and a1 can
be formed from the formulae
(∑𝑌)(∑X2 )−(∑X)(∑XY)
a0 =
N∑X2 − (∑X)2

(𝑁∑𝑋𝑌)−(∑X)(∑Y)
a1 = 3.10
N∑X2 − (∑X)2

REGRESSION

V = u + at 3.11
Dependent variable Independent variable

An equation of the straight line y = mx + c is a Deterministic Model.

y = β0 + β1 x + є, where є = random error.

A general polynomial probabilities model relating dependent variable y to a


single independent variable x is given by

y = β0 + β1 x1 + β1 x 2 + βN xN
3
+ є.

General Linear Model ( GLM )

y = β0 + β1 x1 + β2 x2 + …… + βN xN + є 3.12

for example, the model y = β0 + β1 x1 + β2 x 2 + β3 x 3 + є is equivalent to the


general linear model with n=3, x1 = x, x2 = x2 and x3 = x3

Classification of terms in the General linear Model x 2


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Individual terms in the general linear model are classified by their exponents.
The degree of a term is giving by the sum of the exponents for the independent
variables appearing in the term. For instance
𝛽1 𝑥1 𝛽2 x21 𝛽3 𝑥2 𝛽4 𝑥1 𝑥2
y = β0 + + + + 3.13
1st degree 2nd degree 1st degree 2nd degree

Order of the model

A first order model is a general model that contains all parallel, first degree
terms in the linear model.

A second order model contains parallel first and second order terms.

LEAST SQUARE METHOD

This method can be used to determine the least square line that minimized the
square error of prediction ∑(y − Ӯ)2

For the linear model, y = β 0 + β1x + є. The sum of the square error can be
denoted as ∑(y − Ӯ)2 = ∑(y - β̂0 - β̂1)

Ӯ = β̂0 - β̂1x1 3.14


∬ xy
β̂1 = and β̂0 = Ӯ - β̂1x 3.15
∬ xx

∑x2
Where ∬ xx = ∑(x − x̅)2 =( ∑x2 - ) 3.16
n

∑x∑y
∬ xy = ∑(x − x̅)(y - y̅) = ∑xy - 3.17
n

Example 3.1

A random sample of x = 9 animal live weight and dead weights were recorded as
follows:

x ( live wt, kg ) y ( dead wt, kg )


4.2 2.8
3.8 2.5
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

X X2 Y Y2 XY
4.2 17.64 2.8 7.84 11.76
3.8 14.44 2.5 6.25 9.5
4.8 23.04 3.1 9.61 14.88
3.4 11.56 2.1 4.81 7.14
4.5 20.05 2.9 8.41 13.05
4.6 21.16 2.8 7.84 12.88
4.3 18.49 2.6 6.76 10.75
3.7 13.68 2.4 5.76 8.88
3.9 15.21 2.5 6.25 9.75
37.2 155.4 23.7 63.13 99.08
4.8 3.1
3.4 2.1
4.5 2.9
4.6 2.8
4.3 2.5
3.7 2.4
3.9 2.5

Table 3.1 Summary Table


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

∑𝐱 𝟐 (𝟑𝟕.𝟐)𝟐
∬ xx =∑x2 - = 155.48 – = 1.72
𝐧 𝟗

(∑𝐗∑𝐲) (𝟑𝟕.𝟐)(𝟐𝟑.𝟕)
∬ xy = ∑xy - = 99.08 - = 1.06
𝐧 𝟗

∬ 𝒙𝒚 𝟏.𝟎𝟔
𝛽̂ 1 = = = 0.616
∬ 𝒙𝒙 𝟏.𝟗𝟐

∑𝒚 𝟐𝟑.𝟕 ∑𝑋 37.2
Ӯ= = = 2.635, ̅=
𝒙 = = 4.133
𝒏 𝟗 𝒏 9

𝛽̂ 0 = Ӯ – 𝛽̂ 1 𝑥̅ = 2.633 – 0.616 x 4.133 = 0.087


The least square estimate for these data is :- ŷ = 0.087 + 0.6162

y = β 0 + β1x1 + β2x2 + βNxN + ⋲

We try to find estimates β1, β2, β3….. βN that minimize the model
∑(𝑦 − 𝑦̂)2 = ∑(𝑦 − β ̂ −β ̂ x … . β 𝑥 )2
0 1 1 N 𝑁

CORRELATION

Coefficient of linear correlation

The ratio of the explained variation to the total variation is called the
Coefficient of determination.

𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
Coeff. Of correlation = r = ±√
𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛

∬ 𝑥𝑦
r= 3.18
√∬ xx ʃyy

(∑𝑦)2
where ∬yy = ∑𝑦 2 –
𝑛

∬ 𝑥𝑦
Recall that 𝛽̂ (the square of the least square line) is
∬ 𝑥𝑥
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

∬ 𝑥𝑦
∴r=√ 𝛽̂ 3.19
∬ 𝑦𝑦

Properties of r

r lies between -1 and +1.

r is greater than zero indicate a positive linear relationship

r = 0 means there is no linear relationship.

̅ )2
∑( Yest −Y
r = ±√ 3.18
∑(𝑌−𝑌̅)2

y y y y

r>0 x r<0 x r=0 x r~0 x

r2 gives the proportion of the total can be accounted for by the independent
variable x.

Standard error of estimate

If we let Yest represent the value of Y for a given values of X as estimated from
the equation Y = a0 + a1, a measure of the scatter about the regression line of Y
on X is supplied by the quantity

∑(𝑌−Yest )2
Syx = √ 3.20
N

which is called the standard error of estimated of Y and X.

The standard estimate error of X and Y defined by


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

2
∑(𝑋−𝑋𝑒𝑠𝑡)
Sxy = √ . 3.21
N

In general, Syx ≠ Sxy.

The standard error of estimated has properties analogue to those of standard


deviation. For example, we construct line parallel to the regression of Y on X at
respective vertical distances Syx, 2Syx and 3Syx from it, we should find if N is
large enough, that there would be included between these lines about 68%,
95% and 99.7% of the sample points.

N
The standard deviation is given by 𝑠̅ = √ s 3.22
N−1

This is only useful for small samples, a modified standard error of estimate is
𝑁
given by Syx = √ Syx 3.33
N−2

𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 ̅)
∑(𝑌𝑒𝑠𝑡− 𝑌
Since r = ±√ = ±√ ̅)
Total variance ∑(𝑌−𝑌

This equation becomes……….


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

INTERENCES ABOUT A POPULATION MEAN FROM LARGE SAMPLE RESULT

2 things
a. Estimation
b. Test of hypothesis
ESTIMATON

In statistics, sampling theory can be employed to obtain information about samples


drawing at random from a known population. We try to infer information about a
population from samples drawn from it. This area of statistics is called Inferential
Statistics. The procedure for estimation can be divided into two:
1. Point estimation: where you calculate a single number mean, median etc to
estimate a parameter of interest.
2. Interval estimation: used to estimate a parameter likely to fall within a particular
interval e.g height of a student in GET464 class is between 1.05-1.25. Interval
estimates indicates precise or accuracy of an estimate and are therefore preferable
to point estimates.
ERROR OF ESTIMATION
The error of estimation for a given point is the absolute value of the difference
between what should be obtained (population mean, µ) and what is actually obtained
(arithmetic sampling mean, Ӯ)
Error of estimation =/Ӯ-µ/

BOUND ON ERROR
2𝜎
Bound on error =2σӮ =
√𝑛

STATISTICAL DECISION THEORY


Statistical Decisions:- many times in real life estimation, we are called upon to make
decisions about populations on the basis of sample information such decisions are
called statistical decisions e.g we may wish to decide whether a new drug is really
effective in curing disease or to decide whether an education system is better than
other.

STATISTICAL HYPOTHESIS
In an attempt to reach a decision, it is useful to make assumptions ( or guesses)
about the population involved. Study assumptions may be or may not be true are
called statistical hypothesis. They are generally statements about the tabulating
distribution of the population.

TYPE I & TYPE II ERRORS


If we reject a hypothesis when it should be accepted we say a type I error has
occurred.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

If on the other hand, we accept a hypothesis when it should be rejected, we say a


type II error has occurred.
In order for decision rules to be good, texts would be designed so as to minimize
errors of decision.

LEVEL OF SIGNIFICANCE
In testing a given hypothesis, the maximum probability with which we are willing to
risk a type I error is called level of significance (or significance level) – denoted by
𝛼. This often specified before any samples are drawn so that the results obtained
*** not influence on choice.
In practice, a significance level of 0.05 or below is popularly used. It shows that
there are 5 chances in 100 that we would reject the hypothesis when it should be
accepted. We are about 95% confident that we have made the right decisions.
There is 0.05 probability that the decision is wrong.

Critical Region
Critical Region
0.080
0.080

THE NORMAL DISTRIBUTION


This is an example of a continuous probability distribution names of
normal curve, or Gaussian distribution. It is defined by eqn N1
1
− (𝑥−𝜇)2
1 2 ⁄
𝑦= 𝑒 𝜎2 … N1
𝑎√2𝜋

Where;

𝜇= mean

𝜎= STD dev

𝜋= 3.14159

e= 2.71828
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

The total area bounded by curve of eqn N1 and the X-axis is 1; hence
the area under the curve between two ordinates X=a and X=b, where
a<b, represents the probability that X lies between a and b. this
probability is denoted by pr {a<X<b}

When the variable X is expressed in terms of std units [Z=(X-𝜇)/a]

Eqn N1 is replaced by N2 called the std form


1 −1⁄ 2
𝑦= 𝑒 2𝑍 … N2
√2𝜋

In cases like N2, it is said that Z is normally distributed with mean 0 nas
variance 1. Fig N1 is a graph of this standardized normal curve.

The areas included between Z = -1 and +1, Z=-2 and +2 and Z=-3 and +3
are respectively equal to 68.27%, 95.45% and 99.73% of the total area
which is 1. Table 2 given shows the area under this curve bounded by
the ordinates at Z=0 and any +ve value of Z.

From the table, the area between any two ordinates can be found by
using the symmetry of the curve about Z=1

0.4

0.3

0.2

0.1

0.0
-3 -2 -1 0 1 2 3

Fig

Table N3: properties of normal distribution

Mean 𝜇
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Variance 𝜎2
Standard deviation 𝜎
Moment coefficient of skewness 𝜎3=0
Moment coefficient of kurtosis 𝜎4=Z
Mean deviation

Hypergeometric Distribution

In probability theory or statistics, the hypergeometric distribution


is a discrete probability that describes the probability of K successes in
N draws & replacement from a finite population of size N containing a
maximum of K successes. This is in contrast to the binomial distribution
which describes the probability of K successes in n draws with
replacement.

The hypergeometric distribution applies to sampling without


replacement from a finite population whose elements can be classified
into two exclusive categories like pass/fail, male/female, or
employed/unemployed. As random selections are made from the
population, each subsequent draw decreases the population causing
the probability pf success to change with each new draw.

The following 2 conditions characterize the hypergeometric


distribution

1. The result of each draw can be classified into one or two


categories
2. The probability of success change on each draw.
A random variable X follows the hypergeometric distribution if its
probability mass function (p.m.f) is given by
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

𝐾 𝑁−𝐾
( )( )
P(X=K) = 𝑘 𝑁𝑛−𝑘 . . . . (1)
𝑛
Where,
N = the population size
K = the number of success states in the population
n = the number of draws
k = the number of successes
𝑎
Is a binomial coefficient
𝑏
The p.m.f is positive when

Max (0, n+K-N) ≤ K ≤ min (K, n)

Example of hypergeometric distribution

Consider an urn of two types of marbles black and white ones. Define
drawing a white marble as a success and drawing a black as failure
(analogous to binomial function).

If the variable N describes the number of all marbles in the urn (see
contingency table 1) and K describes the number of white marbles,
then N-K corresponds to the number of black marbles.

If X is the random variable whose outcome is K, the number of white


marbles actually drawn in the exhibit can be explained in Table 1

Table 1: Contingency table

Drawn Not Drawn Total


White marbles K K-k K
Black marbles n-k N+k-n-K N-K
Total N N-n N
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Now, assume that there are 5 white and 45 black marbles in the urn.
Standing next to the urn, you close your eyes and draw 10 marbles
without replacement, what is the probability that exactly 4 of the 10
are white?

Note.

Although we are looking at success/failure, the data are not


accurately modeled by the binomial distribution, because the
probability of success on each trial is not the same and the size of the
remaining population changes as we remove each marble. See table 2
for summary

Table 2: Summary solution-contingency table

Drawn Not drawn Total


White Marbles k=4 K-k+1 K=5
Black marbles n-k=6 N+k-n-K=39 N-K=4
Total n=10 N-n=40 N=5

The probability of drawing exactly k white marbles can be calculated by


eqn (2)
𝐾 𝑁−𝐾
( )( )
𝑘 𝑛−𝑘
P(X=k) =f (k; N, K, n) = 𝑁 …… (2)
𝑛

Substituting into eqn (2) from the e.g., we have,


5 45
( )( ) 1.12217
4 5
P(X=4) = f (4; 50, 5, 10) = 50 = = 0.000118
102772278
10
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

THE BINOMIAL DISTRIBUTION


Definition & proportion

• If P is the probability that an output will happen in any single trial


(called the probability of a success) and q=1-p is the probability
that it will fail to happen in any single trail(called the probability
of a failure) then the probability that the event will happen
exactly X times in N trials (i.e. X successes and N-X failures will
occur) is given by
𝑁 𝑁!
P(X) = ( ) 𝑝 𝑥 𝑞𝑁−𝑋 = 𝑝 𝑋 𝑞 𝑁−𝑋 …… B1
𝑋 𝑋!(𝑁−𝑋)
Where:
X=0, 1, 2, 3...N
N! = N (N-1) (N-2)…! And 0! = 1

Example 1
The probability of getting exactly 2 heads in 6 tosses of a fair
coin can easily be obtained by using eqn B1

6 1 2 1 6−2 6! 1 6 6 1 6 15
( )( ) ( ) = ( ) = ( )( ) = 2
2 2 2 2!(6−2)! 2 2!4! 2 6

1
Putting N=6, X=2 and p=q=
2

Example 2
What is the probability of getting at least four heads in 6
tosses of a fair coin?
Solution
Similar to example 1 eqn B1 is shown as follows
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

4
6 1 1 6−4 6 1
5
1 6−5 6 1
6
1 6−6
( )( ) ( ) + ( )( ) ( ) + ( )( ) ( )
4 2 2 5 2 2 6 2 2
15 6! 1 5 1 1 6! 1 6 1 6−6
= + ( ) ( ) + ( ) ( )
64 5!(6−5)! 2 6 5!(6−6)! 2 6
15 6 1 64 11
= + + = =
64 64 64 22 32
The discrete distribution, eqn B1, is often called the binomial
distribution since for x = 0, 1, 2, N, it corresponds to successive
terms of the binomial formula or binomial expansion given by eqn
B2
𝑁 𝑁
(𝑞 + 𝑝)𝑁 = 𝑞 𝑁 + ( ) 𝑞 𝑁−1 𝑝 + ( ) 𝑞 𝑁−2 𝑝2 + ⋯ + 𝑝𝑁 …B2
1 2
𝑁 𝑁
𝑤ℎ𝑒𝑟𝑒 1, ( ) , ( )… Are called binomial coefficients
1 2
An example applying eqn B2
4 4 4
(𝑞 + 𝑝)4 = 𝑞 4 + ( ) 𝑞 3 𝑝 + ( ) 𝑞 2 𝑝2 + ( ) 𝑞𝑝3 + 𝑝4
1 2 3
= 𝑞 4 + 4𝑞 3 𝑝 + 6𝑞 2 𝑝2 + 4𝑞𝑝3 + 𝑝4 …. B4

Some properties of binomial distribution are given in table B.1

Table B.1: Binomial Distribution properties

Mean µ=Np
Variance a2=Npq
Standard deviation a=√𝑁𝑝𝑞
Moment coefficient of skewness 𝑞−𝑝
𝛼3 =
√𝑁𝑝𝑞
Moment coefficient of % kurtosis 1 − 6𝑝𝑞
𝛼3 = 3 +
𝑁𝑝𝑞

Example
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

In 100 tosses of a fair coin, find the mean no of heads and the STD
dev.

From table B1, we have


1
µ=Np= 100( ) = 50
2

1 1
a=√𝑁𝑝𝑞 = √100 ( ) ( ) = 5
2 2

ANALYSIS OF VARIANCE

Analysis of variance often recognized by the acronym ANOVA (Analysis of Variance) is needed for testing
the significance of differences between three or more sampling means or equivalently, to test the null
hypothesis that the sample means are all equal.

E.g Suppose that in an Agricultural experiment, four different chemical treatments of soil produced
mean wheat yield of 28, 22, 18 and 24 barrels per acre, respectively. Is there a significance difference in
these means is the observed spread simply due to chance. Fisher developed the technique, analysis of
variance to solve such problems. It make use of the F distribution.

TWO TYPES OF ANOVA BASED ON TYPE OF EXPT.

A. One-way classification or one factor expt. In a one-factor expt., measurement or observations


are obtained for “a” independent group of samples where the number of measurements in each
group is “b”. Then there are “a” each of which has b repetitions or “b” replications. In the
example above, a = 4.
The results of a one-factor experiment can be represented in table having “a” rows and “b”
columns, as in Table ANI or Table ANI.

Treatment 1 X11, X22,…………..X1b 𝑥̅1

Treatment 2 X21,X22,……………X2b 𝑥̅2


. .
. .
. .
Treatment a Xa1,Xa2,…………….Xab 𝑥̅𝑎

As in Table ANIB, depending on the state package


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Treatment 1 X11

X12

X1b

Treatment 2 X21

X22

.
X2b
Treatment a Xa1
Xa2
Using table ANI XIJK denotes the measurement in the jth row and in kth column where j = 1, 2...a and k =
1, 2...b. For example, X28 refers to the 8th measurement for the second treatment. Also 𝑥̅𝐽 can be taken
to be the mean of the measurements in the jth row. This given equation AN1
𝑏
1
𝑥̅𝐽 = ∑ 𝑋𝑗𝑘 , 𝑗 = 1, 2 … 𝑎 − − − −𝐴𝑁1
𝑏
𝑘=1

The dot in 𝑥̅𝐽 , is used to show that the index k has been summed out. The 𝑥̅𝐽 values are called group
means, treatment means or row means.

The ground mean or the overall mean, is the mean of all the measurements in all the groups and is
denoted by x̅ as given in equation AN2.
𝑎 𝑏
1
𝑥̅𝐽 = ∑ ∑ 𝑋𝑗𝑘 , … … … 𝐴𝑁2
𝑎𝑏
𝑗=1 𝑘=1

3 Types Variation in one-factor expts.

In summary, the 3 variations are given by equations AN3, AN4 and AN5.

2 T2
V = ∑ xJk − … … . . 𝐴𝑁3
ab
J,k

1 𝑇2
𝑉𝐵 = ∑ 𝑇𝑗 2 − … … . 𝐴𝑁4
𝑏 𝑎𝑏
𝐽

𝑉𝑤 = 𝑉 − 𝑉𝐵 … … . 𝐴𝑁5
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Where,

V = Total Variation

𝑉𝐵 = Variation between treatments

𝑉𝑤 = Variation within treatments

T = Total of values 𝑋𝑗𝑘

𝑇𝑗 = Total of all values in the jth treatment

i.e, T = ∑ xjk , 𝑇𝑗 = ∑𝑥𝑗𝑘 … … . 𝐴𝑁6


j,k

In practice, with manual calculations, it is convenient to subtract some fixed values from all the data in
the table in order to simplify the calculations with affecting the final results.

Example.

Table ANE shows the yields in m3 of a certain variety of wheat per hectare. The wheat was grown in a
particular type soil treated with fertilizer: A, B or C.

Find (a) The mean yield for the different treatments (b) The grand mean for all treatments (c) The total
variations (d) The variation between treatments (e) The variations within treatments

Table ANE

A 48 49 50 49
B 47 49 48 48
C 49 51 50 50
Solution: For simplification, subtract 45 from all data to get table ANA

Table ANA

A 3 4 5 4
B 2 4 3 3
C 4 6 5 5
(a) Treatment (row) means are
1
𝑋̅1 = 4 (3 + 4 + 5 + 4) = 4

1
𝑋2 = 4 (2 + 4 + 3 + 3) = 3

1
𝑋3 = (4 + 6 + 5 + 5) = 5
4

Thus, the mean yields after adding 45 are 49, 48, 50 m3/ha for A, B and C respectively.

(b) The grand mean for all treatments


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content
1
𝑋̅ = (3 + 4 + 5 + 4 + 2 + 4 + 3 + 3 + 4 + 6 + 5 + 5) = 4
12

Thus, the grand mean for table ANE is 45 + 4 = 49

(c) The Total variation is given by equation C1

2
𝑉 = ∑(𝑥𝑗𝑘 − 𝑥) … … 𝐶1
𝑗,𝑘

= (3 – 4)2 + (4 – 4)2 + (5 – 4)2 + (4 – 4)2 + (2 – 4)2 + (4 – 4)2 + (3 – 4)2 + (3 – 4)2 + (4 – 4)2 + (6 – 4)2 + (5 – 4)2
+ (5 – 4)2 = 14

(d) The variation between treatments is given by equation d1

𝑉𝐵 = 𝑏 ∑(𝑥̅ − 𝑥̅ )2 … … . 𝑑1
𝑗

i.e 𝑉𝐵 = 4[(4 − 4)2 + (3 − 4)2 + (5 − 4)2 ] = 8

(e) The treatment variation is 𝑉𝑤 = 𝑉1 − 𝑉𝐵 = 14 − 8 = 6

Table ANS summarizes in the ANOVA table, typical of one - factor expt.

TABLE ANS

Source of Variation Degree of Freedom Mean Square F


(DF)
Between Treatments a–1=2 𝑆𝐵̅ 2 8 𝑆𝐵̅ 2 4
𝑉𝐵 = 8 = =4 =
𝑎−1 2 ̅
𝑆𝑤 2 2/3
= 𝑤𝑖𝑡ℎ 2 𝑎𝑛𝑑 𝑎 𝑑𝑓
Within treatments a(b -1) = 3(4 – 1) ̅2
𝑆𝑤 6 2
𝑉𝐵 = 𝑉 − 𝑉𝐵 = 3(3) = 9 = =
𝑎(𝑏 − 1) 9 3
14 – 8 = 6
Total ab – 1 = 3 x 4 – 1 =11
V = 14

TWO – WAY CLASSIFCATION OR TWO – FACTOR EXPT

In a two – factor expt, ANOVA can be understood in line with that of one – factor. Consider the E.g

Suppose that agricultural expt consists of examining the yields per acre of 4 different varieties of wheat,
where each variety is grown on 5 different plot of land. Thus a total of 4 x 5 = 20 plots are needed. It is
convenient in such case to combine the plots into blocks, with a different variety of wheat grown on each
plot within a block. Thus, five block would be required.

The two classifications or the factors are (i) The wheat variety grown (ii) The particular block used (which
may involve different soil fertility fertilizer type, dosage etc.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Alternatively, factor can be called or regarded as treatment and factor to the block.

Notation for two – factor expt.

Take that we have “a” treatments and “b” blocks from which table 2W1 an be constructed to illustrate
the notation.

Table 2W1: Notation for Two – factor Expt. And Representation

Block

1 2 ----------------- b
Treatment 1 X11 X12 ----------------- X1b
Treatment 2 X21 X22 ----------------- X2b
.
.
.

Treatment b Xa1 Xa2 ------------------ Xab

That Is for treatment j and block k, the value is denoted by Xjk. The mean of the entries in the jth row is
denoted 𝑋𝑗 ̅̅̅̅, where
̅̅̅ , where j = 1,2….a, while the mean of the entries in the kth column is denoted by 𝑋𝑘
k = 1,2…..b.

The overall or ground mean is denoted by 𝑋̅. In symbols,


1
𝑋̅𝑗 = 𝑏 ∑𝑏𝑘=1 Xjk

̅ 𝑘 = 1 ∑𝑎𝑗=1 Xjk
𝑋. 𝑎

1
𝑋̅= 𝑎𝑏 ∑𝑗.𝑘 Xjk

In a two – factor expt, without replications the variations are as given in equation *

𝑉 = 𝑉𝐸 + 𝑉𝑅 + 𝑉𝐶 … . .∗

Where,

𝑉𝐸 = 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑒 𝑡𝑜 𝑒𝑟𝑟𝑜𝑟 𝑜𝑟 𝑐ℎ𝑎𝑛𝑔𝑒𝑠

2
=∑ (𝑥𝑗𝑘 − 𝑋̅𝑗 − 𝑥̅𝑘 + 𝑥̅ )
𝑗⋅𝑘
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

= residual variation = random variation

𝑉𝑅 = 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑟𝑜𝑤𝑠 (𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠)


𝑎 2
=𝑏∑ (𝑥̅𝑗 − 𝑥̅ )
𝑗=1

𝑉𝑐 = 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 (𝑏𝑙𝑜𝑐𝑘𝑠)


𝑏
= 𝑎 ∑𝑘=1(𝑥̅𝑘 − 𝑥̅ )2

The analysis of variance table for both mean value and two – factor with replications also for the
reasonings

ELEMENTARY SAMPLING THEORY

Sampling Theory: is a study of relationship existing between a population and samples drawn from the
population.

Uses:

1. In estimating unknown population quantities (e.g. population mean and radiance) often called
population parameters or briefly parameters, from a sample mean and variance (often called
sample statistics).
2. To determine whether the observed differences between 2 samples are to due to chance
variation or whether they are really significant e.g. in obtaining whether a drug for use in
treating malaria is effective or not ( that of hypothesis and significance)

A study of inferences made conclusion of a population using samples drawn from it together with
indications of the accuracy of such inference by using probability theory is called statistical inference

Random Samples and Random Numbers

To make reasonable conclusion using sampling theory and statistical theory samples must be chosen to
be representation of a population. A solution of sampling methods and of related problems arising is
called the design of the experiment.

Random sampling is of the ways of obtaining a representation sample. Here each member of the
population has an equal chance of being included in the sample. One sample to change are of obtaining
a random sample is to assign numbers to each member of the population, write this number on pieces
of paper and pick them randomly one by one.

Standard Errors: This is standard deviation of a sampling distribution.


Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

The quantities µ, σ,𝜌𝜇𝑟 and 𝑋̅,S, P, M denote respectively the population and sampled means, standard
deviations population and moments about the means.

When the sample size (N) is very large the sampling distribution are normal or nearly normal. The
methods are known as large sampling methods. (Use the Z test. For small samples (N<0j e use the theory
of small samples (or exact sampling theory)- Use the t-test.

Statistical Estimation Theory

Biased Estimation: occurring when the mean of the sampling distribution of a statistics equal the
corresponding population parameter.

Example

The mean of the sampling distribution means () is µ, the distribution population mean. Hence the mean
𝑋̅ is an unbiased estimate of the population mean µ. The mean of the sampling distribution of variance
is
𝑁−1 2
𝜇𝑠2 = 𝜎
𝑁

Where 𝜎 2 = population variance

N= sample size

The modified variance


𝑁
𝑆 2 =𝑁−1 𝑆 2

Efficient Estimates: If the sampling distributions of two statistics have the same mean then the statistics
with the smaller variance is called a efficient estimator of the mean, while the other statistic is called the
inefficient estimate or

In practice inefficient estimation are often used before as of the relative case with which some of them
can be estimated.

Point Estimates and Interval Estimates.

Point estimate is an estimate of a population given by a single number.

Interval estimate: is an estimate of a population parameter given by two numbers between which the
parameters maybe considered to the interval estimates indicate the precision or accuracy of an estimate
and are there preferable than point estimates.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Confidence Interval Estimates of Population Parameters.

Let 𝜇𝑠 = mean and 𝜎𝑠 = standard deviation (standard error) of the sampling distribution of a statistics.

When N ≥ 30, the sampling distribution is approximately normal, we can expect to find an actual sample
statistics lying in the interval 𝜇𝑠 −𝜎𝑠 to 𝜇𝑠 + 𝜎𝑠 , 𝜇𝑠 − 2𝜎𝑠 to 𝜇𝑠 + 2𝜎𝑠 or 𝜇𝑠 − 3𝜎𝑠 to 𝜇𝑠 + 3𝜎𝑠 about
68.27%, 95.45% and 99.73% of the time respectively.

Also, we can expect to find 𝜇𝑠 in the intervals S - 𝜎𝑠 to S +𝜎𝑠 , S-2𝜎𝑠 to S+2𝜎𝑠 , S-3𝜎𝑠 to S+3𝜎𝑠 about
68.27%, 95.45% and 99.73% of the time. There are 4 intervals 68.27%, 95.45% and 99.73% are called
confidence intervals for estimating µ. The numbers of these intervals (S±𝜎𝑠 , S±2𝜎𝑠 , S±3𝜎𝑠 ) are then
called the 68.27%, 95.45% and 99.45% confidence limits.

Similarly, S ± 1.96𝜎 and S ± 2.58𝜎 are the 95% and 99% (or 0.95 and 0.99) confidence limits for S.
The percentage confidence is often called Confidence level. The number 1.96, 2.58 etc. are called
confidence coefficients or critical values and denoted by 𝑍𝑐 .

Common confidence levels used in practice

Confidence 99.73% 99% 0.98% 96% 95.45% 95% 90% 80% 68.27% 50%
Level
𝑍𝑐 3.00 2.58 2.33 2.05 2.00 1.96 1.645 1.28 1.00 0.6745

From Table, the probability of Z is 0.4337.


16−20
Z= = −2
2

P(Z) = 0.4772.

STATISTICAL DECISION THEORY.

Introduction

Very often we are called to make decisions about population the basis on sample information such
decisions are called Statistical Decisions. E.g. deciding whether a new drug is really effective in curing
malaria or not.

Statistical Hypothesis

Statistical hypothesis are assumptions (or guesses) about the population involved such assumptions may
be true or false. They are generally statements about the probability distributions of the populations.
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Null Hypothesis: they are hypothesis formulated for the sole purpose of rejecting or nullifying
hypothesis e.g. in determining whether the tossing of a coin is fair (i.e. P=0.5, where P is the
probability of heads. Similarly, if we want to decide whether one procedure is better than another,
we formulate the hypothesis that there is difference between the procedures

Alternative Hypothesis(𝐻𝑎 )

Any hypothesis that differs from a given hypothesis is called an alternative hypothesis e.g. if 𝐻𝑜 :
P=0.5, 𝐻𝑎 ≠ 0.5, or p> 0.5

Procedure for Tests of Significance

A statistical test is based on the concept of proof of contraction is comprised of four parts as
follows:

1. The Null hypothesis denoted 𝐻𝑜


2. Research hypothesis also called alternative hypothesis denoted by 𝐻𝑎
3. Test statistics denoted by T.S
4. Rejection region denoted by R.R

Z- Score

The Z-Score (or test) is used to determine the probability that a measurement will fall in the interval
from µ to same value Y to the right of µ, we calculate the of standard deviation that Y lies from the
mean by using the formula
𝑌−µ
Z= 𝜎

Example

Consider a normal distribution with µ=20 and σ=2

(a) Determine the probability that a measurement will be in the interval 20 to 23Find the
probability that the measurement will be in the interval to 20
(b) Find the probability that the measurement will be in the interval to 20
Compiled by Computer Engineering Students, Class of 2019
Subjected to necessary corrections if and when correctly registered
All credits are to the original owner(s) of the document content

Solution

From the diagram

𝑌−𝜇
Now Z = 𝜎

23−20
= 2

=1.5

You might also like