Professional Documents
Culture Documents
MT233 October 2019-1
MT233 October 2019-1
Course Outline
1. Introduction
3. Hypothesis testing
4. Regression analysis
1
References
1. Statistics and probability with applications for engineers and scientist set by Bhisham
C. Gupta, Irwin Guttman
3. Design and Analysis of experiments 2nd, 3rd, 4th, 5th...8th ED by Douglas C Mont-
gomery
1 Introduction
1.1 What is statistics?
1. It refers to the sets of data relating to a wide range of topics such as the size of
populations, production activity, retail prices, incomes, rainfall, etc.
2. Statistics refers to the theory and methods used for collection, description, analysis
and interpretation of numerical data.
• Based on the above definitions one can say statistics comprises of two branches
• Environmental Studies (do strong electric or magnetic fields induce higher cancer rates?)
• Quality engineering
2
1.3 What does a statistician needs to be able to do?
There are various reasons why statisticians use samples and some are as follows:
• Cost-effective: To consider a sample is cost effective with respect to time, money, and
labour that in considering the whole population.
• Utility: In some experimental methods it will be futile to consider the whole population
if the process involves destroying the objects/items/individuals.
• Median -The sample median is obtained by first ordering the n observations from small-
est to largest (with any repeated values included so that every sample observation appears
3
in the ordered list). Then,
The single middle
n+1 th
value if n is odd = 2 ordered value
∗
x = The average of the two
middle values if n
is even = average of n and n + 1th ordered value.
2 2
• Advantages of median:
• Disadvantage(s) of median:
• Arithmetic mean: is the sum of all n observations or values divided by sample size, n,
that is
Pn
i=1 xi
x̄ = (1)
n
– Advantages: It considers all the values in the sample to find the outcome.
– Disadvantage: It is affected by extreme values in a data set.
• Variance is the sum of the squared deviations from the mean of n values divided by the
degrees of freedom (n − 1), that is
Pn
2 i=1 (xi− x̄)2
Var[xi ] = s =
n−1
Pn 2
− 2x̄xi + x̄2 )
i=1 (xi
=
n−1
Pn
( n 2
P P
i=1 xi i=1 xi )
Pn 2−2 n
x
i=1 i n ( x
i=1 i ) + n n2
=
n−1
( n 2
P
i=1 xi )
Pn 2
i=1 xi − n
= . (2)
n−1
4
Example 1 Determine the mode, median and mean from the following dataset: 4,5,1,4,12,10.
Solution
2. Median: To find the median, first arrange the observations as to size, that is, in either
ascending or descending order, that is,
1, 4, 4, 5, 10, 12
4+5
Thus, the median = = 4.5
2
3. Mean:
4 + 5 + 1 + 4 + 12 + 10
x̄ =
6
36
=
6
= 6.
4. Variance:
( n 2
P
i=1 xi )
Pn 2
i=1 xi − n
Var[xi ] =
n−1
2
302 − (36)
6
=
5
= 17.2
Prepare your own notes on (focus on how to construct these graphs, look at advantages and
disadvantages)
• Histogram
• Box-plot
• Frequency polygon
5
Home work
Explain clearly, using the following data, how the following are constructed:
23 29 40 28 15 22 46 39 22 17 26 33 35 49 20,
36 25 15 31 17 43 54 36 30 30 40 27 24 20 28 42
22 37 17 39 17 22 9 26 29
(b) Histogram,
2 Random variables
2.1 Continuous random variables in one-dimensional
• A discrete random variable is a random variable with a finite (or countably infinite) set
of real numbers for its range.
• A continuous random variable can also be defined a random variable where the data
can take infinitely many values.
• In continuous random variable, the interval on the real line may be open or closed,
bounded or unbounded.
• For instance, the interval could be [0, 1], (0, ∞) or (−∞, ∞).
6
• This function must be non-negative and have the property that the area of the region
bounded by the graph of f and the x− axis, −∞ < x < +∞, is 1, that is;
Z ∞
P (−∞ < x < +∞) = f (x)dx = 1. (3)
−∞
• Therefore
dF (x)
= f (x) (7)
dx
Example 2 Suppose that the battery failure time, measured in hours, has a probability density
function (p.d.f )
2
f (x) = x ≥ 0.
(x + 1)3
(ii) Find the probability that a randomly selected battery from the warehouse will have a
lifetime less than 5 hours.
Solution
(i) To be a valid p.d.f over the given interval, f must have two characteristics.
7
2. Its definite integral over the interval must be precisely a unity.
Since the give p.d.f is non-negative over the defined interval, we now investigate the
second characteristic, that is,
Z ∞ b
2 1
P (0 ≤ x ≤ ∞) = dx = lim − = 1.
0 (x + 1)3 b→∞ (x + 1)2 0
(ii)
Z 5 5
2 1 35
P (0 ≤ x ≤ 5) = 3
dx = − 2
= .
0 (x + 1) (x + 1) 0 36
(iii)
Z x x
2 1 1
F (x) = 3
du = − 2
=1− .
0 (u + 1) (u + 1) 0 (x + 1)2
= x2 f (x)dx − 2µ2 + µ2
a
Z b
= x2 f (x)dx − µ2 . (9)
a
Example 3 : Let the continuous random variable X denote the current measured in a thin
copper wire in milliamperes. Assume that the range of X is [0, 20mA], and assume that the
probability density function of X is
f (x) = 0.05, 0 ≤ x ≤ 20
8
(a) What is the probability that a current measurement is less than 10 milliamperes?
Solution
Z 10
(a) P (X < 10) = 0.05dx = 0.5.
0
20
x2 20
Z
(b)(i) Expected value E(x) = xf (x)dx = 0.05 = 10.
0 2 0
20
(x − 10)3 20
Z
(ii) Variance V (x) = (x − 10)2 f (x)dx = 0.05 = 33.33.
0 3 0
• Median: Another useful measure of central tendency is the median. We define the median
to be the number m such hat precisely half of the x− values lie below m and the other
half of the x− values lie above m., That is
P (a ≤ x ≤ m) = 0.5
Simplifying gives
• Mode: The mode is the value that appears most often in a set of data. The mode of a
continuous probability distribution is the value x at which its probability density function
has its maximum value, so the mode is at the peak.
9
Tutorial #1
1. Show that the following functions are probability density functions for some k and de-
termine the value of k. Then determine the mean and variance of X.
2. Suppose that
e−(x−6) , 6<x
f (x) =
0 x≤6
10
2.4 Continuous random variables in two-dimensional
• The double integral of fXY (x, y) over a region R provides the probability that (X, Y )
assumes a value in R.
• This integral can be interpreted as volume under the surface fXY (x, y) over the region
R.
Definition: A joint probability density function for the continuous random variable X and Y
denoted as fXY (x, y), satisfies the following properties
1.
2.
Z ∞ Z ∞
fXY (x, y)dxdy = 1. (12)
−∞ −∞
Example 5 : A privately owned business operates both a drive-in facility and a walk-in facil-
ity. On a randomly selected day, let X and Y, respectively, be the proportions of the time that
the drive-in and the walk-in facilities are in use, and suppose that the joint density function of
these random variables is
2
(2x + 3y), 0 ≤ x ≤ 1, 0≤y≤1
5
f (x, y) =
0 otherwise
(b) Find P [(X, Y ) ∈ A], where A = {(x, y)|0 < x < 0.5, 0.25 < y < 0.5}
11
Solution
Thus
Z ∞ Z ∞ Z 1Z 1
2
fXY (x, y)dxdy = (2x + 3y)dxdy,
−∞ −∞ 0 0 5
Z 1 2 1
2x 6xy
= + dy
0 5 5 0
Z 1
2 6y
= + dy
0 5 5
2y 3y2 1
= + = 1.
5 5 0
Definition: If the joint density function of continuous random variables X and Y is fXY (x, y),
the marginal probability density functions of X and Y are
Z
fX (x) = fXY (x, y)dy, (14)
Rx
Z
fY (y) = fXY (x, y)dx, (15)
Ry
12
respectively, where Rx denotes the set of all points in the range of (X, Y ) for which X = x,
and Ry denotes the set of all points in the range of (X, Y ) for which Y = y.
!
Z x2 =1
2 2(1 + 3y)
fY (y) = 2x + 3y dx = .
x1 =0 5 5
Definition 1 : Given continuous random variables X and Y with joint probability density
function fXY (x, y), the conditional probability density function of Y given X = x is
fXY (x, y)
fY |x (y) = f (y|x) = for fX (x) > 0 (16)
fX (x)
fXY (x, y)
fX|y (y) = f (x|y) = for fY (y) > 0 (17)
fY (y)
2 , is
• The conditional variance of X given Y = y, denoted as V (X|y) or σX|y
Z Z
2
V (X|y) = (x − µX|y ) fX|y (x)dx = x2 fX|y (x)dx − µX|y (21)
Ry Ry
13
Example 7 Consider the pdf fXY (x, y) = x + y, for 0 < x < 1 and 0 < y < 1.
Determine
1. f (Y |x)
Solution
1.
" #1
1
y2
Z
f (x) = (x + y)dy = xy + = x + 0.5
0 2
0
Thus
2.
! " #0.5
Z 0.5
0.5 + y (1 + y)y 7
P (0.25 < Y < 0.5|x = 0.5) = dy = = = 0.21875.
0.25 0.5 + 0.5 2 32
0.25
3.
! " #1
1 1
y2 y3
Z Z
0.5 + y 7
E(Y |x) = yf (Y |x)dy = y dy = + = .
0 0 0.5 + 0.5 4 3 12
0
The definition of independence for continuous random variables is similar to that of discrete
random variables. For continuous random variables if fXY (x, y) = fX (x)fY (y) for all x and y
then the random variables X and Y are said to be independent.
14
Solution
Z ∞ Z ∞
−(x+y) −x
fX (x) = e dy = e e−y dy = e−x .
0 0
Z ∞ Z ∞
−(x+y) −y
fY (y) = e dy = e e−x dx = e−y .
0 0
Clearly
• var[aX + c] = a2 var[X]
2. If X and Y are random variable with joint probability distribution, with a and b as
constants then
3. If X and Y are independent random variable (Cov(X, Y ) = 0), with a and b as constants,
then
4. If X and Y are random variable with joint probability distribution, with a and b as
constants then
Example 9 If X and Y are random variable (r.v) with joint probability distribution, such that
var[X] = 2, var[Y ] = 4 and cov(X, Y ) = −2, find
Solution
15
(a)
(b)
• One of the most important examples of a continuous probability distribution is the Nor-
mal distribution.
• Equation (24) is often referred termed the standard normal density function.
16
Example 10 1. Find the area under the standard normal curve
2. Suppose the current measurement in a strip of wire are assumed to follow a normal
distribution with a mean of 10 milliamperes and a variance of 4 (milliamperes)2 . What
is the probability that a measurement will exceed 13 milliamperes?
Solution
1. (a)
P (0 ≤ Z ≤ 1.2) = P (Z ≤ 1.2) − P (Z ≤ 0)
= 0.8849 − 0.5
= 0.3849
2. Let X denote the current milliamperes. The requested probability can be represented as
x − 10
P (X > 13). Let Z = . Therefore
2
x − 10 13 − 10
P (X > 13) = P >
2 2
= P (Z > 1.5)
= 0.06681.
17
3 Hypothesis testing
3.1 Introduction
• Hypothesis testing is concerned with deciding between the two hypothesis H0 ( null hy-
pothesis) and H1 (alternative hypothesis)
• H1 express the way in which the value of a particular parameter in a statistical model
may deviate from that specified in H0
Example 11 A machine that produces metal cylinders is set to make cylinders with a diameter
of 50 mm. Is it practical that all cylinders that this machine will produce will have a diameter
of exactly 50 mm?
Solution
(i) H0 : µ = 50 (All cylinders produced by the machine have the set diameter, 50mm)
(ii) H1 6= 50 (There is a possibility that the machine can produce cylinders whose diameter is
not 50mm)
Definitions
• Such tests are based on the value of sample statistics, such as x̄, z or t scores and these
are called test statistics.
• The subset is chosen so that the total probability is low on H0 and is better explained
by H1 .
18
Steps in hypothesis testing
Figure 1: The distribution of Z0 when H0 : µ = µ0 is true; with the critical region for (a) the
two sided H1 : µ 6= µ0 , (b) the one-sided alternative H1 : µ > µ0 , and the one-sided alternative
H1 : µ < µ0 .
3. Choose the appropriate test statistics and establish the critical region.
Remark: We reject H0 when the computed value lies in the critical region.
Example 12 : An electrical firm manufactures light bulbs that have a lifetime that is approx-
imately normally distributed with a mean of 12 hours and variance 0.64 hours2 . A light bulb
is selected at random, and is tested, and the lifetime is found to be 13.3 hours. Determine
whether this bulb belongs to the manufacturer. Use the 5% level of significance.
Solution
2. Level of significance: 5%
19
3. Test statistic: Normal distribution.
x−µ 13.3 − 12
4. Computed Z-value:Z = = √ = 1.625
σ 0.64
5. Conclusion: Since the computed Z value (1.625) dose not lie in the critical region [Z ≤
−1.96 U Z > 1.96] we fail to reject H0 at 5% level of significance and we conclude that
the bulb belongs to the manufacturer.
• Usually we take a sample of size n, from which we compute the sample mean x̄ which
we then compare with a specified value.
•
Pn
i=1 xi
var[x̄] = var
n
1
= var[x1 + x2 + · · · + xn ]
n2
nσ 2 σ2
= =
n2 n
r
σ2 σ
• Therefore the standard error S.E[x̄] = =√
n n
Example 13 A soft-drink bottler purchases 10 bottles from a glass company. The bottler
wants to know if the average mean breaking strength exceeds 200 psi. If so she wants to accept
the bottles. Past experience indicates that for 4 specimen bottles the variance of the breaking
strength is 100 psi2 and a mean of 214 psi. Investigate at 5% level of significance whether the
manufacturer should accept or reject the bottles.
Solution
1. • H0 : µ = 200
• H1 : µ > 200
2. Level of significance: 5%
4. Computed Z-value
x̄ − µ 214 − 200
Z= = 10 = 2.8
S.E[x̄] 2
20
5. Conclusion: Reject H0 . The manufacturer should accept the lot since the mean breaking
strength is greater than 200 pascals.
3.3 The difference between two mean when population variances σ12 and σ22
are known
• Since we have two populations under study, we are supposed to deduce the standard
error for these populations
σ2
• We know that var[x̄] = ,
n
σ12 σ22
• Then var[x¯1 − x¯2 ] = var[x¯1 ] + var[x¯2 ] = + ,
n1 n2
s
σ12 σ22
• Thus S.E.[x¯1 − x¯2 ] = + ,
n1 n2
(x¯1 − x¯2 ) − (µ1 − µ2 )
• Hence Z = ,
S.E[x¯1 − x¯2 ]
Example 14 A manufacturer claims that the average tensile strength of synthetic fibre A ex-
ceeds the average tensile strength of synthetic fibre B. To test his claim, 50 pieces of each type
of synthetic fibre are tested under similar conditions. Type A had an average tensile strength
of 43.7 psi and a variance of 11.8 psi2 , while type B had an average tensile strength of 41.5
psi and a variance of 46.3 psi2 . At 5% significance level, test the manufacturer’s claim.
Solution
1. • H0 : µA − µB = 0
• H1 : µA − µB > 0
2. Test statistic
(x¯A − x¯B ) − (µA − µB )
Z = q 2 2
σA σB
nA + nB
(43.7 − 41.5) − 0
= q
11.8 46.3
50 + 50
= 2.04
21
3.4 Hypothesis testing when the population variances is not known
and n < 30.
Example 15 A manufacturer of television picture tubes has a production line that produces
an average of 100 tubes per day. Because of new government regulations, a new safety device
has been installed, which the manufacturer believes will reduce average daily output. A random
sample of 15 days’ output after the installation of the safety device is shown below.
93, 103, 95, 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95
At 5% significance level, is there sufficient evidence to conclude that the average daily output
has decreased following the installation of the safety device?
Solution
• Also observe the sample (n = 15) < 30 hence we use the t distribution
3. Test statistic
x̄ − µ
t = s
√
n
96.47 − 100
= 4.85
√
15
= −2.82.
4. Conclusion: Reject H0 since the computed t value lie in the critical region (−2.82 <
−1.761), and conclude that there is enough evidence to show that the average daily pro-
duction has decreased.
3.5 Difference between two population mean when the population variances
σ12 and σ22 are not known and (n1 − 1) + (n2 − 1) = n1 + n2 − 2 < 30.
• In order to use t distribution to make a valid test of hypothesis about µ1 −µ2 the following
conditions must be met.
1. The two population random variables (x1 and x2 ) are normally distributed.
2. The two sample must be independent
3. The two population variances are equal, that is σ12 = σ22 .
22
• By condition 3, we have common variance known as pooled variance, given by
(n − 1)s21 + (n2 − 1)s22
s2p =
n1 + n2 − 2
• Since we are comparing two populations, thus
(x¯1 − x¯2 ) − (µ1 − µ2 ) (x¯1 − x¯2 ) − (µ1 − µ2 )
t= q 2 2
= r .
sp sp
1 1
n1 + n2
2
sp n1 + n2
Example 16 The manager of a large production facility believes that worker productivity is a
function of, among other things, the design of the job, which refers to the sequence of move-
ments. Two designs are being considered for the production of new product. In an experiment,
six workers using design A had a mean assembly time of 7.60 minutes, with a standard devi-
ation of 2.36 minutes, for this product. (The six observation were 8.2, 5.3, 6.5, 5.1, 9.7, 10.8).
Eight workers using design B had a mean assembly time of 9.20 minutes, with a standard
deviation of 1.35 minutes. (The observations were 9.5, 8.3, 7.5, 10.9, 11.3, 9.3, 8.8, 8.0). Can
we conclude at the 5% level of significance that the average assembly times differ for the two
designs? Assume that the times are normally distributed.
Solution
1. • H0 : µ1 − µ2 = 0
• H1 : µ1 − µ2 6= 0
2. Test statistic
(x¯1 − x¯2 ) − (µ1 − µ2 ) (7.60 − 9.20) − 0
t= r = q
1 1
= −1.61
1
s2p n1 + n2 1 3.38 6 + 8
• Recall that
(n − 1)s21 + (n2 − 1)s22
s2p =
n1 + n2 − 2
(6 − 1)2.362 + (8 − 1)1.352
=
6+8−2
= 3.38
3. Conclusion: We fail to reject H0 . Since the computed t value does not lie in the critical
region. Therefore we conclude that there is no sufficient evidence to allow us to conclude
that a difference in mean assembly times exists between designs A and B.
23
3.6 Paired Comparison
• Here, we consider two sample as in a two sample t− test, the difference is that in this
experimental design the samples are not independent.
• Observations occur in pairs such that, the two observations in a pair are taken from the
same the same experimental unit
• Or from two similar experimental unit (similar with respect to certain attribute)
Example 17 Gasohol has received much attention in recent years as possible alternative to
gasoline as a fuel for auto-mobiles. To compare the mileages per-gallon that can be achieved
with the two fuels, the following test was performed. Eight cars were selected and their fuel
tanks completely cleaned. Each car was driven twice over a predetermined course-once using
gasohol and once using gasoline and the miles per gallon was recorded for each trip.
At 10% significance level, does the data support the hypothesis that the mean mileage per gallon
of gasohol is less than that of gasoline?
Solution
It follows that
P
D −35
x¯d = = = −4.375
n 8
s P P 2 s
2
n D − ( D) 8(173) − (−35)2
sd = = = 1.69
n(n − 1) 8(7)
1. • H0 : µd = 0
• H1 : µd < 0
2. Test statistic
x̄ − µd −4.375 − 0
t= sd = 1.69 = −7.29
√ √
8 8
4. Conclusion: Reject H0 and conclude that the mean mileage for gasohol is less than that
of gasoline at 10% level of significance.
24
3.7 Confidence interval
• In general a (1 − α)100% confidence interval (CI) for the population parameter θ is given
by
CI for the true population mean µ, when the population variance σ 2 is known
• Conclusion: Since the stated value in the null hypothesis does not lie within the 95% CI
we reject H0 at 5% level of significance and conclude that the mean breaking strength is
greater than 200.
3.9 CI for true population mean µ when the population variance σ 2 is un-
known and n < 30.
• Conclusion: Reject H0 (value stated in the null hypothesis not in the CI). Conclude that
the average production has changed.
25
3.10 CI for the difference between two populations
CI for two populations when µ1 − µ2 , σ12 and σ22 are known and n1 + n2 − 2 > 30.
• H0 : µ1 − µ2 = 0, H1 : µ1 − µ2 > 0
• Conclusion. Reject H0 . Conclude that the average tensile strength for the two metals are
different.
CI for two populations when µ1 − µ2 , σ12 and σ22 are unknown and n1 + n2 − 2 < 30.
s
1 1
= (7.6 − 9.2) ± 2.179 3.38 +
6 8
• H0 : µ1 − µ2 = 0, H1 : µ1 − µ2 6= 0
26
4 Linear Regression and Correlation
4.1 Introduction
• In many real world problems there are two or more variables that are related, and it is
important to explore this relationship.
• E.g in an industrial situation it is known that the tar content in the outlet stream in a
chemical process is related to the inlet temperature.
• A procedure for estimating the tar content for various levels of the inlet temperature.
• Applications of regression are numerous and occur in almost every filed, including:
– engineering
– physical sciences
– economics and resource management
– life and biological sciences
• The relationship between the response y and the regressor, x, is a straight line.
y = β0 + β1 x + (27)
• Errors are assumed to be normal distributed with zero mean and an unknown but con-
stant variance.
• Errors are also assumed to be uncorrelated, implying that the value of one error does not
depend on the value of any other error.
• If the range of data on x includes x = 0, then the y− intercept β0 is the mean of the
distribution of the response y when x = 0
• However, if the range of values for x does not include zero, then β0 has no practical
interpretation.
27
4.2 Least squares estimation of β0 and β1
• Parameters β0 and β1 are unknown and must be estimated using sample data
• These data may result in either from a controlled experiment designed specifically to
collect the data, or from existing records.
• We need to estimate β0 and β1 so that the sum of squares of the differences between the
observations yi and the straight line is at minimum.
yi = β0 + β1 xi + i , i = 1, 2, 3, · · · , n. (28)
• Equation (27) can be viewed as a population regression model while equation (28) is a
simple regression model written in terms of the n pairs of data (yi , xi ), i = 1, 2, · · · , n.
ŷ = βˆ0 + βˆ1 x,
such that each pair of observations satisfies the relation
yˆi = βˆ0 + βˆ1 x + ei (29)
where ei = yi − yˆi is called a residual and describes the error in the fit of the model at
the ith data point
• We need to find βˆ0 and βˆ1 so as to minimize the Residual Sum of Squares (RSS).
n
X n
X
RSS = e2i = (yi − βˆ0 − βˆ1 xi )2 (30)
i=1 i=1
• Taking partial derivative of (30) with respect to βˆ0 and βˆ1 , gives
n
∂RSS X
= −2 (yi − βˆ0 − βˆ1 xi ) (31)
∂ βˆ0 i=1
n
∂RSS X
= −2 (yi − βˆ0 − βˆ1 xi )xi , (32)
∂ βˆ1 i=1
• Setting the partial derivatives to zero and rearranging the terms we obtain
n
X n
X
nβˆ0 + βˆ1 xi = yi (33)
i=1 i=1
n
X n
X n
X
βˆ0 xi + βˆ1 x2i = xi yi (34)
i=1 i=1 i=1
28
which yields (make βˆ0 subject from (33) and substitute in (34)
Pn Pn Pn
n x i y i − ( x i )( i=1 yi )
βˆ1 = i=1Pn 2 i=1
Pn 2
n i=i xi − ( i=1 xi )
Sxy
= (35)
Sxx
• where
n Pn 2
X ( i=1 xi )
Sxx = x2i − (36)
n
i=1
n Pn 2
X ( i=1 yi )
Syy = yi2 − (37)
n
i=1
n
( n i=1 yi )( n i=1 x1 )
X P P
Sxy = x i yi − (38)
n
i=1
• Often the problem of analysing the quality of the fitted regression line is handled through
an ANOVA approach.
• The analysis of variance approach in a simple regression model test the hypothesis:
Source df SS MS F
Regr SS
Regression 1 Regr SS Regr SS s2
SSE
Residual n-2 SSE s2 = n−2
Total n-1 SS(Total)
• Mathematically it is given by
Regression SS residual SS
r2 = =1− (39)
Syy Syy
29
• Since Syy is a measure of the variability in the response y without considering the effect
of the regressor x, and the residual SS is a measure of the variability in y remaining after
x has been considered.
• r2 is the proportion of the variation in the response y accounted for by the regressor x.
• Values of r2 close to 1 imply that the model explains most of the variation in y.
• where t( α2 ,n−2) is the value of the t− distribution with (n-2) degrees of freedom, and
S 2 is the residual mean square from the ANOVA table.
• Equation ŷ = βˆ0 + βˆ1 x may be used to predict the mean response y|x0 at x = x0 .
• Where x0 is not necessarily one of the pre-chosen values.
Temperature (x) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Converted sugar (y) 8.1 7.8 8.5 9.8 9.5 8.9 8.6 10.2 9.3 9.2 10.5
30
(a) Fit a simple linear regression model to the data.
(b) Carry out an analysis of variance (ANOVA) to test at the 5% level of significance whether
he slope is significantly different from zero. From the ANOVA table, compute the coeffi-
cient of determination, r2 and interpret it.
(c) Predict the amount of converted sugar when the coded temperature is 1.75. Find a 95%
prediction interval for this prediction.
Solution
Pn
i=1 xi 16.5
x̄ = = = 1.5, (42)
n 11
Pn
i=1 yi 100.4
ȳ = = = 9.13, (43)
n 11
n Pn 2
X ( i=1 xi )
Sxx = x2i − = 1.1, (44)
n
i=1
n Pn 2
X ( i=1 yi )
Syy = yi2 − = 7.20, (45)
n
i=1
n
( ni=1 xi )( ni=1 yi )
X P P
Sxy = x i yi − = 1.99.
n
i=1
31
(b) For the ANOVA table we need to compute the following
• H0 : β1 = 0
• H1 : β1 6= 0
Source df SS MS F
Regression 1 3.60 3.60 9.0
Residual 9 3.60 0.40
Total 10 7.20
Regression SS 3.60
r2 = = = 0.50
Syy 7.20
Comment: This implies that 50% of the variation in converted sugar (y) is explained by
the temperature (x) and the remainder is unaccounted for by our regression model.
32
9.58 − 2.26(0.6776) < y0 < 9.58 + 2.26(9.58)
1. Verify that the function f is a probability density function(p.d.f) over the given interval
2. Find the constant k so that the function f is a probability density function over the given
interval
4. Let X denote the reaction time, in seconds, to a certain stimulus and Y denote the
temperature (o F) at which a certain reaction starts to take place. Suppose that two
random variables X and Y have the joint density
k(2x + y), 2 < x < 6, 0 < y < 5
f (x) =
0 otherwise
(a) Find k,
33
(b) P (X > 3, Y > 2)
(c) P (X + Y < 4)
5. Let X and Y denote the lengths of life, in years, of two components in an en electronic
system. If the joint density function of these variables is
e−(x+y) , x > 0, y > 0
f (x) =
0 otherwise
7. If X and Y are random variable (r.v) with joint probability distribution, such that
var[X] = 1.5, var[Y ] = 2 and cov(X, Y ) = −1, find
(a) What is the probability that a sample’s strength is less that 6250 Kg/cm2 ?
(b) What is the probability that a sample’s strength is between 5800 and 5900 Kg/cm2 ?
(c) What strength is exceeded by 95% of the sample?
9. The diameter of holes for cable harness is known to have a standard deviation of 0.01
cm. A random sample of size 30 yields an average diameter of 1.5045 cm. Use α = 0.01,
to test the hypothesis that the true mean hole diameter is 1.50 cm.
10. A contractor makes a large purchase of cement. The bags of cement are supposed to
weigh 94 kg. The contractor decided to test a sample of bags to see if he is getting
stipulated weight. A random sample of size 9 yielded the following weights.
34
11. Iron ore is extracted from rocks obtained from two different sites A and B. The observa-
tions are of percentage of iron ore per sample:
2 ) and N (µ , σ 2 ) respectively.
Assume normal populations, N (µA , σA B B
12. The average fuel consumption for 10 small cars before and after a certain additive sub-
stance was introduced into their fuel was observed and the data obtained were recorded
as follows:
After : 47 38 44 48 52 55 44 52 60 44
Before : 40 39 32 33 40 27 36 56 50 40
Suppose that the differences in fuel consumption are normally distributed with mean µD
2
and variance σD
(a) At 5% significance level, is there sufficient evidence to conclude that the additive
substance increases fuel consumption in each vehicle?
(b) Construct a 90% confidence interval for µD .
13. A market survey was conducted for the purpose of forming a demographic profile who
would like to own an electronic engineering company. The data collected is presented
below:
Is there sufficient evidence to conclude that the desire to own an electronic engineering
company is related to gender? (Test using α = 0.05).
35
14. Voltage output (y) and engine speed (x) in metres per second were observed for a turbine
at a hydroelectric station were recorded as follows:
15. Explain clearly, using the following data, how the following are constructed:
23 29 40 28 15 22 46 39 22 17 26 33 35 49 20,
36 25 15 31 17 43 54 36 30 30 40 27 24 20 28 42
22 37 17 39 17 22 9 26 29
16. An experiment was conducted to compare the speeds of the word-processing packages
of two brands of minicomputers A and B. Forty people with similar backgrounds were
randomly selected and divided into two groups. One group was assigned minicomputer
A and another to minicomputer B. Each person was asked to perform the same word
processing job and the length of time it takes for each person to complete the job was
recorded. Past experience has shown that the population, associated with both mini-
computer A and B are normally distributed. The times required by the group using
minicomputer A had a mean of 14.8 minutes and a variance of 3.9 minutes2 . For the
group using using minicomputer B, the mean length of time to complete the task was
12.3 minutes and the variance was 4.3 minutes2 .
Is there sufficient evidence to conclude that the mean length of time required to complete
a word-processing task using minicomputer A is less than that of minicomputer B? (Use
α = 0.01).
36
TableI
A r e a sU n d e rt h e N o r m a lC u r v e
o.oo o.oI 0.02 0.03 o.o4 0.05 0.06 0.a7 0'08 0'o9 r - ,(:L"t
- J.4 0.0003 0.0001 0.0003 0.0003 0.0003 0.0001 0.ooo3 0.0003 0.0003 0.0002 z
- J.J 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003 o
0.0007 0 0007 0.0006 0.0006 0.0006 0.0()06 0.0006 0.0005 0.0005 0.0005 0.I0 1.2815
- J , I 0 . 0 0l 0 0.0009 0.0009 0.0009 0.0008 0.0008 0.0003 0.0008 0.0007 0.0007 0 .0 5 1.5449:
- 1.0 0 . 0 0 Il 0.mr3 0.0013 0.0012 0.00r2 0.001I 0 . 0 0 II 0 . 0 0 I| 0,00I0 0.0010 0.025 1.9600
0.010 2. )26I
-? a 0.00r 9 0 . 0 0 I8 0 . 0 0t 7 0.00l7 0.00l6 0 . 0 0| 6 0.00r 5 0.00tJ 0.00r4 0.m14
0.0021 0.002I 0.0020 0.00I9 0.005 ?.5758'
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.ffi22
0.0035 0.0034 0.0031 0.0032 0.003I 0.0010 0.0029 0.0028 0.oo27 0.0026 0 , 0 0t I .0902
- ) A 0.0047 0.0045 0.0044 0.0043 0.004I 0.0040 0.0019 0.0038 0.0017 0.0016 0.0005
0.0048 1.2905
-2-5 0.0062 0.0060 0.0059 0.0057 o.oo55 o.oot4 0.00J2 0.005I 0.0049
0 . 0 0 0I 1.7r90
-2.4 0.0078 0.0075 0.0071 0.007I 0.0069 0.0068 0.0066 0.0064 0.00005 I .8906
0.0082 0.0080
-2.J 0 . 01 0 7 0.0104 0.0| 02 0.0099 0.0096 0.0094 0.009r 0.0089 0.0087 0.0084 0 . 0 0 0 0I 4.2649
0 . 0 13 9 0.0136 0 . 0r 1 2 0.0129 0 . 0 12 5 0.0122 0 . 0 tt 9 0 . 0 tr 6 0.01l3 0 . 0 tt 0
-2.1 0 . 0t 7 9 0 . 0r ? 4 0.0170 0.0166 0 . 0t 6 2 0 . 0r 5 8 0 . 0 15 4 0.0t50 0.0r46 0.0t41
-- 2.0 0.0228 0.022? 0.02l7 0 . 0 2| 2 0.0207 0.0202 0 . 0 r9 7 0 . 0I 9 2 0 . 0t 8 8 0 . 0t 8 l
. - t . 9 0.0287 0.028| 0.0214 0.0268 0-0262 0.0256 0.0250 0.0244 0.0239 0.0213
- l . E 0.0359 0.0352 0 . 0 3 4 4 '.00..0043 1 6 0.0129 0.0322 0 . 0 3l 4 0.0307 r0.010l 0.0294
- t.t 0.0446 0.0416 0.0427 l8 0.0409 0.040I 0.0192 0.0384 0.0375 0.0367
- 1 . 6 0.0548 o.o517 0 . 0 5 2 6 0 . 0 5I f 0.0505 0.0495 0.0485 0.0475 0.q4q5 0.o455
t a
0.0668 0.0655 0.0641 0.0610 0.0618 0.0606 0.0594 0 . 0 5 8 2 0 . 0 5 7r 0.0559
- 1 . 4 0.0808 0.0791 0.0778 0.0764 0.0749 0.0735 o.o722 0.0708 0.0694 0.068|
- l.l 0,@68 0.095I 0.0934 0.0918 0 . @ 0l 0.0885 0.0869 0.0851 0.0818 0.082J
p .r l 5 l 0 . ln l 0 . 1| | 2 0.1091 0 .r 0 7 5 0 .r 0 5 6 0.1038 0.1020 0.1003 0.0985
Q .I 3 5 7
i l
0 .l ] 3 5 0 .t 3 r 4 0.1292 0.127 | 0 .1 2 5I 0.1230 0.t2t0 0.1190 0.1t70
- 1 . 0 0 .I 5 8 7 0.1562 0.t519 0.1515 o.t492 0.1469 o.1446 0.1423 0.1401 o.t379
n+-
1 A 0.9951 0.9955 0.9955 .0.9957
a', 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.997l 0.9972 0.9971 0.9974
0.9974 0"9975 0.9976 0.9977 0.9977 0.9978 0.9979 09919 0.9980 0 . 9 9 8|
0.99EI 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
7.9
1.0
l. l
0.9987
0.9990
0.9987
0.99el
0.9987
0.999r
0.9988
0.9991
0.9988
0.9992
0.998e
0.9992
0.9989
0.9992
0.9994
0.9989
0.9992
0.9995
0.9990
0.9991
0.9995
0.9990
0.9991
0.9995
v
'lo)
1 1 0.9991 0.999r 0.9994 0.9994 0.9994 0.9994
1 1 n 60< n doo< n ooo< n ooo< ^ ooo< n AAA6 o s995 0.9995 0.9996 0.9997
Table Thc cnrricsirr this tablc arc tlrc criticnl valucsfor Studcnt'st lor an area of q in lhc
Crllicrlvrlucsol Studcnt's
I r i g l r t - h u n dt u i l .C r i t i c a l v a l u c sf o r t h c l c f t - h a n dt a i l a r c f o u n d b y s y m m c t r y
distribution
-Amountol a in One-lail
0.25 0.10 0.01 0.005
o
@ aiqEqisaRs
"-€rrt{' rtneicioi S A F T=: F R 5IT3 E i 3r : q n Y n ' i t ! f ? - ) n N
(!
E - h 6 N \ O r V v - \ O
z s ': ro - . €n hr s: q t : n d r c s
m e i r i c i n i N : i T F: : F F 1T I s E !f l E E q E
E C > n N q l O - @ € 6 ^
o
o,
(-) :'r ) o
-
i9\a
@ h r<r o
09a5;X
ri ei ei e..i R : T i T: : = R F- 5 T i I t q . < t h h €
oer:gntc
u
q n T F m v _ N O . c
o c\l
9 o i € r n
n L ai ri 'a
€ v J o . F ;
-i ,-i -i c.i r.i
6;$nR x3':q € E ei- q_. E sRR6q
ci
ci.i ci.i ci e.i e.i c.r nl ri.i J .J J J --i J
Ol
0 9 y \ 9 c ) . o r r q h { r
cv ': -o - i €g no vq n 0 9 e _ 6 r
a a r i n i n i *q *3 *3 .Sin ^ i K; .Rr g; .Y =^ r. .=
o s n n -
r= a
_ = p i RI =
i!EI L
5 = : = fft g 3 : : i i I I F : : : " F = i r ! !
|r) \ O v O \ O N
v ; r € €
N = d F ; +
cv V 9 9 - €
9 a r O r €
N = c d r i
g IIia
+Fid-i BEqqq QEi;n RRR -eRRr=:l
ei
".i",-,r^^ oiaic.icioi
d
nFI!l
"iF::
u Rin
€ vx) vI ' c E r E
.' - i e i nr i { e6 €nqq; q
i - i c i F . io i - i
qiEn ssFRF\o€o-Fr
oiruorcr r-i .i.i.i.i i rR33=
t)
o
x. ' - @; \E
o . <g
r I : l rEr i - E
q i i qd .Ei . i oEi . i ;c . e
,i
a nsi cEl c rqe r cqv E rEi c . f. l i. i csi c si KF s B H s
o .irid_i_.
n
r<f ';i+RtE
c r;: o
: <di .i oxvi + ; ; ;
o\ h !t N h h ? -,R
fin;S$ i gFSF r:6o.re
a) ; AqiEq ; . j . i . i . i . l r$ X iii i=l=Ra
o ".i*kil^
a rg
c)
i ; S q E qnd eF i q Fl ild ; * .a{ . e; i ;€. j E i J€n e
,i-€\o? q q 6 nsr €
rcorc,i
s $ ? 6 , . F e-
.c z
ciciciJ ii"{R_q
o g - 9 \ O h € T € F N
L
E
n . i O \ - A
N : . - d 6 s
N 6 N N N
+ - i F i F i - i
88SsR i8€qe
r;dcicici
GRR;$ $xR=o
N N N r{ ^t r.i c..i .i
ei .i tri
l{ ...| .i .i .l
o' O) O n - \ O h o \ r 6 6 n
o lt
o a ' O f t O
e { = c i 6 r i
.'l O\ \O rt
. a d d d - i
n
O - n \ o o
c l - : q q q qEEi.: qssGB RsER;
1 6 n 6 d d N N ^t r. hi Fi ci c.i ri ci ni ei oi ci ni
\ 6l h an N O\'A n N 9 m @ \ O € € - \ O
.9 o l
N ^ ' - n -
' ' - 6 € h
n - € \ O r
.q rr d.i ..i
n 6. -6 {6 -n - : q
o n
Eqqqq 4sBFF s6Bss
N c{ .l .r c] r.l F{ .i ai ..t ?.1 a.l ci al
o l
o \ o a . t 6 o \ - \ o n F
u - d d h v rr.l
O \ O \ - . ! ! > ( ! t O \ O n O r h - ,
N = o i \ d r ; .a+q
0 r
-i
nY 9n.!
n n n m m
+i;;=
nnnnrt
H 6 'r i,e U
rr m
5 eS g F € 8
i ci Fio{rici.i
oc)
Q o h q a t \ O \ q O @ O \ - q €
o ; n o \ r n R : ? N o \
.t -c- t$ .N cO q
a
-
' r - A € h
r 9 ^ l -
. c v . a +
O \ € @ f - \ O
d-i-i-iFi :::nc
nn n nm
.N t- "r nj ; cF -qo
m ri ri m ri Fid-i-iFi
- =
. - o - n - - F
o\ o\dc{\o gr r c) !f
\o^;-ir\o q n d_r - 6, 9 ? q= g g , ? N c ) € € $ r Q c ) d q
3.2 - - r € € r* rO 6 ri
hhn - i S',.q'dd!',.' : : nn .?n r t c . tc . . t - blq4
- E
= T s v v r r t . r t ' t f9 v
v . q q r i r i "q
c o O
- . . { F r . < r h\ o F c o o \ o - s = : : --=gR
u
5 9 F F .R
i XR S?gR g
F =
rolEuturouaO
Jo,uropeaJJ
1osaa:6eq
r a \ 6
: o
o = c i
l f , o l
l' - J o v o
E c
> o