You are on page 1of 33

Statistics and Probability

Solved Assignments
Semester Spring 2010

Assignment 1
Question 1: (Marks:
2+2+2+4=10

(a) Give an answer of the followings:

• For a series, mean is 5 and mode is 2, find median of the series

Given that

Mean =5 and mode = 2

Now we will find the median by using the empirical relationship among the three measures

i.e.

mod e = 3median − 2mean


1
median = (mod e + 2mean)
3
1 1
median = (2 + 2 × 5) = (12 ) = 4
3 3

• What is aim of collecting numerical data for a statistical study?

The main purpose of a statistical study is to make inference about population on the basis of
sample data. So to get descriptive information from sample, we need data. And collection of
numerical data provides the BASIS for the analysis of data to carry out further steps.

• Write down the functions of statistics.

1. Statistics assists in summarizing the larger set of data.


2. Statistics assists in the efficient design of laboratory and field experiments as well as
surveys.
3. Statistics assists in a sound and effective planning in any field of inquiry.
(b)

A paint retailer has had numerous complaints from customers about under-filled paint cans. As a
result retailer started to inspect the incoming shipments. A recent shipment contained 2,440
gallon-size cans. The retailer sampled 50 cans and weighted each on a scale capable of
measuring weight up to four decimal places and properly filled cans weight 10 pounds.

Now for this problem

1. Describe a population
2. Describe a variable of interest
3. Describe the data type of variable
4. Describe a sample
Sol:

Reading the question statement, we know that

a) The population is the set of units of interests to the retailer, which is the shipment of
2,440 cans of paint.
b) The weight of paint cans is the variable, the retailer wishes to evaluate.
c) In this case retailer has to measure the weight, and the weight is continuous quantitative
variable.
d) The sample is the subset of population. In this case, it is the 50 cans of paint selected by
the retailer.

Question 2: Marks:
2+2+6=10

(a) How collection of data is performed with the help of enumerators.

Under this method, the information is gathered by employing trained enumerators who assist the
informants in making the entries in the schedules or questionnaires correctly. This method gives the
most reliable information if the enumerator is well-trained, experienced and tactful.

(b) Average height of the students in a school is 5.2 inches. A sample of 12 students showed the
following heights in inches.

5.0, 5.3, 5.2, 4.9, 4.11, 5.0, 5.5, 5.4, 5.1, 5.0, 5.2, 4.10

Calculate the sampling error.

Sol:
As µ=5.2 and sample mean of the data is

x=
∑ x = 59.81 = 4.98
n 12

Sampling error = x − µ

=4.98-5.5=-0.22

(c) Find the missing frequencies and complete the following table.

x f C.f Relative Cumulative frequency

2 2/15

4 1

6 7

8 3

10 15 1

As the relative frequency= class frequency/total

=2/15

So, First class has 2 frequency and in cumulative first class frequency is the first cumulative
frequency so first cumulative will also be 2

Now if we add 1 and 2 we will get 3 which is third cumulative frequency.

The difference between 7 and 3 is 4, so 4 will be the 3rd class frequency

Add 7 and 3 will give 10 which is the 4th cumulative frequency

And last cumulative frequency is the total no of all the frequencies the difference between 10 and
15 will generate 5 which is the last class frequency
BY dividing all the also frequencies we can obtain the relative frequencies.

x f c.f Cumulativ
Relative
frequency

2 2 2 2/15

4 1 3 3/15

6 4 7 7/15

8 3 10 10/15

10 5 15 15/15

Question 3: Marks:
=2+8=10

a) Can we find out the Median from the following data? If yes, write the reason (No need to
calculate the median).

Wages of workers in a factory

Monthly Income (Rs.) NO. of Workers

Less than 2000/- 100

2000-2999/- 300

3000-3999/- 250

4000-4999/- 50

5000 & above 1200

Sol:

Yes we can find the median from the data as median is the most appropriate measure of average
when data is in open ended class intervals.

(b) Compute Mean, Median and Mode from the following data.

No. of students 1 2 3 5 6

f 15 10 5 15 5
Sol:

No. of students(x) f fx c.f

1 15 15 15

2 10 20 25

3 5 15 30

5 15 75 45

6 5 30 50

Total 50 155

X=
∑ fx = 155 / 50
Mean= ∑f
= 3.1

Since n/2 =50/2=25 is an integer so, median will be the averages of (n/2)th value and
{(n+2)/2}th value,

n
median = ( ) th value
2
50
= ( ) th value
2
= 25 th value

and

n+2
median = ( )th value
2
50 + 2
=( )th value
2
= (52 )th value
2
= 26 th value

Now we check the 25th value and the 26th value in the cumulative frequency column and found
that these values lie corresponds to 2 & 3 respectively. So

Median= (2+3)/2
=2.5

Mode

As the data is discrete, so mode would be that value; which occur maximum no. of times in the
data set and here we have two modes 1 and 5, as they both occur equal no. of times in the data
set i.e. 15 times.
Assignment 2
Question 1: (Marks:
4x2=8)

Give the answer of short questions.

a) Why Quartile deviation is better than the Rang?

Range is only the difference between the minimum and maximum value. It gives no information
about the distribution between two ends of series and it is affected by outliers (highly extreme
values). Hence it can draw misleading/false picture of the observation.

The quartile deviation is superior to range as it is not affected by extremely large or small
observations. It covers the central 50% of values. It is also used in situations where extreme
observations are thought to be unrepresentative.

b) How standard deviation is better than mean absolute deviation?

Both are used to measure the dispersion of the data set and involve each and every data-value in
their computation. But in mean deviation, while using the absolute values we neglect the fact that
some deviations are negative and some are positive. We introduce a kind of artificiality in Mean
Deviation and because of that the further theoretical development or application of the concept is
impossible.

This problem is overcome by computing the standard deviation. This problem is overcome by
computing the Standard Deviation. We square the deviations in Standard Deviation rather than
taking absolute values of the deviations.

That’s why standard deviation is much preferred and widely used measure of dispersion.

c) What is the uselessness of Chebyshev’s Theorem?

A limitation of the Chebychev's theorem is that it gives no information at all about the
probability of observing a value within one standard deviation of the mean. That is when the
value of constant “k” is one. Although huge amount of data fall within µ ± σ , this can not be
explained by this theorem.
d) If coefficient of skewness = 0, then what would you say about the skewness of the
distribution?

If the coefficient of skweness = 0, then it is a symmetrical distribution. That’s mean, median and
mode of distribution is equal.

Question 2: (Marks:
4+8=12)

a) Show that the range is greatly affected by the extreme values; interpret the result.

996 999 9 997 995 1000 1014 1002 1001

Solution:

Given that

996 999 9 997 995 1000 1014 1002 1001

Then

Range=Xm-X0

=1014-9

=1005

Interpretation:

Observing the values closely, we find that value ‘9’ is significantly smaller than the rest of
values in the data set. And since range depends on this value too, this single value has caused the
range of the data set to be wider and it is presenting a misleading picture about the whole data.

b) The mean and the standard deviation of a set of values is 50 and 10 respectively. Compute
X ± 2 S and X ± 3S . Interpret the results in the light of (i) empirical rule (ii) Chebyshe’s
inequality.
Solution:

From the given information

X ± 2 S = 50 ± 2(10) = (30, 70)

X ± 3S = 50 ± 3(10) = (20,80)

(i) Empirical Rule:


• According to empirical rule, in a normal distribution, the interval X ± 2 S contains
95.45% values. So here we can say that the 95.45% of the data lies in the interval the (30,
70).

• According to empirical rule, in a normal distribution, the interval X ± 3S contains


95.45% values. So we can say that the 99.73 % values lie within interval (20, 80).

(ii) Chebychev’s inequality:


• According to Chebychev’s inequality, the interval X ± 2 S contains at least
 1   1  3
1 − 2  = 1 − 2  = = 75% of the observations. So we can say that by this rule, 75%
 k   2  4
values of given data lies in the interval (30, 70).

• According to Chebychev’s inequality, the interval X ± 3S contains at least


 1   1 8
1 − 2  = 1 − 2  = = 88.89% of the observations. So we can say that by this rule,
 k   3  9
88.99% values of given data lies in the interval (20, 80).

Question 3: (Marks:
5+5=10)

a. Find the first two moments about mean from the following data.

X= 34, 70, 42, 54, 40, 68, 56, 38, 36, 72

Solution:

To find the moments about mean we have to find the mean of the data.
X X −X (X − X )2

34 -17 289

36 -15 225

38 -13 169

40 -11 121

42 -9 81

54 3 9

56 5 25

68 17 289

70 19 361

72 21 441

0 2010

Mean:

ΣX 510
X= = = 51
n 10

Firs moment is given by


∑ (x i − x )
m1 = =0
n

Second moment is given by


∑ ( xi − x )
2

m2 =
n
2010
= = 201
10

b) Calculate Bowley’s coefficient of skewness from the following information.


Q1 = 34.087156

Q3 = 44.962963

Xɶ = 39.606382

Solution:

Bowley’s co-efficient of skew ness:

(Q1 + Q3 − 2 Median)
Sk =
Q3 − Q1
34.087156 + 44.962963 − 2(39.606382)
Sk =
44.962963 − 34.087156
−0.162645
Sk =
10.875807
Sk = −0.014954752
Assignment 3
Question 1: Marks:
3+3+4=10

a) For a particular data with five pair of values:

∑Y 2
= 26, ∑ Y = 10, ∑ XY = 37

The fitted line is y = -1+0.5x

Find the standard error of estimate ( s yx )

Solution:
s yx =
∑Y 2
− a ∑ Y − b∑ XY
n−2

26 − ( −1)(10 ) − ( 0.5 )( 37 )
=
5−2

26 + 10 − 18.5
=
3

17.5
= = 5.833 = 2.415
3

b) Two equations of the least square regression lines are given by

Y= 2.64 + 10.83 X
And
X= -1.91 + 6.18 Y
Are these lines possible for any data set? Explain your answer:

Solution:

These lines are possible only if the square root of the product of two slopes “r” lies
between -1 and +1. The correlation coefficient “r” in this case is given blow.
r = byx × bxy
r = 10.83 × 6.18
r = 66.93 = 8.18 > 1

So these lines are not possible for any data sets.

c) Two dice are rolled. Make a sample space also find the probability that

i. The sum of the outcomes is equal to 10.


ii. The sum of the outcomes is equal to 7.
iii. The sum of the outcomes is equal to 1.
Solution:
S= {(1, 1), (1, 2), (1, 3), (1, 4) (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
n(S) = 36

Let A be the event that sum of the outcomes is equal to 10.

A = {(4, 6), (5, 5), (6, 4)}

n( sum 10) 3
P (Sum is A) = = = 0.0833
n( S ) 36
Let B be the event that sum of the outcomes is equal to 7.

B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}

n( sum 7) 6
P(B) = = = 0.167 = 6/36
n( S ) 36
Let C be the event that sum of the outcomes is equal to 1.

C = {φ }

n( sum1) 0
P(C) = = =0
n( S ) 36

Question 2: Marks:
4+6=10
a) If S= {1, 2, 3, 4, 5, 6}, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, then verify whether A and B
are independent?

Solution:

AS S= {1, 2, 3, 4, 5, 6}, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, then

For independent events

P ( A ∩ B ) = P ( A) × P ( B )

So we will check this condition

A ∩ B = {3, 4}
P (A ∩ B) = 2/6
P (A) = 4/6
P (B) = 4/6
Since,
P (A) x P (B) = 4/6 x 4/6
P (A) x P (B) = 4/9
P (A) x P (B) ≠ P (A ∩ B)
Hence A and B are not independent.
b) Indicate whether the following statement is true or false for three mutually exclusive
events A, B and C. Justify your answer.

1 2 1 1 1
P( A) = , × P( B) = and × P(C ) =
6 3 6 4 6

Solution:

Given that
1
P( A) =
6

And
2 1
.P( B) =
3 6
1 3 3
⇒ P( B ) = × =
6 2 12
3
⇒ P( B ) =
12
Now
1 1
.P ( C ) =
4 6
1 4 4
P (C ) = × =
6 1 6

For three events to be mutually exclusive there sum must be equal to one
1 3 4 13
P ( A) + P ( B ) + P (C ) = ( ) + ( ) + ( ) = ≠1
6 12 6 12

Hence we can say that the given statement is not true.


Question 3: Marks:
2+8=10

a) If we draw a card from an ordinary deck of 52 playing cards. Can king and diamond be
mutually exclusive events? Give reason to support your answer.
Solution: The both events can not be mutually exclusive because if we draw a card from an
ordinary deck of 52 playing cards it can be both a king and a diamond. So they are not
mutually exclusive events.

b) A marble is drawn at random from a box containing 10 red, 30 white, 20 blue and 15
orange marbles.
Find the probability that the drawn marble is

i. orange or red
ii. not – ‘red or blue’
iii. not blue
iv. red, white or blue.

Solution:
Red marbles White marbles Blue marbles Orange marbles Total
10 30 20 15 75

Total number of possible ways to draw a marble = ℂ175 = 75


15 + 10 1
i. P(marble is orange or red) = = = 0.33
75 3
30 + 15 45 3
ii. P(marble is not – ‘red or blue’) = = = = 0.60
75 75 5
10 + 30 + 15 55 11
iii. P(marble is not blue’) = = = = 0.73
75 75 15
10 + 30 + 20 60 4
iv. P(marble is red, white or blue) = = = = 0.80
75 75 5
Assignment 4
Question 1: Marks:
3+7=10

a)

Find mean from the following probability distribution.

No. of Petals
P(X)
X
x1 = 3 0.05
x2 = 4 0.10
x3= 5 0.20
x4 = 6 0.30
x5 = 7 0.25
x6 = 8 0.075
x7 = 9 0.025
Total 1

Sol:

No. of Petals XP(X)


P(X)
X
x1 = 3 0.05 0.15
x2 = 4 0.10 0.4
x3= 5 0.20 1
x4 = 6 0.30 1.8
x5 = 7 0.25 1.75
x6 = 8 0.075 0.6
x7 = 9 0.025 0.225
Total 1 5.925

The mean of this distribution is:

µ = E(X) = ∑XP(X) = 5.925 ≅ 5.9.


b) A random variable X has the following probability distribution:

X P(X)
-2 0.1
-1 k
0 0.2
1 2k
2 0.3
3 3k

Find
(i) K (ii) P(X<2) (iii) P (X≥2).

Sol

X P(X) P(X)
-2 0.1 0.1
-1 k 0.0667
0 0.2 0.200
1 2k 0.1333
2 0.3 0.3000
3 3k 0.2000
Total 0.6+6k 1.000

∑ P ( X ) = 0.6 + 6k
AS
∑ P( X ) = 1
6k = 1 − 0.6 = 0.4
6k = 0.4
k = 0.4 / 6
k = 0.0667

(ii) P (X<2) = P(X=-2) + P (X=-1) + P (X=0) + P (X=1)


P(X<2) = 0.100 + 0.0667 + 0.2 + 0.1333
P(X<2 ) = 0.5000
(iii) P(X≥2) = 1- P (X<2) = 1 - 0.5 = 0.5

Question 2: Marks:
2+2+6=10

a) If E(x) =4, and E(y) =1, then Find E (2x+5y).

Sol

E (2x+5y) = 2 E(x) + 5 E(y)

= 2 (4) + 5 (1)

=8+5

= 13

b) Form the following table of x and y , Find h(0).

Joint Probability Distribution


Y
P(X = xi)
0 1 2
g(x)
X
0 3/28 6/28 1/28
1 9/28 6/28 0
2 3/28 0 0
P(Y = yj)
h(y)
Sol
2
h(0) = ∑ f ( x,0)
x =0

3 9 3
h(0) = + +
28 28 28
15
h(0) = = 0.5357
28

c) Let X and Y are two discrete r.v.’s with the following joint probability distribution:

x
1 2
y
1 0.10 0.15
2 0.20 0.30
3 0.10 0.15

Find E(X), E(Y).

Sol

x
1 2 h(y)

1 0.10 0.15 0.25

2 0.20 0.30 0.50

3 0.10 0.15 0.25


g(x) 0.4 0.60 1

E ( x ) = ∑ xg ( x ) = 1× 0.4 + 2 × 0.60 = 1.6


E ( y ) = ∑ yh ( y ) = 1× 0.25 + 2 × 0.50 + 3 × 0.25 = 2

Question 3: Marks: 10

Let x and y have the joint probability distribution given by


xy
f ( x, y ) = x = 2, 4, 5; y = 1, 2, 3
66

Find
(i) Joint Probability distribution table
(ii) Marginal probability function of X and Y,
(iii) Are X and Y are independent.

Solution:

Joint Probability distribution table

x 1 2 3

2 2/66 4/66 6/66

4 4/66 8/66 12/66


5 5/66 10/66 15/66

(ii)

Marginal probability function of x:


3
xy x 2 x 3x x
g ( x ) = ∑ f ( x, y ) = ∑ = + + = for x = 2, 4,5
y y =1 66 66 66 66 11

Marginal probability function of y:

5
xy 2 y 4 y 5 y y
h ( y ) = ∑ f ( x, y ) = ∑ = + + = for y = 1, 2,3
x x = 2 66 66 66 66 6

(iii)

For independence f(x,y) = g(x).h(y)

x y xy
Now g ( x ) .h ( y ) = × = = f ( x, y )
11 6 66

So, x and y are independent.


Assignment 5
Question 1: Marks:
2+3+5=10

a) When you consider poisson distribution as the limiting form of the binomial distribution?

Solution:
It is a limiting approximation to the binomial distribution, when p, the probability of success is
very small but n, the number of trials is so large that the product np = µ is of a moderate size.

b) The mean and standard deviation of the population is 30 and 5 respectively. The probability
distribution of the parent population is unknown, find the mean and standard error of the
sampling distribution of X when n=50

Solution:
Given is
µ = 30,σ = 5 and n = 50

As we know that

µX = µ
⇒ µ X = 30

And standard error is given by


σ
S .E ( X ) = σ X =
n
Putting values
5 5
= = = 0.707
50 7.07

c) Ten vegetables cans, all of the same size, have lost their labels. It is known that 5 contain
tomatoes and 5 contain corns. If 5 are selected at random, what is the probability that all contain
tomatoes? What is the probability that 3 or more contain tomatoes?

Solution:

Given data can be arrange as


Tomatoes cans Total cans Corn cans Selected cans

K= 5 N= 10 N-K= 5 n=5

Let X denote the number of tomatoes cans then hypergeometric distribution is given by

k  N − k 
  
x n−x 
P ( X = x) =   
N
 
n 

Probability that ALL contains tomatoes:

C55C05 1
P( X = 5) = 10
= = 0.00397
C5 252

Probability that 3 or more contain tomatoes:

C35C25 C45C15 C55C05


= + 10 + 10
C510 C5 C5
100 25 1
P ( X ≥ 3) = P ( X = 3) + P ( X = 4) + P ( X = 5) = + +
252 252 252
126 1
= =
252 2

Question 2: Marks:
3+7=10

a) Define sampling with replacement and sampling without replacement.

Solution:
Sampling with replacement: Sampling is said to be with replacement when from a population a
sampling unit is drawn, observed and then returned to the population before another unit is
drawn.
In sampling with replacement, an element can be selected more than once.
Sampling without replacement: Sampling is said to be without replacement when from a
population a sampling unit is drawn and not returned to the population before another unit is
drawn.

In sampling without replacement an element can be selected only once.

b) A finite population consists of values 6, 6, 9, 15 and 18. Calculate the sample means for all
possible random samples of size n=3, that can be drawn from this population without
replacement. Make the sampling distribution of sample mean and find the mean and variance of
this distribution.

Solution:

Given data is

Population: 6, 6, 9, 15 and 18.

N=5, n=3

Number of possible samples (without replacement) N


Cn = 5C3 = 10

Now the sample and their means are as below

No Samples x = ∑x/n

1 6,6,9 7

2 6,6,15 9

3 6,6,18 10

4 6,9,15 10

5 6,9,18 11

6 6,15,18 13

7 6,9,15 10
8 6,9,18 11

9 6,15,18 13

10 9,15,18 14

Now for the sampling distribution of x

x f f (x) xf ( x ) x2 f (x)

7 1 1/10 7/10 49/10

9 1 1/10 9/10 81/10

10 3 3/10 30/10 300/10

11 2 2/10 22/10 242/10

13 2 2/10 26/10 338/10

14 1 1/10 14/10 196/10

Total 10 1 108/10 1206/10

µ x = ∑ xf ( x ) = 108 /10 = 10.8


σ X2 = ∑ x 2 f ( x ) −(∑ xf ( x )) 2 = 1206 /10 − (10.8 ) = 3.96
2

Question 3: Marks:
2+2+6=10

a) Find the value of maximum ordinate of the standard normal curve correct to four decimal
places.

Solution:

Since the standard normal probability density function is symmetric about zero, its maximum
ordinate is at Z=0
1 − (0)2 / 2 1
= e =
2π 2.507
= 0.3989

b) If Z is a standard normal variable with mean 0 and variance 1, then find the Lower quartile.

Solution:

P( Z < Q1 ) = 0.25
φ (Q1 ) = 0.25
(Q1 ) = φ −1 (0.25)

Using the table

Q1 = −0.6745

-OR- Alternatively, We can find this way:

As we know that

Q1 = µ − 0.6745σ

Putting value

Q1 = 0 − 0.6745(1)
Q1 = 0.6745

c) Let X 1 , X 2 , X 3 be a random sample of size 3 from a population with mean µ and variance σ
2

Consider the following two estimators of the mean

X1 + X 2 + X 3
T1 =
3
X1 + 2 X 2 + X 3
T2 =
4

Which estimator should be preferred?


Solution:

First we examine which one among T1 & T2 is unbiased. If ONLY one of them is the unbiased
we can prefer it as a better estimator. If both of them are unbiased then we have to compare their
variances. The estimator with least variance will be the preferred.
So let’s first see unbiasedness:

T1 is sample mean X , which we know is unbiased.

And for T2

 X + 2X2 + X3 
E (T2 ) = E  1 
 4 
1 4µ
E (T2 ) = E ( µ + 2 µ + µ ) = =µ
4 4

So T2 is also unbiased.

Since both estimator are unbiased, NOW we have to check there variances.

 X + X2 + X3  1
Var (T1 ) = Var  1  = 9 [Var ( X 1 ) + Var ( X 2 ) + Var ( X 3 ) ]
 3 
1 3σ 2
σ2
= σ 2 + σ 2 + σ 2  = =
9 9 3

 X + 2X2 + X3  1
Var (T2 ) = Var  1  = 16 [Var ( X 1 ) + 4Var ( X 2 ) + Var ( X 3 ) ]
 4 
1 6σ 2
3σ 2
σ 2 + 4σ 2 + σ 2  = =
16 16 8

Comparing both variances

1 3
< so
3 8
Var (T1 ) < Var (T2 )

Hence we conclude that since T1 is unbiased as well as has low variance, so this estimator T1 is
better then T2.
Assignment 6
Question 1: Marks:
5x2=10

Give the answer of short questions.

1) How can we determine the three possible locations of rejection region?

Solution:
If H1 :θ < θ 0
(i) Then the test is left-tailed test, and the rejection region is located in the left tail of the
distribution.
If H1 :θ > θ 0
(ii)
Then the test is right-tailed test, and the rejection region is located in the right tail of the
distribution.
If H1 :θ ≠ θ 0
(iii)
Then the test is two-tailed test, and the rejection region is located equally in both tails of the
distribution.
2) If α = 0.10, how many intervals would be expected to contain µ ?

Solution:

We would expect about 90% of all such confidence intervals to contain µ and 10% to miss µ , in
the repeated sampling.

3) What does role the sample mean play in a two-sided confidence interval for µ based
on a random sample from a normal distribution?

Solution:

The sample mean is the mid point of the confidence interval but has no effect on the length of the
confidence interval.

4) In which situation we may replace σ 2 by S 2 ?

Solution:

In case of a large sample, drawn from a population with unknown population


variance σ , we may replace σ 2 by S 2 .
2

5) If an automobile is driven on the average no more than 16000 Km per year, then
formulate the null and alternative hypothesis.

Solution:

H 0 : µ ≤ 16000 km
H1 : µ > 16000km

Question 2: Marks:
2+2+6=10

a) The average yield of corn of variety A exceeds the average yield of variety B by at least
200 Kg per acre, formulate null and alternative hypothesis.

Solution:

H 0 : µ A − µ B ≥ 200 kg
H1 : µ A − µ B < 200 kg

b) When we use one-sided test and two-sided test?

Solution:

If the value of parameter is fully specified (i.e. H1 : µ ≠ µ0 ), we use two tailed test. If the
parameter of the distribution is not specified ( H1 : µ > µ0 or H1 : µ < µ0 ) then we use one sided
test.

c) In a poll of college students in a large university, 300 of 400 students living in students
residences (hostels) approved a certain course of action, whereas 200 of 300 students not
living in students’ residences approved it. Compute the 90% confidence interval for the
difference of proportions.

Solution:

From the data, sample proportions are:


300
Pˆ1 = = 0.75
400
200
Pˆ2 = = 0.67
300

90% C.I. for p1-p2:

pˆ1 qˆ1 pˆ 2 qˆ2


( pˆ1 − pˆ 2 ) ± Zα 2 +
n1 n2

pˆ1 qˆ1 pˆ 2 qˆ2


( pˆ1 − pˆ 2 ) ± (1.645) +
n1 n2

or 0.08 ± (1.645)
(0.75)(0.25) + (0.67)(0.33)
400 300

or 0.08 ± (1.645) (0.0347)


or 0.08 ± 0.057
or 0.023 to 0.137

Question 3: Marks:
5+5=10

a) The Punjab Highway Department is studying the traffic pattern on the G.T. Road near
Lahore. As part of the study, the department needs to estimate the average number of
vehicles that pass the Ravi Bridge each day. A random sample of 65 days gives x = 5010
and s = 650. Find the 90 percent confidence interval estimate for µ, the average number
of vehicles per day.

Solution:

x = 5010,s = 650, n = 65 and Z0.05 = 1.645.


1 − α = 0.90  6 50 
501 0 ± (1 .64 5) 
 65

α = 0.1
α = 0.05
2
Zα = Z 0.05 = 1.645
2

The 90% confidence interval for µ is

s
x ± zα 2
n

650
5010 ± 1.645
65

or 5010 ± 132.62

or 4877.38 to 5142.62

or, rounding the above two figures correct to the nearest whole number, we have :

4877 to 5142

b) Mr. Ali wants to run election for City Government. After a strong election campaign, Mr.
Ali’s staff conducts their own poll over the weekend prior to the election. The results
show that for a random sample of 500 voters 290 will vote for Mr. Ali. Develop a 95
percent confidence interval for the population proportion who will vote for Mr. Ali
using α = 0.05 .

Solution:

From the data the sample proportion is 290/500 = 0.58.

The 95% Confidence Interval for p is:

p̂(1 − p̂ )
p̂ ± z α / 2
n
0.58(1 − 0.58)
= 0.58 ± 1.96
500
= 0.58 ± 0.043
= (0.537, 0.623)

The end points of the confidence interval are 0.537 and 0.623. The lower point of the confidence
interval is greater than 0.50. So, we conclude that the proportion of voters in the population
supporting Mr. Ali is greater than 50 percent. He will win the election, based on the polling
results.