Malaysia

PAHANG

I-

En

FINAL EXAMINATION

COURSE

APPLIED STATISTICS

COURSE CODE

LECTURER

NOR HAFIZAH BINTI MOSLIM

AZLYNA BINTI SENAWI

MOHD RASHID BIN AB HAMID

NOR AZILA BINTI CHE MUSA

NOOR FADHILAH BINTI AHMAD RADI

DATE

5 JUNE 2012

DURATION

3 HOURS

SESSION/SEMESTER :

PROGRAMME CODE :

INSTRUCTIONS TO CANDIDATES

1.

2.

3.

4.

This question paper consists of SEVEN (7) questions. Answer all questions.

All answers to a new question should start on new page.

All the calculations and assumptions must be clearly stated.

Candidates are not allowed to bring any material other than those allowed by

the invigilator into the examination room.

EXAMINATION REQUIREMENTS:

Statistical Table

1.

Scientific Calculator

2.

DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO DO SO

This examination paper consists of FIFTEEN(15) printed pages including front page.

CONFIDENTIAL

QUESTION 1

An article in the Journal of Strain Analysis compares Karlsruhe and Lehigh methods for

predicting the shear strengths for steel plate girders. Data of these two methods, are

shown in Table 1.

Girder

Gi

G2

G3

G4

G5

G6

G7

G8

G9

Karlsruhe

Method

1.186

1.151

1.322

1.339

1.200

1.402

1.365

1.537

1.559

Lehigh

Method

1.067

0.992

1.063

1.062

1.062

1.178

1.037

1.086

1.052

(a)

Find the mean and standard deviation for the difference of methods in Table 1.

(3 Marks)

(b)

Find a 98% confidence interval for the mean difference in shear strengths between

Karlsruhe and Lehigh methods.

(4 Marks)

(c)

Is there any mean difference between the two methods? By assuming the data is

normally distributed, test the hypothesis at 5% level of significance.

(7 Marks)

CONFIDENTIAL

QUESTION 2

The variability in the thickness of the oxide layers is a critical characteristic of the

semiconductor wafers. Low variability of the oxide thickness is desirable for subsequent

processing steps. Two different mixtures of gases are being studied to determine whether

one is superior in reducing the variability of the oxide thickness. Twenty one wafers are

etched in each gas. For gas A, the mean of oxide thickness is 10.05 angstroms and

standard deviation is 1.96 angstroms while for gas B the mean is 13.22 angstroms and

standard deviation is 2.13 angstroms.

(a)

Find a 98% confidence interval for population mean for mixture of gas B and give

the interpretation of the parameter estimate.

(5 Marks)

(b)

Determine whether the two mixtures of gases have different in variability of oxide

layers thickness at the 0.1 level of significance.

(8 Marks)

(c)

Can we conclude that the mean mixture of gas A is not more than gas B at 10%

significance level? Your assumption of the condition of population variances is

based on your answer in (b).

(8 Marks)

QUESTION 3

A researcher wishes to see whether there is any difference in the weight gains of athletes

following one of three special diets. Athletes are randomly assigned to three groups and

placed on three different diets for six weeks. The weight gains (kg) are shown below.

Type of diet

Diet

Diet

Diet

2

4

1

(a)

(1 Mark)

(b)

Based on the data in Table 2, is there any treatment effect between the type of

diets at 5% level of significance?

(16 Marks)

QUESTION 4

(a)

What is the different between simple linear regression and multiple linear

regression?

(1 Mark)

(b)

n=1O, S,=929.98 and S=389.93.

(i)

value.

(4 Marks)

(ii)

Complete the ANOVA table below and test the hypothesis that the

linearity of the regression line at a = 0.05 significance level.

Source of

variation

Sum of squares

Degrees of

freedom

Mean of

squares

F lest

Regression

Residual

Total

929.98

Table 3: ANOVA

(12 Marks)

QUESTION 5

A study was performed to investigate the car performance for car models produced by the

U.S., Japan, Germany and Sweden between 1978 and 1979. The fuel mileage in

kilometers per liter is believed to be related to car weight (in thousand kilograms), driveratio and horsepower. A multiple regression analysis is conducted to determine the

multiple linear regression equation which gives the best fit to the data. The following

show the Excel outputs of the multiple regression analysis for the study.

SUMMARY OUTPUT

Regression Statistics

0.9031

Multiple R

0.8155

RSquare

0.8104

Adjusted R Square

2.9508

Standard Error

38!

Observations

ANOVA

1

36

37

Residual

Total

SignficanceF

8.88939E-15!

8.12711

292 5752

1586.0908;

Lower 95%

52.1665.7:

-7.0199

447452

-9.7093.

52 6697

-7.0199

447452

-9.7093!

0.0000;

0.0000

249311

-12.5159

19537

0.6630

4871

-8.36.

P-value

tStat

Intercept

Weight

MS

SS

df

Regression

SUMMARY OUTPUT:

Regression Statistics

Multiple

RSquare

Adjusted RSquare

Standard Error

.

Observations

0.4172

0.1741

01511

6.0323

38.

..

.

.

ANOVA

df

Regression

Residual

Total

1:

36

37

276.1009

13099899

15860908

Intercept

8.44

Drive Ratio

5.28

MS

SS

6.0065

1.9158

276.1009

353886

tStat

7.5876:

I P-value

1.4046

2.7546.

0.1687

0.0092

Significance F

0.0092

..

-3.7453

20.6181!

20.5181

-3.7453

9.1624

1.39171

1.3917

9.1524.

Lower 95%

SUMMARY OUTPUT

Regression Statistics

0. 8713

0.7591

0.7524

3.2.1576:

38

'MultipleR

RSquare

re

Adjusted RS

Standard Error

Observations

:ANOVA

df

Regression

Residual

Total

2.1270

0.0203

46.71!

-0.22

Intercept

Horsepower

Significance

MS

SS

332.0377 10.61211

15860908

1

36

37

1.11961E-12!

La wer 95%

P-value

tStat

21.9587

-10.6517

0.6000

0.0000;

42.3928

-0.2568!

42.3928

-0.2568

51.0205

-0.1746

51.0205

-0.1746

SUMMARY OUTPUT

Regression Statistics

Multiple

R Square

Adjusted Rsquare

Standard Error

Observations

0.9858!

0.8945:

0.6885

2.1864

38:

.

ANOVA

df

Regression

Residual

Total

:

:

2!

35

37

70.92

-10.83!

-4.90!

MS

SS

Coefficients

Intercept

Weight

Drive Ratio

Signficonce F:

..

167.3079, 4.7802,

1536.0908

.

Standard Error

P-value

tStcit

4.5904: 15.4495

0.7006! -15.4610!

0.9566; -5.1191 ;

0.0000

0.0000

0.0000

Lower 95%

51.6001,

-12.2538

-6.8392.

Upper 95%

80.2380

-9.4093;

-2.9551

61.6001

-12.2538

_6.8392

80.2330

-9.4093

19551

SUMMARY OUTPUT

Regression Statistics

0.9G951

Multiple R

0.82721

RSquare

0.8173:

Adjusted R Square

27986

Standard Error

38

Observations

ANOVA

df

SS

1.9240

4894

1.6338

-6.06.

0.0437

-0.07

Intercept

Weight

Horsepower

MS

83.7553

78321

2741249

1586.0908.

2.

35

37

Regression

Residual

Total

P-value

tStat

0 0000

254379

0.0007

-3.7119

0.1338,

-1.5348;:

Significance

14

4.55362&14

Lower 95016

528479

450361

52.8475

45.0361:

2.7477

-9.3814

-2.7477

-9.3814:

0.0216.

-0.15571

0.0216

-0.1557

SUMMARY OUTPUT

Regression Statistics

Multiple

RSquare

Adjusted R Square

Standard Error

Observations

08793

07732

07602

3.2051

38.

ANOVA

df

Regression

Residual

Total

Intercept

Drive Ratio

Horsepower

2

35

37

MS

SS

1226.3751 613.1875

359.7157 10.2776

1586.0908

5 7676

54.63

1.2597E

1 86

0.0247

-0.24.

i

SignficanceF:

F

5.29113E-12

59.6626

P-value

tStat

9.4714

00000

0.1495

-1.4737 :

0.0000

-9.6157

Lower 95%

663359

42 9182

56 3359

42 9182

07009

44140

0.7009,:

-4.4140,

-0.1871.

-0.2872.

-0.1871

-0.2872

SUMMARY OUTPUT

Regression Statistics

Multiple R

Rsquare

Adjusted R Square

Standard Error

Observations

0.9482

0.8991

0.8902

2.1595

38

ANOVA

3

34

3T

70.28

928

472

-0.04:

Significance F

150.0333 4.7069:

1585.0908

Intercept

Weight

Drive Ratio

Horseoower

MS

SS

df

Regression

Residual

Total

tStat

P4'alue

4.5838: 15.3326i

14255 65133

09595 49234

0.0342; -1.2432 :

0.0000

00000

0.0000 !

5.26413E17

Lower 95%

60.96611

121813

-6.6736;

-0.11211

79.5969

63875

-2.7739

60.9561

121813

-6.67361

-0.1121;

(a)

(2 Marks)

(b)

(7 Marks)

(c)

Hence, determine the best regression equation for predicting the mileage (kilometer

per liter) value.

(2 Marks)

IN

79.5969

63876

27739

0.0270

QUESTION 6

The goals scored per match by MyKid football team gave the following results:

Number of matches

1

IL_L 18

0

3 1 4

2

29 1 18 1 10

7

1

Test whether the number of goals per match follows a Poisson distribution at 10%

significance level.

(12 Marks)

QUESTION 7

In a study of the television viewing habits of children, a developmental psychologist

selects a random sample of 300 primary students; 100 boys and 200 girls. Each student is

asked which of the following TV programs they preferred most; Word World, Dibo the

Gift Dragon or Mickey Mouse Clubhouse. Results are shown in Table 5 below.

Boys

Girls

Column total

Viewing Preferences

Mickey Mouse

Dibo the Gift

Word World

Clubhouse

Dragon

20

30

50

70

80

50

90

110

100

Row total

100

200

300

At a = 0.01, is there enough evidence to support the claim that the proportions of

viewing preferences for boys are equal for each of the three TV programs?

(8 Marks)

10

Confidence Intervals, Sample Sizes and Hypothesis Testing

Hypothesis testing for p

I.

x_z

a/ 7;

2

X - Z

Xt

X+Zai7=J

, X + Za/2

Ztt/T

Ztest

7J

a12vJ_ , X+ta/2v_j=J

where

ttest =

v=n-1

i2),u0

2

Ztest

JcY22

cT

Foro^o:

Foro-^o:

(i 2)u0

____

Zte

fl

11

L

2

111

fl2

For o

For o

#

2)u0

F2

Vi

12

2\

where v

n1

n2 )

n2

n)

n1-1

S2 2

n2-1

For o = o:

o- :

Ztest =

For cr

2s\2

n2)

n2 -1

'2

In1

where v =

12 1fl2

ni)

n1 -1

41+

V'i

X2)tai2v /L+L

I (

For o

ttest

)Z

12 sj-_-+_1_

x2)P0

J

For cr = o:

(yi

2)ta12v

test

where

ri^ + n2

v=n1+n2-2

Pooled estimator, s

/(n-1)s+(n2-1)s

n1+n2-2

Hypothsis Testing for PD

d _z/iTd

,,,Fn

ttest

S D

t\

SD)

Za//=s +Za/T)

12

d-,u D

SD/'.I

where v = n -1

n-I

f'

a/2,n1

[JP(1_P) ,

-_______

Ztest

1-7r0)

p+ai2ifr(1_

]

7t2

If ,r

[(p1 _p2)Za12

(1

)+

(p, p2 )r0

0 1 Zt.,

= o,

___________

it1 (1irk ) 92(1-7r2)

Z. =

+ n2

PP (I --PP

F

where pp=

x1 + x 2

fl1 + fl2

((n_1 )s 2 (n _l)s2

II X.2

' 2

k

(n-1)s

Z1-a/2,v )

/2,

where

II

Zest

co

v=n-1

U2

1

2

S2 faizvi,v2

where

' S2 2

fa12

V2 V1 I

)

v1 =nj 1

v2= fl2 1

.test - 2

S2

Sample sizes

n = ['

n=p(1_p)2-J

0 J2

13

a2

Two-way ANOVA

One-way ANOVA

SST =

k

SST = >jx

ar

1=1 j1 k=1

1

__x2

H jl

1

___x2

SSA =--x2

br 1=1 '

abr

___x2

SSB= - x.

ar 1

11

fl

iab

ar

1

abr

r

1=1 j1

SSE = SSTSS(Tr)

Test using Contingency Tables

n,.

1.

E.

2' (0 i E)

L4

1=1

fl

Xtest _

Hypothesized distribution DoF;

c(O_E)

ZtSt

V

=k - p 1

Ejj

Simple Linear Regression and Correlation

r-

[Exi)t i J

S,=x1y1 '

i=1

.'jsxxs.y

SXY

[nJ2

______

S=Jx '

1=1

S,=y

fl

1=1

and

+/3 x where /3 =

xx

14

fl = -

1=1

ttest

s.e(flo)

_i-i

s.e(i)

y2

F

MSR- !+T

FMSR-

SSR

MSRes_[

SSR/31SXY

15

n-2

