You are on page 1of 30

In this chaptcr, studcnts will lcam:

(a) conccpts ofscattq diagram, correlation coellicient and linear regression;

(b) calculation and jnterprctation ofthe product monrcnt correlation coefficient and oflhe
cquation ofthc lcast squares regression line;
(c) intcrlolationandcxtrapolation;
(d) usc ofa squarc, rcciprocal or logarithmic transfomation to achicve linearity.

Notc: Studcnts arc not rcquircd to lcanr

(r) clerivationoflirrrnulac;
(b) hypothcsis tests

L Bivariate Data and Scattcr Diagrams

2. Producl Moment Correlation Coellioient, /
3. Regrcssion Lincs
4. Interpolation and Extrapolation
5. Linea|ization of Bivariatc Data
6. Miscellaneous Examples

l. Bivariate Data and Scatter Diagrams

The type oldata with each obscrvation having two measurenelts associated with it is callecl a
bivariate data.

trgl
.v
Age ofa planl Quantity of fruit produced
Height of students Weight of students
wcight at the end ofa spring Lcngth ofthe spring
Ditunet(l ufstem of a plant Average leigth of leafof the plant
No. ofhrs spent studying Marks achieved
'firne 'l emperature ofcooling object

c8-l
Scatter Dingram
The most common and convcnicnt mcthod ofdisplaying a sel clfbivariate data is by means oi a

scatler diagram.

Wc trcat thc bivariatc pairs as sct ol(r, r) coordinalcs and plot thcm as a graph io obiain a scl ol'
points. Thlr scattcr diagram will revcal thc rclationship bctwccn thc two variablcs.

Eg 2 The marks ola class of l0 studcnts jn a Mathernatics cxamiratjon are give,l in the tablc

Student R (' D F] F (i H

12 84 50 42 33 50 69 8l 5o :15
onark in Paper 1)
v '/3 ,10
31 u3 42 60 63 59 92
(mark in Paper 2)

Use of GC to obtain Scatter Diagram

CC:
Step l: Entcr data
<stat><Edit><enteF

Bnter values of.n in Ll and values of) in L2.

LltlI =
To clear column in list: Move cursor to Lt and press
TE
E3
59
9E

121t1) =
Before plotting,
. 'Y:" screen i unbold all ":" signs

(values in Lr) Xmi[: smallest r value Xmax : maximum r. value

(values in L2) Ymin = smallesty value Ymax=maximumvvalue

c8-2
Step 2i Plot the data PlntZ Pl+tf
(sr',\'r'PLo1) Dr-{'
'JFE: E La fu
rt}'. {IF l-/
. Sct Plot to 'ON'
I iEL: Lr
<I :Plot 1...><cntcr>. ON>.:cntcr>
I isL: Lr
' Choosc type ofgraph 'scattcr plot' EFK: E +
' Xlist :Lr (-r-coordinates)
. Yljst: L2 L| coordinatcs) 1:Ll.'LZ
. Mark: Any
. .:TRACE>
, <Zoom><9:Zoom Stal':'
. (to vicw full plot)

Whcn fjqplA!1gdjg?Ia!l{, a scatte. diagran is obtained as shown below:

lrlrl iv
90
80
/rl
EO

50
4rl
3rl
2D

10
0 -T
100

c8-3
Do it yourself

Qn l: The height and weight ofa class ofl0 students are given in the table below:

Sketch a scatter diagram for the set ofbivariate data given-

Student B c D E F G H I J

1.5 I.58 1.6 1.61 1.65 1.72 1.73 r.78 1.8 1.85
(Height in m)

v '70 '75
53 5'7 62 65 66 72 90 85
(weisht in ks)

Soln:

c8-4
Analvsis of Scatter Diagram

X
XX Xand I related in this way are said 1() have a
corelation.
xXX)0(
X
(linear relationship)
X i.e. as n gets larger, y gets larger
XX
X

XX
X corelation.
XX
X XX (linear relationship)
X X
X i.e. as r. gets larger,l, gets smaller
XX

XX n

iYtr
XX
correlation
(No clear relationship)
1XX X

c8-5
Wc rvi ll only be dealing with Iinear rclationship. I f points in the scatter didgram seem to lie near a

straight iinc. we say that lhere is linear correlation bctween -r and I.

Notc:
. Scatter diagrarns arc uscd only for quantitative variables (i.e. height, mass, counts, ctc).

. Scatter cliagrams can givc us a visual evidence ofoutliers.

.lnterpretationofthcstrcngthsolclybasedonlhescallerdiagramissubjectivcanditcanbe
cleceiving when clifl_erent scales for the axcs arc used.

2. Linear Product ryJomcnt Correlation Coefficient /

To measure the degree oflincar rclationship betweeD two variables r and 1 (which is called
correlation), a quantity called the , will be needed.
The estimatcd product-moment correlation coef'licient ofa sample is given by:

EC;Ito-/-t'
Found in MF 15

while y denotes the mean ofall they-values.

Importnnt notes on the product-momcnt colrelation co€fficient, /
l. 1! r 11. reR
2. Sign o1 r intlicates the direction ofliirear corelation.

r 0-
> posilivecol_relation

3. Thc 1nagnitudc ofr" indicates thc strength ofthe lincar correlation-

.1. r only represents the (Le.g!99-qi]j49ar corrclatlo! but a hjgh corrclalion does not
necessarily imply one directly causes the othcr. lt is a nurn€.ical measule ofthe
strenglh ofthe linear relationship belrveen 2 variables.

5. r-= 0 mcans no lincar rclationship but it d-qcs not nccessarilv

E.g. 'l hc correlation bctwccn thc two scts ofvariables is saicl to be curvilinear. There is

no linear conelation and such a scatter diagram rvill give a vcry low valuc ofr (i.c. r
= 0). But therc is a curvilinear kind ofconelation or quadratic correlation bctwec:rr
the variahles

6. r is a lneasure of the degree of scatter and ," is indcpendent of the units in which the
data is measured- r is unalGcted bv chanscs in the scale of the axes and chanoes of
units of the variables.

Value ofr Indication

Pcrfcct positive linear correlation betwcen variables.
All poinls (r, /) lie on a linc with pesilygjlepgr4djgq!.
Perfect negative linear coarelation between variables.

c8-7
X
XX v XX ,.X
vX
xxx., x
X,(
X XiX
XX xXX
X
)o(
X
X
XX
XX
aXX X XX X
X XX

Diagrarn (a) Diagram (b) Diagram (c)

Eg 3a
The marks in Mathematics (.r) and Chemistry (/) obtained by ten randomly chosen JC 2 students
were taken and the summadsed data were given as follows

I xy = 3}6ao, l'2 = tus+, Z yt = \$azo, Ir = 528, Z y = 6sa

Find the product moment conelation coefficient / and comment on the value ofr obtained.

y- I'IY
Soln: r=

2,, tl,)'lfr", t:,I

n
' )\"'
(s28)(666)
38640_
10 (conect to 3 sig fig)
(528)'li4esro (uuuf
:qqa+
r0 Jl r0 l J

c8-8
Eg 3b
The data in the above example is given in the table below instead ofthe sumtnarised statistics. Find
the product momcnt correlation coefficient r and comml-nt on the value ofr obtained.

18 20 30 40 16 54 60 80 88 9)

J 42 54 60 54 62 68 80 66 80 t00

Soln:
IJse of GC to obtain r
Step I : Key in the data using <STAT> <EDIT>

:I
zg
30
t0
t6
5\
60
Lr(r) = 18
Step 2: Tum diagnostics on
<CATALOG> <DiagnosticOn>
Step 3: <STAT> <CALC> <8:LinReg(a+br.)> <LIST> <NAMES> <Lr> <LIST>
<NAMES> <Lr>
lnRp!l
!J=E+bX
d-Jo. I Jt rlErfJJ
Lr-. J-{a D ?rf I 7J
I.|-.(|+|+L'J(z.zo
r-.8626339159

r = 0.863 (coffect to 3 sig fi8)

c8-9
Do it yoursell

Qn 2: The height and wcight of a class of l0 students arc giverl in the table belo\.v:
Find the product moment correlation coefficicnt r and comment on the valuc ofr obtained.
Student B C D F F G H J

;-
1.5 1.58 1.6 1.63 t65 t1) 1.73 1.78 t.8 1.85
(l lcight in m)

v '75
53 5't 62 65 66 10 12. 90 85
(Weisht in kg)

Soln:

3. Regression Lines

/'mcasurcs how well the data fits a linear model. Ifthe fit is good, we can consider fbr-mulating an

cquation ol a straiglrt linc to model the relationship. This straight line is called a regression line.

(a) F-or any biva.iate set ofdata, connecting variables r and/, there are always qg_glliqJglX
defined reercssion Iines.

Least Squares m€thod

The line ofbcst fit, also known as the regrcssion line, is found based on the least squares method.
(For a bettcr understandinS ofthis mcthod, visit the website below

Lcast Sauales.html)

c8,10
Equation of the lcast squares regression line oft on x

"Least squdres" Regression line ofy on x

) minirrizel ) ennrs

y:a+bx

J= a +br (lea.;l squares regression line of y on x)

J = a + hy is obtained by finding values ofa and b such that lel is minimum. (e is the difference

Observed value ofy: Thel-coordinate of the point.

Expected valu€ ofl,: The correspondingy-coordinate on the Lne.
e: The difference between the above two.

aid a = t-b;

\-
t'^-\-
z-t.,
, xy
b= I ("-;X-v - t)
and r-t=b()l;-;) (in MFrs)
)t"-;)' -,, (I')'
z'^ n

Thus,y = t+b(,Y-t)

c8-l i
Note:

.
t.
\=4::!!, )=4
t"
Regrcssion linc passes through (t,t), the rnean olthe set ofbivariate data.

b is klrown as the estimated regression ooellrcient (slope ofgraph).

a is the/-intercept.
.r is the indep€ndent variablc (controllcd) andJ'' is the dependent variable.

Ilegression linc is used tbr estimatirlg ), given ). (r is the independent variable)

Eg.la 'lhe marks in Mathcmatics (.r) and flhemistry 0,) obteined by ten randomly chosen JC 2
students were takcn and thc surrrnariscd data wcre given as lollows

20

r12
Find the cquation oflhe estimated regression line oft on -y.

Soln:
Use ofGC to obtain regression line
Step I : Key in the data using <STA'|> <EDl l >
LI L3 1

rFI
20
\2
51r
]'I EO
\4 lrr
t6 EZ
5t 68
60 EO

Li(rl = 18
Step 2: Turn diagnostics on
<CATALOG> <Diagnosticon> <ENTER>
Step 3: <STAT> <CALC> <8'LinReg(a+bi)> <LIST> <NAMES> {Lr> <,>
<LIST> <NAMES> <L:> <ENTER>
I NREg
s=E+bx
???E
---?o 'ao??
Lr-. -rt l E 70I ?,J
t *-- a ++rJa .4a ztf,
r=.8625339159

. . the e stimated regression line of / on ,:r is I = 38.7 + 0.528ir

c8-12
Eg 4b Suppose that the table in Eg 4a is not given and thc data is summarised as I-ollows

I.r=11J640. 1,,2 =:++o+. lt'-totztt.I jr=52S, )-r=666,n: lo

(a) Find the equation ofthe estimated regression line of.), on,rr.
(b) Interpret the siope and ),-intercept in lhe context ofthe question.
(c) P.edict the marks in Chemistry Cr) ofa JC2 studcnt ifhe obtained 50 marks in
Mathematics (,r) using the regression line obtained in (ii). Comnent on the
reliability of the score obtained.
Soln:

(a) = s2.8 , t= = 66.6

\'.\-,, {s28)(ob6)
L^. /-^,/,.1 18640 - "
t0

10

Estimated regression line ofy, on;:: y=

(b) Slope: For an increase of I in the Mathematics score, there is an increase of0.528
in the Chemistry score.

i/-intercept: A student is estimated to score 38.7 for Chomistry when he/she scores 0 for
Mathematics.

(c) Wherr r=50, _l=

y- I'Ir 38640_
(528)(666)
10

il;'rtt;"arLl
\i{" , )\" n ,f;'ilF,*ff )
t-

Since r ry 0.863 , it indicates a high positive linear correlation between Mathematics and
Chernistry scores. llence, the predicted score is reliable.

c8 13
Do it Yoursell

Qn 3:
The ro. ofhours spent studying for a particular subject in a week and the marks obtained 1br a test

for l0 students are given in the table below:

Student B c D E F G H I J

5 7 8 t0 1) 13 15 20 2t
(No. ofhours per week)

v 5'l 62 '70 '75

53 66 72 90 85
(Mark)

(a) Find the equation ofthe estimated regression lile of7 on:r.
(b) Inter?ret the slope andl-intercept in the context ofthe question.
(c) Estimate the no. of hourc a student needs to spend in order to achieve a mark of 80 in the
test. Comme[t on the reliability ofthe va]ue obtained-

Soln:

c8-14
Equatiop of the teast squares regression line of.r on r

"Lcast squares" Re(r?ssion li ?ofxotr!

(Minimizc -r' crrors)

\ " r:c+d/
.,
X
x""'x
''"
X'\
...t x
x -\
-;;
;. '(i,r)
\,.
.:/, X

-r - d!
c+ (lea:;l .tqLter.ts rcgresston li)lc oJ x on y)
,r: c + dI is obtained by finding values ofit ilncl b such that te2 is minimum.
(e is the di1l€rence betwcen the obscrvcd and cxpcctcd r, also known as residuals)

Observed value of -r: Ther coordinat€ ofthe point.

Expected value of,r: 'lhe corresponcling r coordinaic on the line.
e: The difference bctwccn thc abovc two.

Equatiorl ofregression line ofr on.1, can bc found using

\-.!.,
L"L'
\r,
t(,.,-Xr t) or and y-1=d(x ;) (in MFls)
lrr rt'
I:u' _DI
Thus,J= t+d(,t-t)

Note:

\
t, y
t,,
-,
. Regression line passes through (t,t), the mean ofthc set ofbivariate data.

c8-15
' d is known ns the estimated regression coclticient (slope of !3aph).

' c is the r-intercept.

. Jl is the independent va able (controllcd) and -r is the dependent variable.
. Regression line is used fbr estimating n gjverJ, (), is the independent variable)

Ilelationship bctlre€n / and regrcssion lines

. A differcnt line olregression will bc obtained if rve interchangc thc jndependert and depcndeDt

Regression line of
v .), on -r

Regression line of
x on JL/

y: a + bx (least squares regressiotl line ofy on x)

r: c+ d/ (ledtl squeres regression line ofx on y)
If/ = fl, the two lines coincidl--

r:- I (ifboth b and d are negative)

Tlc larger the numerical value ofr, the nearer the lines approach coincidence and

If the two lines are identical, i and I have pg4&qtll!494ryq1AtigD!t!ip.

c8-16
No lihem correl.tion r = 0

Eg 5a Find the regression lines of/ onr and r on.), for the data below and also calculate the

product moment correlation coeffi cient.

,]
I 2 4 6 8 10

v l0 l4 l2 13 15 t2 t3

Soln:
Step l: Placer values in Ll and.p values in L2.
<STAT> <EDIT>
Step 2: To get product moment corelation coefficient.
<Catalogue> <Diagnosticon> <ENTER>

Step 3: To get regression line ofj2 on jr.

<STAT> <CALC> <8:LinReg(a+br)> <LIST> <NAMES> <Lr> <,> <LIST> <NAMES>
<L2> <,> <VARS> <Y-VARS> <1:FUNCTION> <Yr> <ENTER>

c8-t 7
rnReg(B+bx) Lr,LrnRPg
z,Vrl Ic=a+bx
a=11.70403587
b=. 1S68986547
rr=. 1438282624
r=,37BlEB19E3

)=11.7+0.186,lj
Step 4: To get regrcssion line of-r ony.
<STAT> <CALC> <8:LinReg(a+br)> <LIST> <NAMES> <L2> <,> <LIST><NAMES>
<Lr> <.> <VARS> <Y-VARS> <1 TFLINCTION> <Yr> <ENTER>

lnHeg(E+bx) Lr,
r , Vzl 'J=E+bx
d- +. ,J.tiJ ?.4J ?J
Lt-- I OOJltlJl{]J
rz=.1438282624
r=.3781881983

x: - 4.34 + 0.769 y
r + 4.14
.. 'v - --l --:-- ) store thrs as Y'
0.7b9
Note: All above regression lines are stored in Yt, Y2 respectively so that the regression line can
be obtained graphically (Not really a must-do)

Soln:
Regression liney on rc is
Rcgression line -r ony is

Product moment coffelation coeffi ci€nt.

c8 18
Eg 5b Find the regression lines ofjl on r and r on / for the data below and also calculate the
prcduct momcnt correlation coeffi cient.

38' :7
lx2 =2t0, )r =
n

Soln:

= 5.43 = 12.'7

I _,
\'-s.,
Z-^ Z.' {18)(8e)
495 : --,1--,1

)70 ) L

\ nt
F'F
L L' u 495-
..- (ls)(so)
u u' n 1

s,,.-(I,)
z-' l r47- {84)'
n 1

(used Jbr.lindi g r when )) is gire )

t- I'Iu
-1
4es {J8)(8e)

, l[r"
/1r, _tI,, {Id]
, 1r 7t\
./[zro-:s llrr+z-8]
1l I
llt'- )lt' I
(Compare these a swerc with those you obtained using GC)

c8-19
Eg6

civen )(; i)(y t)=2s, I(x-r)' =s0 and l(y,y)2 =3s,ea1gut.1.

(0 the coefficient ofregression fory on r, and

Soln:

(iD Linear product moment correlation coefficient,

Outlier / suspect / anomaly affects computation ofr
. Sketch scatter plot

(-r,,-t,r)

Regession line
ofy on r

x
. Identify the outlier data pair (J.r,.l,r)
. Remove data (xl,.t/r) ftorn CC
. Recalculate the corelation coeflicient for the revised data
. Recalculate the line ofregression ofy on r for the revised data.

c8-20
4. InterpolationandExtrapolation

Once thc rcgrcssion lincs are found, we can use them lor !!.Elp!4lli9&

Extrapolntion ol rhe sample should be used \$rilll caution as the relationship bctwcen )aand I niay

not be Iinearbeyord a cefiain point.

Eg 7 (continucd from Eg 5a) In thc abovc cxanrplc 5a, find thc valuc of
(i) 1 when -r:5 (iiterpolatiol within the range of-r)
(ii) r. when-f :5 (extrapolation - outside the range of})

Soh:
(i) Frorn GC, when ,r = 5, -t = 12.(; trsing thc y on x regressidl line (Y tgraph)

(ii) From GC, when .1, = 5, .y = 0.5 using lhe x on ! regression lhe (y 2 graph)

Eg 8 'Ihe ages, x years and hcights, y cm, of l0 boys wcrc given as follows:

1'2 : s9l.so,l12 = 1 6609 r,Irr = 1 2023.3,1r = 9 t.6,

:r = r28i
(i) Calculate the linear conelation coeilicient between.y andJ-
(ii) Calculate the equation ofthe regression line ofl on r, and
(iii) use it lo eslimate the height ifa boy is 9.0 years old. State the value ofy given by the
regression line when,! = 30 and commcnt upon your answ€r.

Soln:
(i) I inear (orrclirliun coell. bct. r & 1

s.\-,,
, _,, t-^ z-' I202.t 1 fel6l(r28r\
" ',
t0

899 8-'
rer.6l'lf
|66091
/rzsll')
r0^t0)

c8-21
(ii) Eqn- ofrcgrcssion line o1_r on r:

(iij)

c8 22
trg9
The averagc densities ofblackbirds (in pairs per thousand hcctarcs) ovcr vcry large lreas of
f'amland and ofwoodland arc shown, f-or the years 1976 1o 1982, in the table below.
Year t9'7 6 19'77 197ft 1919 | 9E0 lgSl 1982

;;y: e r>7,) i,2 = 641?9,I-v = 2585,Ix2 = 964609,I-ry = 248579

Counting blackbirds in woodland is easier than counling then in lalmlaDd. It is desired in future to
determine only woodland clcnsity and hence use it kr estimate larmland dcnsity.
(i) Trcating thc ycars as providing indcpcndcnt lajrs ofobservations, usc thc givcn data to
cstjn)ate thc lincar regressjon cquation oft, on.r rclating the farmland and r.voodland deisities.
(ii) Given that the 1983 woodland deisity is 500, estimate the average l'annland density 1or that

Soln: (r)

Manurl meihod

_) r' lt" , ) rlet<rrrrine unll rrou,llun,l ,lcnsiry . tu

l{l00i-Ll
estirnalc l'arn and density' > dependent

(:.r, (') in l.r and (y) in L,

Crcatc list for
L'
\.1,'I _-" ."
^tt.r'
I
, Jiagnosricon - s:lrnRegnrbx

607 22r,5.4r 2585

l = ---- + --r:-- (.\
l) ll7 0.226i
7 10005.43' 7 ) l

. .y : l1 .7 + 0.226t

(ii) whcn ir:4i0,/: ll.'7+ 0-226(s0O) - t24.7

.'.Estinrated averagc fannland density for 1983 is 124.7.

As extrapolation is boing caried out in this case, the lin(]ar cofielation may not be valid
outside ofthe range ofvalues. llence, the esfimate is un eliable.

crJ 23
Do it yourself
Qn 4:
The no. ofhours spent studying for a particular subject in a week and the marks obtained for a test
for 10 students are given in the table below:

Srudent R c D E F G H I J

5 l 8 l0 ll 12 t3 15 20 21
(No. ofhours per week)

v '73 '14 89
55 60 62 63 66 75 84
(Mark)

(a) Find the equation ofthe estimated regr€ssion line ofy on x.

o) Estimate the no. ofhouN a student needs to spend in order to achieve full marks in the
test. Comment on the reliability ofthe value obtained.

Soln:

Obtain the least square €stimates for d and B using an equation of the form
(i) y=q+ Blogtar and
(ii) y=d+Px2
as a fit for the set ofdata shown above.

Determine which equation is a better fit, giving rcasons to support your answer.

c8-24
Soln: (i) .f -.1+ /loglr-r =

Kcy in the cltta lbr x, l and z jnto L,, l-, and L, rcspectively using <STAT>

<EDII'>
LI LZ L] 3
ET 5.5 st FTr{H
t8 6.1 7A -7991t
E6 8.5 E6 .9t9rrZ
It5 \-z E5 .6Zlt5
91 7.t 91 .859t3
EO 5.1 EO
95 9.6 9S .9EZZ7
rr =loB{Lt } lrttt=. 7481888?78-..

Frt,m CC, v: Therelbre d:

(ii)
Key in the clata 1'or -r, y and ; into L,, L, and Lr respeotively using <S1A'l>

<EDt'.l >
LI LZ L} ]
5.E
5.1
BT
7E
F*t
19.59
s.5 s5
Lt E5 17.5t
?.9 91. 5!r.76
5.1 EO t6.01
9.6 95 92.16
.f,6

From GC, -},: Therelore d= , p:

Using GC,

Since the correlation coelficient ibr part (i) is larger than that in part (ii), there is a

much better positive linear conelation. Therefore, t =a+ / logr0 r is a better fit.

c8,25
6. MiscellaneousExamples
Eg 11

A random sampie ofeight pairs of values of). and.), is used to obtain the following equations ofthe
regression lines ofy on n and ofrc on J., respectively.
7x. t5t
_. 7
'I t0 l0
.t___v+20
6-
Seven pairs ofdata are given in the table.

l0 1l 12 l1 1'7
't4 ls
,7
-l 9 8 6 5 4 1

Find the sth pair ofvalues of(jr,./). Detemine the value ofthe product moment conelation

coefficient and comment on what its value implies about the 2 regression lines given above.

Let y be the value obtained by substituting a sample value ofr into the equation ofthe regression

line ofy onx. Evaluate fforeach ofthe eight values ofxand venfy that )(7 f)'=S.S.
For each ol the sample values ofx, I/'isgivenby y'=a+bir,where u*!!1, 6*-1.y7lro1"on
I0 l0
you say about the value of I(1,- f ')'] ?

Soln:
1 l5t
-lr=--J+-.-...--(l)
l0 l0
7
y= _-y + 20 ...... (2)
6

Since we need to find the 86 value, take n:8

therelore lr and Zy -
From the given data, )a =O+ and )a =a0

Therefore the 8rh values are -n = and .).) =

Using GC, r =

c8-26
Sincc r . 0.90,+,which is very close to I , it indicates a high negative lincar conclation between
,\ and I. Hcnce thc rcgrcssion lines are very close.

l0 1l 12 ll l1 l4 19 10

v 9 8 I 6 5 4 u

_. 1 l5r
t0 t0 8.1 7.4 61 1.4 1.2 5.3 1.8 8.1

I(.u r)' = 8.8 (shown)

Eg 12
The daily rate charged by a ca-hire firn varics with thc lcngth ofthc hirc period. Thc finr-r's

brochure gives fhe following data:

H ire
Pcriod, 2 3 4 5 l0 30 50

,r l)ays
Daily
149 119 115 11). 109 105 103 10i
Rate \$.1

Calculate the value ofthe product- moment correlation coel'ficient.

Give a sketch ofthe scatter diagram fb. the data, as shown on your calculator, and hence
(i) comment on the suitability offinding the linear regression line ofy on r,
(ii) state, with a reason, which ofthe following models is approp.iate.

A: .y-n11'rr B:1=a16": C,y=u+!

jl.
D: y=a+hlnx

F or the appropriate mod{-l, calculate the least squares estimates of a and b. Find also the product
lnolrlent corelation coefficient and commeDt on the suitabilitv ofthe modcl-

c8-2',1
Soln:
. Entcr thc data into the GCI as in two lists (say Iand y)

. Prcss l2"d l[CATALoG][D] and select Diagnosticon.

. With the command Diagnosticon, on thc Homc Screen, find ilny regressir'n ei.luation.
(Follow previous exarnple to find the regression equalior)
Your scrccn shot should look like this:

LinEes
v=EX+h
a= -. 4986649635
h=128.5658301
rr=.317465457?
F= -. 56344la73la9

Sketch the data as a scatter diagram.

(i) The scatter plot ofl, and r shows that the relationship betwecnJ", and x is non-linear.

Mor€over, the / value indicates a low negative linear col.(rlation. Hcnce, the
regression line ofJL, on x is not suitable.
(ii) It can bo €asily identified as C since-! tends to a limit lbr larger value ofr.
'l akc h
y-a.- iey a Ibz.Drawlher(e,ressic,nline_yonz.

c8-28
-l
he screen should look likc this:

b=47, B7?6J415
rt=,98.37837183
r'= - 9914587189

Thcrelorc a : 99.8 arld b . 47.1

Now r = 0 992 rvhich is close to 1. Therefore there is a very high positive linear
coITelation which implies that the model is suitable.

Eg 13 l{esearch is being carried out iDto how the concenlration ofa dlug in the bloodsiream varics
with time, measurcd lrom when thlj dnrg is givcn- Observations at succcssivc timcs givc the
data shown in the fbllowing table.

'Iine (t minules) 90
aloncentralion
r microrrams Dcr litrc

It is given that thc valuc ofthc product momcnt corrclation cocf{icicnt for this data is
0.912, colrect to 3 decimal p].lces- The scatter diagram for the data is shown below-

r Gnicrograms per litrc)

100

rJ{l

60

40

2t

0 I {nimtcs)
r00 t50 200 2J0 300 150

Calculate the cquation ofthe regrcssion line ofr on t. t2l

Ci culate the corresponding estimatcd valuo of ir when I : 300, and comment on the
suitability olthe linear model. t2l
The variablcl is defined by_1, : ln r. For the variablesl and I,
(i) calculate the product moment co.relation cocfficient and comment on its value,
l2l
(iD calculate thc cquation ofthe appropriate regression linlr. t3]
Use a regression linc to giv€ the best estimate that you can of the time when the drug
concentration is 15 micrograrns per litrc. t2]
ICCE 'A' level/ Nov 2007 P2l Ql ll

crJ 29
Soln:
Equation ofthe regression line ofr on / :

When r:300, r:
It is not a suitable model as the concentration cannot bc a negative value.

(ii) As I is an independent variable, regression line ofy on t is appropriate.

Y= 4.62 0 0123t

As / is close to I, the regression lines ofl on I and l on 1 are almost idertical, therelore we can
usel on I to estimate L

c8 30