18 views

Uploaded by Nadya Farah Kamilia

geostatistika

- Macroeconomic Data Analysis
- Pearson r Spss Method
- Correlation
- Correlation SPSS
- Estimize Bull speed using Back propagation
- Correlation Analysis
- Topic03 Correlation Regression
- BBM 3rd Sem Syllabus 2014.pdf
- GAISE - Appendices - August 2005 - p89-p107
- Correlation and Regression
- Presentation Chapter 6 MTH1022
- Analyzed Univariate and Bivariate
- Document(2)
- Correlation Analysis
- Lecture_2.1
- bigreview-burgers-alecwilson
- Correlation Coefﬁcients of Probabilistic Hesitant Fuzzy Elements and Their Applications to Evaluation of the Alternatives
- Spss Last Asignment
- 1
- penyusunan

You are on page 1of 40

9780521740494c23.xml

CUAU033-EVANS

11:42

C H A P T E R

23

Investigating the

relationship between

two numerical variables

Objectives

PL

E

P1: FXS/ABE

SA

M

To fit a straight line to data by eye, and using the method of least squares

To interpret the slope of a regression line and its intercept, if appropriate

To predict the value of the dependent (response) variable from an independent

(explanatory) variable, using a linear equation

In Chapter 22 statistics of one variable were discussed. Sometimes values of a variable for

more than one group have been examined, such as age of mothers and age of fathers, but only

one variable was considered for each individual at a time.

When two variables are observed for each subject, bivariate data are obtained. For example,

it might be interesting to record the number of hours spent studying for an exam by each

student in a class and the mark they achieved in the exam. If each of these variables were

considered separately the methods discussed earlier would be used. It may be of more interest

to examine the relationship between the two variables, in which case new bivariate techniques

are required. When exploring bivariate data, questions arise such as, Is there a relationship

between two variables? or Does knowing the value of one of the variables tell us anything

about the value of the other variable?

554

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

555

Consider the relationship between the number of cigarettes smoked per day and blood

pressure. Since one opinion might be that varying the number of cigarettes smoked may affect

blood pressure, it is necessary to distinguish between blood pressure, which is called the

dependent or response variable, and the number of cigarettes, which is called the

independent or explanatory variable. In this chapter some techniques are introduced which

enable questions concerning the nature of the relationship between such variables to be

answered.

PL

E

23.1

As with data concerning one variable, the most important first step in analysing bivariate data

is the construction of a visual display. When both of the variables of interest are numerical then

a scatterplot (or bivariate plot) may be constructed. This is the single most important tool in

the analysis of such bivariate data, and should always be examined before further analysis is

undertaken. The pairs of data points are plotted on the cartesian plane, with each pair

contributing one point to the plot. Using the normal convention, the variable plotted

horizontally is denoted as x, and the variable plotted vertically as y. The following example

examines the features of the scatterplot in more detail.

Example 1

SA

M

The number of hours spent studying for an examination by each member of a class, and the

marks they were awarded, are given in the table.

Student

Hours

Mark

1

4

27

2

36

87

3

23

67

4

28

84

5

25

66

6

11

52

7

18

61

8

13

43

9

4

38

10

8

52

Student

Hours

Mark

11

4

41

12

19

54

13

6

57

14

19

62

15

1

23

16

29

65

17

33

75

18

36

83

19

28

65

20

15

55

Solution

The first decision to be made when preparing this scatterplot is whether to show Mark

or Hours on the horizontal (x) axis. Since a students mark is likely to depend on the

hours that they spend studying, in this case Hours is the independent variable and

Mark is the dependent variable. By convention, the independent variable is plotted on

the horizontal (x) axis, and the dependent variable on the vertical (y) axis, giving the

scatterplot shown.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

556

Mark y

80

60

40

20

PL

E

P1: FXS/ABE

10

30

20

40

Hours x

From this scatterplot, a general trend can be seen of increasing marks with increasing hours of

study. There is said to be a positive association between the variables.

Two variables are positively associated when larger values of y are associated with larger

values of x, as shown in the previous scatterplot.

Examples of variables which exhibit positive association are height and weight, foot size and

hand size, and number of people in the family and household expenditure on food.

Example 2

SA

M

The age, in years, of several cars and their advertised price in a newspaper are given in the

following table.

Age (years)

4

6

5

Price ($)

13 000 9 800 11 000

Age (years)

7

Price ($)

9 700

7

4

2

3

3

8 300 10 500 15 800 14 300 13 800

6

4

6

4

8

9 500 13 200 10 000 11 800 8 000

6

12 200

Solution

In this case the independent variable is the age of the car, which is plotted on the

horizontal axis. The dependent variable, price, is plotted on the vertical axis.

Price ($) y

16 000

14 000

12 000

10 000

8 000

1

7

8

Age (years) x

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

557

From the scatterplot a general trend of decreasing price with increasing age of car can be seen.

There is said to be a negative association between the variables.

Two variables are negatively associated when larger values of y are associated with smaller

values of x, as shown in the scatterplot above.

Examples of other variables which exhibit negative association are weight and number of

weeks spent on a diet program, hearing ability and age, and number of cold rainy days per

week and sales of ice creams.

The third alternative is that a scatterplot shows no particular pattern, indicating no

association between the variables.

PL

E

P1: FXS/ABE

y

8

6

4

SA

M

There is no association between two variables when the values of y are not related to the

values of x, as shown in the preceding scatterplot.

Examples of variables which show no association are height and IQ for adults, price of cars

and fuel consumption, and size of family and number of pets.

When one point, or a few points, do not seem to fit with the rest of the data they are called

outliers. Sometimes a point is an outlier, not because its x value or its y value is in itself

unusual, but rather because this particular combination of values is atypical. Consequently such

an outlier cannot always be detected from single variable displays, such as stem-and-leaf plots.

For example, consider this scatterplot. While the

y

variable plotted on the horizontal axis takes values

8

from 1 to 8 and the variable plotted on the vertical

6

axis takes values from 2 to 8, the combination (2, 8)

4

is clearly an outlier.

2

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

558

PL

E

The calculator can be used to construct a scatterplot of statistical data. The procedure is

illustrated using the age and price of car data from Example 2.

The data is easiest entered in a Lists &

3).

Spreadsheet application (

) to

Firstly, use the up/down arrows (

name the first column age and the second

column price.

Then enter the data as shown.

SA

M

5 ) to graph the data. At first the data

displays as shown.

Variable from the Plot Properties (b 2

4) and selecting age.

Specify the y variable by selecting Add Y

Variable from the Plot Properties (b 2

6) and selecting price.

The data now displays as shown.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

559

The table represents the results of 12 students in two tests.

Test 1 score

Test 1 score

10

12

18

20

13

11

6

9

8

6

5

6

12

12

15

13

15

17

SA

M

PL

E

Enter the data into list1 (x) and list2 (y) in the

module. Tap SetGraph,

Setting . . . and select the tab for Graph3.

(Note: Following on from the types of graphs in

univariate statistics, this allows the Scatterplot

settings to be remembered and called upon when

required.)

Ensure that all other graphs are de-selected and

to produce the graph shown in the full

tap

screen.

Select the graph window (bold border) and tap Analysis, Trace to scroll from point to

point and display the coordinates at the bottom of the graph.

Exercise 23A

Save your data for 14 in named lists as they will be needed for later exercises.

1 The amount of a particular pain relief drug given to each patient and the time taken for the

patient to experience pain relief are shown.

Note:

Patient

Drug dose (mg)

Response time (min)

1

0.5

65

2

3

4

5

6

7

8

9 10

1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

35 15 10 22 16 10 18 70 20

b From the scatterplot, describe any association between the two variables.

c Identify outliers, if any, and interpret.

2 The proprietor of a hairdressing salon recorded the amount spent advertising in the local

paper and the business income for each month of a year, with the following results.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

560

1

350

9 450

2

450

10 070

3

400

9 380

4

500

9 110

5

250

5 220

6

150

3 100

7

350

8 060

8

300

7 030

9

550

11 500

10

600

12 870

11

550

10 560

12

450

9 850

PL

E

b From the scatterplot, describe any association between the two variables.

c Identify outliers, if any, and interpret.

3 The number of passenger seats on the most commonly used commercial aircraft, and the

airspeeds of these aircraft, in km/h, are shown in the following table.

Number of seats

Airspeed (km/h)

405

830

296

797

288

774

258

736

240

757

230

765

193

760

188

718

Number of seats

Airspeed (km/h)

148

683

142

666

131

661

122

378

115

605

112

620

103

576

102

603

SA

M

b From the scatterplot, describe any association between the two variables.

c Identify outliers, if any, and interpret.

4 The price and age of several secondhand caravans are listed in the table.

Age (years)

7

7

8

9

4

8

1

Price ($)

4 800

3 900

4 275

3 900

6 900

6 500

11 400

Age (years)

10

9

9

11

3

4

7

Price ($)

8 700

1 950

3 300

1 650

9 600

8 400

6 600

b From the scatterplot, describe any association between the two variables.

c Identify outliers, if any, and interpret.

23.2

If the plot of a bivariate data set shows a basic trend, apart from some randomness, then it is

useful to provide a numerical measure of the strength of the relationship between the two

variables. Correlation is a measure of strength of a relationship which applies only to

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

561

numerical variables. Thus it is sensible, for example, to calculate the correlation between the

heights and weights for a group of students, but not between height and gender, as gender is

not a numerical variable. There are many different numerical measures of correlation, and each

has different properties. In this section the q-correlation coefficient will be introduced.

Consider the scatterplot of the number of hours spent by each member of a class when

studying for an examination, and the mark they were awarded, from Example 1. This shows a

positive association. To calculate the q-correlation coefficient, first find the median value for

each of the variables separately. This can be done from the data, but it is usually simpler to

calculate directly from the plot. There are 20 data points, and the median values are halfway

between the 10th and 11th points, both vertically and horizontally. A vertical line is then drawn

through the median x value, and a horizontal line through the median y value. The effect of this

is to divide the plot into four regions, as shown.

Marks y

80

60

40

SA

M

20

PL

E

P1: FXS/ABE

10

30

20

40

Hours x

Each of the four regions which have been created in this way is called a quadrant, and it can

be noticed immediately that most of the points in this plot are in the upper right and lower left

quadrants. In fact, wherever there is a positive association between variables this will be the

case.

Consider the scatterplot of the age of cars and the advertised price from Example 2, which

shows negative association. Again the median value for each of the variables is found

separately. There are 15 data points, giving the median values at the 8th points, both vertically

and horizontally. A vertical line is then drawn through the median x value, and a horizontal line

through the median y value. In this particular case they are coordinates of the same point, but

this need not be so.

Price ($) y

16 000

14 000

12 000

10 000

8 000

1

7

8

Age (years) x

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

562

It can be seen in this example that most points are in the upper left and the lower right

quadrants, and this is true whenever there is a negative association between variables.

These observations lead to a definition of the q-correlation coefficient.

The q-correlation coefficient can be determined from the scatterplot as follows.

Find the median of all the

x values in the data set, and

draw a vertical line through

this value.

PL

E

P1: FXS/ABE

y values in the data set, and

draw a horizontal line through

this value.

The plane is now divided into four quadrants. Label the quadrants A, B, C and D as shown

in the diagram.

Count the number of points in each of the quadrants A, B, C and D. Any points which lie

on the median lines are omitted.

Let a, b, c, d represent the number of points in each of the quadrants A, B, C and D

respectively. Then the q-correlation coefficient is given by

(a + c) (b + d)

a+b+c+d

SA

M

q=

Example 3

Use the scatterplot from Example 1 to determine the q-correlation coefficient for the number

of hours each member of a class spent studying for an examination and the mark they were

awarded.

Solution

There are nine points in quadrant A, one in quadrant B, nine in quadrant C and one in

quadrant D.

Thus

(a + c) (b + d)

a+b+c+d

(9 + 9) (1 + 1)

=

9+1+9+1

18 2

=

20

16

=

20

= 0.8

q=

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

563

Example 4

Use the scatterplot from Example 2 to determine the q-correlation coefficient for the age of

cars and their advertised price.

Solution

There is one point in quadrant A, six in quadrant B, one in quadrant C and six in

quadrant D.

PL

E

P1: FXS/ABE

(a + c) (b + d)

a+b+c+d

(1 + 1) (6 + 6)

=

1+6+1+6

2 12

=

14

10

=

14

= 0.71

q=

From Examples 3 and 4 it can be seen that q-correlation coefficients may take both positive

and negative values. Consider the situation when all the points are in the quadrants A and C.

(a + c) (b + d)

a+b+c+d

a+c

(since b and d are both equal to zero)

=

a+c

=1

SA

M

q=

Thus the maximum value the q-correlation coefficient may take is 1, and this indicates a

measure of strong positive association.

Suppose all the points are in the quadrants B and D.

(a + c) (b + d)

a+b+c+d

(b + d)

(since a and c are both equal to zero)

=

b+d

= 1

q=

Thus the minimum value the q-correlation coefficient may take is 1, and this indicates a

measure of strong negative association.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

564

When the same number of points are in each of the quadrants A, B, C and D then:

(a + c) (b + d)

a+b+c+d

0

(since a = b = c = d)

=

a+b+c+d

=0

q=

This value of the q-correlation coefficient clearly indicates that no association exists.

PL

E

1 q

0.75 < q

0.50 < q

0.25 < q

0.25 q

0.50 q

0.75 q

0.75

0.50

0.25

< 0.25

< 0.50

< 0.75

1

moderate negative relationship

weak negative relationship

no relationship

weak positive relationship

moderate positive relationship

strong positive relationship

Exercise 23B

SA

M

a q = 0.20

e q = 0.95

i q = 1

b q = 0.30

f q = 0.75

j q = 0.25

c q = 0.85

g q = 0.75

k q=1

d q = 0.33

h q = 0.24

l q = 0.50

2 Calculate the q-correlation coefficient for each pair of variables shown in the following

scatterplots.

a

36

24

12

15.0

20.0

25.0

30.0

35.0

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

565

280

210

140

120

y

8

6

4

2

0

y

160

200

SA

M

PL

E

P1: FXS/ABE

240

280

80

60

40

20

Example

10

20

30

40

3 The amount of a particular pain relief drug given to each patient and the time taken for the

patient to experience pain relief are shown.

Patient

Drug dose (mg)

Response time (min)

1

0.5

65

2

1.2

35

3

4.0

15

4

5.3

10

5

2.6

22

6

3.7

16

7

5.1

10

8

1.7

18

9

0.3

70

10

4.0

20

a Use your scatterplot from 1, Exercise 23A to find the q-correlation coefficient for

response time and drug dosage.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

566

b Classify the strength and direction of the relationship between response time and drug

dosage according to the table given.

Example

4 The proprietor of a hairdressing salon recorded the amount spent advertising in the local

paper and the business income for each month of a year, with the following results.

Month Advertising ($) Business ($)

7

350

8 060

8

300

7 030

9

550

11 500

10

600

12 870

11

550

10 560

12

450

9 850

PL

E

1

350

9 450

2

450

10 070

3

400

9 380

4

500

9 110

5

250

5 220

6

150

3 100

a Use your scatterplot from 2, Exercise 23A to find the q-correlation coefficient for

advertising expenditure and total business conducted.

b Classify the strength and direction of the relationship between advertising expenditure

and business income according to the table given.

5 The number of passenger seats on the most commonly used commercial aircraft, and the

airspeeds of these aircraft, in km/h, are shown in the following table.

405

830

296

797

288

774

258

736

240

757

230

765

193

760

188

718

Number of seats

Airspeed (km/h)

148

683

142

666

131

661

122

378

115

605

112

620

103

576

102

603

SA

M

Number of seats

Airspeed (km/h)

a Use your scatterplot from 3, Exercise 23A to find the q-correlation coefficient for the

number of seats on an airline and the air speed.

b Classify the strength and direction of the relationship between the number of seats on

an airline and the air speed according to the table given.

6 The price and age of several secondhand caravans are listed in the table.

Age (years)

7

7

8

9

4

8

1

Price ($)

4 800

3 900

4 275

3 900

6 900

6 500

11 400

Age (years)

10

9

9

11

3

4

7

Price ($)

8 700

1 950

3 300

1 650

9 600

8 400

6 600

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

567

a Use your scatterplot from 4, Exercise 23A to find the q-correlation coefficient for price

and age of secondhand caravans.

b Classify the strength and direction of the relationship between price and age of

secondhand caravans according to the table given.

23.3

r=

PL

E

When a relationship is linear the most commonly used measure of strength of the relationship

is Pearsons product-moment correlation coefficient, r. It gives a numerical measure of the

degree to which the points in the scatterplot tend to cluster around a straight line.

Pearsons product-moment correlation is defined to be

degree which the variables vary together

degree which the two variables vary separately

Formally, if we call the two variables x and y and we have n observations then Pearsons

product-moment correlation for this set of observations is

n

1

xi x

yi y

r=

n 1 i=1

sx

sy

SA

M

where x and sx are the mean and standard deviation of the x scores and y and s y are the mean

and standard deviation of the y scores.

There are two key assumptions made in calculating Pearsons correlation coefficient r. They

are

the data is numerical

the relationship being described is linear.

Pearsons correlation coefficient r has the following properties:

If there is no linear relationship, r = 0.

y

r=0

r = +1.

r = 1.

y

r = +1

r = 1

Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

568

Otherwise, 1 r +1

Pearsons correlation coefficient r can be classified as follows:

0.75 strong negative linear relationship

0.50 moderate negative linear relationship

0.25 weak negative linear relationship

< 0.25

no linear relationship

< 0.50

weak positive linear relationship

< 0.75

moderate positive linear relationship

1

strong positive linear relationship

PL

E

1 r

0.75 r

0.50 r

0.25 < r

0.25 r

0.50 r

0.75 r

The following scatterplots show linear relationships of various strengths together with the

corresponding value of Pearsons product-moment correlation coefficient.

30

25

12

Age

CO level

14

10

20

15

Traffic volume

SA

M

and traffic volume: r = 0.985

Testosterone level

600

1400

hormone) level of a group of prisoners:

r = 0.814

110

160

120

100

Score

Mortality

140

100

80

90

60

80

100

Smoking ratio

120

Verbal

smoking ratio (100 average): r = 0.716

700

20

600

15

500

10 12 14 16 18 20

Age 1st word

age (in months) when first word spoken:

r = 0.445

Calf

P1: FXS/ABE

10

5

400

500

700

600

Mathematics

800

and mathematical ability: r = 0.275

30

40

Age

50

r = 0.005

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

569

PL

E

the calculator. This will be illustrated using the age and price of car data from

Example 2.

With the data entered as the two lists age and

price respectively, we now open a Calculator

1) to calculate Pearsons

application (

product-moment correlation coefficient.

Use Linear Regression (mx + b) from the

Stat Calculations submenu of the Statistics

menu (b 6 1 3) and complete the

dialog box as shown.

SA

M

information including the value for r as

shown.

Consider the following set of data

x

y

1

2

3

5

5

7

4

2

7

9

Tap Calc, Linear Reg and select the settings shown. Tap

OK to produce the results shown in the second screen.

The value of r is shown in the answer box.

Tap OK to produce a scatterplot.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

570

After the mean and standard deviation, Pearsons product-moment correlation is one of the

most frequently computed descriptive statistics. It is a powerful tool but it is also easily

misused. The presence of a linear relationship should always be confirmed with a scatterplot

before Pearsons product-moment correlation is calculated. And, like the mean and the

standard deviation, Pearsons correlation coefficient r is very sensitive to the presence of

outliers in the sample.

Exercise 23C

PL

E

P1: FXS/ABE

a r = 0.20

e r = 0.95

i r = 0.50

b r = 0.30

f r = 0.75

j r = 0.25

c r = 0.85

g r = 0.75

k r =1

d r = 0.33

h r = 0.24

l r = 1

2 By comparing the plots given to those on page 538 estimate the value of Pearsons

correlation coefficient r.

a

280

36

210

24

140

SA

M

12

15.0

20.0

25.0

30.0

35.0

y

8

14 000

12 000

10 000

8 000

2

3

20

f y

280

240

40

200

60

160

80

120

8 x

10

20

30

40

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

571

3 The amount of a particular pain relief drug given to each patient and the time taken for the

patient to experience relief are shown.

Patient

Drug dose (mg)

Response time (min)

1

0.5

65

2

1.2

35

3

4.0

15

4

5.3

10

5

2.6

22

6

3.7

16

7

5.1

10

8

1.7

18

9

0.3

70

10

4.0

20

PL

E

b Classify the relationship between drug dose and response time according to the table

given.

4 The proprietor of a hairdressing salon recorded the amount spent on advertising in the

local paper and the business income for each month for a year, with the following results.

Month Advertising ($) Business ($)

1

350

9 450

2

450

10 070

3

400

9 380

4

500

9 110

5

250

5 220

6

150

3 100

7

350

8 060

8

300

7 030

9

550

11 500

10

600

12 870

11

550

10 560

12

450

9 850

SA

M

b Classify the relationship between the amount spent on advertising and business income

according to the table given.

5 The number of passenger seats on the most commonly used commercial aircraft, and the

airspeeds of these aircraft, in km/h, are shown in the following table.

Number of seats

Airspeed (km/h)

405

830

296

797

288

774

258

736

240

757

230

765

193

760

188

718

Number of seats

Airspeed (km/h)

148

683

142

666

131

661

122

378

115

605

112

620

103

576

102

603

b Classify the relationship between the number of passenger seats and airspeed

according to the table given.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

572

6 The price and age of several secondhand caravans are listed in the table.

Age (years)

7

7

8

9

4

8

1

Price ($)

4 800

3 900

4 275

3 900

6 900

6 500

11 400

Age (years)

10

9

9

11

3

4

7

Price ($)

8 700

1 950

3 300

1 650

9 600

8 400

6 600

PL

E

P1: FXS/ABE

b Classify the relationship between price and age according to the table given.

7 The following are the scores for a group of ten students who each had two attempts at a

test (out of 70).

Attempt 1

Attempt 2

53

63

56

66

57

67

49

58

44

54

69

70

66

70

40

55

53

63

43

53

68

70

64

70

a Construct a scatterplot of these data, and describe the relationship between scores on

attempt 1 and attempt 2.

b Is it appropriate to calculate the value of Pearsons correlation coefficient for these

data? Give reasons for your answer.

SA

M

Student

Test 1

Test 2

different tests for a group of students.

1

214

216

a Construct a scatterplot of these data, and

2

281

270

describe the relationship between scores on

3

212

281

Test 1 and Test 2.

4

324

326

b Is it appropriate to calculate the value of

5

240

243

Pearsons correlation coefficient for these

6

208

213

data? Give reasons for your answer.

7

303

311

c Determine the values of the q-correlation

8

278

290

coefficient and Pearsons correlation

9

311

320

coefficient r.

d Classify the relationship between Test 1 and Test 2 using both the q-correlation

coefficient and Pearsons correlation coefficient r, and compare.

e It turns out that when the data was entered into the student records, the result for

Test 2, Student 9 was entered as 32 instead of 320.

i Recalculate the values of the q-correlation coefficient and Pearsons correlation

coefficient r with this new data value.

ii Compare these values with the ones calculated in c.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

23.4

573

Lines on scatterplots

PL

E

If a linear relationship exists between two variables it is possible to predict the value of the

dependent variable from the value of the independent variable. The stronger the relationship

between the two variables the better the prediction that is made. To make the prediction it is

necessary to determine an equation which relates the variables and this is achieved by fitting a

line to the data. Fitting a line to data is often referred to as regression, which comes from a

Latin word regressum which means moved back.

The simplest equation relating two variables x and y is a linear equation of the form

y = a + bx

where a and b are constants. This is similar to the general equation of a straight line, where a

represents the coordinate of the point where the line crosses the y axis (the y axis intercept),

and b represents the slope of the line. In order to summarise any particular (x, y) data set,

numerical values for a and b are needed that will make the line pass close to the data. There are

several ways in which the values of a and b can be found, of which the simplest is to find the

straight line by placing a ruler on the scatter diagram, and drawing a line by eye, which

appears to follow the general trend of the data.

Example 6

SA

M

The following table gives the gold medal winning distance, in metres, for the mens long jump

for the Olympic games for the years 1896 to 1996. (Some years were missing owing to the two

world wars.)

Find a straight line which fits the general trend of the data, and use it to predict the winning

distance in the year 2008.

Year

Distance (m)

Year

Distance (m)

1896

6.35

1960

8.13

1900

7.19

1964

8.08

1904

7.34

1968

8.92

1908

7.49

1972

8.26

1912

7.59

1976

8.36

1920

7.16

1980

8.53

1924

7.44

1984

8.53

1928

7.75

1988

8.72

1932

7.65

1992

8.67

1936

8.05

1996

8.50

1948

7.82

2000

8.55

1952 1956

7.57 7.82

2004

8.59

Solution

9.0

Distance

8.5

8.0

7.5

7.0

6.5

6.0

Year

Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

574

PL

E

Note that this scatterplot does not start at the origin. Since the values of the

coordinates that are of interest on both axes are a long way from zero, it is sensible to

plot the graph for that range of values only. In fact, any values less than 1896 on the

horizontal axis are meaningless in this context.

The line shown on the scatterplot is only one of many which could be drawn. To

enable the line to be used for prediction it is necessary to find its equation. To do this,

first determine the coordinates of any two points through which it passes on the

scatterplot. Appropriate points are (1932, 7.65) and (1976, 8.36). The equation of the

straight line is then found by substituting in the formula which gives the equation for a

straight line between two points.

y2 y1

y y1

=

x x1

x2 x1

8.36 7.65

y 7.65

=

x 1932

1976 1932

0.71

= 0.016

=

44

y 7.65 = 0.016(x 1932)

y = 0.016x 23.26

SA

M

The intercept for this equation is 23.26 m. In theory, this is the winning distance

for the year 0! In practice, there is no meaningful interpretation for the y axis intercept

in this situation. But the same cannot be said about the slope. A slope of 0.016 means

that on average the gold medal winning distance increases by about 1.6 centimetres at

each successive games.

Using this equation the gold medal winning distance for the long jump in 2008

would be predicted as

y = 23.26 + 0.016 2008 = 8.87 m

Obviously, attempting to project too far into the future may give us answers which are not

sensible. When using an equation for prediction, derived from data, it is sensible to use values

of the explanatory variable which are within a reasonable range of the data. The relationship

between the variables may not be linear if we move too far from the known values.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

575

Example 7

The following table gives the alcohol consumption per head (in litres) and the hospital

admission rate to each of the regions of Victoria in 199495.

Region

LoddonMallee

Grampians

Barwon

Gippsland

Hume

Western Metropolitan

Northern Metropolitan

Eastern Metropolitan

Southern Metropolitan

Per capita

consumption

(litres of alcohol)

9.0

8.4

8.7

9.1

10.0

9.0

6.7

6.2

8.1

Hospital

admissions per

1000 residents

42.0

44.7

38.6

44.7

41.0

40.4

36.2

32.3

43.0

PL

E

P1: FXS/ABE

Find a straight line which fits the general trend of the data, and interpret the intercept and slope.

Solution

Admissions

SA

M

45.0

40.0

35.0

30.0

6.0

7.0

9.0

8.0

Alcohol consumption

10.0

One possible line passes through the points (7, 36) and (9, 42).

y2 y1

y y1

=

x x1

x2 x1

y 36

42 36

=

x 7

97

6

=

2

=3

y 36 = 3(x 7)

y = 3x + 15

or admission rate = 15 + 3 alcohol consumption

Thus

The intercept for this equation is 15, implying that we predict a hospital admission

rate would be 15 per 1000 residents for a region with 0 alcohol consumption. While

this is interpretable, it would be a brave prediction as it is well out of the range of the

Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

576

data. A slope of 3 means that on average the admission rate rises by 3 per 1000

residents for each additional litre of alcohol consumed per capita.

Exercise 23D

Example

x

y

0

1

1

3

2

6

3

7

4

7

5

11

6

13

7

18

8

17

PL

E

P1: FXS/ABE

Draw a straight line which fits the data by eye, and find an equation for this line.

2 Plot the following set of points on graph paper.

x

y

3

5

2

2

1

0

0

6

1

7

2

11

3

13

4

20

Draw a straight line which fits the data by eye, and find an equation for this line.

7

3 The numbers of burglaries during two successive years for various districts in one state are

given in the following table.

a Make a scatterplot of the data.

District Year 1 (x) Year 2 (y)

b Find the equation of a straight line which

A

3233

2709

relates the two variables.

B

2363

2208

c Describe the trend in burglaries in this state.

C

4591

3685

SA

M

Example

D

E

F

G

H

I

J

K

4317

2474

3679

5016

6234

6350

4072

2137

4038

2792

3292

4402

5147

5555

4004

1980

4 The following data give a girls height (in cm) between the ages of 36 months and

60 months.

Age (x)

Height (y)

36

84

40

87

44

90

52

92

56

94

60

96

Find the equation of a straight line which relates the two variables.

Interpret the intercept and slope, if appropriate.

Use your equation to estimate the girls height at age

i 42 months

ii 18 years

e How reliable are your answers to d?

a

b

c

d

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

577

5 The following table gives the adult heights (in cm) of ten pairs of mothers and daughters.

Mother (x)

Daughter (y)

170

178

163

175

157

165

165

173

175

168

160

152

164

163

168

168

152

160

173

178

b Find the equation of a straight line which relates the two variables.

c Estimate the adult height of a girl whose mother is 170 cm tall.

6 The manager of a company which manufactures MP3 players keeps a weekly record of the

cost of running the business and the number of units produced. The figures for a period of

eight weeks are shown in the table.

PL

E

P1: FXS/ABE

Cost in 000s $ (y)

a

b

c

d

2.5 3.3 2.4 2.6 4.1 3.1 3.5 3.8

Find the equation of a straight line which relates the two variables.

What is the manufacturers fixed cost for operating the business each week?

What is the cost of production of each unit, over and above this fixed operating cost?

7 The amount of a particular pain relief drug given to each patient and the time taken for the

patient to experience pain relief are shown.

1

0.5

65

2

1.2

35

SA

M

Patient

Drug dose (mg)

Response time (min)

3

4.0

15

4

5.3

10

5

2.6

22

6

3.7

16

7

5.1

10

8

1.7

18

9

0.3

70

10

4.0

20

a Find the equation of a straight line which relates the two variables.

b Interpret the intercept and slope if appropriate.

c Use your equation to predict the time taken for the patient to experience pain relief if 6

mg of the drug is given. Is this answer realistic?

the amount spent on advertising in the local

paper and the business income for each

month for a year, with the results shown.

a Find the equation of a straight line

which relates the two variables.

b Interpret the intercept and slope if

appropriate.

c Use your equation to predict the business

income which would be attracted if the

proprietor of the salon spent the following

amounts on advertising:

i $1000

ii $0

1

350

9 450

2

450

10 070

3

400

9 380

4

500

9 110

5

250

5 220

6

150

3 100

7

350

8 060

8

300

7 030

9

550

11 500

10

600

12 870

11

550

10 560

12

450

9 850

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

578

23.5

Fitting a line to a scatterplot by eye is not generally the best way of modelling a relationship.

What is required is a method which uses a more objective criterion. A simple method is two

divide the data set into two halves on the basis of the median x value, and to fit a line which

passes through the mean x and y values of the lower half, and the mean x and y values of the

upper half. This is called the two-mean line, and while easy to determine, it is not very widely

used. The most common procedure is the method of least squares. The least squares

regression line is the line for which the sum of squares of the vertical deviations from the data

to the line (as indicated in the diagram) is a minimum. These deviations are called the

residuals.

PL

E

P1: FXS/ABE

y

35

30

y = a + bx

25

20

(xi, yi)

15

SA

M

10

5

10

We would like to find a and b such that the sum of the residuals is zero. That is,

n

(yi a bxi ) = 0

i=1

n

i=1

n

(yi a bxi )2

i=1

n

From 1 ,

n

i=1

(yi a bxi ) = 0

i=1

yi na b

n

xi = 0

i=1

y a b x = 0

a = y b x

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

579

S=

n

bxi ]2

[yi ( y b x)

i=1

n

2

=

[(yi y ) b(xi x)]

i=1

n

i y ) + b2 (xi x)

2]

=

[(yi y )2 2b(xi x)(y

i=1

PL

E

In order to find the value of b which minimises S, we will differentiate with respect to b and

set the derivative equal to zero.

n

n

dS

i y ) + 2b

2

= 2

(xi x)(y

(xi x)

db

i=1

i=1

=0

n

i y )

(xi x)(y

Simplifying gives

b=

i=1

n

2

(xi x)

i=1

SA

M

Equations 3 and 4 can then be used to calculate the least squares estimates of the y axis

intercept and the slope.

The calculator can be used to construct the least squares regression line. The procedure

is illustrated using the age and price of car data from Example 2.

It has previously been illustrated how to

enter the data as two lists, age and price

respectively, and from these construct the

scatterplot in a Data & Statistics

application resulting in the data displayed

as shown.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

580

Regression submenu of the Analyze menu

(b 4 6 1) to place the regression

line on the scatterplot as shown.

PL

E

P1: FXS/ABE

The following data gives the heights (in cm) and weights (in kg) of 11 people.

Height (x)

Weight (y)

177

74

182

75

167

62

178

63

173

64

184

74

162

57

169

55

164

56

170

68

180

72

SA

M

shown. Tap OK to produce the results

shown in the second screen. The format of

the formula, y = ax + b is shown at the

top and the values of a, b are shown in the

answer box.

Tap OK to produce a scatterplot

showing the regression line.

The formula can be automatically copied into a selected

entry line if

desired by selecting a graph number, e.g. y1 in the Copy Formula line.

Note:

After the equation of the least squares line has been determined, we can interpret the

intercept and slope in terms of the problem at hand, and use the equation to make predictions.

The method of least squares is also sensitive to any outliers in the data.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

581

Example 8

Consider again the gold medal winning distance, in metres, for the mens long jump for

the Olympic games for the years 1896 to 2004.

Find the equation of the least squares regression line for these data, and use it to predict

the winning distance for the year 2008.

Year

Distance (m)

Year

Distance (m)

Solution

1896

6.35

1960

8.13

1900

7.19

1964

8.08

1904

7.34

1968

8.92

1908

7.49

1972

8.26

1912

7.59

1976

8.36

1920

7.16

1980

8.53

1924

7.44

1984

8.53

1928

7.75

1988

8.72

1932

7.65

1992

8.67

1936

8.05

1996

8.50

1948

7.82

2000

8.55

1952 1956

7.57 7.82

2004

8.59

PL

E

P1: FXS/ABE

distance = 23.87 + 0.0163 year

The predicted distance for the year 2008 is

distance = 23.87 + 0.0163 2008 = 8.86 m

Example 9

SA

M

Consider again the data from Example 7 which related alcohol consumption per head (in litres)

and the hospital admission rate to each of the regions of Victoria in 199495.

Region

LoddonMallee

Grampians

Barwon

Gippsland

Hume

Western Metropolitan

Northern Metropolitan

Eastern Metropolitan

Southern Metropolitan

(litres of alcohol)

9.0

8.4

8.7

9.1

10.0

9.0

6.7

6.2

8.1

Hospital admissions

per 1000 residents

42.0

44.7

38.6

44.7

41.0

40.4

36.2

32.3

43.0

Find the equation of the least squares regression line which fits these data.

Solution

admissions = 19.9 + 2.45 alcohol

which is slightly different from the line fitted by eye.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

582

The existence of even a strong linear relationship between two variables is not, in itself,

sufficient to imply that altering one variable causes a change in the other. It only implies that

this might be the explanation. It may be that both the measured variables are affected by a third

and different variable. For example, if data about crime rates and unemployment in a range of

cities were gathered, a high correlation would be found. But could it be inferred that high

unemployment causes a high crime rate? The explanation could be that both of these variables

are dependent on other factors, such as home circumstances, peer group pressure, level of

education or economic conditions, all of which may be related to both unemployment and

crime rates. These two variables may vary together, without one being the direct cause of the

other.

PL

E

P1: FXS/ABE

Exercise 23E

Example

1 The following data give a girls height (in cm) between the ages of 36 months and

60 months.

Age (x)

Height (y)

36

84

40

87

44

90

52

92

56

94

60

96

SA

M

a Using the method of least squares find the equation of a straight line which relates the

two variables.

b Interpret the intercept and slope, if appropriate.

c Use your equation to estimate the girls height at age

i 42 months

ii 18 years

d How reliable are your answers to part c?

successive years for various districts in

one state are given in the following table.

Using the method of least squares find

the equation of a straight line which

relates the two variables.

District

A

B

C

D

E

F

G

H

I

J

K

Year 1 (x)

3 233

2 363

4 591

4 317

2 474

3 679

5 016

6 234

6 350

4 072

2 137

Year 2 (y)

2 709

2 208

3 685

4 038

2 792

3 292

4 402

5 147

5 555

4 004

1 980

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

Example

583

3 The following table gives the adult heights (in cm) of ten pairs of mothers and daughters.

Mother (x)

Daughter (y)

170 163 157 165 175 160 164 168 152 173

178 175 165 173 168 152 163 168 160 178

PL

E

a Using the method of least squares find the equation of a straight line which relates the

two variables.

b Interpret the slope in this context.

c Estimate the adult height of a girl whose mother is 170 cm tall.

4 The manager of a company which manufactures MP3 players keeps a weekly record of the

cost of running the business and the number of units produced. The figures for a period of

eight weeks are:

Number of MP3

players produced (x)

Cost in 000s $ (y)

2.5 3.3 2.4 2.6 4.1 3.1 3.5 3.8

a Using the method of least squares find the equation of a straight line which relates the

two variables.

b What is the manufactures fixed cost for operating the business each week?

c What is the cost of production of each unit, over and above this fixed operating cost?

SA

M

5 The amount of a particular pain relief drug given to each patient and the time taken for the

patient to experience pain relief are shown.

Patient

Drug dose (mg)

Response time (min)

1

2

3

4

5

6

7

8

9 10

0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

65 35 15 10 22 16 10 18 70 20

a Using the method of least squares find the equation of a straight line which relates the

two variables.

b Interpret the intercept and slope if appropriate.

c Use your equation to predict the time taken for the patient to experience pain relief

if 6 mg of the drug is given. Is this answer realistic?

recorded the amount spent on advertising

in the local paper and the business income

for each month for a year, with the

results shown.

a Using the method of least squares find

the equation of a straight line which

relates the two variables.

b Interpret the intercept and slope if

appropriate.

Month

1

2

3

4

5

6

7

8

9

10

11

12

Advertising ($)

350

450

400

500

250

150

350

300

550

600

550

450

Business ($)

9 450

10 070

9 380

9 110

5 220

3 100

8 060

7 030

11 500

12 870

10 560

9 850

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

584

c Use your equation to predict the volume which would be attracted if the proprietor of

the salon spent the following amounts on advertising.

i $1000

ii $0

How to construct a scatterplot

Construct a scatterplot for the following set of test scores.

Treat Test 1 as the independent (x-) variable.

Test 1 score

Test 2 score

PL

E

P1: FXS/ABE

10

12

18

20

13

11

6

9

8

6

5

6

12

12

15

13

15

17

Editor.

Type the data into list1 and list2.

Setup the calculator to plot a statistical graph.

Press ENTER to select 1:Plot Setup.

Press F1 to Define Plot1.

Complete the dialogue box as follows:

r For Plot Type: select 1:Scatter

SA

M

a

b

c

d

you to the Plot Setup menu.

plots the scatter plot in a properly scaled window.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

585

Use a calculator to calculate the correlation coefficient r for the following data.

x

y

1

2

3

5

5

7

4

2

7

9

PL

E

Enter the data into your calculator using the

Stats/List Editor.

Type the data into list1 and list2.

Calculate the correlation coefficient.

a Press F4 to access the Calculate menu.

b Move the cursor down ( ) to Regressions and

across ( ) to the Regressions menu.

c Press ENTER to select 1: LinReg(a+bx). This will

take you to the LinReg(a+bx) dialogue box.

SA

M

r For Xlist: type in list1.

r For Ylist: type in list2.

e Press

ENTER

The following data gives the heights (in cm) and weights (in kg) of 11 people.

Height (x) 177 182 167 178 173 184 162 169 164 170 180

Weight (y) 74 75 62 63 64 74 57 55 56 68 72

Use a graphics calculator to determine the equation of the least squares regression line

that will enable weight to be predicted from height.

Enter the data into your calculator using the Stats/List Editor.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

586

a Press F4 to access the Calculate menu.

b Move the cursor down (D) to Regressions and

across (B) to the Regressions menu.

c Press ENTER to select 1: LinReg(a+bx). This will take

you to the LinReg(a+bx) dialogue box.

d Complete the dialogue box.

r For Xlist: type in list1.

r For Ylist: type in list2.

ENTER

SA

M

e Press

PL

E

P1: FXS/ABE

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

587

SA

M

PL

E

Bivariate data arises when measurements on two variables are collected for each subject.

A scatterplot is an appropriate visual display of bivariate data if both of the variables are

numerical.

A scatterplot of the data should always be constructed to assist in the identification of

outliers and illustrate the association (positive, negative or none).

Two variables are positively associated when larger values of y are associated with larger

values of x. Two variables are negatively associated when larger values of y are associated

with smaller values of x. There is no association between two variables when the values of

y are not related to the values of x.

When constructing the scatterplot, the independent or explanatory variable is plotted on the

horizontal (x) axis, and the dependent or response variable is plotted on the vertical (y) axis.

If a linear relationship is indicated by the scatterplot a measure of its strength can be found

by calculating the q-correlation coefficient, or Pearsons product-moment correlation

coefficient, r.

If the values on a scatterplot are divided by lines representing the median of x and the

median of y into four quadrants A, B, C and D, with a, b, c, d representing the number of

points in each quadrant respectively, then the q-correlation coefficient is given by

(a + c) (b + d)

q=

a+b+c+d

Pearsons product-moment correlation, r, is a measure of strength of linear relationship

between two variables, x and y. If we have n observations then for this set of observations

n

xi x

yi y

1

r=

n 1 i=1

sx

sy

where x and sx are the mean and standard deviation of the x scores and y and s y are the

mean and standard deviation of the y scores.

For these correlation coefficients

1 q 1

1 r 1

with values close to 1 indicating strong correlation, and those close to 0 indicating little

correlation.

If a linear relationship is indicated from the scatterplot a straight line may be fitted to the

data, either by eye or using the least squares regression method.

The least squares regression line is the line for which the sum of squares of the vertical

deviations from the data to the line is a minimum.

The value of the slope (b) gives the extent of the change in the dependent variable

associated with a unit change in the independent variable.

Once found, the equation to the straight line may be used to predict values of the response

variable (y) from the explanatory variable (x). The accuracy of the prediction depends on

how closely the straight line fits the data, and an indication of this can be obtained from the

correlation coefficient.

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

Review

Chapter summary

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Essential Advanced General Mathematics

Multiple-choice questions

PL

E

1 For which one of the following pairs of variables would it be appropriate to construct a

scatterplot?

A eye colour (blue, green, brown, other) and hair colour (black, brown, blonde, red, other)

B score out of 100 on a test for a group of Year 9 students and a group of Year 11 students

C political party preference (Labor, Liberal, Other) and age in years

D age in years and blood pressure in mm Hg

E height in cm and gender (male, female)

2 For which one of the following plots would it be appropriate to calculate the value of the

q-correlation coefficient?

A

C strong positive

B moderate positive

A weak positive

E moderately strong

D close to zero

SA

M

Review

588

4 The scatterplot shows the relationship between age and the number of alcoholic drinks

consumed on the weekend by a group of people.

The value of the q-correlation coefficient is closest to

7

5

7

E 1

D

C

A 1

B

9

6

9

20

No. of drinks

15

10

0

10

20

30

40

Age

50

60

70

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

589

220

200

Weight

180

160

140

120

100

80

150

160

170 180

Height (cm)

190

200

SA

M

140

The weekly income and weekly expenditure on food for a group of 10 university students is

given in the following table.

Weekly

income ($)

150 250 300 600 300 380 950 450 850 1000

Weekly food

expenditure ($) 40 60 70 120 130 150 200 260 460 600

6 The value of the Pearson product-moment correlation coefficient r for these data is closest

to

E 0.8

D 0.7

C 0.6

B 0.4

A 0.2

7 The least squares regression line which would enable expenditure on food to be predicted

from weekly income is closest to

B 0.482 42.864 weekly income

A 0.482 + 42.864 weekly income

D 239.868 + 1.355 weekly income

C 42.864 + 0.482 weekly income

E 1.355 + 239.868 weekly income

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

Review

5 The following scatterplot shows the relationship between height and weight for a group of

people.

The value of the Pearsons product-moment correlation coefficient r is closest to

E 0

D 0.3

C 0.5

B 0.8

A 1

PL

E

P1: FXS/ABE

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Essential Advanced General Mathematics

Suppose that the least squares regression line which would enable expenditure on

entertainment (in dollars) to be predicted from weekly income is given by

Weekly expenditure on entertainment = 40 + 0.10 Weekly income

PL

E

8 Using this rule the expenditure on entertainment by an individual with an income of $600

per week is predicted to be

E $240

D $46

C $100

B $24 060

A $40

9 From this rule which of the following statements is correct?

A On average for each extra dollar of income an extra 10 cents is spent on entertainment

B On average for each extra 10 cents in income an extra $1 is spent on entertainment

C On average for each extra dollar of income an extra 40 cents is spent on entertainment

D On average people spend $40 per week on entertainment

E On average people spend $50 per week on entertainment

10 For the scatterplot shown the line of best fit would

have a slope closest to:

C 10

B 0.1

A 0.1

E 200

D 10

250

200

150

100

SA

M

Review

590

50

10

12

14

16

18

20

22

Short-answer questions

1 The following table gives the number of times

Inside 50

the ball was inside the 50 metre line in an AFL

64

football game, and the teams score in that game

57

a Plot the score against the number of Inside 50s.

34

b From the scatterplot, describe any association

61

between the two variables.

51

q-correlation between the score and the number of

Inside 50s.

52

53

51

64

55

58

71

Score (points)

90

134

76

92

93

45

120

66

105

108

88

133

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

591

Distance (kms)

Time (mins)

12

15

50

75

40

50

25

50

45

80

20

50

10

10

3

5

10

10

30

35

4 The following scatterplot shows the relationship between height and weight for a group of

people. Draw a straight line which fits the data by eye, and find an equation for this line.

220

200

weight

180

160

140

120

SA

M

100

80

140

150

160

170

180

height (cm)

190

errors on the task, were recorded for a sample of 10

primary school children. Determine the equation of

the least squares regression line which fits these data.

6 For the data in 5:

a Interpret the intercept and slope of the least

squares regression line.

b Use the least squares regression line to predict

the number of errors which would be observed

for a child who took 10 seconds to complete the

task.

200

Time (seconds)

22.6

21.7

21.7

21.3

19.3

17.6

17.0

14.6

14.0

8.8

Errors

2

3

3

4

5

5

7

7

9

9

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

Review

3 The distance traveled to work and the time taken for a group of company employees are

given in the following table. Determine the value of the Pearson product-moment

correlation r for these data.

PL

E

P1: FXS/ABE

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Essential Advanced General Mathematics

Extended-response questions

1 A marketing company wishes to predict the likely number of new clients each of its

graduates will attract to the business in their first year of employment, by using their scores

on a marketing exam in the final year of their course.

PL

E

Number of new

and which is the dependent variable?

Exam score

clients

b Construct a scatterplot of these data.

65

7

c Describe the association between the

72

9

Number of new clients and Exam

68

8

score.

85

10

d Determine the value of the q-correlation

74

10

coefficient for these data, and classify

61

8

the strength of the relationship.

60

6

e Determine the value of the Pearson

78

10

product-moment correlation coefficient

70

5

for these data and classify the strength

82

11

of the relationship.

f Determine the equation for the least squares regression line and write it down in terms of

the variables Number of new clients and Exam score.

g Interpret the intercept and slope of the least squares regression line in terms of the

variables in the study.

h Use your regression equation to predict to the nearest whole number the Number of new

clients for a person who scored 100 on the exam.

i How reliable is the prediction made in h?

SA

M

Review

592

2 To investigate the relationship between marks on an assignment and the final examination

mark a sample of 10 students was taken. The table indicates the marks for the assignment

and the final exam mark for each individual student.

a Which is the independent variable

Assignment mark Final exam mark

and which is the dependent variable?

(max = 80)

(max = 90)

b Construct a scatterplot of these data.

80

83

c Describe the association between the

77

83

assignment mark and exam mark.

71

79

d Determine the value of the q-correlation

78

75

coefficient for these data, and classify the

65

68

strength of the relationship.

80

84

e Determine the value of the Pearson

68

71

product-moment correlation coefficient

64

69

for these data and classify the strength of

50

66

the relationship.

66

58

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

P1: FXS/ABE

P2: FXS

9780521740494c23.xml

CUAU033-EVANS

11:42

Chapter 23 Investigating the relationship between two numerical variables

593

PL

E

SA

M

3 A marketing firm wanted to investigate the relationship between airplay and CD sales (in the

following week) of newly released songs. Data was collected on a random sample of 10 songs.

a Which is the independent variable and which

No. of times the Weekly sales

is the dependent variable?

song was played

of the CD

b Construct a scatterplot of these data.

47

3950

c Describe the association between the number

34

2500

of times the song was played and weekly sales.

40

3700

d Determine the value of the q-correlation

34

2800

coefficient for these data, and classify the strength

33

2900

of the relationship.

50

3750

e Determine the value of the Pearson

28

2300

product-moment correlation coefficient for these

53

4400

data and classify the strength of the relationship.

25

2200

f Determine the equation for the least squares

46

3400

regression line and write it down in terms of the

variables Number of times the song was played and Weekly sales.

g Interpret the intercept and slope of the least squares regression line in terms of the

variables in the study.

h Use your regression equation to predict the weekly sales for a song which was played 60

times.

i How reliable is the prediction made in h?

2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

Review

f Use your answer to d to comment on the statement: Good final exam marks are the

result of good assignment marks.

g Determine the equation for the least squares regression line and write it down in terms

of the variables Final exam mark and Assignment mark.

h Interpret the intercept and slope of the least squares regression line in terms of the

variables in the study.

i Use your regression equation to predict the Final exam mark for a student who scored 50

on the assignment.

j How reliable is the prediction made in i?

- Macroeconomic Data AnalysisUploaded byScott W. Hegerty
- Pearson r Spss MethodUploaded byFayne Conadera
- CorrelationUploaded byPrince Aman
- Correlation SPSSUploaded bySami119
- Estimize Bull speed using Back propagationUploaded byIJMER
- Correlation AnalysisUploaded byAnkit Patel
- Topic03 Correlation RegressionUploaded byVijay Phani Kumar
- BBM 3rd Sem Syllabus 2014.pdfUploaded byBhuwan
- GAISE - Appendices - August 2005 - p89-p107Uploaded byDennis Ashendorf
- Correlation and RegressionUploaded byNjuh Polki
- Presentation Chapter 6 MTH1022Uploaded byNorlianah Mohd Shah
- Analyzed Univariate and BivariateUploaded byAlbertus Budi
- Document(2)Uploaded byDavid Alvarez Miranda
- Correlation AnalysisUploaded byChristian Surio Ramos
- Lecture_2.1Uploaded bysharad1204
- bigreview-burgers-alecwilsonUploaded byapi-340088425
- Correlation Coefﬁcients of Probabilistic Hesitant Fuzzy Elements and Their Applications to Evaluation of the AlternativesUploaded byMia Amalia
- Spss Last AsignmentUploaded byaminasaeed
- 1Uploaded byKhairul Aznim
- penyusunanUploaded byRiki Shafar
- APR 2011 ~~Uploaded bysarah
- 2261.PDFUploaded byTessfaye Wolde Gebretsadik
- SEC_case_studyUploaded bySaeesh Prabhu Desai
- Effect of Utilizing Geometer’s Sketchpad Software on Students’ Academic Achievement in Mathematics’ Training at High SchoolsUploaded bytheijes
- Hasil Uji Spss Pk IwanUploaded byrizkyallah gsp
- Ap12 Statistics q1(1).UnlockedUploaded byMelissa Chapman
- Session 1 Part2A Handout (1)Uploaded byAnisha Jain
- Correlation and RegressionUploaded byArunkumar
- PACFUploaded byMasoud Barati
- RM Unit 4.2Uploaded byKarthickKrishna

- Statistics LectureUploaded byPamelyn Faner Yao
- D1203-94Uploaded byHarleyBenavidesHenriquez
- Generalized Weibull DistributionsUploaded byChaparrón Bonaparte
- Statistics Unit 5 NotesUploaded bygopscharan
- Control Charts for VariablesUploaded byDr. Mahmoud Abbas Mahmoud Al-Naimi
- Response Reviewer CommentsUploaded byzameershah
- Correlation after midterm 3.pptUploaded byNataliAmiranashvili
- thesis soil erosionUploaded byChoo Fu-Kiat
- PHY12L E301Uploaded byArvn Christian Santicruz Flores
- Solution Hayashi Cap 1Uploaded bysilvio de paula
- Predicting the Seam Strength of Notched Webbings for Parachute Assemblies Using TheUploaded bynguyenthuyngoc71
- Manual SSHUploaded byAnonymous h5uTxGvc
- An Introduction to the Extended Kalman FilterUploaded byVu Duc Truong
- Business Analysis Using Regression - A CasebookUploaded byMichael Wen
- Measures of DispersionUploaded byJitin Chaurasia
- Seven-Basic-Quality-Tools.pdfUploaded byjoselyn
- Corelation CoeficientUploaded bynitin30
- Assignment 1Uploaded byAshish
- 720059Uploaded byAnggun Aiyla Nova
- Probability TestUploaded byJoseph
- VonBertalanffyUploaded byCaducas Rocha Duarte
- Book2 NeuralUploaded byJos2
- 04MA246L4A.pdfUploaded byKien Thai
- 15. App Nfluence of Variables on Yield of Soybean in Raisen District of Madhya PradeshUploaded byImpact Journals
- Melissa Dell - Path Dependence in Development Evidence From the Mexican RevolutionUploaded byLuis Gonzalo Trigo Soto
- Mendelian Inheritance in CornUploaded byVincent Lee
- Determine of Dew Points and Bubble PointsUploaded byArif Iskandar
- Rpg SpeciesUploaded byChris Shanahan
- 05.04.16 Mariners Minor League ReportUploaded byAnonymous rHnsf1r
- IRM Chapter 6e 05Uploaded byenufeffizy