Attribution Non-Commercial (BY-NC)

429 views

Attribution Non-Commercial (BY-NC)

- Unit 4 ( CURVES )
- Unit 4 ( STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )
- Unit 2 ( PROBABILITY DISTRIBUTIONS )
- Unit 3 ( SAMPLE AND SAMPLE DISTRIBUTIONS )
- Paired data, correlation & regression
- Correlation and Regression
- Unit 17 Correlation and Regression
- Unit 1 ( PROBABILITY THEORY ) - statistik
- Correlation and Regression
- Unit 2 ( TECHNIQUE IN PLANNING AND PRE-CONSTRUCTION OF ROADWORK )
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )
- Unit 2 ( CHARACTERISTICS OF MATERIALS )
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )
- BSD
- Unit 5 ( LAND DEALINGS )
- concrete slab
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )
- MB 0024 Set2
- PM0010 Answers
- Albright DADM 5e_PPT_Ch 10

You are on page 1of 23

C 5606 / 5/ 1

UNIT 5

CORRELATION AND REGRESSION

OBJECTIVES

General Objective To understand and apply the concept of correlation and regression Specific Objectives At the end of the unit, you should be able to: Draw a scatterplot for a set of ordered pairs Compute the correlation coefficient Compute the equation of the regression line

C 5606 / 5/ 2

INPUT

5.0 CORRELATION So far we have considered the statistics of one variable. Of course we sometimes get data involving two variables. For example, look at the marks obtained on two Mathematics paper by a group of students below. Student Paper 1 Paper 2 A 42 31 B 84 83 C 50 42 D 42 60 E 33 28 F 50 63 G 69 59 H 81 92 I 50 73 J 35 40

So what can we find out from the data ? Students B and H have done very well on both papers, E has done very badly on both papers, student I has done much better on paper 2 than paper 1. A graph might help us to make more sense of the data, as would the average (mean) mark for papers 1 and 2. The most useful type of graph is a scatter diagram.

C 5606 / 5/ 3

If we plot the data as points, with marks for Paper 1 on the x- axis and for paper 2 on the y-axis, we obtain a graph like the one shown heree. Note that we do not need to start the scales at zero.

We see that the points go roughly from bottom left to top right(this is made clearer by enclosing the points as shown below.

C 5606 / 5/ 4

From the data the mean value for paper 1 And for paper 2

y = 57.1

x = 53.6

We now plot the line x = 53.6 and y = 57.1 on the scatter diagram:

The line divide the graph into four quadrants : Top Right All points have both x values and y values greater than their respective means i.e. (x x ) <0, (y - y ) < 0. The product would be positive. Bottom Left All points have both x values and y values less than their respective means i.e. (x x ) <0, (y - y ) < 0. The product would be positive. Top left x values less than x , y values greater than y . Product negative. Bottom right x values greater than x , y values less than y . Product negative. Look at the scattergrams (scatter diagrams) below. The patterns seem to be very different.

C 5606 / 5/ 5

Roughly speaking: Positive correlation the higher the value of x, the higher the value of y. Negative correlation the higher value of x, the lower value of y. Zero correlation no fixed relationship between x and y. Again this is made clearer by drawing the lines y = y , x = x .

You have met scatter diagrams in your work of which you may have drawn a line of best fit on the graph in order to estimate a value of y given a value of x. The line was drawn by eye but you would know that the line passes through the mean values of ( x , y ) as shown below.

C 5606 / 5/ 6

The lines on the first two diagrams are relatively easy to draw, but where do we draw a line on the third and having drawn it, would it be of any practical use? Notice that we have been looking for a special type of relationship between the x and y values a straight line or linear relationship. The fact that we cant find such a relationship does not mean that there is no relationship at all. The product-moment formula for determining the linear correlation coefficient The convention of dealing with data Horizontal (x) axis The independent variable

Vertical (y) axis The dependent variable Let us look at some data on the height of students and the distance they can throw a cricket ball. Height (x) cm Distance (y) m 122 41 124 38 133 52 138 56 144 29 156 54 158 59 161 61 164 63 168 67

Just looking at the data, a general response might be the taller a person, the further they can throw a cricket ball. (apart from the odd person!)

C 5606 / 5/ 7

C 5606 / 5/ 8

One of the measures of the degree of linear correlation between two variables is called the coefficient of correlation, denoted by the symbol r. The coefficient of correlation for two variables, say X and Y, is given by:

r=

[( X X )

( X X )(Y Y )

2

(Y Y ) 2

oe simply =

[( x

xy

2

)( y 2 )

The value of the correlation coefficient ranges from +1 for a perfect correlation to -1 for a perfect negative correlation

Example 5.1 a) Determine the coefficient of correlation between X and Y based on the data below. X Y 4 12 5 10 6 8 9 6

b) The data given below gives the experimental values obtained for the torque output from an electric motor, X, against the current taken from the supply, Y. Determine the value, degree and nature of the coefficient of linear correlation between the variables X and Y (if there is one). X Y 0 4 1 6 2 6 3 6 4 8 5 10 6 10 7 10 8 14 9 12

C 5606 / 5/ 9

Y =36

3 x=XX

4 y = Y- Y 3 1 -1 -3

5 xy -6 -1 0 -9

6 x2 4 1 0 9

x 2 = 14

7 y2 9 1 1 9

y 2 = 20

X = 24

X = 24 =6 4

-2 -1 0 3

Y =

36 =9 4

xy = 16

r= b)

[( x

xy

2

)( y 2 )

[ (14)(20)]

16

16 280

= 0.9562

x= X 0 1 2 3 4 5 6 7 8 9 Y 4 6 6 6 8 10 10 10 14 12

X X

y=

Y Y

x == 45 45 X = = 4 .5 10

y = 86 86 Y = = 8.6 10

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

-4.6 -2.6 -2.6 -2.6 -0.6 1.4 1.4 1.4 5.4 3.4

xy 20.7 9.1 6.5 3.9 0.3 0.7 2.1 3.5 18.9 15.3

x2 20.25 12.25 6.25 2.25 0.25 0.25 2.25 6.25 12.25 20.25

x 2 = 82 .5

y2 21.16 6.76 6.76 6.76 0.36 1.96 1.96 1.96 29.16 11.56 y 2 = 88.4

xy = 81 . 0 81 = 0.95

r=

[( x

xy

2

)( y 2 )

C 5606 / 5/ 10

ACTIVITY 5A

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT...! 1. Determine the coefficient of correlation up to 4 decimal places between X and Y based on the data below. X Y 122 41 124 38 133 52 138 56 144 29 156 54 158 59 161 61 164 63 168 67

2.

The co-ordinates given below refer to an experiment to verufy Newtons law of cooling over a limited range of values. Determine the value, degree and nature of the coefficient of correlation. Time (min) Temperatuer (oC) 4 46 8 34 10 30 12 26 16 24 22 20

3. The following results were obtained experimentally when verifying Hookes law: Load (N) Extension (mm) 2 2 5 23 8 62 11 119 15 223

4. The thickness of case-hardening achieved varies with temperature and some coordinated obtained by experiment are as shown. Temperature (oC) 400 420 Thickness (m) 3.7 3.4 350 3.7 320 3.8 400 3.6 480 3.3 440 3.4 370 3.7

C 5606 / 5/ 11

FEEDBACK TO ACTIVITY 5A

1. 2. 3. 4.

C 5606 / 5/ 12

INPUT

5.2 LEAST SQUARES REGRESSION LINE Scatter Diagrams Line Of the Best We have already referred to the drawing of a line of best fit by eye

Thev only calculation involved determining x dan y , since the line of best fit passes through the point ( x , y ). From the line you might be expected to estimate a y value given an x- value. Of course, by eye line fitting is a subjective matter, trying to minimise the distances between the points and the line. A mathematical computation method is available to produce two lines : known as y and x ( to estimate value of y) and x on y ( to estimate values of x) These are known as (Linear) Regression Lines or Least-Squares Regression Lines.

C 5606 / 5/ 13

Scatter Diagrams The y on x Regression Line Since the line must pass through (( x , y ), the parameters that can vary are the gradient of the line and the point where the line cuts the y axis. The equation of the line will be of the form y = a + bx y on x ( some syllabuses use Greek letters and instead of a and b)

The y on x line minimises the sum of the squares of the vertical distances from the points to the regression line ( the square of the distance is used to ensure a positive result). As with correlation there is a formula derived from a proof and a corresponding computational method. The proof is not required at A/AS Level )

(x y ( x ) n n

2

For y = a + bx

b =

xy x

2

a = y -b x

C 5606 / 5/ 14

x y

2.5 3.5

4 3

8 6.5

5 7

7 8

9.5 11

2

8.5 9

12.5 10.5

12.5 13

x = 8.4

14.5 13

y

8.45

Calculate the regression line y on x. b) Based on the data alreday calculated, find the regression line y on x and estimate the value of y when x = 160

x y xy = 1468 = 520 = 77689 x = 218070 n = 10

2

x = 8.4

b =

xy x

2

(x y ( x) n n

2

827

= 0.8377

a= y -bx

C 5606 / 5/ 15

y = 1.4133 + 0.8377x We can now use this equation to calculate ( estimate) a value of y for a given value of x . For example . Find a value for y given x = 10 Substituting y = 1.4133 + (0.8377 x 10)

Finding a value from within the range of x is called interpolation Warning . Estimation a value from outside the data range ( say x = 20 ) is called extrapolation and should bec avoided ( at all cost ) since you do not know that the relationship between x and y will hold for larger and smaller values than those recorded. b) For the regression line y on x,

b =

xy x

2

(x y ( x) n n

2

77689

= 0.5270

a = y - (b x )

= 52 - (0.5270 x 146.8 )

= - 25.3636

So, regresson line is y = -25.3636 + 0.5270x When x = 160, y = -25.3636 + (0.5270 x 160) = 58.96

C 5606 / 5/ 16

ACTIVITY 5B

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT...! a. The table shows the results for a number of athletes. X represents long jump (metres )

x y xy x = 19 = 66 = 126.22 = 36.44 n = 8

2

X 1.8 2.1 1.9 2.0 1.8 1.8 1.6 1.8 1.9 2.3 19

y 6.7 7.6 6.3 6.8 5.9 7.9 5.5 5.6 6.5 7.2 66

x2 3.24 4.41 3.61 4.00 3.24 3.24 2.56 3.24 3.61 5.29 36.44

y2 44.89 57.76 39.69 46.24 34.81 62.41 30.25 31.36 42.25 51.84 441.5

xy 12.06 15.96 11.97 13.6 10.62 14.22 8.8 10.08 12.35 16.56 126.22

Calculate the values of b for the regression line y = a + bx b. The length y metres of a cable subjected to a load of x kilograms is given by y = + x. In an experiment to estimate and for a particular cable, the value of of y was measured for each of x . The following quantities were calculated from the 15 pair of values.

x y xy = 225 = 238 = 3581 x = 3625

2

C 5606 / 5/ 17

x y xy x = 21 = 43 = 171 = 91 n = 6

2

y = 335

2

i) ii)

Calculate the equation of the regression line of y on x . Give your answer in the form y = a + bx, where the values of a and b should be stated to 3 significant figures. It is required to estimate the value of y for a given value of x. State circumstances under which the regression line of x and y should be used, rather than the regression line of y and x

C 5606 / 5/ 18

FEEDBACK TO ACTIVITY 5B

a. b. c.

i) a = 3.0688, regression line is y = 3.07 + 1.17 ( 3 significant figures) ii) Use regression line of x on y to estimate value of x when y is the independent variable.

C 5606 / 5/ 19

SELF ASSESSMENT 5

You are approaching success. Try all the questions in this self-assessment section and check your answers given on the next page. If you encounter any problems, consult your instructor. Good luck. 1. The data given below refers to the relationship between man-hours worked and production achieved in a factory. Determine the coefficient of correlation. Index of production man-hour 100 basis Index of production, 94 actual basis

97 91

100 100

101 105

93 84

103 112

91 83

89 80

110 123

86 78

2. The number of man-days lost per week due to sickness in two similar departments of a factory are show for a 12-week period. Department A Department B 2 0 1 8 1 8 2 1 19 18 21 20 17 17 18 19 12 16 16 15 14 15 17 18 13 16 15 18

Determine the coefficent of correlation and comment on its degree and nature.

C 5606 / 5/ 20

3. The masses and height for ten people were measured and the results are as shown. Mass 38 (kg) Height 135 (cm) 38 140 38 137 44 141 44 147 51 145 32 132 51 149 77 164 32 130

Calculate the coefficient of correlation for this data 4. The relationship between the pressure and volume of a gas was measured and the follwowing results were obtained : Pressure 58 (kPa) Volume 0.36 3 (m ) 62 0.97 67 0.43 73 0.52 81 0.48 81 0.29 86 0.31 92 0.75 104 0.27

Determine the coefficient of correlation and comment on the result obtained. 5. The caloric intake of rats varies with body mass as shown below. Body mass (g) Caloric Intake (cal h-1 2.0 3.1 2.1 1.5 Is there a linear correlation between these results ? 3.6 3.2 4.6 3.6 5.0 3.6 6.0 3.9 7.0 4.1 8.0 4.2 8.5 4.5 9.0 4.6 10.0 5.9

C 5606 / 5/ 21

6. Determine the coefficient of correlation for the data given below and test the null hypothesis that = 0 at a level of significance of 0.1. The datagiven relates the number of hours of sunshime per week to the hours lost due to sickness. Hours of 10 sunshine/week Hous lost due 90 to sickness 13 75 15 75 17 65 18 55 20 45 22 55 23 45 24 35

7. The length y metres of a cable subjected to a load of x kilograms is given by y = + x. In an experiment to estimate and a particular cable, the value of y was measured for each of 15 values of x. The following quantities were calculated from the pairs of values.

x y xy x = 225 = 238.5 = 3581 = 3625

2

a)

x y xy x = 21 = 43 = 171 = 91 n = 6

2

y = 335

2

i) ii)

Calculated the equation of regression line of y and x. Give your answer in the form y = a + bx, where the values of a and b should be stated to 3 significant figures. It is required to estimate the value of y for a given value of x. State circumstances under which the regression line of x and y should be used, rather than the regression line of y on x

9. The data given below is relationship between the heights and masses of ten people. Height, 175 X cm Mass, 82 Y kg 180 78 193 86 165 72 187 91 171 80 198 95 168 72 184 89 177 74

Determine the equation of the regression line of mass on height, expressing the regression coefficients correct to two decimal places.

C 5606 / 5/ 22

10. The power needed to drive a lathe increase as the cutting angle of the tool increase when cutting a constant speed and depth of cut. The relationship for mild steel is : Cutting 50 angle (degrees)X Power 6.2 (kW)Y 55 6.8 60 7.6 65 8.2 70 8.1 75 8.8 80 9.7 85 10.0 90 10.4

Determine a) the equation of the regression line of power on cutting angle and b) the equation of the regression line of cutting angle on power, expresing the regression coefficients correct to three significant figures in each case.

C 5606 / 5/ 23

Have you tried all the questions?? If YES, check your answers now. 1. 2. 3. 4. 5. 6. 7. 8. 0.97 0.70 , fair direct 0.97 -0.31, It is probable that the measurements were made at different Temperatures r = 0.94, hence there is a good, direct correlation. r = -0.95, t.99

= 1.42

I tI = 8.05

hypothesis is rejected

= 15.69 = 0.014

y= 15.69 + 0.014x

i) y = 3.07 + 1.17x ii) use regression line of x and y to estimate value of x when y is the independent variable. y = -036.83 + 0.66x a) Y = 1.14 + 0.104 X b) X = -9.27 + 9.41Y

9. 10.

- Unit 4 ( CURVES )Uploaded byZara Nabilah
- Unit 4 ( STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )Uploaded byZara Nabilah
- Unit 2 ( PROBABILITY DISTRIBUTIONS )Uploaded byZara Nabilah
- Unit 3 ( SAMPLE AND SAMPLE DISTRIBUTIONS )Uploaded byZara Nabilah
- Paired data, correlation & regressionUploaded byvelkus2013
- Correlation and RegressionUploaded bysujnahere7435
- Unit 17 Correlation and RegressionUploaded bycooooool1927
- Unit 1 ( PROBABILITY THEORY ) - statistikUploaded byZara Nabilah
- Correlation and RegressionUploaded byanindya_kundu
- Unit 2 ( TECHNIQUE IN PLANNING AND PRE-CONSTRUCTION OF ROADWORK )Uploaded byZara Nabilah
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )Uploaded byZara Nabilah
- Unit 2 ( CHARACTERISTICS OF MATERIALS )Uploaded byZara Nabilah
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )Uploaded byZara Nabilah
- BSDUploaded bymmanoj08
- Unit 5 ( LAND DEALINGS )Uploaded byZara Nabilah
- concrete slabUploaded byNicola Tomasi
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )Uploaded byZara Nabilah
- MB 0024 Set2Uploaded bylakkuMS
- PM0010 AnswersUploaded bySridhar Veerla
- Albright DADM 5e_PPT_Ch 10Uploaded byXiao Ho
- Social function of music listening for young people across culture.pdfUploaded byJC Samson Tulinao
- Instructors Manual (Principles of Econometrics).pdfUploaded byManuel Alberto Pérez Pérez
- 25Uploaded byAbdu Mohammed
- QM - ExerciseUploaded bysunru24
- Elementary Quality Assurance ToolsUploaded byignaunaq
- MAS.m-1414. Cost Concepts, Classification and Segregation.mcUploaded byCharry Ramos
- Apa 6 Bgs Quantitative Research Paper 03 2015Uploaded byJakie Ubina
- APA-6-BGS-QUANTITATIVE-Research-Paper-03-2015.pdfUploaded byJakie Ubina
- Inertia Constant EstimationUploaded bySubhadeep Paladhi

- ANALYSIS OF STATICALLY 2D FRAME STRUCTUREUploaded byZara Nabilah
- MOMENTUM EQUATIONSUploaded byZara Nabilah
- BUOYANCY AND STABILITYUploaded byZara Nabilah
- Estimating Earthwork - kontrak prosedur 2Uploaded byZara Nabilah
- Unit 1 ( PROBABILITY THEORY ) - statistikUploaded byZara Nabilah
- Unit 15 ( DESIGN OF FOUNDATIONS )Uploaded byZara Nabilah
- Unit 14 ( DESIGN OF SLENDER COLUMNS )Uploaded byZara Nabilah
- Unit 13 ( DESIGN OF SHORT BRACED COLUMNS )Uploaded byZara Nabilah
- Unit 12 ( REINFORCED CONCRETE COLUMNS )Uploaded byZara Nabilah
- concrete slabUploaded byNicola Tomasi
- Unit 10 ( TORSION )Uploaded byZara Nabilah
- Unit 9 ( DESIGN OF SHEAR REINFORCEMENT )Uploaded byZara Nabilah
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )Uploaded byZara Nabilah
- Unit 7 ( DESIGN DETAILS OF BEAMS )Uploaded byZara Nabilah
- Unit 6 ( DESIGN OF REINFORCED CONCRETE CONTINUOUS BEAMS )Uploaded byZara Nabilah
- Unit 5 ( DESIGN OF FLANGED BEAM: T-BEAM )Uploaded byZara Nabilah
- Unit 4 ( DESIGN OF RECTANGULAR BEAM SECTIONS )Uploaded byZara Nabilah
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )Uploaded byZara Nabilah
- Unit 2 ( CHARACTERISTICS OF MATERIALS )Uploaded byZara Nabilah
- BSDUploaded bymmanoj08
- Unit 7 ( LAND ACQUISITION – THE LAND ACQUISITION ACT 1960, ACT 486 )Uploaded byZara Nabilah
- Unit 6 ( MALAY RESERVATIONS – THE MALAY RESERVE ENACTMENT F.M.S. Cap 142 )Uploaded byZara Nabilah
- Unit 5 ( LAND DEALINGS )Uploaded byZara Nabilah
- Unit 4 ( CONVERSION, SUB-DIVISION, PARTITION AND AMALGAMATION OF LAND )Uploaded byZara Nabilah
- Unit 3 ( DISPOSAL OF LAND BY OTHER THAN ALIENATION )Uploaded byZara Nabilah
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )Uploaded byZara Nabilah

- Regression Analysis of Electrical Energy Consumption With Cross-country DataUploaded byparth gupta
- MinorThesis PresentationUploaded byJihad Imanudi Ridlo I
- HypothesisUploaded byReymart Rodas
- Corporate BudgetingUploaded bySirsanath Banerjee
- mid termUploaded bysamuelteal
- Coefficient of VariationUploaded byCamilo Lillo
- Lava an SyntaxUploaded byAndrés González Santa Cruz
- MIT18_05S14_class26-prob.pdfUploaded byIslamSharaf
- III Sem. BA Economics - Core Course - Quantitative Methods for Economic Analysis - 1Uploaded byAgam Reddy M
- CAPMUploaded byraman_bhoomi2761
- Sampling in ResearchUploaded byma38278288
- ExamW08.pdfUploaded byJamie Samuel
- SQC MATH.(Vikas,Vaibhav,Swanand,Shree.pptxUploaded byVikasPatil
- Take Home Exam UkpUploaded byvkey_viknes2579
- Stats- Chapter 11.pdfUploaded byejiwffb
- 16_HW_Assignment_Biostat.docxUploaded byjhon
- Recommender Systems an Introduction Chapter07 Evaluating Recommender SystemsUploaded bySaraHussayn
- Failure Time DataUploaded byMiftahul Ulum
- Research Methods - STA630 Fall 2006 Assignment 06 SolutionUploaded byYusuf Hussein
- 8a87cTutorial Sheets Prob and StatsUploaded byBharat
- Multiple RegressionUploaded bymthmstr
- Difference Between Regression and Predictive ModellingUploaded byrphmi
- Quiz_3_SV_EUploaded byNtxawm Muas
- Master Data HasilUploaded byOnga Iki
- Quantitative Methods.docxUploaded bySayed Nadeem
- Assignment_4_6331.pdfUploaded byALIKNF
- MASH WhatStatisticalTestHandoutUploaded bySasank Sai
- Section10 SolutionsUploaded byMASHIAT MUTMAINNAH
- Chi SquareUploaded byTatiana Ursachi
- Dart Classification StackingUploaded byAnonymous pcBxYG