429 views

Uploaded by Zara Nabilah

Attribution Non-Commercial (BY-NC)

- Determining the best statistical models for estimating the forage yield of Atriplex canescens (Purush) Nut. in Kahrizak area (Iran)
- Unit 4 ( STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )
- Unit 2 ( PROBABILITY DISTRIBUTIONS )
- Unit 1 ( PROBABILITY THEORY ) - statistik
- Unit 4 ( CURVES )
- Probability Distributions
- Unit 3 ( SAMPLE AND SAMPLE DISTRIBUTIONS )
- Correlation and Regression
- MCQs on Correlation and Regression Analysis
- Unit 2 ( TECHNIQUE IN PLANNING AND PRE-CONSTRUCTION OF ROADWORK )
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )
- Unit 2 ( CHARACTERISTICS OF MATERIALS )
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )
- BSD
- Unit 6 ( MALAY RESERVATIONS – THE MALAY RESERVE ENACTMENT F.M.S. Cap 142 )
- Unit 5 ( LAND DEALINGS )
- concrete slab
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )
- Unit 9 ( DESIGN OF SHEAR REINFORCEMENT )
- Unit 12 ( REINFORCED CONCRETE COLUMNS )

You are on page 1of 23

C 5606 / 5/ 1

UNIT 5

CORRELATION AND REGRESSION

OBJECTIVES

General Objective To understand and apply the concept of correlation and regression Specific Objectives At the end of the unit, you should be able to: Draw a scatterplot for a set of ordered pairs Compute the correlation coefficient Compute the equation of the regression line

C 5606 / 5/ 2

INPUT

5.0 CORRELATION So far we have considered the statistics of one variable. Of course we sometimes get data involving two variables. For example, look at the marks obtained on two Mathematics paper by a group of students below. Student Paper 1 Paper 2 A 42 31 B 84 83 C 50 42 D 42 60 E 33 28 F 50 63 G 69 59 H 81 92 I 50 73 J 35 40

So what can we find out from the data ? Students B and H have done very well on both papers, E has done very badly on both papers, student I has done much better on paper 2 than paper 1. A graph might help us to make more sense of the data, as would the average (mean) mark for papers 1 and 2. The most useful type of graph is a scatter diagram.

C 5606 / 5/ 3

If we plot the data as points, with marks for Paper 1 on the x- axis and for paper 2 on the y-axis, we obtain a graph like the one shown heree. Note that we do not need to start the scales at zero.

We see that the points go roughly from bottom left to top right(this is made clearer by enclosing the points as shown below.

C 5606 / 5/ 4

From the data the mean value for paper 1 And for paper 2

y = 57.1

x = 53.6

We now plot the line x = 53.6 and y = 57.1 on the scatter diagram:

The line divide the graph into four quadrants : Top Right All points have both x values and y values greater than their respective means i.e. (x x ) <0, (y - y ) < 0. The product would be positive. Bottom Left All points have both x values and y values less than their respective means i.e. (x x ) <0, (y - y ) < 0. The product would be positive. Top left x values less than x , y values greater than y . Product negative. Bottom right x values greater than x , y values less than y . Product negative. Look at the scattergrams (scatter diagrams) below. The patterns seem to be very different.

C 5606 / 5/ 5

Roughly speaking: Positive correlation the higher the value of x, the higher the value of y. Negative correlation the higher value of x, the lower value of y. Zero correlation no fixed relationship between x and y. Again this is made clearer by drawing the lines y = y , x = x .

You have met scatter diagrams in your work of which you may have drawn a line of best fit on the graph in order to estimate a value of y given a value of x. The line was drawn by eye but you would know that the line passes through the mean values of ( x , y ) as shown below.

C 5606 / 5/ 6

The lines on the first two diagrams are relatively easy to draw, but where do we draw a line on the third and having drawn it, would it be of any practical use? Notice that we have been looking for a special type of relationship between the x and y values a straight line or linear relationship. The fact that we cant find such a relationship does not mean that there is no relationship at all. The product-moment formula for determining the linear correlation coefficient The convention of dealing with data Horizontal (x) axis The independent variable

Vertical (y) axis The dependent variable Let us look at some data on the height of students and the distance they can throw a cricket ball. Height (x) cm Distance (y) m 122 41 124 38 133 52 138 56 144 29 156 54 158 59 161 61 164 63 168 67

Just looking at the data, a general response might be the taller a person, the further they can throw a cricket ball. (apart from the odd person!)

C 5606 / 5/ 7

C 5606 / 5/ 8

One of the measures of the degree of linear correlation between two variables is called the coefficient of correlation, denoted by the symbol r. The coefficient of correlation for two variables, say X and Y, is given by:

r=

[( X X )

( X X )(Y Y )

2

(Y Y ) 2

oe simply =

[( x

xy

2

)( y 2 )

The value of the correlation coefficient ranges from +1 for a perfect correlation to -1 for a perfect negative correlation

Example 5.1 a) Determine the coefficient of correlation between X and Y based on the data below. X Y 4 12 5 10 6 8 9 6

b) The data given below gives the experimental values obtained for the torque output from an electric motor, X, against the current taken from the supply, Y. Determine the value, degree and nature of the coefficient of linear correlation between the variables X and Y (if there is one). X Y 0 4 1 6 2 6 3 6 4 8 5 10 6 10 7 10 8 14 9 12

C 5606 / 5/ 9

Y =36

3 x=XX

4 y = Y- Y 3 1 -1 -3

5 xy -6 -1 0 -9

6 x2 4 1 0 9

x 2 = 14

7 y2 9 1 1 9

y 2 = 20

X = 24

X = 24 =6 4

-2 -1 0 3

Y =

36 =9 4

xy = 16

r= b)

[( x

xy

2

)( y 2 )

[ (14)(20)]

16

16 280

= 0.9562

x= X 0 1 2 3 4 5 6 7 8 9 Y 4 6 6 6 8 10 10 10 14 12

X X

y=

Y Y

x == 45 45 X = = 4 .5 10

y = 86 86 Y = = 8.6 10

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

-4.6 -2.6 -2.6 -2.6 -0.6 1.4 1.4 1.4 5.4 3.4

xy 20.7 9.1 6.5 3.9 0.3 0.7 2.1 3.5 18.9 15.3

x2 20.25 12.25 6.25 2.25 0.25 0.25 2.25 6.25 12.25 20.25

x 2 = 82 .5

y2 21.16 6.76 6.76 6.76 0.36 1.96 1.96 1.96 29.16 11.56 y 2 = 88.4

xy = 81 . 0 81 = 0.95

r=

[( x

xy

2

)( y 2 )

C 5606 / 5/ 10

ACTIVITY 5A

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT...! 1. Determine the coefficient of correlation up to 4 decimal places between X and Y based on the data below. X Y 122 41 124 38 133 52 138 56 144 29 156 54 158 59 161 61 164 63 168 67

2.

The co-ordinates given below refer to an experiment to verufy Newtons law of cooling over a limited range of values. Determine the value, degree and nature of the coefficient of correlation. Time (min) Temperatuer (oC) 4 46 8 34 10 30 12 26 16 24 22 20

3. The following results were obtained experimentally when verifying Hookes law: Load (N) Extension (mm) 2 2 5 23 8 62 11 119 15 223

4. The thickness of case-hardening achieved varies with temperature and some coordinated obtained by experiment are as shown. Temperature (oC) 400 420 Thickness (m) 3.7 3.4 350 3.7 320 3.8 400 3.6 480 3.3 440 3.4 370 3.7

C 5606 / 5/ 11

FEEDBACK TO ACTIVITY 5A

1. 2. 3. 4.

C 5606 / 5/ 12

INPUT

5.2 LEAST SQUARES REGRESSION LINE Scatter Diagrams Line Of the Best We have already referred to the drawing of a line of best fit by eye

Thev only calculation involved determining x dan y , since the line of best fit passes through the point ( x , y ). From the line you might be expected to estimate a y value given an x- value. Of course, by eye line fitting is a subjective matter, trying to minimise the distances between the points and the line. A mathematical computation method is available to produce two lines : known as y and x ( to estimate value of y) and x on y ( to estimate values of x) These are known as (Linear) Regression Lines or Least-Squares Regression Lines.

C 5606 / 5/ 13

Scatter Diagrams The y on x Regression Line Since the line must pass through (( x , y ), the parameters that can vary are the gradient of the line and the point where the line cuts the y axis. The equation of the line will be of the form y = a + bx y on x ( some syllabuses use Greek letters and instead of a and b)

The y on x line minimises the sum of the squares of the vertical distances from the points to the regression line ( the square of the distance is used to ensure a positive result). As with correlation there is a formula derived from a proof and a corresponding computational method. The proof is not required at A/AS Level )

(x y ( x ) n n

2

For y = a + bx

b =

xy x

2

a = y -b x

C 5606 / 5/ 14

x y

2.5 3.5

4 3

8 6.5

5 7

7 8

9.5 11

2

8.5 9

12.5 10.5

12.5 13

x = 8.4

14.5 13

y

8.45

Calculate the regression line y on x. b) Based on the data alreday calculated, find the regression line y on x and estimate the value of y when x = 160

x y xy = 1468 = 520 = 77689 x = 218070 n = 10

2

x = 8.4

b =

xy x

2

(x y ( x) n n

2

827

= 0.8377

a= y -bx

C 5606 / 5/ 15

y = 1.4133 + 0.8377x We can now use this equation to calculate ( estimate) a value of y for a given value of x . For example . Find a value for y given x = 10 Substituting y = 1.4133 + (0.8377 x 10)

Finding a value from within the range of x is called interpolation Warning . Estimation a value from outside the data range ( say x = 20 ) is called extrapolation and should bec avoided ( at all cost ) since you do not know that the relationship between x and y will hold for larger and smaller values than those recorded. b) For the regression line y on x,

b =

xy x

2

(x y ( x) n n

2

77689

= 0.5270

a = y - (b x )

= 52 - (0.5270 x 146.8 )

= - 25.3636

So, regresson line is y = -25.3636 + 0.5270x When x = 160, y = -25.3636 + (0.5270 x 160) = 58.96

C 5606 / 5/ 16

ACTIVITY 5B

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT...! a. The table shows the results for a number of athletes. X represents long jump (metres )

x y xy x = 19 = 66 = 126.22 = 36.44 n = 8

2

X 1.8 2.1 1.9 2.0 1.8 1.8 1.6 1.8 1.9 2.3 19

y 6.7 7.6 6.3 6.8 5.9 7.9 5.5 5.6 6.5 7.2 66

x2 3.24 4.41 3.61 4.00 3.24 3.24 2.56 3.24 3.61 5.29 36.44

y2 44.89 57.76 39.69 46.24 34.81 62.41 30.25 31.36 42.25 51.84 441.5

xy 12.06 15.96 11.97 13.6 10.62 14.22 8.8 10.08 12.35 16.56 126.22

Calculate the values of b for the regression line y = a + bx b. The length y metres of a cable subjected to a load of x kilograms is given by y = + x. In an experiment to estimate and for a particular cable, the value of of y was measured for each of x . The following quantities were calculated from the 15 pair of values.

x y xy = 225 = 238 = 3581 x = 3625

2

C 5606 / 5/ 17

x y xy x = 21 = 43 = 171 = 91 n = 6

2

y = 335

2

i) ii)

Calculate the equation of the regression line of y on x . Give your answer in the form y = a + bx, where the values of a and b should be stated to 3 significant figures. It is required to estimate the value of y for a given value of x. State circumstances under which the regression line of x and y should be used, rather than the regression line of y and x

C 5606 / 5/ 18

FEEDBACK TO ACTIVITY 5B

a. b. c.

i) a = 3.0688, regression line is y = 3.07 + 1.17 ( 3 significant figures) ii) Use regression line of x on y to estimate value of x when y is the independent variable.

C 5606 / 5/ 19

SELF ASSESSMENT 5

You are approaching success. Try all the questions in this self-assessment section and check your answers given on the next page. If you encounter any problems, consult your instructor. Good luck. 1. The data given below refers to the relationship between man-hours worked and production achieved in a factory. Determine the coefficient of correlation. Index of production man-hour 100 basis Index of production, 94 actual basis

97 91

100 100

101 105

93 84

103 112

91 83

89 80

110 123

86 78

2. The number of man-days lost per week due to sickness in two similar departments of a factory are show for a 12-week period. Department A Department B 2 0 1 8 1 8 2 1 19 18 21 20 17 17 18 19 12 16 16 15 14 15 17 18 13 16 15 18

Determine the coefficent of correlation and comment on its degree and nature.

C 5606 / 5/ 20

3. The masses and height for ten people were measured and the results are as shown. Mass 38 (kg) Height 135 (cm) 38 140 38 137 44 141 44 147 51 145 32 132 51 149 77 164 32 130

Calculate the coefficient of correlation for this data 4. The relationship between the pressure and volume of a gas was measured and the follwowing results were obtained : Pressure 58 (kPa) Volume 0.36 3 (m ) 62 0.97 67 0.43 73 0.52 81 0.48 81 0.29 86 0.31 92 0.75 104 0.27

Determine the coefficient of correlation and comment on the result obtained. 5. The caloric intake of rats varies with body mass as shown below. Body mass (g) Caloric Intake (cal h-1 2.0 3.1 2.1 1.5 Is there a linear correlation between these results ? 3.6 3.2 4.6 3.6 5.0 3.6 6.0 3.9 7.0 4.1 8.0 4.2 8.5 4.5 9.0 4.6 10.0 5.9

C 5606 / 5/ 21

6. Determine the coefficient of correlation for the data given below and test the null hypothesis that = 0 at a level of significance of 0.1. The datagiven relates the number of hours of sunshime per week to the hours lost due to sickness. Hours of 10 sunshine/week Hous lost due 90 to sickness 13 75 15 75 17 65 18 55 20 45 22 55 23 45 24 35

7. The length y metres of a cable subjected to a load of x kilograms is given by y = + x. In an experiment to estimate and a particular cable, the value of y was measured for each of 15 values of x. The following quantities were calculated from the pairs of values.

x y xy x = 225 = 238.5 = 3581 = 3625

2

a)

x y xy x = 21 = 43 = 171 = 91 n = 6

2

y = 335

2

i) ii)

Calculated the equation of regression line of y and x. Give your answer in the form y = a + bx, where the values of a and b should be stated to 3 significant figures. It is required to estimate the value of y for a given value of x. State circumstances under which the regression line of x and y should be used, rather than the regression line of y on x

9. The data given below is relationship between the heights and masses of ten people. Height, 175 X cm Mass, 82 Y kg 180 78 193 86 165 72 187 91 171 80 198 95 168 72 184 89 177 74

Determine the equation of the regression line of mass on height, expressing the regression coefficients correct to two decimal places.

C 5606 / 5/ 22

10. The power needed to drive a lathe increase as the cutting angle of the tool increase when cutting a constant speed and depth of cut. The relationship for mild steel is : Cutting 50 angle (degrees)X Power 6.2 (kW)Y 55 6.8 60 7.6 65 8.2 70 8.1 75 8.8 80 9.7 85 10.0 90 10.4

Determine a) the equation of the regression line of power on cutting angle and b) the equation of the regression line of cutting angle on power, expresing the regression coefficients correct to three significant figures in each case.

C 5606 / 5/ 23

Have you tried all the questions?? If YES, check your answers now. 1. 2. 3. 4. 5. 6. 7. 8. 0.97 0.70 , fair direct 0.97 -0.31, It is probable that the measurements were made at different Temperatures r = 0.94, hence there is a good, direct correlation. r = -0.95, t.99

= 1.42

I tI = 8.05

hypothesis is rejected

= 15.69 = 0.014

y= 15.69 + 0.014x

i) y = 3.07 + 1.17x ii) use regression line of x and y to estimate value of x when y is the independent variable. y = -036.83 + 0.66x a) Y = 1.14 + 0.104 X b) X = -9.27 + 9.41Y

9. 10.

- Determining the best statistical models for estimating the forage yield of Atriplex canescens (Purush) Nut. in Kahrizak area (Iran)Uploaded byInternational Network For Natural Sciences
- Unit 4 ( STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )Uploaded byZara Nabilah
- Unit 2 ( PROBABILITY DISTRIBUTIONS )Uploaded byZara Nabilah
- Unit 1 ( PROBABILITY THEORY ) - statistikUploaded byZara Nabilah
- Unit 4 ( CURVES )Uploaded byZara Nabilah
- Probability DistributionsUploaded bygregoriopiccoli
- Unit 3 ( SAMPLE AND SAMPLE DISTRIBUTIONS )Uploaded byZara Nabilah
- Correlation and RegressionUploaded bysujnahere7435
- MCQs on Correlation and Regression AnalysisUploaded byMuhammad Imdadullah
- Unit 2 ( TECHNIQUE IN PLANNING AND PRE-CONSTRUCTION OF ROADWORK )Uploaded byZara Nabilah
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )Uploaded byZara Nabilah
- Unit 2 ( CHARACTERISTICS OF MATERIALS )Uploaded byZara Nabilah
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )Uploaded byZara Nabilah
- BSDUploaded bymmanoj08
- Unit 6 ( MALAY RESERVATIONS – THE MALAY RESERVE ENACTMENT F.M.S. Cap 142 )Uploaded byZara Nabilah
- Unit 5 ( LAND DEALINGS )Uploaded byZara Nabilah
- concrete slabUploaded byNicola Tomasi
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )Uploaded byZara Nabilah
- Unit 9 ( DESIGN OF SHEAR REINFORCEMENT )Uploaded byZara Nabilah
- Unit 12 ( REINFORCED CONCRETE COLUMNS )Uploaded byZara Nabilah
- Cima Ba2 2017 NotesUploaded byBrilliant
- Semi Variable CostUploaded byRicky Gupta
- Stats Sample BookUploaded byapi-3857574
- Cost BehaviorUploaded byEhtesham Haque
- multiple regression fileUploaded byAbhi Maheshwari
- Cima Ba2 2017 NotesUploaded byMelita Rudo Ncube Zhuwarara
- chapter 3 notes-alyssaUploaded byapi-234771201
- stats_ch12.pdfUploaded byVivek Anandhan
- chapter 9Uploaded byAyush Chopra

- ANALYSIS OF STATICALLY 2D FRAME STRUCTUREUploaded byZara Nabilah
- MOMENTUM EQUATIONSUploaded byZara Nabilah
- BUOYANCY AND STABILITYUploaded byZara Nabilah
- Estimating Earthwork - kontrak prosedur 2Uploaded byZara Nabilah
- Unit 3 ( SAMPLE AND SAMPLE DISTRIBUTIONS )Uploaded byZara Nabilah
- Unit 15 ( DESIGN OF FOUNDATIONS )Uploaded byZara Nabilah
- Unit 14 ( DESIGN OF SLENDER COLUMNS )Uploaded byZara Nabilah
- Unit 13 ( DESIGN OF SHORT BRACED COLUMNS )Uploaded byZara Nabilah
- Unit 12 ( REINFORCED CONCRETE COLUMNS )Uploaded byZara Nabilah
- concrete slabUploaded byNicola Tomasi
- Unit 10 ( TORSION )Uploaded byZara Nabilah
- Unit 9 ( DESIGN OF SHEAR REINFORCEMENT )Uploaded byZara Nabilah
- Unit 8 ( SERVICEABILITY LIMIT STATE (SLS) )Uploaded byZara Nabilah
- Unit 7 ( DESIGN DETAILS OF BEAMS )Uploaded byZara Nabilah
- Unit 6 ( DESIGN OF REINFORCED CONCRETE CONTINUOUS BEAMS )Uploaded byZara Nabilah
- Unit 5 ( DESIGN OF FLANGED BEAM: T-BEAM )Uploaded byZara Nabilah
- Unit 4 ( DESIGN OF RECTANGULAR BEAM SECTIONS )Uploaded byZara Nabilah
- Unit 3 ( DESIGN THEORY: LIMIT STATES AND BENDING )Uploaded byZara Nabilah
- Unit 2 ( CHARACTERISTICS OF MATERIALS )Uploaded byZara Nabilah
- BSDUploaded bymmanoj08
- Unit 7 ( LAND ACQUISITION – THE LAND ACQUISITION ACT 1960, ACT 486 )Uploaded byZara Nabilah
- Unit 6 ( MALAY RESERVATIONS – THE MALAY RESERVE ENACTMENT F.M.S. Cap 142 )Uploaded byZara Nabilah
- Unit 5 ( LAND DEALINGS )Uploaded byZara Nabilah
- Unit 4 ( CONVERSION, SUB-DIVISION, PARTITION AND AMALGAMATION OF LAND )Uploaded byZara Nabilah
- Unit 3 ( DISPOSAL OF LAND BY OTHER THAN ALIENATION )Uploaded byZara Nabilah
- Unit 2 ( DISPOSAL OF LAND BY ALIENATION )Uploaded byZara Nabilah

- algebra 2 quarter 3 do nowsUploaded byapi-214128188
- IRJET-Structural Conservation of Historical Palace using Refor_Tec and Sockfix TechniqueUploaded byIRJET Journal
- DEFENCE: An Analysis of a SEVENTH-DAY ADVENTIST Offensive Against JEHOVAH'S WITNESSESUploaded byGary Strange
- DBMS - Previous Year QPsUploaded bySindhuja Vigneshwaran
- Fiscal Policy ReviewUploaded byravibhandari2
- Focused Listening With SongsUploaded bySimona Brezan Aioani
- 05-SignalEncodingTechniquesUploaded byسام النعمان
- MKTG10001Uploaded byJessica Kok
- Composition in Bahia, Brazil - Ernst Widmer and His Octatonic Strategies.pdfUploaded byGeorge Christian Vilela Pereira
- Functions of HrmUploaded bymehtamehul2904
- Medically Supervised Water-only Fasting in the Treatment of HypertensionUploaded byTrueNorth Health Center
- Org-and-mgt-1[1].pptxUploaded byAngge Paglinawan
- Tales of Madness, Miracles, Death and SalvationUploaded byValentina Ferracioli
- Parkinsonismo VascularUploaded byAna María Saldaña Benavides
- Recovery RatesUploaded byProdan Ioana
- Introduction to Fortran 90Uploaded byJohnny Condori Uribe
- DummyPfCtr1Uploaded bysivasivasap
- b2 ReadingUploaded byRodica Ioana Bândilă
- Diesel Generator SpecificationUploaded bymtrj59
- chang electrochemistry.pdfUploaded byggk2013
- 2. GS Paper.pdfUploaded byAshutosh Agnihotri
- TEST16Uploaded bySapinette Le Phương Hòa
- Buckling of compositesUploaded byAnonymous LcR6ykPBT
- Dynamic_PTDF_Implementation_in_the_Market_Model.pdfUploaded byMojtaba Tabatabaeipour
- Demons and BioelectricityUploaded byMichael Hobbs
- IDL-52204Uploaded bywait
- Arapahoe County Email PolicyUploaded byColorado Ethics Watch
- Final Thesis for Abdulrazzag OthmanUploaded byAnonyvous
- project scheduling, MS ProjectUploaded bysatishy242
- Normal MenstruationUploaded byAnish Veettiyankal