84456945.

doc

Page 1 of 7

Dependent variable: The variable that being predicted or estimated.
Independent variable: The variable that provides the basis for estimation. Scatter Diagram: The diagrammatic way of representing bivariate data. Correlation coefficient: Measures strength of the linear relationship between two variables. The sample correlation coefficient for the n pairs ( x1 , y1 ),( x2 , y2 ),...,( xn , yn ) is ∑ ( xi − x )( yi − y ) r= , −1 ≤ r ≤ 1 , r =1 if and only if all ( xi , yi ) pairs lie on a straight line with ∑ ( xi − x ) 2 ∑ ( y i − y ) 2 positive slope, and r = -1 if and only all ( xi , yi ) pairs lie on a straight line with negative slope. Regression: measures the probable movement of one variable in terms of other. General form of a linear equation: y = α + β1 x1 + β 2 x 2 +  + β k x k + ε , (y- dependent and x’s-independent  variables). The regression equation for a set of n data points (with one independent variable) is y = a + bx , ∑ ( xi − x )( yi − y ) and a = y − bx where, b = ( xi − x ) 2 Coefficient of determination: r 2 is the percentage of total variation in the observed y-values that are explained by the regression. 0 ≤ r2 ≤ 1; Values of r 2 near zero indicate that the regression equation is not very useful for making predictions and values of r 2 near 1 indicate that the regression equation is extremely useful for making predictions. We add more relevant independent variables in the linear regression model when r 2 is low. Multiple linear regression: y = α + β1 x1 + β 2 x 2 +  + β k x k + ε Example 1: x 2 4 6 8 y 5 6 8 1 0 x y xd 2 5 4 6 6 8 8 10 1012 1213 1416
Sum

1 0 1 2
xd
2

1 2 1 3

1 4 9
2

Questions:

= x − x yd = y − y

yd

( xd )( yd )

-6 -4 -2 0 2 4 6

-5 -4 -2 0 2 3 6

5670 Interpretations: a = 0.625002 , when x is zero, estimated value of y on average is 0.625002 unit. b = 0.910714 , if x increases by 1 unit then y increases on average by 0.910714 unit. r 2 = 0.988222 , 98.82% total variation in y is explained by the independent variable x (only 1.18% is left unexplained, which is due to other variables). The model fits very well. Prediction:
ˆ y = 3.630359

36 16 4 0 4 16 36 112

25 16 4 0 4 9 36 94

30 16 4 0 4 12 36 102

r=

102 (112 )( 94 )

= 0.994093

b=

102 = 0.910714 112

a = 7 − 0.910714 × 8 = 0.625002

Estimated Reggression Eq. is
ˆ y = 0.625002 + 0.910714 x

unit, when x=3.3 units
Page 1 of 7

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

r = .52 S catter diagram ( r= +1) 10 8 6 4 2 0 0 2 X 10 8 6 4 2 0 15 13 11 9 7 5 1 6 4 6 0 2 4 X 6 8 10 2 4 6 8 X X (i ndependent var i abl e) Perfect negative correlation Strong Moderate negative negative correlation correlation -1.doc Page 2 of 7 . 60 12 10 8 6 4 2 0 C orrelation.00 /opt/scribd/conversion/tmp/scratch6285/84456945. r =0 Zer o cor r el ati on.50 Negative correlation 0 0.84456945.doc Page 2 of 7 Scatter diagrabm between x and y 20 Y-variable 15 10 5 0 2 4 6 8 X-variable 10 12 14 Calculation by Calculator (for exam): Zero c orrelation.00 Weak negative correlation No correlation Perfect positive correlation Weak Moderate Strong Positive Positive Positive Correlation Correlation Correlation -0. r= -.50 Positive correlation 1.

83 5. Estimate the selling price of a 10-year old car.x ) (y . c.34 2.x ) . 3 7 . Draw a scatter diagram.1 2 7 6 3 11 3.doc Page 3 of 7 Example 2: The owner of Maumee Motors wants to study the relationship between the age of car and its selling price. a = y .4 9´ 891 7 111 .6 8 11 8 9 10 8 10 12 6 11 6 8.y ) = 2 2 ( å ( x .y ) . d.09 1.9 42 .09 b= ) å ( x .25 2.9083 ˆ y= 1 1 .9 62 =5 .y = 2 å ( x.9167 6.479 r 2 = 02959 .31 -1. e.bx = 6908 +.39 5.9 ´4 .73 0.9 62 =5 .doc Page 3 of 7 .2.35 5.05 5.43 8.6 4 12 4 5 8 5 6 7 10 7 8 7.51 4.83 7.1-8 6.x ) (y .35 2.36 2. Determine the coefficient of determination.43 -2. If we want to estimate selling price based on the age of the car.57 0.30 .59 0. 8 Reg Eqn.91 5.32 0. a. a. Determine the regression equation.4 8 x ˆ ˆ e = y.91 6. 6 = .23 -1.87 7. Determine the coefficient of correlation.30 8.43 7. which variable is the dependent variable and which is the independent variable? b.7 /opt/scribd/conversion/tmp/scratch6285/84456945.35 7. b.2.30 -0.544 1.33 5. Interpret the regression equation.83 -2.6 12 6 8 8. Does it surprise you that the relationship is inverse? 18.30 ˆ e 2 r= å ( x . c.4 9x 11 .84456945.61 0.06 4.09 0. 9 4 2 25 .52 3.x )å y .y Residuals 1. Selling Age (yrs) price Car X ($000) Y 1 9 8. 8 . ˆ y =1 . Interpret these statistical measures.17 0. Listed below is a random sample of 12 used cars sold at Maumee motors during the last year.

Calculate 90 percent confidence intervals for c and d with the regression line: x = c + dy In both cases compute the coefficients of determination.7= -1. Associated with a job are two random variables: CPU time required (Y) and the number of disk I/O operations (X).99784 111464. Thus perform a linear regression of X on Y. the estimated regression line is. Number (X) 398 390 410 502 590 305 210 252 398 392 Time (y) 40 38 42 50 60 30 20 25 40 39 a. . r 2 = 0.998919. perform a linear regression: y = a + bx Compute point estimates of a and b as well as 90 percent confidence intervals.2 ==0. Thus.1 ∑ xi − x  . given a CPU time requirement.4-0.104008.612 byx = 2 111464. a = y − bx =38. rxy = xi y b ∑ b − x gi − y g = ∑ ( xi − x )2 ∑ ( yi − y ) 2 Scatter diagram of Time and number 600 550 500 450 400 350 300 250 200 15 20 25 30 35 40 45 50 55 60 65 Time (X) 11593.4 ∑ xi − x yi − y = 11593.doc Page 4 of 7 Example 4.2 =0. y = −1612 + 0104008 x Number (Y) b g g b b g Scatter Diagram between Y and X 65 55 45 35 25 15 200 250 300 350 400 450 500 550 600 Number (X) Page 4 of 7 /opt/scribd/conversion/tmp/scratch6285/84456945. Next suppose we want to predict a value of I/O request count.1 × 1208.doc Time (Y) . Draw a scatter diagram from these data.104008 × 384. Given the following data. Does a linear fit seem reasonable? Assuming we wish to predict the CPU time requirement given an I/O request count.84456945. b. compute the sample correlation coefficient.

01 1.29643 + 9.02 2.97 2.92 1.94 1.95 1.doc Page 5 of 7 b.04 Failure rate Temp (F) SUMMARY OUTPUT Regression Statistics Multiple R 0. 2.02 2.95 1.97 2 2.doc .  x = c + dy  x == 16. Draw a scatter diagram.02 2 1.88 50 55 60 65 70 75 80 85 90 95 100 105 110 Temp (F) Failure rate 55 65 75 85 95 105 55 65 75 85 95 105 1.01 2.93 1.97 2.96 1.93037806 R Square 0. Temp (F) Line Fit Plot 2.94 1.02 2.01663902 Error Obstions 12 ANOVA df Regressio Residual SS MS 1 0.90 1.017831 10 0.01 2.85216366 Square Standard 0.00 2.97 2.01 1.84456945.94 1.145E-05 Page 5 of 7 /opt/scribd/conversion/tmp/scratch6285/84456945.002769 0.04 2. 4. 3.000277 F 64.06 2.4066 Significance F 1.9 1.93 1. Fit a least-squares linear line through the data in the following table.02 2.59384 y -------------------------------------------------------------------------------------------Example 4: The failure rate of certain electronic device is suspected to increase linearly with its temperature. Estimate a least squares line. Comment on the line.017831 0.86560333 Adjusted R 0.98 1.9 1. Table: The Failure Rate versus Temperature 55 65 75 85 95 105 55 65 75 85 95 105 1.04 1. Determine correlation coefficient and coefficient of determination and comment on them.

Use the regression equation to predict the annual food expenditure of a family with a disposable income of Tk25000.000281 8.001630477 Upper 95. Describe the apparent relationship between disposable income and annual food expenditure. and y denotes food expenditure. d. mother.0% 1.84456945.748165654 0. Determine and interpret the value of r. The following data gives the test scores and sales made by the salesmen during a certain period. ii) Does it indicate that the termination of services of low test scores is justified? iii) If the firm wants a minimum sales of Taka 55000. f.00288381 Example 5. h.85E-15 1. where x denotes disposable income. in thousands of dollars.00225714 0.023007 78.21204 2. For a preliminary study.025373 1.0206 Lower 95. b.85069149 0. what is the minimum test scores that will ensure continuation of service? Scatter diagram 54 52 50 48 46 44 42 40 38 36 34 32 30 12 14 16 18 20 Test Scores 22 24 26 28 /opt/scribd/conversion/tmp/scratch6285/84456945.doc Page 6 of 7 Total 11 Coefficients 0. x y 30 36 27 20 16 24 19 25 55 60 42 40 37 26 39 43 a. g. the economist takes a random sample of eight middle-income families of the same size father. Graph the regression equation and the data points. What does the slope of the regression line represent in terms of disposable income and annual food expenditure? e.85069149 0.15E-05 0.0% 1. Determine the regression equation for the data.0016305 0. two children).7481657 1. in hundreds of taka. Test Scores Sales (Thousand Tk) 15 32 20 37 25 49 22 38 27 51 23 46 16 33 21 41 20 39 i) Compute the –correlation coefficient between the test scores and the sales.doc Sales (Thousand Taka) Page 6 of 7 .79942857 0. c. Example: A department store gives in-service training to its salesmen which are followed by a test. Discuss the graphical implication implications of the value of r.00288381 Intercept Temp (F) Standard t Stat P-value Lower 95% Upper 95% Error 1.: An economist is interested in the relationship between the disposable income of a family and the amount of money spent annually on food. The results are as follows. It is considering whether it should terminate the services of any salesman who does not do well in the test. Identify the predictor and response variables.

Select the variables X and Y by highlighting them and clicking the Right Arrow. Press the Down Arrow to move to the next cell under "Name". Click in the first cell under the "Name" column. click on the "Data View" tab at the bottom of the screen. Saving the data file Click File.84456945. Click OK. Type 2 . In this view the variables are the columns and the cases (subjects) are the rows. click on the "Variable View" tab at the bottom of the screen.doc Page 7 of 7 . Click Save . On the "Save Data As" window. SPSS will add a .doc Page 7 of 7 Analysis by SPSS: Open SPSS Creating Variable Names In the "SPSS Data Editor" window. Press the Tab key to transfer the number into the cell and move to the next cell. Click Correlate. In this view variables and their definitions are listed as rows. Click in the cell for the first variable (X) and the first case. Plot: /opt/scribd/conversion/tmp/scratch6285/84456945.sav extension. Repeat the procedure for Y. Analysis Click Analyze. Click Bivariate. Entering the data In the "SPSS Data Editor" window. Type X in the cell. Continue to enter data for all the cases. type a:yourname Click SAVE.

Sign up to vote on this title
UsefulNot useful