This action might not be possible to undo. Are you sure you want to continue?
© John Wiley and Sons, Inc.
PowerPoint Slides Prepared By: Alan Olinsky Bryant University
In some circumstances, data can be valuable in helping to determine the parameters in a relationship or its structural form. The process of using data to formulate relationships is known as regression analysis. In this approach, we identify one variable as the response variable, which means that it can be predicted from the values of other variables. Those other variables are called explanatory variables.
Types of Regression Models
Regression models that involve one explanatory variable are called simple regressions When two or more explanatory variables are involved, the relationships are called multiple regressions. Regression models are also divided into linear and nonlinear models, depending on whether the relationship between the response and explanatory variables is linear or nonlinear.
Estimating Relationships Scatter plot – visualize association Correlation: r 1 (n 1) n i 1 xi sx x (yi sy y) n – number of pairs of observations for x. sy – standard deviations of x. y r – measures strength of linear relationship between x and y 8-4 . y sx.
1] r > 0 – positive association r < 0 – negative association r close to 1 (or –1) implies a strong association r close to 0 implies a weak association Excel function: CORREL(xrange.yrange) 8-5 .r-statistic Independent of units of measurement Lies in range [-1.
respectively.an “error” term.Simple Linear Regression y = a + bx + e y .independent variable e . of the regression line. Constants a and b represent the intercept and slope. 8-6 .dependent variable x .
y 8-7 .Error Term in Regression Unexplained “noise” in the relationship May represent limitations of knowledge Or may represent random deviations of the dependent variable from its mean.
Regression Goal Want to find line to most closely match the observed relationship between x and y Define “most closely” as minimizing sum of squared differences between observed and model values Minimizing sum of differences would set y equal to its mean Penalizes large differences more than small differences 8-8 .
Performing Regression Residuals: ei = yi – y = yi – (a + bxi) Sum of squared differences between observations and model : n n 2 2 (yi a bx1 ) SS = ei i 1 i 1 The regression problem: choose a and b to minimize SS 8-9 .
Regression Analysis Assumes residuals are normally distributed with mean 0 Regression parameters can be calculated directly from the data n b n i 1 x i yi n n i 1 n xi n i 1 yi n i 1 x2 i ( i 1 xi ) 2 a y bx Simpler to use Excel’s regression tool (Under Data Analysis menu) 8 .10 .
Goodness of Fit Coefficient of determination: R2 Lies in range [0. 1] Closer to one – better fit Measures how much of the variation in yvalues is explained by model 1 – perfect match to model 0 – equation explains none of observed variation 8 .11 .
12 .Regression Window 8 .
13 .Regression Output R Squared Degree of significance (under 0.1 is significant) Estimate for a Estimate for b P values of under 0.1 are statistically significant 8 .
positive or negative) when the true coefficient is zero. 8 . p-value: Measures the probability of observing the given estimate of the regression coefficient (or a larger value.14 . Confidence interval: Gives a range within which the true regression coefficient lies with given probability.Regression Statistics Four measures are used to judge the statistical qualities of a regression: R2: Measures the percent of variation in the explanatory variable accounted for by the regression model. F-statistic (Significance F): Measures the probability of observing the given R2 (or higher) when all the true regression coefficients are zero.
15 .g... y = axb . log y = log a + b log x Give up some intuition for convenience 8 . e.Simple Nonlinear Regression A straight line may not be the most plausible description of dependency.g. e. Can follow previous ideas to minimize sum of squared differences No Excel functions or simple formulas Or can transform non-linear relationship into linear one.
16 .Multiple Linear Regression Multiple independent variables y = a0 + a1x1 + a2x2 + … + amxm + e Work with n observations – each has: One observation of dependent variable One observation each of the m independent variables Seek to minimize the sum of squared differences Put all independent variables into x-range in Excel’s regression tool 8 .
1 Coefficients of regression equation are statistically significant 8 .17 .Regression Output Square root of R square Coefficient of multiple determination Accounts for presence of multiple variables P values of under 0.
Values to Include in Regression Ideally pick values that can be justified based on practical or theoretical grounds Could choose set that generates largest value of adjusted R2 Also could choose based on those with significant p-values for coefficients Remember that good models require good forecasts for the independent variables.18 . 8 .
19 .Regression Assumptions Errors in the regression model: Follow a Normal distribution Are mutually independent Have the same variance Linearity is assumed to hold 8 .
LINEST is an array function that can be used to compute regression statistics and use them directly as parameters in a model.20 . 8 . Trendline is a charting option that allows the user to fit one of six families of curves to a set of data and to add the resulting regression line to the plot.*Using the Excel Tools Trendline and LINEST Excel provides several alternative methods for performing regression analysis.
8 .*Trendline The Trendline option appears in Excel only when a chart has been selected. Trendline offers the option to fit any one of the following six families of curves: Linear: y = a + bx Logarithmic: y = a+ b×ln(x) Polynomial: y = a + bx + cx2+ dx3+ …(the user selects the Order of the polynomial. the number of previous values used to calculate the result). which is n. which is the largest exponent of x).21 . Power: y = axb Exponential: y = aebx Moving average: y = average of previous n y-values (the user selects the Period for the moving average.
22 .Trendline Window 8 .
Linear Trendline 1.000 1.000.976 2 Linear trendline for BPI data 8 .000 y = 61873x + 443600 400.200.000 Demand 800.000 1 2 3 4 5 Year 6 7 8 9 10 R = 0.23 .000 600.000 200.
000 Demand 800.000 1 2 3 4 5 Year 6 7 8 9 10 y = 450609x0.000.000 400.000 1.3475 R2 = 0.Power Trendline Chart Title 1.000 600.000 200.200.24 .9656 Power trendline for BPI data 8 .
8 . so if the data change the regression parameters calculated by LINEST change automatically. LINEST is an array function.25 . the user must be careful to re-run the Regression procedure. which can be used as alternatives to the Data Analysis add-in. Like all Excel functions. It is one of a set of functions including SLOPE. and TREND.*LINEST LINEST is an Excel function that calculates regression parameters and measures of goodness-of-fit for simple or multiple regressions. This is not true of regression results calculated using the Analysis Toolpak: if the underlying data change. INTERCEPT. which means that it physically occupies more than one cell in the spreadsheet. it is linked to the underlying data.
Forecasting Model Using LINEST. 8 .26 .
27 .Function Wizard for LINEST 8 .
Summary Modeling is the central task for the analyst and data collection and statistical analysis support the modeling task where appropriate. we often collect data and perform statistical analysis to refine the parameters and relations in our models. 8 . Regression analysis is a means for using data to help formulate relationships among variables.28 . When sensitivity testing indicates that certain parameters or relationships must be determined precisely. All regression methods are based on the idea of fitting a family of curves to data by choosing parameters that minimize the sum of squared residuals.
29 .Summary The simplest regression model is a linear relationship with one explanatory variable. . which can be used to fit any one of six families of curves to plotted data. which can be used to calculate regression estimates dynamically. The most complete method in Excel is the Regression option within the Analysis Toolpak add-in. Other useful methods include Trendline. and LINEST. 8 . although regression can also be applied in cases where there are multiple explanatory variables and nonlinear relationships.
Copyright 2008 John Wiley & Sons. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. John Wiley & Sons. or damages caused by the use of these programs or from the use of the information herein. Request for further information should be addressed to the Permissions Department. Reproduction or translation of this work beyond that permitted in section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful.30 . Inc. omissions. The Publisher assumes no responsibility for errors. All rights reserved. 8 . Inc.