You are on page 1of 3

BUSINESS ANALYTICS MODULE 4

Single Variable Linear Regression

→ We use regression analysis for two primary purposes:


• Studying the magnitude and structure of a relationship between two variables.
• Forecasting a variable based on its relationship with another variable.

→ The structure of the single variable linear regression line is 𝐲=a+bx.


• 𝐲 is the expected value of y, the dependent variable, for a given value of x.
• x is the independent variable, the variable we are using to help us predict or better understand the dependent
variable.
• a is the y-intercept, the point at which the regression line intersects the vertical axis. This is the value of y
when the independent variable, x, is set equal to 0.
• b is the slope, the average change in the dependent variable y as the independent variable x increases by
one.
• The true relationship between two variables is described by the equation y=α+βx+𝜀, where 𝜀 is the error term
y–y. The idealized equation that describes the true regression line is y=α+βx.

→ We determine a point forecast by entering the desired value of x into the regression equation.
• We must be extremely cautious about using regression to forecast for values outside of the historically
observed range of the independent variable (x-values).
• Instead of predicting a single point, we can construct a prediction interval, an interval around the point
forecast that is likely to contain, for example, the actual selling price of a house of a given size.
→ The width of a prediction interval varies based on the standard deviation of the regression (the standard
error of the regression), the desired level of confidence, and the location of the x-value of interest in
relation to the historical values of the independent variable.

→ It is important to evaluate several metrics in order to determine whether a single variable linear regression model is
a good fit for a data set, rather than looking at single metrics in isolation.

2
→ R measures the percent of total variation in the dependent variable, y, that is explained by the regression line.
2 !"#$"%$&'  !"#$%&'!(  !"  !"#  !"#!"$$%&'  !"#$ !"#$"%%&'(  !"#  !"  !"#$%&'
• R = =
!"#$%  !"#$"%$&'   !"#$%  !"#  !"  !"#$%&'
2
• 0≤R ≤1
2
• For a single variable linear regression, R is equal to the square of the correlation coefficient.

2
→ In addition to analyzing R , we must test whether the relationship between the dependent and independent variable
is significant and whether the linear model is a good fit for the data. We do this by analyzing the p-value (or
confidence interval) associated with the independent variable and the regression’s residual plot.
• The p-value of the independent variable is the result of the hypothesis test that tests whether there is a
significant linear relationship; that is, it tests whether the slope of the regression line is zero, H0: β=0 and Ha:
β≠0.

Single Variable Linear Regression | Page 1 of 3


BUSINESS ANALYTICS MODULE 4
Single Variable Linear Regression

→ If the coefficient’s p-value is less than 0.05, we reject the null hypothesis and conclude that we have
sufficient evidence to be 95% confident that there is a significant linear relationship between the
dependent and independent variables.
2
→ Note that the p-value and R provide different information. A linear relationship can be significant (have a
2
low p-value) but not explain a large percentage of the variation (not have a high R .)
• A confidence interval associated with an independent variable’s coefficient indicates the likely range for that
coefficient.
→ If the 95% confidence interval does not contain zero, we can be 95% confident that there is a significant
linear relationship between the variables.

→ Residual plots can provide important insights into whether a linear model is a good fit.
• Each observation in a data set has a residual equal to the historically observed value minus the regression’s
predicted value, that is, 𝜀=y–𝐲.
• Linear regression models assume that the regression’s residuals follow a normal distribution with a mean of
zero and fixed variance.

→ We can also perform regression analyses using qualitative, or categorical, variables. To do so, we must convert
data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis.
• A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy
variable for “Female” would equal 1 for all female observations and 0 for male observations.

Single Variable Linear Regression | Page 2 of 3


BUSINESS ANALYTICS MODULE 4
Single Variable Linear Regression

EXCEL SUMMARY

Recall the Excel functions and analyses covered in this course and make sure to familiarize yourself with all of the
necessary steps, syntax, and arguments. We have provided some additional information for the more complex
functions listed below. As usual, the arguments shown in square brackets are optional.

→ Adding the best fit line to a scatter plot using the Insert menu

→ Forecasting with regression models in Excel


• =SUMPRODUCT(array1, [array2], [array3],…) is a convenient function for calculating point forecasts.

→ Creating a regression output table using the Data Analysis tool

→ Creating regression models with dummy variables


• =IF(logical_test,[value_if_true],[value_if_false])
→ Returns value_if_true if the specified condition is met, and returns value_if_false if the condition is not
met.
• To perform a regression analysis with an independent dummy variable, follow the same steps as when using
quantitative variables.

Single Variable Linear Regression | Page 3 of 3

You might also like