You are on page 1of 15

Regression analysis in Excel

The regression analysis is a part of statistical modeling that is used to estimate the relationship
between the two or more variables. In MS Excel, you can perform several statistical analyses,
including regression analysis. This one is a good option because almost every computer user can
access Excel.

What is regression analysis?

Regression analysis is an analysis that shows a relationship between dependent and


independent variables and produces an equation. This equation consists of a coefficient that
represents the relationship of dependent and independent variables.

Simple Linear Regression

In simple linear regression, the value of one variable is used to describe the value of another
variable. The variable that is being described is called dependent variable, while the variable,
which is used to describe or predict the value of dependent variable is called independent
variable.

Note: Independent and dependent variable are the two most essential terms of regression analysis.
Types of variables in regression

Regression analysis have two variables:

1. Dependent variable (predictor variable)


2. Independent variable (explanatory variable)

Dependent variable is the factor that we try to understand and predict. The value of dependent
varies according the independent variable. In contrast, independent variables are those factors
that affect the dependent variable and helps to predict the value for dependent variables.

Real Scenario

Let's take a scenario to understand the variables of regression analysis -

For example, we have sells data of 12 months stored in an Excel worksheet. This data is for the
sell of umbrellas from January to December. Each month sell is different according to rainfall.
Umbrellas are most sold in month July and less sold in January.

This Excel worksheet will contain three columns: Months (Jan to Dec), Rainfall percentage,
and sold umbrella quantity (total number of umbrellas sold in each month).
Here,

Dependent variable: Umbrella

Independent variable: Rainfall percent

So, the umbrella is a dependent variable whose sell depends on the rainfall percentage of each
month, which is an independent variable. Sell of umbrellas increase and decreases when rainfall
is high or low. Hope you have understood the dependent and independent variable in regression
analysis.

Apply regression analysis

Now, you will see how regression analysis is performed on the Excel data step by step. We have
this data here.
Step 1: Inside the Data tab, click on the Data Analysis option added it Excel in earlier steps.

Step 2: Scroll down and Select the Regression from the list and click OK in this panel.
Step 3: Now, configure the following settings in the regression dialogue box.

 In Input Y range, provide the cell reference of dependent variables. In our dataset, umbrella is
the dependent variable that resides in column C. So, the cell reference will be C2:C13.
 In Input X range, provide the cell reference of independent variables. For example, in our
dataset, rainfall is the independent variable resides in column B. Hence, the cell reference will
be B2:B13.
 Mark the Labels checkbox if you have included header cell reference in X, Y ranges.
 Carefully choose an output option from here. We have chosen New workbook Ply.
 In the end, mark the residuals checkbox that will provide you the difference between actual and
predicted values.

Step 4: Enter all these required details carefully and click OK.
It will generate a summary in Sheet2 for the analysis after setting up the following things.

Step 5: See the output created by Excel regression analysis and observe it.
This summary output will contain REGRESSION STATISTICS, ANOVA, and RESIDUAL
OUTPUT, most importantly. All these details in the same page.

Interpret the regression analysis result

We have performed regression analysis and you have noticed that the execution of regression is
very easy. You have to do nothing difficult with it because all the calculations take place
automatically. The complete output is auto-generated along with the statement as well.

Calculation is easy, but the interpretation is not so easy to understand. So, this time is to interpret
its result. You have seen that the output is containing four major parts: regression statistics,
ANOVA, and residual output. Let's analyze them:

Regression Statistics

Regression statistics tell you how the linear regression equations fit in our data.

Let's understand the terms used in the regression statistics table.

 Multiple R is the correlation coefficient that helps to measure the strength of a linear
relationship between two variables. The higher value of Multiple R means the stronger
relationship between variables.
1: Strong positive relationship
-1: Strong negative relationship
0: No relationship at all.
 R Square is the coefficient of determination, which currently has its value 0.9047. It represents
the goodness of fit. Round off the first to digits which will be 90% that is fair enough to fit in our
regression model. It means 90% of dependent variables are explained by independent variables.
Generally, high values are better for R Square.
 Adjusted R Square is advanced of R square, which is adjusted for the number of independent
variables. It is used for multiple analyses.
 Standard Error is also a goodness-of-fit measure. The regression equation will be more certain
for the smaller number.
 Observation is the total number of observations in your model.
ANOVA

The next part of regression analysis is ANOVA, an analysis of variance. Then coefficient section
with ANOVA.

 Df refers to the degree of freedom. It associates with the source of variance.


 SS refers to the sum of squares.
 MS is mean square.
 F is f statistics that test the overall significance of the model.
 Last one is Significance F that is the P-value of F.

The most essential component after the ANOVA table is coefficients. This allows the users to
create the linear regression equation in Excel, i.e.,

1. y = bx +a

For our dataset months, rainfall, and sold umbrella, the formula will be:
1. y = Rainfall coefficient * x + Intercept

Put the values from the table in this formula:

1. y = 0.327*x-15.417

Put the value of x = Rainfall (mm) for any month. Like we have put for January rainfall, i.e., 76.
So,

y= 0.327*76-15.417

y = 9.435

It is the predicted number for the number of umbrellas sold in January. Similarly, you can predict
how many umbrellas going to be sold for any month by putting their rainfall percentage.

RESIDUAL OUTPUT

The last part is residual, which shows the difference between actual and estimated values. If you
compare the results of both values for the total number of umbrellas sold each month, you will
see that there will be a slight difference between both numbers.
If you will compare the actual umbrellas sold in January month and predicted value for it. You
will get a slight difference in them.

Actual sold umbrella in January: 12

Predict value of sold umbrella in January: 9.486

The difference between the actual value and predicted value can be seen in residuals in their
respective column.

12 - 9.486 = 2.514

You can match it under the RESIDUAL OUTPUT table.

Make a linear regression graph

You can also make a graph and plot the values on it to see the relationship between two
variables. Thus, draw a linear regression chart.

Step 1: Open your Sheet1 in the same Excel workbook and select the columns of independent
variable and dependent variable along with the header.
Step 2: Navigate the Insert tab, where you see the chart group. Click on it, then choose Scatter
(first one in the list). For easy to go, follow Insert > Chart group > Scatter.
Step 3: A scatter plot chart will be inserted in your currently active workbook, which will look
something like this -
Step 4: Now, draw a least square regression line in this plotted chart. To do this, right-click on
any of the points in this chart and choose Add Trendline? from the context menu.

Step 5: Select the Linear trendline shape from the right side of the panel Format Trendline.
Step 6: Scroll down the format trendline panel and optionally mark the Display equation on
chart to get the formula for regression. However, this one is optional.
You can now see that the regression equation has been created.

Step 7: Now, move to the Fill & Line option to customize the line you like. You can change the
color and type of line from here. E.g., use a solid line instead of the dashed line.

 Firstly, Select the Solid line radio button, then scroll down.
 Change the line color to red or anything you want.
 Choose the solid line from the Dash type list.

See the customize linear regression graph.


You can make some more improvements to the graph, like provide the axis title (horizontal and
vertical) to the graph.

You might also like