You are on page 1of 2

Correlation and Linear Regression Using Excel

Correlations
Correlation is a measure of the strength of linear association between two variables. It ranges from
-1 to +1 with values close to -1 or +1 indicating a strong linear relationship and values close to 0
indicating a weak relationship.
1. Create the spreadsheet shown to display the percentages
obtained by 5 students in Physics and Chemistry.

2.
2. Select Tools/Data
Analysis/Correlation and
click OK. (The Data
Analysis Toolpak add-in
must be installed.)
3. In the Input Range box
enter B1:C6 and select the
Labels in first row option.
4. Choose the Output Range
option and in the Output
Range box enter A8 and
click OK.

The output will appear as shown. This gives the correlation


between Physics and Chemistry as being 0.826902 and the
correlation between Physics and itself as being 1 and
Chemistry and itself as being 1. (The values will be 1 as any
variable is in perfect linear relationship with itself.)
This sort of table output is useful as we can deal with correlations between more than two variables.
5. Enter a third column of results with percentages for Biology and create a new table of
correlations.
The table will give all the correlations between variables taken two at a time. These are called
pairwise correlations and the table is called the correlation matrix.

Linear Regression
We now want to find the linear regression of Chemistry on Physics. Linear regression is the process
of fitting the best possible straight line through a series of points.
1. Select Tools/Data
Analysis/Regression and
click OK.
2. Select the Chemistry column
to be the Input Y Range and
the Physics column to be the
Input X Range.
3. Make sure that the Labels
box in the top part of the
dialog box and the Line Fit
Plots box in the lower part of
the dialog box are both
ticked, but that none of the
other boxes are ticked.
4. Choose the Output Range
option and enter F1 in the
Output Range box. Click
OK.
A large amount of summary output will be produced including a chart. Multiple R is the square
root of R square, R square is the square of the correlation coefficient and Standard Error is the
estimate of the standard deviation of the errors e in the model. The number of observations is the
original number of data points.
5. Scroll to the right to see the chart (which was produced because you selected Line Fit Plots).
Select one of the Predicted Chemistry points by clicking on it.
6. To change the individual points to a line: Right click and select Format Data Series. Make
sure that the Patterns tab is selected and change the Line option to Automatic and the
Marker option to None. Click OK.