You are on page 1of 4

Introduction to modeling: linear

regression

Context and Objectives


I am looking to buy an apartment. I found an apartment that suits me perfectly and I’d like to
negotiate a fair price. The surface of the apartment is 72 m². What is the average price for an
apartment of this area? To what extent does the surface explain the price differences between the
apartments?
To answer these questions, we will model the price by the surface using a sample of apartments in
the neighbourhood.

Open workbook “Apartments”. Graph the association between price and surface.

Is the Pearson correlation coefficient relevant for measuring the strength of the relationship?
Why?

What is the value of the r coefficient?

r=

Setting the coefficients of the line


Since the relationship is linear, we can adjust the scatter plot by a straight line.
Let us call a the slope of the line and b the y-intercept. The equation of the line is
y=ax+ b
where y is the price and x is the surface.

1/4
We will look for the set of coefficients ( a and b ) which minimizes the differences between the
observed values of y and those calculated by the model. By doing so, we will determine the “linear
regression model”.

We could minimize the maximum difference (in absolute value) between the observed y and
the calculated y or the mean of the differences (in absolute value). The criterion used conventionally
is the mean of the squared differences.

The solution is given by the Trendline feature:


Click the chart. In Add Chart Element, select Trendline, then select More Trendline Options. Select
Linear, scroll down and select Display Equation on Chart (at the bottom).
What are the coefficients a and b of the regression line?

a= b=

According to our model, what is the price for a 72 m² apartment?

Coefficient of determination
Definition
To what extent does the surface explain price differences between the apartments?
Denoted R ² (R-square), the coefficient of determination is the part of the differences between the
values of y , measured by the variance of y , that is taken into account by the model. R ² = 100%
when the dots of the scatter plot are perfectly aligned.

2/4
Click the line in the graph, right click, select Format Trendline, select Display R-squared value
on chart.
R ² = 0.71. Thus, 71% of the price differences are explained by the surface via the regression model
y=6.32 x +12.3. The remaining 29% comes from other factors such as charges, condition of the
apartment, location, etc.

y
The coefficient of determination is not the explained part of the variable . It is the
explained part of the variance of y .

Link between coefficient of determination and Pearson correlation coefficient


Property
In the case of the model y=ax+ b, the coefficient of determination is the square of the Pearson
correlation coefficient.
R ²=r ²
The value of the Pearson correlation coefficient for the association between Price and Surface is
equal to 0.843 (PEARSON function)
r ² = 0.843² = 0.71
We obtain the value of the coefficient of determination: R ² = 71%

R² r
Unlike the coefficient, the coefficient cannot be expressed in percentage because it is
not a proportion (part of the variance of y ). Besides its value can be negative.

Application: a new way to interpret a Pearson correlation coeffeicient


In the wine perception study, a strong association was shown between purchase intention and
quality assessment, with a Pearson correlation coefficient equal to 0.69. Here, modeling one variable
according to the other is irrelevant. However, the coefficient of determination gives a concrete
meaning to the Pearson correlation coefficient.
r = 0.69 gives R ² = 0.690² = 0.48 = 48%
Nearly half of the variance of the purchase intention is explained by the quality assessment.

3/4
Complete the guidelines for interpreting a linear correlation coefficient in social sciences (Cohen,
1992):

Absolute Strength of R ² : Part of the variance of a


value of the the linear variable that is explained by
r coefficient relationship the other variable

? Strong ?

? Medium ?

? Weak ?

4/4

You might also like