Professional Documents
Culture Documents
Introduction To Modeling
Introduction To Modeling
regression
Open workbook “Apartments”. Graph the association between price and surface.
Is the Pearson correlation coefficient relevant for measuring the strength of the relationship?
Why?
r=
1/4
We will look for the set of coefficients ( a and b ) which minimizes the differences between the
observed values of y and those calculated by the model. By doing so, we will determine the “linear
regression model”.
We could minimize the maximum difference (in absolute value) between the observed y and
the calculated y or the mean of the differences (in absolute value). The criterion used conventionally
is the mean of the squared differences.
a= b=
Coefficient of determination
Definition
To what extent does the surface explain price differences between the apartments?
Denoted R ² (R-square), the coefficient of determination is the part of the differences between the
values of y , measured by the variance of y , that is taken into account by the model. R ² = 100%
when the dots of the scatter plot are perfectly aligned.
2/4
Click the line in the graph, right click, select Format Trendline, select Display R-squared value
on chart.
R ² = 0.71. Thus, 71% of the price differences are explained by the surface via the regression model
y=6.32 x +12.3. The remaining 29% comes from other factors such as charges, condition of the
apartment, location, etc.
y
The coefficient of determination is not the explained part of the variable . It is the
explained part of the variance of y .
R² r
Unlike the coefficient, the coefficient cannot be expressed in percentage because it is
not a proportion (part of the variance of y ). Besides its value can be negative.
3/4
Complete the guidelines for interpreting a linear correlation coefficient in social sciences (Cohen,
1992):
? Strong ?
? Medium ?
? Weak ?
4/4