Professional Documents
Culture Documents
2022 CSN373 Lec 27
2022 CSN373 Lec 27
2
Correlation:
• Correlation Coefficient:
• As stated in the Introduction, statisticians use a measure called the correlation coefficient
to determine the strength of the linear relationship between two variables. There are
several types of correlation coefficients. The one explained in this section is called the
Pearson product moment correlation coefficient (PPMC), named after statistician Karl
Pearson, who pioneered the research in this area.
• The range of the correlation coefficient is from -1 to 1. If there is a strong positive linear
relationship between the variables, the value of r will be close to 1. If there is a strong
negative linear relationship between the variables, the value of r will be close to -1.
When there is no linear relationship between the variables or only a weak relationship,
the value of r will be close to 0.
3
Correlation:
4
Correlation:
5
Example:
6
Example:
7
Example:
8
Example:
9
Example:
10
Example:
11
The Significance of the Correlation Coefficient:
• As stated before, the range of the correlation coefficient is between -1 and +1. When the
value of r is near +1 or -1, there is a strong linear relationship. When the value of r is near
0, the linear relationship is weak or nonexistent. Since the value of r is computed from
data obtained from samples, there are two possibilities when r is not equal to zero: either
the value of r is high enough to conclude that there is a significant linear relationship
between the variables, or the value of r is due to chance.
• The population correlation coefficient is computed from taking all possible (x, y) pairs; it
is designated by the Greek letter ρ (rho). The sample correlation coefficient can then be
used as an estimator of ρ if the following assumptions are valid.
1. The variables x and y are linearly related.
2. The variables are random variables.
3. The two variables have a bivariate normal distribution.
• A biviarate normal distribution means that for the pairs of (x, y) data values, the
corresponding y values have a bell-shaped distribution for any given x value, and the x
values for any given y value have a bell-shaped distribution.
12
The Significance of the Correlation Coefficient:
13
Example:
14
Regression:
• In studying relationships between two variables, collect the data and then construct a
scatter plot.
• The purpose of the scatter plot, as indicated previously, is to determine the nature of the
relationship. The possibilities include a positive linear relationship, a negative linear
relationship, a curvilinear relationship, or no discernible relationship.
• After the scatter plot is drawn, the next steps are to compute the value of the correlation
coefficient and to test the significance of the relationship.
• If the value of the correlation coefficient is significant, the next step is to determine the
equation of the regression line, which is the data’s line of best fit.
15
Regression:
16
Regression:
17
Regression:
18
Example:
19
Example:
20
Example:
21
Example:
22
Next Class…
23