Professional Documents
Culture Documents
(See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2003 for instructions on
making a scatterplot of the data and an alternate method of finding the correlation coefficient and the
equation of the regression line.)
1. Correlation Coefficient. The table below displays the heights (in inches) of a sample of 11
brother‐sister pairs. We will first find just the correlation coefficient for this sample.
Go to Tools/Data Analysis, and choose Correlation:
Click and drag both columns for the Input Range, make sure they are Grouped by Columns, and check
the Labels box if appropriate:
The output below shows that the correlation coefficient is 0.55805; there is a positive linear
relationship between the heights of the brothers and sisters, but this linear relationship is not very
strong.
2. Regression Line. We will treat the brother’s height as the independent variable, X, and the sister’s
height as the dependent variable, Y. To find the equation of the regression line, go to Tools/Data
Analysis, and choose Regression:
Click and drag the data into the appropriate Input Range. Note that the Y (dependent variable) Range
is put in first. Also, check the Labels box if appropriate.
The output is shown below with relevant information highlighted:
First, in the chart under Regression Statistics, we have the coefficient of determination, 0.3114.
This means that 31.34% of the variation in the sisters’ heights is explained by their linear relationship
with the heights of their brothers. Also, in the chart at the bottom of the output display, under
Coefficients, we find ‘Intercept’, the y‐intercept of the regression line, and, next to the name of the
independent variable, ‘Brother’, we find the slope of the regression line. Here, the y‐intercept is
approximately 27.6, and the slope is approximately 0.527. The equation of the regression line, which
we could use to predict the sister’s height from that of her brother, is 0.527 27.6. For each
increase of one inch in the height of the brother, his sister’s height is expected to increase by 0.527
inches.
The information highlighted in blue gives the results of the Linear Regression T‐Test:
We are testing whether there is a significant linear relationship between the brothers’ heights and the
heights of their sisters. Specifically, for ρ, the population correlation coefficient, (or β, the slope of the
regression line for the population,) we test the null hypothesis that 0 β 0 versus the
alternative that ρ ≠ 0 ( β ≠ 0).
The test statistic is t = 2.0175, and the p‐value for the test is t = 0.0744. At the level α = 0.05, we would
fail to reject the null hypothesis and conclude that we don’t have statistically significant evidence of a
linear relationship between the heights of brothers and their sisters.
(If you need to know the standard error of estimate, it can be found under Regression Statistics also;
here, the standard error is 2.247.)
3. Multiple Regression. Suppose that several variables may be related to a person’s salary. The table
below lists salaries, years of employment, years of previous experience, and years of education for a
sample of employees at a certain company. (This data is from Example 1 in section 9.4.)
We go to Tools/Data Analysis, and choose Correlation, as before. This time, we click and drag all four
columns of data:
The results show correlation coefficients between all pairs of variables.
For example, the correlation coefficient between employment and salary is 0.824, the correlation
coefficient between experience and salary is 0.189, and the correlation coefficient between
education and salary is 0.375. We see that the linear relationship between years of employment
and salary is the strongest, and the linear relationship between years of previous experience and salary
is the weakest.
We can also find a regression education that could be used to predict salary (y) from information on
years of employment (x1), years of previous experience (x2), and years of education (x3):
Go to Tools/Data Analysis, and choose Regression as before. This time, only the Salary data is put into
the Y (dependent variable) Range, and the remaining three columns of data are put into the X
(independent variable) Range:
The output is shown below:
The coefficients of the regression line can be found under Coefficients. The regression equation is
49764 364 228 267 .
We can interpret the coefficients in the following way: The coefficient 364 for Employment means that,
for each increase of one year of employment, the predicted salary will increase by about $364.
Similarly, for each year of increase in previous experience, the predicted salary will increase by $228,
and for each year of education, the predicted salary will increase by $267.
We also see that the coefficient of determination is 0.944. This means that about 94.4% of the
variation in salary is explained by its linear relationship with years of employment, experience, and
education. The remaining 5.6% is explained by other factors or chance.