Professional Documents
Culture Documents
and Correlation
Learning Objectives:
1. Calculate the Pearson product-moment correlation coefficient to determine if there is
a correlation between two variables.
2. Explain what regression analysis is and the concepts of independent and dependent
variable.
3. Calculate the slope and y-intercept of the least squares equation of a regression line
and from those, determine the equation of the regression line.
4. Calculate the residuals of a regression line and from those determine the fit of the
model, locate outliers, and test the assumptions of the regression model.
5. Calculate the standard error of the estimate using the sum of squares of error, and use
the standard error of the estimate to determine the fit of the model.
6. Calculate the coefficient of determination to measure the fit for regression models,
and relate it to the coefficient of correlation.
7. Use the t and F tests to test hypotheses for both the slope of the regression model
and the overall regression model.
8. Calculate confidence intervals to estimate the conditional mean of the dependent
variable and prediction intervals to estimate a single value of the dependent variable.
9. Determine the equation of the trend line to forecast outcomes for time periods in the
future, using alternate coding for time periods if necessary.
10. Use a computer to develop a regression analysis, and interpret the output that is
associated with it.
Correlation
Itis a measure of the degree of relatedness
of variables.
Coefficient of correlation, r
This measure is applicable only if both
variables being analyzed have at least an
interval level of data.
r (Pearson product-moment
correlation coefficient)
The term r is a measure of the linear correlation of
two variables.
It is a number that ranges from -1 to 0 to +1,
representing the strength of the relationship
between the variables.
An r value of +1 denotes a perfect positive
relationship between two sets of numbers.
An r value of -1 denotes a perfect negative
correlation, which indicates an inverse relationship
between two variables: as one variable gets larger,
the other gets smaller.
An r value of 0 means no linear relationship is
present between the two variables.
Pearson Product-Moment
Correlation Coefficient
Five correlations
Sample Problem
Introduction to Simple Regression
Analysis
Itis is the process of constructing a
mathematical model or function that can
be used to predict or determine one
variable by another variable or other
variables.
It involves two variables in which one
variable is predicted by another variable.
Dependent variable (y)
the variable to be predicted
Probabilistic Model
It includes an error term that allows for
the y values to vary for any given value
Random error will occur in the
prediction of the y values for
values of x because it is likely that
the variable x does not explain all
the variability of the variable y.
Least square analysis
It is a process whereby a regression
model is developed by producing the
minimum sum of the squared error
values.
Heteroscedasticity
error variance are not constant
Constant Variance
Non-constant variance
Sample Problem
Compute the residuals for table in which a
regression model was developed to predict
the number of full-time equivalent workers
(FTEs) by the number of beds in a hospital.
Table
Solution
The data and computed residuals are
shown in the following table.
Continuation of the solution
Note that the regression model fits these particular data
well for hospitals 2 and 5, as indicated by residuals of -.62
and 1.37 FTEs, respectively. For hospitals 1, 8, 9, 11, and 12,
the residuals are relatively large, indicating that the
regression model does not fit the data for these hospitals
well. The Residuals Versus the Fitted Values graph indicates
that the residuals seem to increase as x increases,
indicating a potential problem with heteroscedasticity. The
normal plot of residuals indicates that the residuals are
nearly normally distributed. The histogram of residuals
shows that the residuals pile up in the middle, but are
somewhat skewed toward the larger positive values.
Residual plot for FTEs
Sum of squares of error (SSE)
Thetotal of the residuals squared
column.
Trend
It is a long-term general direction of data.