You are on page 1of 43

Regression Analysis

JALLY S. MANOLONG
ROSEMARIE T. REPONTE
CHECHE R. POTICAR
EMELOU D. MOSURA
MARY JANE P. AURORA
AILYN V. MIBATO
Topics

01 02
MULTIPLE LINEAR BINARY LOGISTIC
REGRESSION REGRESSION
1. When to Use the Test
2. Sample Statement of the Problem
3. Sample Data Set
4. Sample Conceptual Framework
5. Assumptions of the Test
6. SPSS Simulation
7. Interpret SPSS Output
8. Report the Findings
MULTIPLE LINEAR
REGRESSION
Usage

Multiple regression is a statistical technique used


to explore the predictive ability of a set of
independent variables on one dependent variable.
As a statistical tool, multiple regression is often
used to accomplish three objectives.
Applications/Examples
 You want to understand whether exam performance can be
predicted based on revision time, test anxiety, lecture
attendance and gender.
 You want to understand whether daily cigarette
consumption can be predicted based on smoking duration,
age when started smoking, smoker type, income and
gender.
ASSUMPTIONS TESTS CUT-OFF
Linearity Scatterplot/Pearson r p<.05 alpha
r>.70
Homoscedasticity of Scatterplot of the Residuals The data looks like you shot it out of a
shotgun—it does not have an obvious
Residuals pattern.

Independence of Error
Terms (Residuals) or Durbin-Watson d statistic between the two critical
No linear auto-correlation values of 1.5 < d < 2.5

Normality of Dependent One Sample Kolmogorov- p>.05 alpha


Variable and Residuals Smirnov Test (IV)
Normal Probability Plot
(Residuals)
No Multicollinearity Correlation Coefficients, r<.70
Variance Inflation Factor VIF<10
Sample Research Question/Problem

How much of the variance in GPA can be


explained by the following set of variables:
students’ intelligence quotient, study hours, and
sleep hours? Which of these variables is a better
predictor of GPA?
Sample Conceptual Framework
SAMPLE STATEMENT OF THE PROBLEM
1. What is the profile of the students in terms of the following:
1.1. Grade Point Average;
1.2. intelligence quotient score;
1.3. study hours;
1.4. sleeping hours?
2. Do intelligence quotient score, study hours, and sleep hours significantly
predict students’ Grade Point Average?
3. What is the extent of influence of students’ intelligence quotient score,
study hours, and sleep hours to their Grade Point Average?
MULTIPLE LINEAR REGRESSION SAMPLE DATA SET
Null Hypothesis
Ho: Students’ intelligence quotient score, study hours,
and sleep hours do not significantly predict students’
Grade Point Average.
Or
Ho: Students’ intelligence quotient, study hours, and
sleep hours have no significant influence on students’
Grade Point Average.
Check Assumptions
 Parsimony
 Theoretically Consistent
 Dependent Variable must be continuous (interval/ratio)
 Linearity of IVs on DV
 Homoscedasticity (Equality of Variance) of Residuals
 Normality of Dependent Variable and Residuals
 No Multicollinearity
 Independence of Error Terms (Residuals)
Linearity of IVs on DV
Homoscedasticity of Residuals
Independence of Error Terms
Normality of Dependent Variable
No Multicollinearity
Interpreting the Results
Evaluating the multiple regression model
The value given under the heading R square tells you how much of the variance
in the dependent variable is explained by the model (independent variables or
predictors). In this example, 81.7% in the variation of GPA can be attributed to
IQ, study hours, and sleep hours.
Significance of the model
 The statistical significance is given by the Sig value in the ANOVA table.
Evaluating each of the independent variables
The Beta values indicate which variable makes the strongest unique contribution
to explaining the dependent variable, when the variance explained by all other
variables in the model is controlled for. For each variable, check the Sig value to
assess whether the variable is making a statistically significant unique
contribution to the model.

***A 1 hour/unit increase in study hours is a 1 point/unit increase in GPA.


Reporting the Findings

A multiple regression was run to predict GPA from IQ, study


hours, and sleep hours. These variables significantly predicted
GPA, F(3, 36) = 53.412, p<.001, R2 = .817. All four variables
added significantly to the prediction, p < .05.
The Model Equation
Model Manipulation and Prediction
Predicted GPA = (31.770) + (1.505)(4) + (.351)(112) + (1.381)(8)
Predicted GPA = 88
BINARY LOGISTIC REGRESSION
Usage
Logistic regression allows one to predict a discrete outcome,
such as group membership, from a set of variables that may be
continuous, discrete, dichotomous, or a mix of any of these.
Generally, the dependent variable is dichotomous, such as
male/female, smoker/nonsmoker or success/failure.
Assumptions
Logistic regression is more flexible in that it has no
assumptions about the distributions of the predictor variables
—the predictor variables do not have to be normally
distributed, linearly related, or of equal variance/covariance
across the groups.
Sample Research Problem
A teacher wants to know if a student would pass or fail in
Science after recording their intelligence quotient score, study
hours, and sleep hours.
Sample Conceptual Framework
SAMPLE STATEMENT OF THE PROBLEM
1. What is the profile of the students in terms of the following:
1.1. academic status;
1.2. intelligence quotient score;
1.3. study hours;
1.4. sleeping hours?
2. What factors significantly determine students’ academic status? are the
socio-demographic profiles of banana farmers in terms of:
BINARY LINEAR REGRESSION SAMPLE DATA SET
Null Hypothesis
Ho. Intelligence quotient score, study hours, and sleep hours do
not significantly determine students’ academic status.
Interpreting the Results
Null Hypothesis: Adding the variables (IQ score, study hours, and sleep time) to
the model has not significantly increased our ability to predict the decisions made
by the respondents.
*The full model has a significant prediction performance (χ2 = 28.338; df= 3; p< .
001).
The H-L tests are the null hypothesis that the predictions made by
the model fit perfectly with the observed academic status. This
indicated that the predicted academic status correspond closely to
the actual academic status, indicating a good model fit.

The model has also good fit since Hosmer and Lemeshow Test could
not reject the hypothesis of model appropriateness, as Chi-square
value is .851 and p = .999.
Given the set of independent variables, the full model
explains about 50.8% to 75.2% variation on the
dependent variable.
The performance of the full model. About 93.3% of the observed
students who will pass were correctly predicted by the model.
The results of the binary logistic regression model to determine the most critical
factors that determine students’ academic status.
Based on the results, significantly predicts the likelihood of students to pass
(Wald=4.307;p<.05;Wald=4.824;p<.05; Wald=4.855;p<.05) respectively.

*With one unit increase in study hour, the probability to pass increases 12.9 times.
*Those students whose study 2 hours or more per day are more likely to pass than
those who study less.
Reporting the Result
A logistic regression was performed to ascertain the effects of IQ score, study
hour, and sleep hour on the likelihood that students will pass in Science. The
logistic regression model was statistically significant, χ2 = 28.338; df= 3; p< .
001. The model explained 75.2% (Nagelkerke R2) of the variance in passing a
subject and correctly classified 85.7% of cases. Intelligence quotient score,
study hours, and sleep hours significantly determine students’ academic
status. Those students whose study 2 hours or more per day and have
higher IQ score are more likely to pass than those who study less. However,
students’ who have less sleeping time are more likely to pass in Science.
References
Agresti, A. (2018). Statistical methods for the social sciences. Boston: Pearson.
Darlington, R. B., &amp; Hayes, A. F. (2017). Regression analysis and linear
models: Concepts, applications, and implementation. New York, NY: The Guilford
Press.
Healey, J. F. (2020). Statistics: A tool for social research. Belmont: Wadsworth.
https://www.statisticssolutions.com/what-is-multiple-linear-regression/
https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-
statistics.php
https://www.statisticssolutions.com/what-is-logistic-regression/
Thank you very much!

You might also like