You are on page 1of 22

i

CERTIFICATE

Group No 7
FARHAN ASLAM P19101036
KAMRAN JAWED P19101028
RAYYAN JAMIL P19101054
SHAHID IQBAL P19101064
AYESHA ABRAR P19101012
ASMA HASSAN P19101011
It is certified that the work done in this report entitled, “Multiple Regression Analysis on”, by
above students of MCS Previous (Morning), has been accepted in partial fulfillment of the
requirements of ‘Model and Interference’.

_______________________ _______________________
(Project Advisor) (Project Coordinator)

_______________________ _______________________
Examiner 1 Examiner 2

_____________________________
(Chairman)

ii
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF KARACHI

Table of Contents

Abstract ........................................................................................................................................... iv
List of figures .................................................................................................................................... v
List of Tables .................................................................................................................................... v
List of Symbols ................................................................................................................................. v
CHAPTER 1 – INTRODUCTION ....................................................................................................1
1.1 Background ................................................................................................................................... 1
1.2 Problems ....................................................................................................................................... 1
1.3 Aims and objectives ...................................................................................................................... 2
1.4 Scope of Work .............................................................................................................................. 2
1.5 Thesis Chapters ............................................................................................................................. 3
CHAPTER 2 – LITERATURE REVIEW .........................................................................................4
2.1 Introduction ................................................................................................................................... 4
2.2 Response Variable ........................................................................................................................ 5
2.3 Explanatory Variable .................................................................................................................... 5
2.4 Corelation Analysis ....................................................................................................................... 5
2.5 Regression ..................................................................................................................................... 6
CHAPTER 3 –ANALYSIS BY SPSS: ...............................................................................................7
3.1 Assumptions.................................................................................................................................. 7
3.2 Setup in SPSS ............................................................................................................................. 11
3.3 Test Procedures in SPSS ............................................................................................................. 11
CHAPTER 4-ANALYSIS BY EXCEL............................................................................................ 13
4.1 Data .............................................................................................. 1Error! Bookmark not defined.
4.2 Summary output of Regression from Excel ........................... 1Error! Bookmark not defined.
CHAPTER 5-CONCLUSION AND RECOMMENDATION .......................................................... 14
6.1 Conclusions .................................................................................. Error! Bookmark not defined.4

iii
6.2 Recommendations ....................................................................... Error! Bookmark not defined.4

Abstract

A health researcher wants to be able to predict "VO2max", an indicator of fitness and health with
attributes gender , weight , heart_rate

iv
List of figures

Figure 1 shows an example of correlation graph ___________________ Error! Bookmark not defined.
Figure 2 shows an example of regression graph ____________________ Error! Bookmark not defined.
Figure 3 shows _____________________________________________ Error! Bookmark not defined.
Figure 4 shows _____________________________________________ Error! Bookmark not defined.
Figure 5 shows _____________________________________________ Error! Bookmark not defined.

List of Tables

Table 1 shows Regression Model from SPSS Software ______________ Error! Bookmark not defined.
Table 2 shows Model Summary and Std. Error of the Estimate ________ Error! Bookmark not defined.
Table 3 shows the values of ANOVA from SPSS Softwarem _____________ Error! Bookmark not defined.
Table 4 shows the Coefficients of ANOVA from SPSS Software ______ Error! Bookmark not defined.
Table 5 shows the Data of Oxygen and Heart rate of Patients _____________ Error! Bookmark not defined.
Table 6 Shows the Summary output of regression _____________________ Error! Bookmark not defined.

List of Symbols

Y: Response Variable / Dependent variable

Bi=n: coefficient of determination

C: constant

R: correlation between the predicted values and the observed values of y

R2: coefficient of determination

X: Explanatory variable / predicate variable

v
Chapter 1 Introduction

CHAPTER 1 – INTRODUCTION

1.1 Background

Multiple regression (an extension of simple linear regression) is used to predict the value of a
dependent variable (also known as an outcome variable) based on the value of two or more
independent variables (also known as predictor variables). For example, you could use multiple
regression to determine if exam anxiety can be predicted based on coursework mark, revision
time, lecture attendance and IQ score (i.e., the dependent variable would be "exam anxiety", and
the four independent variables would be "coursework mark", "revision time", "lecture
attendance" and "IQ score"). Alternately, you could use multiple regression to determine if
income can be predicted based on age, gender and educational level (i.e., the dependent variable
would be "income", and the three independent variables would be "age", "gender" and
"educational level"). If you have a dichotomous dependent variable you can use a binomial
logistic regression.

Multiple regression also allows you to determine the overall fit (variance explained) of the model
and the relative contribution of each of the independent variables to the total variance explained.
For example, you might want to know how much of the variation in exam anxiety can be
explained by coursework mark, revision time, lecture attendance and IQ score "as a whole", but
also the "relative contribution" of each independent variable in explaining the variance.

Department of Computer Science


University of Karachi 1
Chapter 1 Introduction

1.2 Problems

1.3 Aims and objectives

Multiple linear regression is the most common form of linear regression analysis. As a predictive
analysis, the multiple linear regression is used to explain the relationship between one continuous
dependent variable and two or more independent variables. The independent variables can be
continuous or categorical (dummy coded as appropriate).

1.4 Scope of Work

Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables.
The objective of multiple regression analysis is to use the independent variables whose values
are known to predict the value of the single dependent value

Department of Computer Science


University of Karachi 2
Chapter 1 Introduction

1.5 Thesis Chapters

1.5.1 Chapter 1

This chapter serves as a general introduction to the basics of multiple regression along with the
reasons why there is need for this kind analysis. The aims and objectives of this research work are
also stated.

1.5.2 Chapter 2

In this chapter, the brief overview of multiple regression is stated along with the details of its
components.

1.5.3 Chapter 3

The Multiple regression analysis by using SPSS software is outlined in this chapter.

1.5.4 Chapter 4

The Multiple regression analysis by using excel is elaborated in this chapter.

1.5.5 Chapter 5

This is the final chapter of this thesis and contains a discussion of the results obtained

Department of Computer Science


University of Karachi 3
Chapter 2 Literature Review

CHAPTER 2 – LITERATURE REVIEW

2.1 Introduction

Multiple regression is an extension of simple linear regression. It is used when we want to


predict the value of a variable based on the value of two or more other variables. The variable
we want to predict is called the dependent variable (or sometimes, the outcome, target or
criterion variable). The variables we are using to predict the value of the dependent variable
are called the independent variables (or sometimes, the predictor, explanatory or regressor
variables).

For example, you could use multiple regression to understand whether exam performance can
be predicted based on revision time, test anxiety, lecture attendance and gender. Alternately,
you could use multiple regression to understand whether daily cigarette consumption can be
predicted based on smoking duration, age when started smoking, smoker type, income and
gender.

Multiple regression also allows you to determine the overall fit (variance explained) of the
model and the relative contribution of each of the predictors to the total variance explained.
For example, you might want to know how much of the variation in exam performance can
be explained by revision time, test anxiety, lecture attendance and gender "as a whole", but
also the "relative contribution" of each independent variable in explaining the variance.

This "quick start" guide shows you how to carry out multiple regression using SPSS
Statistics, as well as interpret and report the results from this test. However, before we
introduce you to this procedure, you need to understand the different assumptions that your
data must meet in order for multiple regression to give you a valid result. We discuss these
assumptions next.

y = b1x1 + b2x2 + … + bnxn + c.

Department of Computer Science


University of Karachi 4
Chapter 2 Literature Review

2.2 Response Variable:

Response variables are also known as dependent variables, y-variables, and outcome variables.
Typically, you want to determine whether changes in the predictors are associated with changes
in the response. For example, in a plant growth study, the response variable is the amount of
growth that occurs during the study. Multiple regression (an extension of simple linear
regression) is used to predict the value of a dependent variable (also known as an outcome
variable) based on the value of two or more independent variables (also known as predictor
variables).

2.3 Explanatory Variable:

An explanatory variable is a type of independent variable. ... When a variable is independent,


it is not affected at all by any other variables. When a variable isn't independent for certain, it's
an explanatory variable.

2.4 Correlation Analysis:

The main purpose of correlation, through the lens of correlation analysis, is to allow
experimenters to know the association or the absence of a relationship between two variables.
When these variables are correlated, you’ll be able to measure the strength of their
association.
Overall, the objective of correlation analysis is to find the numerical value that shows the
relationship between the two variables and how they move together.
One key benefit of correlation is that it is a more concise and clear summary of the
relationship between the two variables than you’ll find with regression.

Department of Computer Science


University of Karachi 5
Chapter 2 Literature Review

Figure 1 shows an example of Correlation graph

2.5 Regression:
On the other hand, regression is how one variable affects another, or changes in a variable
that trigger changes in another, essentially cause and effect. It implies that the outcome is
dependent on one or more variables.

For instance, while correlation can be defined as the relationship between two variables,
regression is how they affect each other. An example of this would be how an increase in
rainfall would then cause various crops to grow, just like a drought would cause crops to
wither or not grow at all.

2.5.1 Regression Analysis:

Regression analysis helps to determine the functional relationship between two variables
(x and y) so that you’re able to estimate the unknown variable to make future projections on
events and goals.

The main objective of regression analysis is to estimate the values of a random variable (z)
based on the values of your known (or fixed) variables (x and y). Linear regression analysis is
considered to be the best fitting line through the data points.

Figure 2 shows an example of Regression graph

Department of Computer Science


University of Karachi 6
Chapter 3 Analysis by SPSS

CHAPTER 3 -Analysis by SPSS:

3.1 Assumptions:
When you choose to analyze your data using multiple regression, part of the process involves
checking to make sure that the data you want to analyze can actually be analyzed using
multiple regression. You need to do this because it is only appropriate to use multiple
regression if your data "passes" eight assumptions that are required for multiple regression to
give you a valid result. In practice, checking for these eight assumptions just adds a little bit
more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when
performing your analysis, as well as think a little bit more about your data, but it is not a
difficult task.

Before we introduce you to these eight assumptions, do not be surprised if, when analyzing
your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., not
met). This is not uncommon when working with real-world data rather than textbook
examples, which often only show you how to carry out multiple regression when everything
goes well! However, don’t worry. Even when your data fails certain assumptions, there is
often a solution to overcome this. First, let's take a look at these eight assumptions:

3.1.1 Assumption # 1:

Your dependent variable should be measured on a continuous scale (i.e., it is either


an interval or ratio variable). Examples of variables that meet this criterion include revision
time (measured in hours), intelligence (measured using IQ score), exam performance
(measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about
interval and ratio variables in our article: Types of Variable. If your dependent variable was
measured on an ordinal scale, you will need to carry out ordinal regression rather than multiple
regression. Examples of ordinal variables include Likert items (e.g., a 7-point scale from
"strongly agree" through to "strongly disagree"), amongst other ways of ranking categories
(e.g., a 3-point scale explaining how much a customer liked a product, ranging from "Not very
much" to "Yes, a lot").

Department of Computer Science


University of Karachi 7
Chapter 3 Analysis by SPSS

3.1.2 Assumption # 2:

You have two or more independent variables, which can be either continuous (i.e.,
an interval or ratio variable) or categorical (i.e., an ordinal or nominal variable). For examples
of continuous and ordinal variables, see the bullet above. Examples of nominal
variables include gender (e.g., 2 groups: male and female), ethnicity (e.g., 3 groups:
Caucasian, African American and Hispanic), physical activity level (e.g., 4 groups: sedentary,
low, moderate and high), profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist),
and so forth. Again, you can learn more about variables in our article: Types of Variable. If
one of your independent variables is dichotomous and considered a moderating variable, you
might need to run a Dichotomous moderator analysis.

Figure 1 shows dried and agglomerated GO/G/PANI nanocomposite

3.1.3 Assumption # 3:

You should have independence of observations (i.e., independence of residuals), which you
can easily check using the Durbin-Watson statistic, which is a simple test to run using SPSS
Statistics. We explain how to interpret the result of the Durbin-Watson statistic, as well as
showing you the SPSS Statistics procedure required, in our enhanced multiple regression
guide.

3.1.4 Assumption # 4:

There needs to be a linear relationship between (a) the dependent variable and each of your
independent variables, and (b) the dependent variable and the independent
variables collectively. Whilst there are a number of ways to check for these linear
relationships, we suggest creating scatterplots and partial regression plots using SPSS
Statistics, and then visually inspecting these scatterplots and partial regression plots to check

Department of Computer Science


University of Karachi 8
Chapter 3 Analysis by SPSS

for linearity. If the relationship displayed in your scatterplots and partial regression plots are
not linear, you will have to either run a non-linear regression analysis or "transform" your
data, which you can do using SPSS Statistics. In our enhanced multiple regression guide, we
show you how to: (a) create scatterplots and partial regression plots to check for linearity
when carrying out multiple regression using SPSS Statistics; (b) interpret different scatterplot
and partial regression plot results; and (c) transform your data using SPSS Statistics if you do
not have linear relationships between your variables.

3.1.5 Assumption # 5:

Your data needs to show homoscedasticity, which is where the variances along the line of
best fit remain similar as you move along the line. We explain more about what this means
and how to assess the homoscedasticity of your data in our enhanced multiple regression
guide. When you analyse your own data, you will need to plot the studentized residuals
against the unstandardized predicted values. In our enhanced multiple regression guide, we
explain: (a) how to test for homoscedasticity using SPSS Statistics; (b) some of the things
you will need to consider when interpreting your data; and (c) possible ways to continue with
your analysis if your data fails to meet this assumption.

3.1.6 Assumption #6: Your data must not show multicollinearity, which occurs when you
have two or more independent variables that are highly correlated with each other. This leads
to problems with understanding which independent variable contributes to the variance
explained in the dependent variable, as well as technical issues in calculating a multiple
regression model. Therefore, in our enhanced multiple regression guide, we show you: (a)
how to use SPSS Statistics to detect for multicollinearity through an inspection of correlation
coefficients and Tolerance/VIF values; and (b) how to interpret these correlation coefficients
and Tolerance/VIF values so that you can determine whether your data meets or violates this
assumption.

3.1.7 Assumption #7: There should be no significant outliers, high leverage points or highly
influential points. Outliers, leverage and influential points are different terms used to
represent observations in your data set that are in some way unusual when you wish to
perform a multiple regression analysis. These different classifications of unusual
points reflect the different impact they have on the regression line. An observation can be
classified as more than one type of unusual point. However, all these points can have a very
negative effect on the regression equation that is used to predict the value of the dependent

Department of Computer Science


University of Karachi 9
Chapter 3 Analysis by SPSS

variable based on the independent variables. This can change the output that SPSS Statistics
produces and reduce the predictive accuracy of your results as well as the statistical
significance. Fortunately, when using SPSS Statistics to run multiple regression on your data,
you can detect possible outliers, high leverage points and highly influential points. In our
enhanced multiple regression guide, we: (a) show you how to detect outliers using "casewise
diagnostics" and "studentized deleted residuals", which you can do using SPSS Statistics, and
discuss some of the options you have in order to deal with outliers; (b) check for leverage
points using SPSS Statistics and discuss what you should do if you have any; and (c) check
for influential points in SPSS Statistics using a measure of influence known as Cook's
Distance, before presenting some practical approaches in SPSS Statistics to deal with any
influential points you might have.

3.1.8 Assumption #8: Finally, you need to check that the residuals (errors) are approximately
normally distributed (we explain these terms in our enhanced multiple regression guide). Two
common methods to check this assumption include using: (a) a histogram (with a
superimposed normal curve) and a Normal P-P Plot; or (b) a Normal Q-Q Plot of the
studentized residuals. Again, in our enhanced multiple regression guide, we: (a) show you
how to check this assumption using SPSS Statistics, whether you use a histogram (with
superimposed normal curve) and Normal P-P Plot, or Normal Q-Q Plot; (b) explain how to
interpret these diagrams; and (c) provide a possible solution if your data fails to meet this
assumption.

You can check assumptions #3, #4, #5, #6, #7 and #8 using SPSS Statistics. Assumptions #1
and #2 should be checked first, before moving onto assumptions #3, #4, #5, #6, #7 and #8.
Just remember that if you do not run the statistical tests on these assumptions correctly, the
results you get when running multiple regression might not be valid. This is why we dedicate
a number of sections of our enhanced multiple regression guide to help you get this right.

In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a multiple
regression assuming that no assumptions have been violated. First, we introduce the example
that is used in this guide.

Department of Computer Science


University of Karachi 10
Chapter 3 Analysis by SPSS

3.2 Setup in SPSS:

In SPSS Statistics, we created five variables: (1) VO2max, which is the maximal aerobic capacity;
(2) age, which is the participant's age; (3) weight, which is the participant's weight (technically, it is
their 'mass'); (4) heart rate, which is the participant's heart rate; (5) gender, which is the
participant's gender, which is the case number.. In our enhanced multiple regression guide, we show
you how to correctly enter data in SPSS Statistics to run a multiple regression when you are also
checking for assumptions. You can learn about our enhanced data setup content on
our Features: Data Setup page. Alternately, see our generic, "quick start" guide: Entering Data in
SPSS Statistics.

3.3 Test Procedures in SPSS:

The seven steps below show you how to analyze your data using multiple regression in SPSS
Statistics when none of the eight assumptions in the previous section, Assumptions, have
been violated. At the end of these seven steps, we show you how to interpret the results from
your multiple regression. If you are looking for help to make sure your data meets
assumptions #3, #4, #5, #6, #7 and #8, which are required when using multiple regression and
can be tested using SPSS Statistics.

3.3.1 Regression from SPSS Software:

Output Created 22-FEB-2021 13:34:41

Comments

Input Active Dataset DataSet0

Filter <none>

Weight <none>

Department of Computer Science


University of Karachi 11
Chapter 3 Analysis by SPSS

Split File <none>

N of Rows in Working Data File 48

Missing Value Handling Definition of Missing User-defined missing values are treated as
missing.

Cases Used Statistics are based on cases with no missing


values for any variable used.

Syntax REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT VO²max

/METHOD=ENTER Gender weight Age


HeartRate.

Resources Processor Time 00:00:00.02

Elapsed Time 00:00:00.25

Memory Required 4080 bytes

Additional Memory Required for


0 bytes
Residual Plots

Table 1 shows Regression Model from SPSS Software

3.3.2 Model Summary

Adjusted R
Model R R Square Square Std. Error of the Estimate

a 0.475a 0.225 0.150 9.07179

a. Predictors: (Constant), Heart Rate, Age, Gender, weight

Table 2 shows Model Summary and Std. Error of the Estimate

Department of Computer Science


University of Karachi 12
Chapter 3 Analysis by SPSS

3.3.3 ANOVA

Model Sum of Squares df Mean Square F Sig.

1 Regression 981.921 4 245.480 2.983 .030b

Residual 3374.190 41 82.297

Total 4356.111 45

Table 3 shows the values of ANOVA from SPSS Software

Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 40.210 8.141 4.939 .000

Gender 4.706 2.721 .242 1.730 .091

weight .027 .073 .059 .365 .717

Age .155 .069 .360 2.250 .030

Heart Rate -.064 .046 -.192 -1.378 .176

Table 4 shows the Coefficients of ANOVA from SPSS Software

Department of Computer Science


University of Karachi 13
Chapter 3 Analysis by SPSS

Department of Computer Science


University of Karachi 14
Chapter 4 Analysis on Excel

CHAPTER 4 -Analysis on Excel

4.1 Data

x3 x2 x1 x4 Y
gender weight age heart-rate VO²max
male 50.1 63 144 35
male 101.1 64 117 40.8
female 47 28 187 35.01
male 88.4 22 109 32
female 90.45 56 184 42
female 62.8 78 111 48.23
female 72.76 89 122 33.58
male 64 29 189 31.99
male 111.1 67 171 49.87
female 50.45 40 111 47.17
male 90.8 45 169 32
male 78 67 172 36.9
female 102.67 87 145 44.27
male 90.92 23 108 37.34
male 55 66 122 38.19
female 62.1 33 177 38.12
female 49.89 21 160 40.9
female 55 30 145 37.09
Male 49.98 24 100 41.82
female 42.02 20 189 45.42
female 59.29 32 107 61.76
Male 89.2 53 158 36.49
Male 60.9 32 133 52.3
Male 111.8 92 178 38.06
Male 58.94 53 141 60
Male 81.01 56 139 33.73
female 73.14 31 186 32
female 69.97 26 144 31.99
Male 62.01 71 109 49.19
Male 71.03 44 181 50
female 90.92 38 117 60.89
female 55 26 161 42
female 44.1 22 114 30.2
female 80 40 177 40.12
Male 73.38 80 116 63.1
Male 103.9 69 187 52.9

Department of Computer Science


University of Karachi 15
Chapter 4 Analysis on Excel

female 90.79 39 100 54


female 64 87 132 60
Male 75.2 29 170 30.8
female 90.4 50 149 47.07
female 123.4 66 119 50.65
Male 56.6 37 106 29.1
Male 115.2 48 151 39.06
Male 101 84 108 44.36
female 112.1 98 169 63.1
Male 56.34 29 130 35.2

Table 5 shows the Data of Oxygen and Heart rate of Patients

4.2 Summary output of Regression from Excel

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.4109734
R Square 0.168899136
Adjusted R 0.109534788
Square
Standard 9.284353555
Error
Observations 46

Table 6 Shows the Summary output of regression

Department of Computer Science


University of Karachi 16
References

CHAPTER 5-CONCLUSION AND RECOMMENDATION

5.1 Conclusions
A multiple regression was run to predict VO2max from gender, age, weight and heart
rate. These variables statistically significantly predicted VO2max, F(4, 95) = 32.39, p <
.0005, R2 = 0.577. All four variables added statistically significantly to the
prediction, p < .05.

Department of Computer Science


University of Karachi 17

You might also like