You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341774602

AN INTRODUCTION TO REGRESSION ANALYSIS is now available on


kindle

Book · May 2020

CITATIONS READS
0 1,912

1 author:

Anusha Illukkumbura
University of Moratuwa
3 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Easy Statistics Series View project

All content following this page was uploaded by Anusha Illukkumbura on 31 May 2020.

The user has requested enhancement of the downloaded file.


AN INTRODUCTION TO REGRESSION ANALYSIS

i
Acknowledgement

"Portions of information contained in this publication/book are


printed with permission of Minitab, LLC. All such material
remains the exclusive property and copyright of Minitab, LLC. All
rights reserved."

Reproduction and distribution of the book without written


permission of the writer is prohibited.

First Edition May 2020

ii
Introduction to Regression Analysis
(Using Manual Calculations, MINITAB and R)

Anusha Illukkumbura
MSc. Business Statistics (University of Moratuwa, Sri Lanka)
B.A. Social Statistics (University of Kelaniya, Sri Lanka)

iii
Preface

Regression analysis is extensively used as a statistical data


modeling and estimating method in Statistics. Regression
analysis is applicable for variables with linear or non linear
relationships.

This book covers basic and major topics related to


 Simple Linear Regression
 Non Linear Regression
 Multi Linear Regression
in simple language with simple examples, so that even a
beginner can easily comprehend without much effort. Most
importantly complex calculations are presented step by step in
an uncomplicated manner. The examples are solved using
manual calculations and statistical software such as Minitab and
R (RStudio Version 4.0.0). Necessary commands are explicitly
presented.

Knowledge of calculating multiple types of regression models is


essential for researchers. I expect Introduction to Regression
Analysis will be resourceful to students, instructors and the
researchers of applied and social sciences.

This book can be used as a self-study material and a text book.

Any suggestions to further improve the contents of this edition


would be warmly appreciated.

Anusha Illukkumbura.
May 2020

iv
Table of Content

CHAPTER ONE : CORRELATION ................................................................ 1

1.1 Introduction .................................................................................... 1


1.1 Scatter Diagram ............................................................................. 1
1.3 Karl Pearson’s Correlation Coefficient ................................. 4
1.4 Spearman’s Rank CorrelationError! Bookmark not
defined.
1.5 Significance of Correlation CoefficientError! Bookmark
not defined.
1.6 Correlation Matrix ............... Error! Bookmark not defined.

CHAPTER TWO: SIMPLE LINEAR REGRESSIONError! Bookmark


not defined.

2.1 Introduction ........................... Error! Bookmark not defined.


2.2 Simple Linear Regression . Error! Bookmark not defined.
2.3 Significance of Parameter . Error! Bookmark not defined.
2.4 Significance of the Model .. Error! Bookmark not defined.
2.5 Confidence Intervals for ParametersError! Bookmark
not defined.
2.6 Coefficient of Determination (R sq/ R2) .................... Error!
Bookmark not defined.
2.8 Coefficient of Variation ...... Error! Bookmark not defined.

CHAPTER THREE: RESIDUALS .......... Error! Bookmark not defined.

3.1 Introduction ........................... Error! Bookmark not defined.


3.2 Residuals have Zero Mean Error! Bookmark not defined.
3.3 Residuals have constant VarianceError! Bookmark not
defined.
3.4 Residuals are uncorrelatedError! Bookmark not
defined.
3.5 Residuals are normally distributed.Error! Bookmark
not defined.

v
CHAPTER FOUR: NON LINEAR REGRESSIONError! Bookmark not
defined.

4.1 Introduction ........................... Error! Bookmark not defined.


4.2 Exponential model .............. Error! Bookmark not defined.
4.2 Quadratic Regression ModelError! Bookmark not
defined.

CHAPTER FIVE: MULTI LINEAR REGRESSIONError! Bookmark


not defined.

5.1 Multi linear regression modelError! Bookmark not


defined.
5.2 Matrix calculation ................ Error! Bookmark not defined.
5.3 Analysis of Variance Table (ANOVA)Error! Bookmark
not defined.
5.4 Partial F test ........................... Error! Bookmark not defined.
5.5 Multi collinearity.................. Error! Bookmark not defined.
5.6 Variable Selection ProceduresError! Bookmark not
defined.
5.7 Best Subset Regression...... Error! Bookmark not defined.

vi
EXAMPLES
Example 1.1: .................................................................................................. 5

Example 1.2: ......................................... Error! Bookmark not defined.

Example 1.3: ......................................... Error! Bookmark not defined.

Example 1.4 .......................................... Error! Bookmark not defined.

Example 1.5 .......................................... Error! Bookmark not defined.

Example 2.1 .......................................... Error! Bookmark not defined.

Example 2.2 .......................................... Error! Bookmark not defined.

Example 2.3 .......................................... Error! Bookmark not defined.

Example 2.4 .......................................... Error! Bookmark not defined.

Example 2.5 .......................................... Error! Bookmark not defined.

Example 2.6 .......................................... Error! Bookmark not defined.

Example 3.1 .......................................... Error! Bookmark not defined.

Example 3.2 .......................................... Error! Bookmark not defined.

Example 3.3 .......................................... Error! Bookmark not defined.


vii
Example 3.4 .......................................... Error! Bookmark not defined.

Example 4.1 .......................................... Error! Bookmark not defined.

Example 4.2 .......................................... Error! Bookmark not defined.

Example 4.3 .......................................... Error! Bookmark not defined.

Example 5 .1 ......................................... Error! Bookmark not defined.

Example 5.2 .......................................... Error! Bookmark not defined.

Example 5.4 .......................................... Error! Bookmark not defined.

Example 5.5 .......................................... Error! Bookmark not defined.

viii
CHAPTER ONE : CORRELATION

1.1 Introduction
Correlation is used to measure the mutual relationship between
two or more variables. Correlation coefficient is the numerical
measure of the relationship between two variables. Correlation
shows the presence of a relationship between variables, strength
of the relationship and its direction. It can be illustrated in graphs
which simplify the interpretation. Correlation demonstrates the
relationship between the measurable variables, but it doesn’t
identify the cause of the relationship. Therefore there can be
underlying variables which affects the relationship.
Scatter diagrams, Karl Pearson’s correlation coefficient and
Spearsman’s Rank correlation are few methods which used
measure the correlation. Correlation coefficient of sample is
represented by “r”, Correlation coefficient of population is
represented by “ρ”. Correlation Coefficient ranges from +1 to -1. If
the correlation coefficient is +1, there is a perfect positive
correlation between two variables. When correlation coefficient is
-1 there is a perfect negative correlation between two variables.
When there is no correlation at all the correlation coefficient is 0.
When it is below + 0.5 relationship considered to be not strong. On
the other hand when it is above + 0.75 relationship considered to
be strong. Between + 0.5 to + 0.75 it is considered to have a
moderate correlation between variables.

1.1 Scatter Diagram

Scatter diagram is the method of drawing a graph with (x,y)


coordinates. According to the spread of the coordinates, one can
easily guess the direction and the strength of the relationship. X is
used for independent variables and Y is used for dependent
variables. Dependent variable is the variable affected by one or
more independent variables. Independent variables are the cause
of the dependent variable. When there are multiple independent
variables they are represented by x1 ,x2,x3 , ….,xn.
Followings are few examples for scatter plot diagrams.

1
Figure 1.1 : Perfect Positive Correlation
Scatterplot of y vs x

16

14

12

10
y

2
2 4 6 8 10 12 14 16
x

Figure 1.2: Perfect Negative Correlation


Scatterplot of y vs x
20

15

10
y

0
0 5 10 15 20
x

Figure 1.3 : Strong Positive Correlation


Scatterplot of y vs x

15.0

12.5

10.0
y

7.5

5.0

2 4 6 8 10 12 14 16
x

2
Figure 1.4: Strong Negative Correlation
Scatterplot of ln y vs x
6.0

5.5

5.0

4.5
ln y

4.0

3.5

3.0

2.5
0 2 4 6 8 10 12 14 16
x

Figure 1.5: Moderate Positive Correlation


Scatterplot of Y vs X
0.48

0.46

0.44

0.42
Y

0.40

0.38

0.36
11 12 13 14 15 16 17
X

Figure 1.6: Moderate Negative Correlation


Scatterplot of Y vs X

16

14

12

10

8
Y

0
0 5 10 15 20
X

Figure 1.7: Non Linear Correlation

3
Scatterplot of Y vs X
30

25

20
Y

15

10

1 2 3 4 5 6 7
X

Figure 1.8: No Correlation


Scatterplot of y vs x
60

50

40

30

20
y

10

-10

-20
0 2 4 6 8 10 12 14 16 18
x

When there is more than one independent variable used in a regression, relationship
among independent variables; relationship between independent variables and dependent
variable should be analyzed using a correlation matrix.

1.3 Karl Pearson’s Correlation Coefficient

Karl Pearson’s correlation coefficient measures both strength and direction of the
relationship. The coefficient gives a measurable value to the strength of relationship. Karl
Pearson’s coefficient is calculated assuming there are no other factors influencing the
dependent variable other than one dependent variable.

Covariance of (x,y) or Cov(x,y) is equal to E[(x- x)- (y-y)], on assumption of variances are
𝑐𝑜𝑣 (𝑥,𝑦 )
positive, correlation of (x,y) = , where sd is standard deviation
𝑠𝑑 𝑥 𝑠𝑑 (𝑦 )

Below equations which can be used to calculate the correlation are based on the covariance
of relationship.

4
xy − n ∗ x ∗ y
r=
(x 2 − nx 2̅ )(y 2 − y 2̅ )

[ x − x (y − y)]
r=
(x − x)2 (y − y)2

nxy − xy
r=
nx 2 − (x)2 ny 2 − (y)2

Example 1.1:
Given below is the data set of marks scored for the year end mathematics examination
and hours spent for mathematic homework per week. Find out if there is a relationship
between these variables and describe the nature of the relationship.
x 1 2 3 4 5
y 35 60 75 85 95
x- hours spent for mathematic homework per day
y- marks scored for the year end mathematics examination

Answer
Calculate ∑x , ∑y, ∑xy, ∑x2 , ∑y2 as explained in table 1.1.

Table 1.1: Descriptive Statistics


x y xy x2 y2
1 35 35 1 1225
2 60 120 4 3600
3 75 225 9 5625
4 85 340 16 7225
5 95 475 25 9025
Total 15 350 1195 55 26700

View publication stats

You might also like