You are on page 1of 3

Regression Analysis

In many field researchers are interested in identifying and assessing relationship among
variables. In these studies, it is often possible to distinguish two main types of variables, namely
response variables and predictor variables.
By predictor variables that can either be set to a desired value or else take values that can
be observed but not controllable. As a result of changes that are made or taken place in these
predictor variables an effect is transmitted to the response variables.
The objective of most studies is to assess the relationship between the response variable
and one or more independent variables.
Regression analysis is a statistical tool for evaluating the relationship between one or more
predictor variables to a single continuous response variable.
Measuring relationship between variables
When assessing the relationship between two variables the simplest method is to be taken
at their scatter plot. And one could also quantify the association by using sample correlation
coefficient.
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑟=
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
However, the correlation coefficient is not sufficient to indicate how two variables are related or
the nature of the relationship and it will not provide information to predict values of one variable
given the other variable. This is where regression analysis and models become useful.
Example: A clothing manufacturer collected the following data on the age, 𝑥 (months) and the
maintenance cost, 𝑦 (£) of his sewing machines.
Machine A B C D E F G H I J K
Age (𝑥) 13 75 64 52 90 15 35 82 25 46 50
Maintenance 24 144 110 63 240 20 40 180 42 50 92
cost (𝑦)

a) Plot a scatter diagram of the data.


b) Calculate the correlation coefficient between 𝑥 and 𝑦 and comment on your result.

Simple Linear Regression Model


Simple linear regression modeling involves finding the straight line that best fits the data
that approximates to the true relationship between the response variable and a single predictor
variable.
𝑦 = 𝛼 + 𝛽𝑥 + 𝜀
Where, 𝛼 − Intercept
𝛽 − Slop of a straight line (Regression Coefficient)
𝜀 − Random error
Statistical Assumptions for a Straight-line model
1. Existence
2. Independence
3. Linearity
4. Homoscedasticity
5. Normal Distribution
The fitted line and its components can be illustrated as below,

Determining the best fitting Straight line


Determining the straight line is essentially estimating 𝛼 and 𝛽 by using the least squares
method. Thus, the fitted line is also called as the least square regression line.
Let’s consider the model for 𝑖 𝑡ℎ individual,
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖
Then sum of squares of error can be obtained as,
𝑛 𝑛

∑ 𝜀𝑖2 = ∑(𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )2


𝑖=1 𝑖=1
Therefore, we can obtain the parameter estimates by minimizing this quantity.
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝛽̂ =
∑(𝑥𝑖 − 𝑥̅ )2
𝛼̂ = 𝑦̅ − 𝛽𝑥̅
Example: An automobile manufacturing company wanted to investigate how the price of one of
its car models depreciates with age. The research department at the company took a sample of
eight cars of this model and collected the following information on the ages (in years) and prices
(in hundreds of dollars) of these cars.
Age 8 3 6 9 2 5 6 2
Price 38 220 95 33 267 134 112 245

a) Construct a scatter diagram for these data.


b) Find the regression line with price as a response variable and age as a predictor variable.
c) Give a brief interpretation of your findings.
d) Predict the price of a 7-year-old car of this model.
Exercise: The following table contains information on the amount of time that each of 12 students
spends each day (on average) on social networks and the internet for social or entertainment
purposes and his or her grade point average (GPA).
Time 4.4 6.2 4.2 1.6 4.7 5.4 1.3 2.1 6.1 3.3 4.4 3.5
(Hours
per day)
GPA 3.22 2.21 3.13 3.69 2.7 2.2 3.69 3.25 2.66 2.89 2.71 3.36

a) Construct a scatter diagram for these data.


b) Find the predictive regression line of GPA on time.
c) Give a brief interpretation of your findings.
d) Calculate the predicted GPA for a college student who spends 3.9 hours per day on special
networks and the internet for social or entertainment purposes.
e) Calculate the predicted GPA for a college student who spends 16 hours per day on special
networks and the internet for social or entertainment purposes. Comment on this finding.

You might also like