You are on page 1of 3

GMATH DATA MANAGEMENT

REGRESSION ANALYSIS

Correlation and regression analysis are closely related since both involve relationship between two
variables and they both use paired observations obtained from the same (or matched) subjects. While
correlation is used to determine the degree as well as the direction of relationship between variables,
regression analysis deals with the use of the relationship for forecasting or predicting the value of a dependent
variable.
The primary goal of regression analysis is to develop a statistical (regression) model that will
characterize the association of the variables and also to determine the statistical relationship, if any, between
variables. If the regression model is found to be adequate, it can then be used to estimate or forecast values
of the dependent variable. Before proceeding with regression analysis, a scatter diagram of Y versus X can
be done. It may give an idea of the form of relationship between them.

SIMPLE LINEAR REGRESSION


Simple linear regression attempts to model the relationship between two variables by fitting a linear
equation to observed data. One variable is considered to be a regressor/predictor or independent variable,
and the other is considered to be a response or dependent variable (the variable being predicted). The
simple linear regression model postulates that
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑒
where:
𝑌 = observed value of the dependent/response variable
𝑋 = observed value of the independent/explanatory variable
𝛽0 and 𝛽1 : regression coefficients
𝛽0 = true regression intercept or the value of the response variable when 𝑋 is zero
𝛽1 = true regression slope or the changes (increase if positive or decrease if negative) in the
response variable brought about by an increase of one unit in the independent variable
𝑒 = residual/random error component which captures all other factors affecting the response
variable but were not included in the model

Method of Least Squares


In general, the goal of simple linear regression is to find the line that best predicts 𝑌 from 𝑋, that is, to
find the line 𝒀 = 𝒂 + 𝒃𝑿 (fitted regression line) that best estimates the regression model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑒 by
determining 𝑎 and 𝑏 that best estimate 𝛽0 and 𝛽1 . The value of the slope 𝑏 and 𝑦-intercept 𝑎 can be obtained
using the method of least squares, as follows:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 ∑𝑦 ∑𝑥
𝑏= 2 2
𝑎= −𝑏 = 𝑦̅ − 𝑏𝑥̅
𝑛 ∑ 𝑥 − (∑ 𝑥) 𝑛 𝑛

Simple linear regression adjusts the values of the slope and intercept to find the line that best fits the
data. More precisely, the goal of regression is to minimize the sum of the squares of the vertical distances of
the points from the line.

The Coefficient of Determination


The coefficient of determination, 𝑟 2 , is used to determine the proportion of the variance (fluctuation)
of one variable that is predictable from the other variable. It allows us to determine how certain one can be
in making predictions from a certain model/graph. It has values from 0 to +1, and measures how well the
regression line represents the data (the percent of the data that is the closest to the line of best fit). That is, 𝑟 2
is the proportion of the total variation in the dependent variable 𝑦 that is explained, or accounted for, by the
variation in the independent variable 𝑥.
For example, if 𝑟 = 0.922, then 𝑟 2 = 0.850. This means that 85% of the total variation in 𝑦 can be explained
by the linear relationship between 𝑥 and 𝑦. The other 15% of the total variation in 𝑦 remains unexplained. If the
regression line passes exactly through every point on the scatter plot, it would be able to explain all of the
variation. The further the line is away from the points, the less it is able to explain.
GMATH DATA MANAGEMENT

EXAMPLES:
1. A study was made by a retail merchant to determine the relation between weekly advertising
expenditures and sales. The following data were recorded:

Advertising Sales Advertising Sales


Costs ($) ($) Costs ($) ($)
40 385 40 490
20 400 20 420
25 395 50 560
20 365 40 525
30 475 25 480
50 440 50 510

a. Plot a scatter diagram.


b. Find the equation of the regression line to predict weekly sales from advertising expenditures.
c. Compute the coefficient of correlation. Interpret.
d. Compute the coefficient of determination. Interpret.
e. Estimate the weekly sales when advertising costs are $35.

2. In the 1990’s, research efforts have focused on the problem of predicting a manufacturer’s market
share using information on the quality of its product. Suppose that the following data are available on
market share, in percentage (𝑌), and product quality, on scale of 0 to 100, determined by an objective
evaluation procedure (𝑋).

X 27 39 73 66 33 43 47 55 60 68 70 75
Y 2 3 10 9 4 6 5 8 7 9 10 13

a. Draw the scatter diagram.


b. Estimate the simple linear regression relationship between market share and product quality rating.
Graph the line.
c. Compute the coefficient of correlation. Interpret.
d. Compute for the coefficient of determination. Interpret.
e. Estimate the market share when the product quality is 95.

3. The paired data below consist of the costs of advertising (in thousands of dollars) and the number of
products sold (in thousands).

Cost 9 2 3 4 2 5 9 10
Number 85 52 55 68 67 86 83 73

a. Plot a scatter diagram.


b. Find the equation of the regression line to predict weekly sales from advertising expenditures.
c. Compute the coefficient of correlation. Interpret.
d. Compute the coefficient of determination. Interpret.
e. Estimate the number of products sold when advertising costs are $4500.
GMATH DATA MANAGEMENT

4. An article in Business Week listed the “Best Small Companies” with its sales and earnings. A random
sample of 12 companies was selected and the sales and earnings, in millions of dollars, are reported
below.
Small Sales Earnings
Company (in million $) (in million $)
1 89.2 4.9
2 18.6 4.4
3 18.2 1.3
4 71.7 8.0
5 58.6 6.6
6 46.8 4.1
7 17.5 2.6
8 11.9 1.7
9 19.6 3.5
10 51.2 8.2
11 28.6 6.0
12 69.2 12.8

a. Plot a scatter diagram.


b. Find the equation of the regression line to predict earnings from sales.
c. Compute the coefficient of correlation. Interpret.
d. Compute the coefficient of determination. Interpret.
e. For a small company with $50 million in sales, estimate the earnings.

You might also like