Chapter 6 (Part I)

CHAPTER 6 (PART I)
Trendlines and Regression Analysis
Prepared by: Nur Liyana Mohamed Yousop

CHAPTER OUTLINES
Introduction
Simple Linear Regression
Multiple Linear Regression
Regression with Categorical Independent Variable

INTRODUCTION
SCOPE OF BUSINESS ANALYTICS
CHAPTER 4
CHAPTER 6 AND 7
CHAPTER 8
PREDICTIVE ANALYTICAL MODELS
Predictive analytical model is

Predictive analytics are executed by processing developed by using
historical data to forecast future happenings. mathematical functions as
following:
Linear Logarithmic Polynomial Power Exponential

function function function function function
PREDICTIVE ANALYTICAL MODELS
Linear function Logarithmic function Polynomial function Power function Exponential function
y=a+bx y=In(x) y=ax2+bx+c y=axb y=abx
Polynomial functions are
functions that have a
Linear functions show
quadratic, a cubic, a
steady increases or Exponential functions
Logarithmic functions quartic and other
decreases over the Power functions are come with a property
are used when the rate properties (all functions
range of x. defined by single where y increases or
of change in a variable plus, minus,
monomials (includes decreases at constantly
increases or decreases multiplication), taking
This is the simplest type number and variables increasing rate.
quickly. just non-negative integer
of function used in that are multiplied
power of x.
predictive models. together, e.g. 3xy) E.g. Continuosly
E.g. Richter scale used
where a≠0 and b>0 compounding interest
to measure earthquake E.g. Business people use
E.g. Demand function (PV and FV)
polynomials to see how
(price and quantity)
rising of a goods will
affect its sales
TYPES OF DATA
Location Time Example
Time series 1 n Malaysia GDP from 2009-2019
India GDP in 2019

Cross-sectional n 1 China GDP in 2019
Japan GDP in 2019
Malaysia GDP from 2009-2019
Pooled @ Panel data n n Japan GDP from 2009-2019
Kore GDP from 2009-2019
TYPES OF DATA
TIME SERIES
CROSS-SECTIONAL DATA
POOLED / PANEL DATA
Our syllabus
For time
series data,
MODE L I NG use a line
R E L ATI ONSHI P S chart.
AND T R E NDS I N For cross-
DATA sectional
data, use a
Create scatter chart.
charts to
better
understand
data sets.
MODELING A PRICE-DEMAND FUNCTION
Linear demand function:

Demand = 20,512 - 9.5116(Price)
EXCEL TRENDLINE TOOL
ORDINARY LEAST SQUARE REGRESSION
TYPES OF VARIABLES
Dependent Variable Independent Variable

• The variable that depends • The variable that is stable
on other variable/s that and unaffected by the
is/are measured other variables you are
trying to measure.
LEAST-SQUARES REGRESSION
Ordinary Least Squares regression (OLS) is more

commonly named linear regression (simple or
multiple depending on the number of explanatory
variables).
Regression is a powerful analysis that can analyze

multiple variables simultaneously to answer
complex research questions.
However, if OLS assumptions are not satisfied, then

results cannot be trusted.
LEAST-SQUARES REGRESSION
To obtain the best-fitted line, minimize the distance between the actual values and the predicted values
through Ordinary Least-Squares method (OLS).
Formula OLS_SLR:
y = ß0 +ß1x
Where;
y : Dependent variable
ß0 : Intercept (often labeled the constant) is the expected mean value of y when all x=0
ß1 : Slope represents the rate of change in y as x changes.
x : Independent variable
REGRESSION ANALYSIS
Regression analysis Simple linear regression Multiple regression
A tool for building

mathematical and
statistical models that
characterize relationships
between a dependent Involves a single Involves two or more
(ratio) variable and one independent variable. independent variables.
or more independent, or
explanatory variables
(ratio or categorical), all
of which are numerical.
POPULATION & SAMPLE REGRESSION
MODELS
Population Random Sample

The fitted line is best on
estimation. There is a
Unknown 𝒀 ෡𝟎 + 𝜷
෡𝒊 = 𝜷 ෡ 𝟏 𝑿𝟏 difference between actual
Relationship value ( 𝑦 ) and the predicted
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊
☺ ☺ value (𝑦).
ො
☺ 𝑦- 𝑦ො = 𝜀𝑖
☺ ☺ Observed error / Residuals
☺ ☺
TEXTBOOK PAGE: 70
☺
SIMPLE LINEAR REGRESSION
• Simple linear regression (SLR) is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables:
• One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
• The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Independent Variable Dependent Variable

Finds a linear relationship

between:
• one independent variable X First prepare a scatter plot to
and; verify the data has a linear Use alternative approaches if
trend. the data is not linear.
• one dependent variable Y
OLS_SLR: STEP BY STEP
DATA: HOME MARKET VALUE STEP 1: Determine Y (DV) and X (IV)

$130,000.00
Size of a house is typically related to its
$120,000.00
market value.
• Y = market value ($) → DV
$110,000.00
• X = square feet → IV
STEP 2: Plot the Scatter Plot

MARKET VALUES
$100,000.00
$90,000.00 The scatter plot of the full data set (42

homes) indicates a linear trend.
$80,000.00
How to adjust scale?

$70,000.00 o Select axis
o Right click → Format axis
$60,000.00
1,400 1,600 1,800 2,000 2,200 2,400 2,600
o Axis option → Change scale
SQUARE FEET
FINDING THE BEST-FITTING
REGRESSION LINE
RELATIONSHIP BETWEEN HOME MARKET VALUES AND SIZE OF SLR Formula:

THE HOUSE (SQUARE FEET) Market value = ß0 + ß(Square feet)
$130,000.00
y = 35.036x + 32673 STEP 3: Find the Best-Fit Regression Line
$120,000.00 R² = 0.5347
$110,000.00 Click Chart → Add Chart Elements → Trendline →

Linear
MARKET VALUES
$100,000.00
Y
$90,000.00
Linear (Y)
$80,000.00
$70,000.00
$60,000.00
1,400 1,600 1,800 2,000 2,200 2,400 2,600
SQUARE FEET
FINDING THE BEST-FITTING
REGRESSION LINE
RELATIONSHIP BETWEEN HOME MARKET VALUES AND

SLR Formula:
SIZE OF THE HOUSE (SQUARE FEET) Market value = ß0 + ß(Square feet)
$130,000.00 STEP 4: Determine the Best Regression Line
y = 35.036x + 32673
R² = 0.5347
$120,000.00
• The regression model explains variation in market value
$110,000.00
due to size of the home.
MARKET VALUES
$100,000.00 • It provides better estimates of market value than simply

Y
using the average.
$90,000.00
Linear (Y)
$80,000.00
• Market value = 32,673 + $35.036 (Square feet)
$70,000.00 • The estimated market value of a home with 2,200

square feet would be:
$60,000.00
1,400 1,600 1,800 2,000 2,200 2,400 2,600
SQUARE FEET • Market value = $32,673 + $35.036(2,200) =
$109,752
SIMPLE LINEAR REGRESSION WITH EXCEL
Data → Data Analysis → Regression

HOME MARKET VALUE REGRESSION
RESULTS
RESULTS
REGRESSION STATISTICS
Analysis Details Interpretation

Value range -1 to 1
o Value > 0 : +ve correlation
Multiple R 0.7313 > 0, +ve correlation
o Value < 0 : -ve correlation
o Value = 0 : no correlation
Variation in the DV explained by IV R2 = 0.5347
R-Squared (R2)
o Value between 0 and 1 53.47% of variation in market values
/ Coefficient of
o Closer to 1, better fit is explained by the size of the house
determination
(square feet)
RESULTS
REGRESSION STATISTICS
Adjusted R2 = 0.5231
o Will be beneficial when the present
Adjusted R-
Modified R2 model is compared with other models
Squared
that incorporate more explanatory
variables.
Standard error of the estimate is the
difference between the observed
(ACTUAL) and ESTIMATED values. SE
Standard Error will be small if the data is close to None
regression line. The SE will be big if the
data is dispersed widely from the
RESULTS
ANALYSIS OF VARIANCE

ANOVA is used to test for significance of
H0: 𝛽1 = 0 (IV has no effect on the DV)
regression:
H1: 𝛽1 ≠ 0 (IV explains variation in DV)
H0: population slope coefficient (𝛽1 ) = 0
F-test = 3.8E-08 < 0.05
H1: population slope coefficient (𝛽1 ) ≠ 0
ANOVA
H rejected
The significance of F-value given in the 0
The slope is not equal to zero. Using a
ANOVA table is the p-value for the F-test.
linear relationship, home size (square
feet) is a significant variable in explaining
If F < the level of significance (normally 5%),
variation in market value
H0 rejected
RESULTS
ANALYSIS OF VARIANCE
Positive • y-value in increases as x-values increase

slope
Negative • y-value decreases as x-values increase

slope
Zero • y-value stays constant as x-values increase

slope
RESULTS
TESTING HYPOTHESIS FOR REGRESSION COEFFICIENT

An alternate method for testing whether a slope or intercept is zero
is to use a t-test: p-values = 0.0000 < α=5%
H0: population slope coefficient (𝛽1 ) = 0 H0 rejected

H1: population slope coefficient (𝛽1 ) ≠ 0
We can conclude that
T-TEST
The test can be computed by using: coefficient is statistically not
equal to zero. Meaning that
෡ −𝛽1
Regression T-Test =
𝛽 1 home size (square feet) has a
𝑆tandard 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝑆𝑙𝑜𝑝𝑒 significant relationship with
market values.
Excel provides the p-values for tests on the slope and intercept.
HOW TO INTERPRET COEFFICIENT?
In our example X is House Size (Square Feet) and Y is Home

Value, thus,
ෝ = 𝟑𝟐, 𝟔𝟕𝟑. 𝟐𝟏𝟗𝟗 + 𝟑𝟓. 𝟎𝟑𝟔𝟒𝑿

𝒚
For coefficient:
If the house size increases by 1 square feet, the home value
increases by $35.0364.
For intercept:
If there is no change in house size, thus, the home value will
be $32,673.2199
RESULTS
CONFIDENCE INTERVAL FOR REGRESSION COEFFICIENT

For the Home Market Value data, it can be concluded that
Confidence intervals (Lower 95%
the true intercept and slopes lies between [14,823, 50,523]
and Upper 95% values in the
and [24.59, 45.48] respectively at α=5% level of
output) provide information about
significance.
the unknown values of the true
regression coefficients, accounting for
CONFIDENCE Although we estimated that a house with 1,750 square feet
sampling error.
INTERVAL has a market value of 32,673 + 35.036(1,750) =$93,986,
if the true population parameters are at the extremes of
Narrower confidence intervals
the confidence intervals, the estimate might be as low as
provide more accuracy in our
14,823 + 24.59(1,750) = $57,855 or as high as 50,523 +
predictions.
45.48(1,750) = $130,113.
RESIDUALS
• Residuals are the observed errors associated with estimating the value of the
dependent variable using the regression line:
𝜀𝑖 = 𝑦𝑖 − 𝑦ො𝑖
RESIDUAL ANALYSIS AND REGRESSION
ASSUMPTIONS
• Residual (ε) = Actual (Observed) Y value − Predicted Y value

• Standard residual = Residual / Standard deviation
• Rule of thumb: Standard residuals outside of ±2 or ±3 are
potential outliers.
• Excel provides a table and a plot of residuals.
This point has a standard

residual of 4.5336
CHECKING ASSUMPTIONS
Assumption Verification Details
Linearity • Examine scatter diagram (should appear If assumption is met:

linear) o Residuals randomly scattered
Linear relationship • Examine residual plot (should appear about zero
between IV and DV random) o Do not exhibit a specific pattern
Normality of Errors • View a histogram of standard residuals If assumption is met:

• Formal Goodness of Fit Test (e.g. Pearson, o Bell-shaped distribution
Errors of all IVs are Chi-square, Jacque-Bera and others)
normally distributed,
mean=0
CHECKING ASSUMPTIONS
Assumption Verification Details

Homoscedasticity • Examine the residual plot If assumption is met:
Constant variance o There will not be dramatic
Variance around the regression differences in the spread of the
line is similar for all the IVs data for different values of the
IVs
Independence of Errors • Durbin Watson Statistics If assumption is met:

(Autocorrelation) o No autocorrelation, if 1.5 ≤ D ≤
The error term for all IVs should 2.5
not be correlated with one
another. If the do, then the • d takes on values between 0 and 4. A value of d = 2
means there is no autocorrelation. A value
problem of autocorrelation substantially below 2 (and especially a value less
persists. than 1) means that the data is positively
autocorrelated. A value of d substantially above 2
means that the data is negatively autocorrelated
CHECKING REGRESSION ASSUMPTIONS FOR
THE HOME MARKET VALUE DATA
• Linearity
• linear trend in scatterplot
• no pattern in residual plot
CONTINUED…
• Normality of Errors • Homoscedasticity

• Residual histogram appears slightly skewed • Residual plot shows no serious difference in the
but is not a serious departure spread of the data for different X values.
• Data→ Data Analysis → Histogram
Square Feet Residual Plot
Histogram-Standard Residual 35000
30 30000
25 25000
20000
20 15000
Residuals
Frequency
10000
15
5000
Frequency
10 0
-50001,300 1,500 1,700 1,900 2,100 2,300 2,500
5 -10000
-15000
0 Square Feet
-3 -2 -1 0 1 2 3 More
BIN
CONTINUED…
• Homoscedasticity
CONTINUED…
• Autocorrelation

Chapter 6 (Part I)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6 (Part I)

Uploaded by

Copyright:

Available Formats

CHAPTER 6 (PART I)

Trendlines and Regression Analysis

Prepared by: Nur Liyana Mohamed Yousop

Simple Linear Regression

Multiple Linear Regression

Regression with Categorical Independent Variable

Predictive analytical model is

Linear Logarithmic Polynomial Power Exponential

Location Time Example

Time series 1 n Malaysia GDP from 2009-2019

India GDP in 2019

POOLED / PANEL DATA

Linear demand function:

Dependent Variable Independent Variable

Ordinary Least Squares regression (OLS) is more

Regression is a powerful analysis that can analyze

However, if OLS assumptions are not satisfied, then

Regression analysis Simple linear regression Multiple regression

A tool for building

Population Random Sample

Independent Variable Dependent Variable

Finds a linear relationship

DATA: HOME MARKET VALUE STEP 1: Determine Y (DV) and X (IV)

STEP 2: Plot the Scatter Plot

$90,000.00 The scatter plot of the full data set (42

How to adjust scale?

RELATIONSHIP BETWEEN HOME MARKET VALUES AND SIZE OF SLR Formula:

$110,000.00 Click Chart → Add Chart Elements → Trendline →

RELATIONSHIP BETWEEN HOME MARKET VALUES AND

$100,000.00 • It provides better estimates of market value than simply

$70,000.00 • The estimated market value of a home with 2,200

Data → Data Analysis → Regression

Analysis Details Interpretation

Analysis Details Interpretation

Positive • y-value in increases as x-values increase

Negative • y-value decreases as x-values increase

Zero • y-value stays constant as x-values increase

TESTING HYPOTHESIS FOR REGRESSION COEFFICIENT

H0: population slope coefficient (𝛽1 ) = 0 H0 rejected

In our example X is House Size (Square Feet) and Y is Home

ෝ = 𝟑𝟐, 𝟔𝟕𝟑. 𝟐𝟏𝟗𝟗 + 𝟑𝟓. 𝟎𝟑𝟔𝟒𝑿

CONFIDENCE INTERVAL FOR REGRESSION COEFFICIENT

• Residual (ε) = Actual (Observed) Y value − Predicted Y value

This point has a standard

Assumption Verification Details

Linearity • Examine scatter diagram (should appear If assumption is met:

Normality of Errors • View a histogram of standard residuals If assumption is met:

Assumption Verification Details

Independence of Errors • Durbin Watson Statistics If assumption is met:

• Normality of Errors • Homoscedasticity

You might also like