You are on page 1of 60

Regression

Linear Regression
The superintendent of an elementary school district must decide whether to hire
additional teachers, and they wants your advice. Hiring the teachers will reduce the
number of students per teacher (the student–teacher ratio) by two but will increase
the district’s expenses. So they asks you: If they cuts class sizes by two, what will
the effect be on student performance, as measured by scores on standardized tests?

The Model
• This linear function can be
E(TestScore| ClassSize) = β0 + βClassSize * ClassSize

Y= C+MX
Linear Regression
Suppose you have a sample of n districts. Let Yi be the average test score in the ith
district, and let Xi be the average class size in the ith district, so that Equation (1)
becomes
E(Yi|Xi)= β0+ β1 *Xi.
Let µi denote the error made by predicting Yi using its conditional mean. Then
Equation (1) can be written more generally as
Yi= β0+ β1*Xi+µ
Linear Regression
Linear Regression
Linear Regression
Estimating the Coefficients of the Linear Regression Model
In a practical situation such as the application to class size and test scores, the
intercept β0 and the slope β1 of the population regression line are unknown.
Therefore, we must use data to estimate these unknown coefficients.
We do not know the population value of βClassSize, the slope of the unknown
population regression line relating X (class size) and Y (test scores). But it is
possible to learn about the population slope βClassSize using a sample of data.
Linear Regression
• 
Linear Regression
The Ordinary Least Squares Estimator
The estimators of the intercept and slope that minimize the sum of squared mistakes
are called the ordinary least squares (OLS) estimators of β0 and β1.

The OLS regression line, also called the sample regression line or sample regression
function, is the straight line constructed using the OLS estimators
• 
• 
Linear Regression
The Ordinary Least Squares Estimator
• 
Linear Regression
The Ordinary Least Squares Estimator
• 
Linear Regression
• 
Linear Regression
• 
• 
• 
• 
• 
• 
Linear Regression
Linear Regression
Confidence Intervals
Because any statistical estimate necessarily has sampling uncertainty, we cannot
determine the true value of coefficients from a sample of data.
It is possible, however, to use the OLS estimator and its standard error to construct a
confidence interval for the coefficients.
Linear Regression
Confidence Intervals
First , It is the set of values that cannot be rejected using a two-sided hypothesis test
with a (5%) significance level.
Second, it is an interval that has a 95% probability of containing the true value of
coefficient; that is, in 95% of possible samples that might be drawn, the confidence
interval will contain the true value of the coefficient.
Because this interval contains the true value in 95% of all samples, it is said to have
a confidence level of 95%.
Linear Regression
Confidence Intervals
First , It is the set of values that cannot be rejected using a two-sided hypothesis test
with a 5% significance level.
Second, it is an interval that has a 95% probability of containing the true value of
coefficient; that is, in 95% of possible samples that might be drawn, the confidence
interval will contain the true value of the coefficient.
Because this interval contains the true value in 95% of all samples, it is said to have
a confidence level of 95%.
Linear Regression
Confidence Intervals
Linear Regression
Confidence Intervals
Calculations: 95% C.I.= (-2.28+1.96*0.52, -2.28-1.96*0.52 )= (-2.28+1.01, -2.28-
1.01)
=(-1.27, -3.29)

99% C.I= -2.8+1.23*0.52


90% CI= -2.8+2.33*0.52
Multiple Regression
Omitted Variable Bias
By focusing only on the student–teacher ratio, the empirical analysis ignored some
potentially important determinants of test scores by collecting their influences in the
regression error term.
These omitted factors include school characteristics, such as teacher quality and
computer usage, and student characteristics, such as family background.
We begin by considering an omitted student characteristic that is particularly
relevant in California, the prevalence in the school district of students who are still
learning English.
Multiple Regression
Omitted Variable Bias
If the regressor (the student–teacher ratio) is correlated with a variable that has been
omitted from the analysis (the percentage of English learners) and that determines,
in part, the dependent variable (test scores), then the OLS estimator will have
omitted variable bias.
Omitted variable bias occurs when two conditions are true: (1) the omitted variable
is correlated with the included regressor and (2) the omitted variable is a
determinant of the dependent variable.
Multiple Regression
The Multiple Regression Model
The multiple regression model extends the single variable regression model of to
include additional variables as regressors.
When used for causal inference, this model permits estimating the effect on Yi of
changing one variable X1i while holding the other regressors (X2i, X3i, and so
forth) constant.
In the class size problem, the multiple regression model provides a way to isolate
the effect on test scores Yi of the student–teacher ratio X1i while holding constant
the percentage of students in the district who are English learners X2i.
When used for prediction, the multiple regression model can improve predictions by
using multiple variables as predictors.
Multiple Regression
The Population Regression Line
• 
Multiple Regression
The Population Regression Line
Y= β0+ β1*X1+ β2*X2 +u (1)
The interpretation Y1= β0+ β1*X1+ β2*X21 (i)
Y2= β0+ β1*X1+ β2*X22 (ii)

Y2-Y1= β2*(X22-X21)
Δ Y= β2 *Δ X2
β2= Δ Y/ Δ X1
Monthly Spending= b0+b1Salary+b2Inflation+b3Age
Monthly Spending= 1000+0.5S+1000I+25A
• 
• 
Multiple Regression
Multiple Regression
No Perfect Multicollinearity
The fourth assumption is new to the multiple regression model. It rules out an
inconvenient situation called perfect multicollinearity, in which it is impossible to
compute the OLS estimator. The regressors are said to exhibit perfect
multicollinearity (or to be perfectly multicollinear) if one of the regressors is a
perfect linear function of the other regressors. The fourth least squares assumption
is that the regressors are not perfectly multicollinear.
Multiple Regression
Heteroscedasticity
A vector of random variables is heteroscedastic (or heteroskedastic) if the
variability of the random disturbance is different across elements of the vector.
Here, variability could be quantified by the variance or any other measure of
statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. A
typical example is the set of observations of income in different cities.

The existence of heteroscedasticity is a major concern in regression analysis and the


analysis of variance, as it invalidates statistical tests of significance that assume that
the modelling errors all have the same variance.
Multiple Regression
Categorical Variables
Y= β0+ β1*Income Class
Income Class –
L,=0 Base
M,=1
U=2
Dummy variable=IC,
no of dummy= n-1 , n= no. of categories,
dummy variable trap, categories= no of dummy

Y= β0+ β1*M+β2*U
S= β0- 0.2*L+0.5U
I=45000+0.3M
Non-linear Functions
Y= β0+ β1*(Income)^2
P=cL^aK^b
Time Series
• Data collected for a single entity at multiple points in time can be used to answer
quantitative questions for which cross-sectional data are inadequate.
• One such question is, what is the causal effect on a variable of interest, Y, of a
change in another variable, X, over time?
Time series data is data that is collected at different points in time. This is opposed
to cross-sectional data which observes individuals, companies, etc. at a single point
in time.
• Because data points in time series are collected at adjacent time periods there
is potential for correlation between observations.
• This is one of the features that distinguishes time series data from cross-
sectional data.

Time series data can be found in economics, social sciences, finance etc.
Time Series
• 
Time Series
Lags, First Differences
Time Series
Autocorrelation
In time series data, the value of Y in one period typically is correlated with its value
in the next period. The correlation of a series with its own lagged values is called
autocorrelation or serial correlation.
The first autocorrelation (or autocorrelation coefficient) is the correlation between
Y t and Yt - 1—that is, the correlation between values of Y at two adjacent dates.
The second autocorrelation is the correlation between Yt and Yt - 2, and the jth
autocorrelation is the correlation between Yt and Yt - j Similarly, the jth
autocovariance is the covariance between Y and Y .
t t-j
Time Series
Autoregressions
If you want to predict the future, a good place to start is the immediate past. For example, if you
want to forecast the rate of GDP growth in the next quarter, you might use data on how fast
GDP grew in the current quarter or perhaps over the past several quarters as well. To do so, a
forecaster would fit an autoregression.
The First-Order Autoregressive Model
An autoregression expresses the conditional mean of a time series variable Yt as a
linear function of its own lagged values. A first-order autoregression uses only one
lag of Y in this conditional expectation.
The first-order autoregression [AR(1)] model can be written in the familiar form of a regression
model as

Yt = b0 + b1 Yt - 1 + ut
The pth-order autoregressive [AR(p)] model represents the Yt as a linear function of
p of its lagged values:
Yt = b0 + b1Yt - 1 + b2Yt - 2 +………………… bpYt - p + ut
Time Series
Trend, Seasonality and Cyclic Behavior
Trend
A Trend exists when there is a long-term increase or decrease in the data. It does not
have to be linear. Sometimes we will refer to a trend as “changing direction,” when
it might go from an increasing trend to a decreasing trend.
Seasonality
A seasonal pattern occurs when a time series is affected by seasonal factors such as
the time of the year or the day of the week. Seasonality is always of a fixed and
known frequency.
Cyclic Behavior
A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency.
These fluctuations are usually due to economic conditions, and are often related to
the “business cycle.”
Time Series
Trend, Seasonality and Cyclic Behavior
Many people confuse cyclic behavior with seasonal behavior, but they are really
quite different.
If the fluctuations are not of a fixed frequency then they are cyclic; if the frequency
is unchanging and associated with some aspect of the calendar, then the pattern is
seasonal.
In general, the average length of cycles is longer than the length of a seasonal
pattern, and the magnitudes of cycles tend to be more variable than the magnitudes
of seasonal patterns.
Many time series include trend, cycles and seasonality. When choosing a forecasting
method, we will first need to identify the time series patterns in the data, and then
choose a method that is able to capture the patterns properly.
Time Series
Trend, Seasonality and Cyclic Behavior
Time Series
Moving Average (MA)
Moving average is a simple, technical analysis tool. Moving averages are usually
calculated to identify the trend direction of a stock or to determine its support and
resistance levels. It is a trend-following—or lagging—indicator because it is based
on past prices.
The longer the time period for the moving average, the greater the lag. So, a 200-
day moving average will have a much greater degree of lag than a 20-day MA
because it contains prices for the past 200 days. The 50-day and 200-day moving
average figures are widely followed and are considered to be important trading
signals.
Moving averages are a totally customizable indicator, which means that one can
freely choose whatever time frame they want when calculating an average.
The most common time periods used in moving averages are 15, 20, 30, 50, 100,
and 200 days. The shorter the time span used to create the average, the more
sensitive it will be to price changes. The longer the time span, the less sensitive the
average will be.
Time Series
Moving Average (MA)
Moving average is a simple, technical analysis tool. Moving averages are usually
calculated to identify the trend direction of a stock or to determine its support and
resistance levels. It is a trend-following—or lagging—indicator because it is based
on past prices.
The longer the time period for the moving average, the greater the lag. So, a 200-
day moving average will have a much greater degree of lag than a 20-day MA
because it contains prices for the past 200 days. The 50-day and 200-day moving
average figures are widely followed and are considered to be important trading
signals.
Moving averages are a totally customizable indicator, which means that one can
freely choose whatever time frame they want when calculating an average.
The most common time periods used in moving averages are 15, 20, 30, 50, 100,
and 200 days. The shorter the time span used to create the average, the more
sensitive it will be to price changes. The longer the time span, the less sensitive the
average will be.
• 
Time Series

Annual electricity sales to residential customers. 1989–2008

Year Sales (GWh) 5-MA

1989 2354.34

1990 2379.71

1991 2318.52 2381.53

1992 2468.99 2424.56

1993 2386.09 2463.76

1994 2569.47 2552.60

1995 2575.72 2627.70

1996 2762.72 2750.62

1997 2844.50 2858.35

1998 3000.70 3014.70

1999 3108.10 3077.30

2000 3357.50 3144.52

2001 3075.70 3188.70

2002 3180.60 3202.32

2003 3221.60 3216.94

2004 3176.20 3307.30

2005 3430.60 3398.75

2006 3527.48 3485.43

2007 3637.89

2008 3655.00
Time Series
Exponential smoothing of time series data assigns exponentially decreasing weights for newest to
oldest observations . In other words, the older the data, the less priority (“weight”) the data is given;
newer data is seen as more relevant and is assigned more weight.
• Smoothing parameters (smoothing constants)— usually denoted by α— determine the weights for
observations. Exponential smoothing is usually used to make short term forecasts, as longer term
forecasts using this technique can be quite unreliable.
• Simple (single) exponential smoothing uses a weighted moving average with exponentially
decreasing weights.
• Holt’s trend-corrected (double exponential smoothing) is usually more reliable for handling
data that shows trends, compared to the single procedure.
• Triple exponential smoothing (also called the Multiplicative Holt-Winters) is usually more
reliable for parabolic trends or data that shows trends and seasonality..
.
Time Series
Exponential smoothing 
Simple Exponential Smoothing
The basic formula is:

St = αyt-1 + (1 – α) St-1

α = the smoothing constant, a value from 0 to 1.


When α is close to zero, smoothing happens more slowly. Following this, the best value for α is the
one that results in the smallest  Mean Squared Error (MSE).
t = time period.
Time Series
Double Exponential Smoothing
This method is deemed more reliable for analyzing data that shows a trend. In addition, this is a
more complicated method which adds a second equation to the procedure:
The basic formula is:

bt = γ(St – St-1) + (1 – γ)bt-1

γ is a constant that is chosen with reference to α.


Time Series
Triple Exponential Smoothing (Holt-Winters’ method)
If your data shows a trend and seasonality, use triple exponential smoothing. In addition to the
equations for single and double smoothing, a third equation is used to handle the seasonality aspect:

It = Β yt/St + (1-Β)It-L+m


Where:
y = observation,
S = smoothed observation,
b = trend factor,
I = seasonal index,
F = forecast m periods ahead,
t = time period.
Like α and γ, the optimal Β minimizes the MSE.
Time Series
ARIMA model
If we combine differencing with autoregression and a moving average model, we obtain a non-
seasonal ARIMA model. ARIMA is an acronym for Autoregressive Integrated Moving Average (in
this context, “integration” is the reverse of differencing).
The full model can be written As

where y′t is the differenced series (it may have been differenced more than once).
The “predictors” on the right hand side include both lagged values of  and lagged errors.

You might also like