You are on page 1of 20

Domingo | Lopez | Lucido | Ragas

ROBUST REGRESSION

ORDINARY LEAST SQUARES METHOD


(OLS)
Recall: [Classical Linear Regression Model]

where
Yi is the value of the response variable in the ith trial
Xij are the known constants; the value of the jth independent
variable on the ith trial
0, 1, 2, ..., k are parameters
i is a random error term,
(i = 1, 2, ..., n and j = 0, 1, 2, ..., k)
Robust Regression

ORDINARY LEAST SQUARES METHOD


(OLS)
optimizes the fit by making the residuals very small
easy to understand

DISADVANTAGE:
Highly influenced by outliers. OLS is not robust to outliers.
Note: Regression outliers (either in x or in y) pose a serious threat
to standard least squares analysis.
Robust Regression

ORDINARY LEAST SQUARES METHOD


(OLS)
2 ways to solve the problem of regression outliers:
Regression Diagnostics
- only work when there is only a single
outlier
Robust Regression

Robust Regression

ROBUST REGRESSION
An alternative method to the Ordinary least Squares
method
A regression method that is not as sensitive to
outliers when errors are not normally distributed as
opposed to usual assumption that errors in regression
models are normally distributed

Robust Regression

ROBUST REGRESSION
Robustness is the insensitivity to small deviations from the
assumptions the model imposes on the data (Huber, 1981)

A model is robust if it has the following features:


Reasonably efficient and unbiased
Small deviations from the model assumptions will not
substantially impair the performance of the model
Somewhat larger deviations will not invalidate the model
completely

Robust Regression

ROBUST REGRESSION
The Breakdown Point of an estimate is the smallest
fraction of the data that can be changed by an arbitrarily
large amount and still cause an arbitrarily large change in
the estimate.

Robust Regression

Example on the Difference between Ordinary


Least Squares and Least Median Squares (LMS):

Robust Regression

Common Approaches under Robust


Regression
1. Least Median Squares
2. Least Trimmed Squares
LMS and LTS both have a breakdown point of 50%
compared to the OLS which has a breakdown point of 0%.

Robust Regression

LEAST MEDIAN SQUARES (LMS)


Consider the model,

The Ordinary Least Squares Estimator minimizes


where ri are the squared residuals

Robust Regression

10

LEAST MEDIAN SQUARES (LMS)


Objective Function of the LMS estimator:

Minimize
Procedure: (one-dimensional case):

1.
2.
3.
4.

Order the n observations:


Compute for h = n/2 + 1
Compute for yh y1, yh+1 y2, ..., yn yn-h+1
Get the midpoint of the two observations that
yield the smallest difference.
5. The resulting midpoint is the LMS estimate.
Robust Regression

11

Exercises:
Cuteness Rating was taken from random people of
engineering. Here are the observations:

Random Celebrities were asked on how interesting


The Naked Truth was. Here are their responses:

Robust Regression

12

Outliers can also be detected through the


LMS estimator.
Procedure:
1. Compute for the residuals using the LMS estimator.
ri = yi LMS estimator
2. Square each of the residuals
3. Get the mean and standard deviation of the
squared residuals
4. Standardize each of the squared residuals using its
mean and standard deviation
5. Standardized squared residuals that are below -2.5
or above 2.5 are considered outliers. (Rousseeuw)
Robust Regression

13

Remarks:
1. Unlike LS, LMS does not have a closed form formula.
2. Since the median is an order or rank statistic, it is not
amenable to calculation via derivatives or other
calculations that rely on continuous functions.
3. LMS estimator may not be the estimator with the
smallest variance, but it generalizes to multiple
regressions.
4. The position of the LMS estimate lies where the points are
concentrated, not in the center of good observations.
5. The LMS is similar to a mode estimator.
6. For n=3, the LMS estimator is not satisfactory since the
two points have the tendency to be close to each other by
chance making the 3rd one an outlier.
7. The LMS has a 50% breakdown point
Robust Regression

14

LEAST TRIMMED SQUARES (LTS)


The LTS method attempts to minimize the sum of
squared residuals over a subset, k.

The difference between the LTS and LS is that we


trimmed the sum of squares.
Robust Regression

15

LEAST TRIMMED SQUARES (LTS)


How to solve for LTS estimator (one-dimensional case):

Robust Regression

16

Exercises
Nursing Board Exam 2015, Philippine Dairy Inquirer

In her break time, Theresa stood at the entrance of each


college and counted the number of handsome boys
entered the building for 30 minutes. The results are given
below:

Robust Regression

17

Outliers can also be detected through the


LMS estimator.
Procedure:
1. Compute for the residuals using the LMS estimator.
ri = yi LMS estimator
2. Square each of the residuals
3. Get the mean and standard deviation of the
squared residuals
4. Standardize each of the squared residuals using its
mean and standard deviation
5. Standardized squared residuals that are below -2.5
or above 2.5 are considered outliers. (Rousseeuw)
Robust Regression

18

Remarks:
1. The number of observations can be drastically
reduced by using the mean of the preceding half.
2. The residuals are squared first and then ordered.
3. According to Rousseeuw (1998), the LTS
procedure is more efficient than the LMS.
4. The objective function of LTS is similar to the
objective function of LS. The only difference is that
the largest squared residuals are not used in the
summation, thus, allowing the fit to stay away from
the outliers.
5. The LTS also has a 50% breakdown point.
Robust Regression

19

References:
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust
Regression and Outlier Detection. Canada.
Jacoby, Bill. Regression III: Advanced Methods.
Michigan State University
Garner, Will. Robust Regression
Simons, Kenneth. (2013). Useful Stata Commands
(for Stata version 12)
Robust Regression

20