You are on page 1of 2

# 1

Testing the model for misspecification and robustness and robust statistical
options when assumptions are violated
1. Nonlinearity
Prior to linear regression modeling, use a matrix graph to confirm linearity of relationships.
graph y x1 x2, matrix
When the equation is not intrinsically nonlinear, the dependent variable or independent variable
may be transformed to effect a linearization of the relationship. Semi-log, translog, Box-Cox, or
power transformations may be used for these purposes. Otherwise, resort to Nonlinear regression.
After estimation, "test" is used to perform tests of linear hypotheses on the basis of the variance
covariance matrix of the estimators (Wald tests). "test" bases its results on the estimated
variancecovariance matrix of the estimators (that is, performs a Wald test), so it can be used
after any estimation command. For maximum likelihood estimation, you will have to decide
whether you want to perform tests on the basis of the information matrix instead of constraining
the equation, reestimating it, and then calculating the likelihood-ratio test
2. Testing the Residuals for Normality
We use a Smirnov-Kolmogorov test. The command for the test is:
predict resid, residuals /* Generation of the regression residuals */
Predict rstd, rstandard /* Generation of standardized residuals */
sktest resid
This tests the cumulative distribution of the residuals against that of the theoretical normal
distribution with a chi-square test To determine whether there is a statistically significant
difference. The null hypothesis is that there is no difference. When the probability is less than .05,
we must reject the null hypothesis and infer that the residuals are non-normally distributed.
To deal with nonnormality of residuals, we could Bootstrap the regression coefficients or run:
qreg y x1 x2 /* Quantile regression*/
3. Testing the Residuals for heteroskedasticity
We may graph the standardized or studentized residuals against the predicted scores to obtain a
graphical indication of heteroskedasticity.
rvfplot, border yline(0)
The Cook-Weisberg test is used to test the residuals for heteroskedasticity.
hettest resid
2

An insignificant result indicates lack of heteroskedasticity. That is, an such a result indicates the
presence of equal variance of the residuals along the predicted line. This condition is otherwise
known as homoskedasticity.
To deal with heteroskedasticiy of residuals, we run regression with Huber/White/Sandwich
variance-covariance estimators
Regress y x1 x2, robust /* Regression with robust standard errors for heteroskedasticity*/
Prof. Halbert White showed that heteroskedasticity could be handled in a regression with a
heteroskedasticity-consistent covariance matrix estimator. Whites estimator is for large samples.
4. Testing the residuals for Autocorrelation
When there is first-order autocorrelation of the residuals, e
t
=rho
1
e
t-1
+v
t
. Sources of
Autocorrelation: lagged endogenous variables; misspecification of the model; simultaneity,
feedback, or reciprocal relationships; seasonality or trend in the model.
One can use the command, dwstat, after the regression to obtain the Durbin-Watson d statistic to
test for first-order autocorrelation.
Run the Ljung-Box Q statistic which tests previous lags for autocorrelation and partial
autocorrelation. The STATA command is :
tsset timevar
corrgram resid
One can run Autoregression in the event of autocorrelation
newey depvar [indepvars] [if] [in] [weight] , lag(#) [options]/* Regression with Newey-West
standard errors */
prais depvar [indepvars] [if] [in] [, options]/* Prais-Winsten and Cochrane-Orcutt regression */
5. Outlier detection
Outlier detection involves the determination whether the residual (error = predicted actual) is
an
extreme negative or positive value. We may plot the residual versus the fitted plot to determine
which
errors are large, after running the regression.
rvfplot, border yline(0) /* residual-versus-fitted plot */
To deal with influential outliers, we run robust regression with robust weight functions:
rreg y x1 x2