Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
Linear Regression

Linear Regression

Ratings: (0)|Views: 76|Likes:

Availability:

See more
See less

08/31/2010

pdf

text

original

Linear regression - Wikipedia, the free encyclopediahttp://en.wikipedia.org/wiki/Linear_regression1 of 89/10/2005 2:15 PM
Linear regression
linear regression
is a method of estimating the conditional expected value of one variable
y
given the valuesof some other variable or variables
x
. The variable of interest,
y
, is conventionally called the "dependent variable". Theterms "endogenous variable" and "output variable" are also used. The other variables
x
are called the "independentvariables". The terms "exogenous variables" and "input variables" are also used. The dependent and independent variablesmay be scalars or vectors. If the independent variable is a vector, one speaks of
multiple linear regression
.The term
independent
variable suggests that its value can be chosen at will, and the
dependent
variable is an effect, i.e.,causally dependent on the independent variable, as in a stimulus-response model. Although many linear regression modelsare formulated as models of cause and effect, the direction of causation may just as well go the other way, or indeed thereneed not be any causal relation at all.
Regression
, in general, is the problem of estimating a conditional expected value. Linear regression is called "linear" because the relation of the dependent to the independent variables is assumed to be a linear function of some parameters.Regression models which are not a linear function of the parameters are called nonlinear regression models. A neuralnetwork is an example of a nonlinear regression model.Still more generally, regression may be viewed as a special case of density estimation. The joint distribution of thedependent and independent variables can be constructed from the conditional distribution of the dependent variable andthe marginal distribution of the independent variables. In some problems, it is convenient to work in the other direction:from the joint distribution, the conditional distribution of the dependent variable can be derived.
Contents
1 Historical remarks2 Justification for regression3 Statement of the linear regression model4 Parameter estimation4.1 Robust regression4.2 Summarizing the data4.3 Estimating beta4.4 Estimating alpha4.5 Displaying the residuals4.6 Ancillary statistics5 Multiple linear regression6 Scientific applications of regression7 See also8 References8.1 Historical8.2 Modern theory8.3 Modern practice9 External links
Historical remarks
The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and byGauss in 1809. The term "least squares" is from Legendre's term,
moindres carrés
. However, Gauss claimed that he had

Linear regression - Wikipedia, the free encyclopediahttp://en.wikipedia.org/wiki/Linear_regression2 of 89/10/2005 2:15 PM
known the method since 1795.Legendre and Gauss both applied the method to the problem of determining, from astronomical observations, the orbits of  bodies about the sun. Euler had worked on the same problem (1748) without success. Gauss published a further development of the theory of least squares in 1821, including a version of the Gauss-Markov theorem.The term "reversion" was used in the nineteenth century to describe a biological phenomenon, namely that the progeny of exceptional individuals tend on average to be less exceptional than their parents, and more like their more distantancestors. Francis Galton studied this phenomenon, and applied the slightly misleading term "regression towardsmediocrity" to it (parents of exceptional individuals also tend on average to be less exceptional than their children). For Galton, regression had only this biological meaning, but his work (1877, 1885) was extended by Karl Pearson and UdnyYule to a more general statistical context (1897, 1903). In the work of Pearson and Yule, the joint distribution of thedependent and independent variables is assumed to be Gaussian. This assumption was weakened by R.A. Fisher in hisworks of 1922 and 1925. Fisher assumed that the conditional distribution of the dependent variable is Gaussian, but theoint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's formulation of 1821.
Justification for regression
The theoretical problem is that given two random variables
X
and
, what is the estimator of
that is the best, or inanother words the estimator that minimizes the mean square error.1. If we estimate
by a constant, it can be shown that
=
E
(
) (population mean) is the best unbiased estimator withMSE var(
) = E[(

E
(
))
2
].2. If we estimate
with a linear (technically affine) predictor of the form
=
aX
+
b
, it can be shown that if then the MSE
E
[(

aX

b
)
2
] is minimized.3. Finally, what is the best general function
f
(
X
) that estimates
? It is
=
f
(
X
) =
E
[
|
X
].
Note:

E
[
|
X
] is a function of
X
. This can be proven with the inequalityThus, the regression estimates the conditional mean of
given
X
because it minimizes the MSE. Other ways of testing for relationship does not use the conditional mean, i.e. correlation.
Statement of the linear regression model
A linear regression model is typically stated in the formThe right hand side may take other forms, but generally comprises a linear combination of the parameters, here denoted
α
and
β
. The term
ε
represents the unpredicted or unexplained variation in the dependent variable; it is conventionally calledthe "error" whether it is really a measurement error or not. The error term is conventionally assumed to have expectedvalue equal to zero, as a nonzero expected value could be absorbed into
α
. See also errors and residuals in statistics; thedifference between an error and a residual is also dealt with below. It is also assumed that is independent o
x
.An equivalent formulation which explicitly shows the linear regression as a model of conditional expectation is

Linear regression - Wikipedia, the free encyclopediahttp://en.wikipedia.org/wiki/Linear_regression3 of 89/10/2005 2:15 PM
with the conditional distribution of
y
given
x
essentially the same as the distribution of the error term.A linear regression model need not be affine, let alone linear, in the independent variables
x
. For example,is a linear regression model, for the right-hand side is a linear combination of the parameters
α
,
β
, and
γ
. In this case it isuseful to think of
x
2
as a new independent variable, formed by modifying the original variable
x
. Indeed, any linear combination of functions
f
(
x
),
g
(
x
),
h
(
x
), ..., is linear regression model, so long as these functions do not have any free parameters (otherwise the model is generally a nonlinear regression model). The least-squares estimates of
α
,
β
, and
γ
arelinear in the response variable
y
, and nonlinear in
x
(they are nonlinear in
x
even if the
γ
and
α
terms are absent; if only
β
were present then doubling all observed
x
values would multiply the least-squares estimate of
β
by 1/2).
Parameter estimation
Often in linear regression problems statisticians rely on the Gauss-Markov assumptions:The random errors
ε
i
have expected value 0.The random errors
ε
i
are uncorrelated (this is weaker than an assumption of probabilistic independence).The random errors
ε
i
are "homoscedastic", i.e., they all have the same variance.(See also Gauss-Markov theorem. That result says that under the assumptions above, least-squares estimators are in acertain sense optimal.)Sometimes stronger assumptions are relied on:The random errors
ε
i
have expected value 0.They are independent.They are normally distributed.They all have the same variance.If
x
i
is a vector we can take the product
β
x
i
to be a scalar product (see "dot product").A statistician will usually
estimate
the unobservable values of the parameters
α
and
β
by the
method of least squares
,which consists of finding the values of
a
and
b
that minimize the sum of squares of the
residuals
Those values of
a
and
b
are the "least-squares estimates." The residuals may be regarded as estimates of the errors; seealso errors and residuals in statistics. Notice that, whereas the errors are independent, the residuals cannot be independent because the use of least-squaresestimates implies that the sum of the residuals must be 0, and the scalar product of the vector of residuals with the vector of
x
-values must be 0, i.e., we must haveandThese two linear constraints imply that the vector of residuals must lie within a certain (
n

2)-dimensional subspace of
n
; hence we say that there are "
n

2 degrees of freedom for error". If one assumes the errors are normally distributed