You are on page 1of 3

Log Linear Regression

The world is not linear. It is a simple statement that everybody is


aware of. However, it entails meaningful consequences to our modeling
approaches. The vast majority of models used in academia and industry
are linear models.
The assumption of the linearity of phenomena under consideration is
highly arbitrary. It is usually necessary for research that encompasses
a small number of observations because it facilitates parameter
estimations. When we have a larger sample of observations, we may
consider non-linear dependencies between dependent and independent
variables. To afford this, we may want to estimate a non-linear model.
These kinds of models require more advanced estimation techniques and
computation power. However, there is one vital alternative to this
approach. Namely, we can approximate non-linear relations with the
mean of linear models on transformed variables.
Log-linear model
Logarithmic transformation is a convenient means of transforming a
highly skewed variable into a more normalized dataset. When modeling
variables with non-linear relationships, the chances of producing errors
may also be skewed negatively. In theory, we want to produce the
smallest error possible when making a prediction, while also taking into
account that we should not be overfitting the model. Overfitting
occurs when there are too many dependent variables in play that it
does not have enough generalization of the dataset to make a valid
prediction. Using the logarithm of one or more variables improves the
fit of the model by transforming the distribution of the features to a
more normally-shaped bell curve.
The vastly utilized model that can be reduced to a linear model is the
log-linear model described by below functional form:
The difference between the log-linear and linear model lies in the fact,
that in the log-linear model the dependent variable is a product, instead
of a sum, of independent variables. This model can be easily
transformed into a linear model by taking a logarithm of each side of
the above equation:

By simply substituting:

where n = 1, 2, …, k, we obtain a purely linear model:

When to use it?


If we are using a log-linear model, we must remember that we are
calculating the logarithms of dependent and independent variables.
Hence, the variables should only have positive values, because the
logarithm of negative value does not exist.
The question that arises is what kind of distribution should we observe
in our variables to consider using a log-linear model. Long story short:
the lognormal distribution.
Lognormal distributions for different means and standard deviations.
Source: https://en.wikipedia.org/wiki/Log-normal_distribution
We get the lognormal distribution when our error term is equal to
exponents of normally distributed errors:

We can observe that if the error term has lognormal distribution then
its logarithm has a normal distribution. We know also that the linear
combination of normally distributed variables has also linear
distribution. Hence if all variables in the log-linear model have lognormal
distributions then:

are normally distributed. Thus we see that in practice we should use a


log-linear model when dependent and independent variables have
lognormal distributions. On the other hand, when those variables are
normal or close to normal, we should rather stay with a simple linear
model.

You might also like