You are on page 1of 9

Variable Transformation

Variable Transformation
Refer to the transformation It also help:
of independent/response
variables to improve or • Make coefficient more
address the following issues: interpretable
• Meet Model Assumptions
• Linearity of the model • Improve generalization
• Heteroscedasticity of the and predictive power
Model • Put predictors in the
• Normality of the error common scale
terms
Logarithmic Transformation
Use if there is a non linear relationship between dependent and independent variable

Suppose a linear model:

Linear- log Transformation:

A log transformation is used to reduce the effect of outliers and to stabilize the
variance when the data is highly skewed

Parametric model that values are


approximately normally distributed
after a log transformation.
Baldi and Long (2001) and
Ibrahim (2005)et al.
Logarithmic Transformation
how to interpret a linear regression model with logarithmic transformations:
Transformation Model Interpretation
No transformation A one(1) unit increase in is associated with an
average change of units in .
Log transformation A 1% increase in is associated with an average
predictor change of units in Y
Log-transformed A 1 unit increase in is associated with an average
outcome change of in
Log-log model A 1% increase in is associated with an average
change of in
Square root transformation
• Used to address the issue of heteroscedasticity, where the
variance of the residuals increases as the predictor variable
increases.
• Normalizing a skewed distribution
• Stabilizes variances (Baguley, 2012),
Square root transformation

No clear interpretation of the estimates/coefficient coming form


a squared independent variable. But can be interpreted at the
data level.

Recall geometric mean: accounting for growth and fluctuation of


the data.
Centering

where
Centering is basically a technique where mean of independent
variables is subtracted from all the values. It means all independent
variables have zero mean.
Standardization or Normalization
• Standardization or Normalization is
the process of putting different
variables on the same scale
• Assumably Reduce multicollinearity
of the model where
• If variables are in the higher order,
its tends to have higher interaction
correlation with other variable
• The linear model coefficients can be
interpreted as the change in the
response (i.e. dependent variable)
for a 1 standard deviation increase in where
the predictor (i.e. independent
variable)
Comparison
Transformation Improve Meet Model Improve Model Produce
Coefficient Assumption generalization comparable
Interpretability predictors
Logarithmic X X
Square root X X
Standardization or X X
Normalization
Centering X

You might also like