This action might not be possible to undo. Are you sure you want to continue?
Dummy variables are independent variables which take the value of either 0 or 1. Just as a "dummy" is a stand-in for a real person, in quantitative analysis, a dummy variable is a numeric stand-in for a qualitative fact or a logical proposition. For example, a model to estimate demand for electricity in a geographical area might include the average temperature, the average number of daylight hours, the total number of structure square feet, numbers of businesses, numbers of residences, and so forth. It might be more useful, however, if the model could produce appropriate results for each month or each season. Using the number of the month, such as 12 for December, would be silly, because that implies that the demand for electricity is going to be very different between December and January, which is month 1. It also implies that Winter occurs during the same months everywhere, which would preclude the use of the model for the opposite polar hemisphere. Thus, another way to represent qualitative concepts such as season, male or female, smoker or non-smoker, etc., is required for many models to make sense. In a regression model, a dummy variable with a value of 0 will cause its coefficient to disappear from the equation. Conversely, the value of 1 causes the coefficient to function as a supplemental intercept, because of the identity property of multiplication by 1. This type of specification in a linear regression model is useful to define subsets of observations that have different intercepts and/or slopes without the creation of separate models. In logistic regression models, encoding all of the independent variables as dummy variables allows easy interpretation and calculation of the odds ratios, and increases the stability and significance of the coefficients. Examples of these results are in Section 3. In addition to the direct benefits to statistical analysis, representing information in the form of dummy variables is makes it easier to turn the model into a decision tool. Consider a risk manager who needs to assign credit limits to businesses. The age of the business is almost always significant in assessing risk. If the risk manager has to assign a different credit limit for each year in business, it becomes extremely complicated and difficult to use because some businesses are several hundred years old. Bivariate analysis of the relationship between age of business and default usually yields a small number of groups that are far more statistically significant than each year evaluated separately.
Describing qualitative data Far from all of the data of interest to econometricians is quantitative. a dummy variable (also known as indicator variable or just dummy) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. D2=1 if and only if autumn. Dummy variables may be extended to more complex cases. firms or countries) or periods in a pooled time-series. Too many dummy variables result in a model that does not provide any general conclusions. This is referred to as the dummy variable trap. otherwise equals zero. The addition of dummy variables always increases model fit (coefficient of determination). dummy variables may be used to indicate the occurrence of wars. but at a cost of fewer degrees of freedom and loss of generality of the model. gender of individuals.g. which is identical to and hence perfectly correlated with the vector-of-ones variable whose coefficient is the constant term. this would result in perfect multicollinearity. For example. It could thus be thought of as a truth value represented as a numerical value 0 or 1 (as is sometimes done in computer programming). so that the matrix inversion in the estimation algorithm would be impossible. seasonal effects may be captured by creating dummy variables for each of the seasons: D1=1 if the observation is for summer. In the panel data fixed effects estimator dummies are created for each of the units in cross-sectional data (e. and D4=1 if and only if spring. For example. if the vector-ofones variable were also present. in econometric time series analysis. otherwise equals zero. To include them in regression. whether they are married. or major strikes. otherwise equals zero. the industry of firms.In regression analysis. we go for dummy variable. for the following reason: If dummy variables for all categories were included. their sum would equal 1 for all observations. For instance. countries or regions are all considered to be qualitative. the information can be described as being true or false or the character 2 . D3=1 if and only if winter. and equals zero otherwise. In many cases. However in such regressions either the constant term has to be removed or one of the dummies removed making this the base category against which the others are assessed.
Dummy variables are 'discrete' and 'qualitative' (e. A dummy variable for several ranges allows you to distinguish the effects of what you might see as “thresholds”. Normally 1 is assigned to the presence of some characteristic or attribute. in the labour force or not. etc. It does not matter to the result. while if rather we define female we would likely do the opposite. EXAMPLE2: A regression model of labour market discrimination by gender. high school graduates. renting or owning your home). green or red. More complex for Bowie. Gi = 1 if ith person is a male 0 if ith person is a female. Units of measurement are ‘meaningless’. male is usually set to 1 when the individual is male and 0 when female. even if the variable is not binary. 3 .g. but it does to their interpretation! Describing categories or ranges Dummy variables are also useful to describe categories. Example1: in the Mincer equation. Indeed. we often use dummy variables for high school dropouts. In those cases. male or female. 0 for the absence of that characteristic or attribute. it is easy to set up a binary variable or dummy variable taking values 0 and 1. taking 1 whenever an individual has eyes of this colour. This technique can also be useful for quantitative data which you do not believe should be considered as one continuous variable. Those are clearer than a gender variable. Notice that summing all variables in a complete set should give you 1 for all observations. For instance. if it takes a finite number of values then it can be described by a complete set of dummy variables For instance. we can have four dummy variables for each of these colour. if eyes colour can be brown. Y i = β0 + β1 S i + β2 G i + εi where Yi = annual earnings Si = years of education..present or absent. working under a collective or individual employment contract. blue.
. Only the nature of the independent variables has changed. G i = 1 ) = β0 + β1 S i + β2 = ( β0 + β2 ) + β1 S i Since E( εi | Si. H0: β2=0) is a test for a difference in the intercept terms. Testing for discrimination (i.e. The expected salary of a female is: E ( Y i | S i . Gi)=0.No special estimation issues as long as the regression meets the all the classical assumptions. Intercept shift Men: wage = (β0 + β2) β1 Si Slope = β2 β0+ β1 Women: wage = β0 + β1 Si Dummy Variable Trap: Suppose we estimate the following: Y i = β1 + β2 S i + β3 F i + β3 M i + εi where Fi = 1 if ith person is female 0 if ith person is male Mi = 1 if ith person is male 4 . G i = 0 ) = β0 + β1 S i The expected salary of a male is: E ( Y i | S i .
Suppose the sample looks like this: Constant 1 1 1 1 1 1 1 Fi 1 0 1 0 1 1 0 Mi 0 1 0 1 0 0 1 The problem is that the two dummies are a linear function of the constant (i. We're including redundant information in the regression. then use 'm-1' dummies. G i = 1 ) = ( β0 + β2 ) + ( β1 + β3 ) S i We now have both a 'composite' intercept term and slope coefficient for male..0 if ith person is female This is known as the 'Dummy Variable Trap'. The solution is simple -. Slope dummy variables: We could allow for differences in these returns by adding an 'interacted' variable: Y i = β0 + β1 S i + β2 Gi + β3 Gi • S i + εi This is a more 'flexible' specification. 5 . The expected salary of female is: E ( Y i | S i . Perfect multicollinearity.drop a dummy variable or the constant term. G i = 0 ) = β0 + β1 S i The expected salary of male is: E ( Y i | S i . Fi+Mi = 1). Rule of Thumb: If you have 'm' categories.e. Estimated coefficients and their standard errors can’t be computed. Violates Assumption (6).
However. Algan. Each dummy coefficient will then be interpreted as the intercept for this specific group. There are many methods to create dummy variables from qualitative data. R will automatically remove the last dummy variable if you provide a complete set. Cahuc and Giuliano (2009) Dummy variables in R By default. Another (more common) possibility is to drop one variable in the set. then male regression line has a higher intercept. Using a set of dummy variables What happens if we use a complete set of dummy variables? The four dummies sum to one. This will be the baseline and the other dummy coefficients will read directly as the difference from this baseline. hence we have perfect co linearity. It is as if we had a single variable always equal to one (like for the intercept). Example from Alesina.If β2>0. One possible way out is then to drop the intercept. Fixed effects Dummy variables are also frequently used as fixed effects. Typically. you are well-advised to do it yourself as this will help with the interpretation. we might add time-fixed effect to our regression to capture structural changes 6 . and also because other software may not be as kind. The regression will not be able to identify properly the coefficients.
you must have several observations for each individual or you will not have degrees of freedom! Dummy Dependent Variables Models In this chapter we introduce models that are designed to deal with situations in which our dependent variable is a dummy variable. That is. our model is pi = β1 + β 2 xi and the estimated slope coefficients would tell us the impact of a unit change in that explanatory variable on the probability that yi = 1 ˆ c) The predicted values from the regression model pi = b1 + b2 xi would provide predictions. it is also useful to define a set of individual-fixed effects to capture all unobserved individual characteristics. 1. this could be a dummy variable for each year or each period (minus one). which is usually not a problem with modern computers. Linear Probability Model In the case of dummy dependent variable model we have: y i = β1 + β 2 xi + ε i where y i = 0 or 1 and E (ε i ) = 0 . In many cases. however. a) Since the mean error is zero. However. For instance. What would happen if we simply estimated the slope coefficients of this model using OLS? What would the coefficients mean? Would they be unbiased? Are they efficient? A regression model in the situation where the dependent variable takes on the two values 0 or 1 is called a linear probability model.underlying our regression. Therefore. pi = prob( yi = 1) and 1 − pi = prob( yi = 0) . it assumes either the value 0 or the value 1. nothing in the estimation strategy that would 7 . Such models are very useful in that they allow us to address questions for which there is a “yes or no” answer. This might lead to a potentially large number of dummy variables. based on some chosen values for the explanatory variables. we know that E ( yi ) = β1 + β 2 xi . To see its properties note the following. for the probability that yi = 1 . b) Now. There is. if we define then E ( yi ) = 1 ⋅ pi + 0 ⋅ (1 − pi ) .
however. For low level of income it is likely that all of the observations are zeros. Two models that are nonlinear. there would be some scatter around the line. To motivate these models. Suppose that the dependent variable takes the value 1 if the individual buys a Rolex watch and 0 other wise. suppose the explanatory variable is income. It 8 . We might simply use OLS and then use the White correction to produce correct standard errors. For example. suppose that our underlying dummy dependent variable depends on an unobserved (“latent”) utility index y * . Logit and Probit Models One potential criticism of the linear probability model (beyond those mentioned above) is that the model assumes that the probability that yi = 1 is linearly related to the explanatory variable(s). For example. The errors. It could. That is. yet provide predicted probabilities between 0 and 1. First.constrain the resulting predictions from being negative or larger than 1-clearly an unfortunate characteristic of the approach. have a nonzero effect on other income groups. taking on the values 0 and 1 if someone buys a car. Also. increasing the income of the very poor or the very rich will probably have little effect on whether they buy an automobile. For higher levels of income there would be some zeros and some ones. 2. A simple way to see this is to consider an example. d) Since E (ε i ) = 0 and uncorrelated with the explanatory variables (by assumption). however. the errors would be heteroscedastic. are heteroscedastic. The difference between the linear probability model and the nonlinear logit and probit models can be explained using an example. expect the relation to be nonlinear. if the variable y is discrete. We might. however. Thus. we know that the OLS estimators are unbiased but would yield the incorrect standard errors. In this case. it is easy to show that the OLS estimators are unbiased. This suggests two empirical strategies. are the logit and probit models. then we can imagine a continuous variable y * that reflects a person’s desire to buy the car. there would be no scatter around the line.
the utility index is not “high enough”) Then: pi = prob( yi = 1) = prob( yi* ≥ 0) = prob( β1 + β 2 xi + ε i ≥ 0) = prob(ε i ≥ − β1 − β 2 xi ) = 1 − F ( − β1 − β 2 xi ) where is the c.. As a practical matter. f . That is. More formally...e. yn ) = f ( y1 ) f ( y2 )...e. recognize that each outcome yi has the density function f ( yi ) = piyi (1 − pi )1− yi ....... y2 . we are likely interested in estimating the β ’s in the model.[ pn n (1 − pn )1− yn ] = ∏p i =1 n n yi i (1 − pi )1− yi and ln L = ∑ y ln p i i =1 n i + (1 − yi ) ln(1 − pi ) which.. To outline the MLE in this context. f ( yn ) y y = [ p1y1 (1 − p1 )1− y1 ][ p2 2 (1 − p2 )1− y2 ].. our basic problem is selecting F – the cumulative density function for the error term.d .seems reasonable that y * would vary continuously with some explanatory variable like income. becomes ln L = ∑ y ln F (β i i =1 1 + β 2 xi ) + (1 − yi ) ln(1 − F ( β1 + β 2 xi )) 9 ... for ε = F ( β1 + β 2 xi ) if F is symmetric Given this. It is here where the logit and probit models differ. suppose y* = β1 + β 2 xi + ε i and yi = 1 if yi = 0 if y* ≥ 0 (i. the utility index is “high enough”) y* < 0 (i. given pi = F ( β1 + β 2 xi ) . each yi takes on either the value of 0 or 1 with probability f (0) = (1 − pi ) and f (1) = pi . This is typically done using a Maximum Likelihood Estimator (MLE). Then the likelihood function is: L = f ( y1 .
or odds.Analytically. A complication arises in interpreting the estimated β ’s.1. In practice. unlike the linear probability model. the next step would be to take the partial derivatives of the likelihood function with respect to the β ’s. a b measures the ceteris paribus effect of a change in the explanatory variable on the probability y equals 1. set them equal to zero. p( yi = 1) → 1 as β1 + β 2 xi → ∞ . and solve for the MLEs. we model the ratio pi . Odds Ratio p( yi = 1) = F ( β1 + β 2 xi ) = 1 1+ e − ( β1 + β 2 xi ) For ease of exposition. It is common to evaluate the derivative at the mean of x so that a single derivative can be presented. we write above equation as pi = 1 ez = where 1 + e−z 1 + e z z = β1 + β 2 xi . 2. To avoid the possibility that the predicted values might be outside the probability interval of 0 to 1. the computer will solve this problem for us. Thus. In the case of a linear probability model. This 1 − pi ratio is the likelihood. Logit Model For the logit model we specify p( yi = 1) = F ( β1 + β 2 xi ) = 1 1+ e − ( β1 + β 2 xi ) It can be seen that p( yi = 1) → 0 as β1 + β 2 xi → −∞ . of obtaining a successful outcome (the 10 . Similarly. In the logit model we can see that ∂prob( yi = 1) ∂F (b1 + b2 xi ) = b2 ∂xi ∂xi = b2 e − ( β1 + β 2 xi ) [1 + e − ( β1 + β 2 xi ) ]2 Notice that the derivative is nonlinear and depends on the value of x. probabilities from the logit will be between 0 and 1. This could be a very messy calculation depending on the functional form of F.
OLS and MLE are not necessarily different. be careful to recall that they represent the impact of a one unit increase in the independent variable in question. we assume that the ε i ~ N (0. L. Logit model cannot be estimated using OLS. Probit Model In the case of the probit model. But we can always compute the probability as certain level of variable in question. β + β 2 xi p( yi = 1) = F 1 σ where F is the standard normal cumulative density function. holding the other explanatory variables constant. Instead. When interpreting coefficients. and hence the name logit model. we assume the error in the utility index model is normally distributed. we use MLE that discussed previous section. but also linear in the parameters. is not only linear in x. σ 2 ) . Interestingly. an iterative estimation technique that is especially useful for equations that are nonlinear in the coefficients. hypothesis testing and econometric analysis can be undertaken in much the same way as for linear equations. L is called the logit. That is 1 11 . MLE is inherently different from least squares in that it chooses coefficient estimates that maximize the likelihood of the sample data set being observed.2. That is. pi 1 + e zi = = e zi 1 − pi 1 + e − zi If we take the natural log of above equation. In this case.ration of the probability that a family will own a car to the probability that it will not own a car)1. MLE are identical to the OLS. for a linear equation that meets the classical assumptions (including the normality assumption). on the log of the odds of a given choice. 2. not on the probability itself. however. we obtain p L = ln i 1− p i = zi = β1 + β 2 xi that is. the log of the odds ration. Once the logit has been estimated.
the derivative is nonlinear and is often evaluated at the mean of the explanatory variables.f. from an empirical standpoint. That is. we set up the index function 12 .625 makes the logit estimates comparable to the probit estimates. ∂prob( yi = 1) = ∂xi ∂F ( β1 + β 2 xi ) β + β 2 xi σ = f 1 β 2 ∂xi σ where f is the density function of the normal distribution. logits and probits typically yield very similar estimates of the relevant derivatives. it is common to estimate the derivative as the probability yi = 1 when the dummy variable is 1 (other variables set to their mean) minus the probability yi = 1 when the dummy variable is 0 (other variables set to their mean). A simple approximation suggests that multiplying the logit estimates by 0. Example: We estimate the relationship between the openness of a country Y and a country’s per capita income in dollars X in 1992. you simply calculate how the predicted probability changes when the dummy variable of interest switches from 0 to 1. In the case of dummy explanatory variables. Thus.d. As in the logit case. the c. 0 otherwise. Which Is Better? Logit or Probit Fortunately. of the logit and the probit look quite similar to one another. β + β 2 xi p( yi = 1) = F 1 = σ ∫ β1 + β 2 xi σ −∞ 1 2π e − t2 2 dt In practice. differing slightly only in the tails of their respective distributions. and test this at the 5% significance level. it is important to remember the parameter estimates associated with logit and probit models are not. We hypothesize that higher per capita income should be associated with free trade. The variable Y takes the value of 1 for free trade. In this case. calculating the derivative is moderately complicated . Since the dependent variable is a binary variable. This is because the cumulative distribution functions for the logit and probit are similar. While the derivatives are usually similar. the derivatives are different only if there are enough observations in the tail of the distribution. Once again.
0156 1. of regression Sum squared resid Log likelihood 0 var 0.41804 4 8 0.9964 Restr.8864 71 0. dependent 0.8629 likelihood McFadden R- 6 squared 0.0.E. z. log likelihood LR statistic (1 df) Probability(LR Avg. Y = 0 (not open) Probit estimation gives the following results: Dependent Variable: Y Method: ML .Binary Probit (Quadratic hill climbing) Date: 05/27/04 Time: 13:54 Sample(adjusted): 1 20 Included observations: 20 after adjusting endpoints Convergence achieved after 7 iterations Covariance matrix computed using second derivatives Variable C Coeffici Std. b2 is the effect of X on Y * .82470 .3432 36 0.0.Y * = β1 + β 2 X i If Y * ≥ 0.50000 1 8 S.99418 8 2. ent Error Statistic . log 13.00100 0.86471 criter.9860 45 0.12948 0. The marginal effect of X on p(Yi = 1) is easier to interpret and is given by f (b1 + b2 X ) ⋅ b2 .0332 3 0.D. The interpretation of the b2 changes in a probit model.04763 Schwarz 6 criterion Hannan-Quinn 6. 3 4 13.00018 stat) 3 Slope is significant at the 5% level.5129 89 0.9059 09 0.Prob.5048 16 X Mean dependent var S. Y = 1 (open). if Y * < 0.33728 Akaike info 0 criterion 2.00047 2. 13 .
1929 Restr.99541 0.33374 Akaike info 5 criterion 2.5129 89 0.D. the maximized loglikelihood value (lnL) can be compared to the maximized log likelihood in a model with only a constant ln L0 in the likelihood ratio index LRI = 1 − ln L − 6.76646 criter.14446 5 7 0.0.8629 Logit estimation gives the following results: Dependent Variable: Y Method: ML .60499 8 2.5119 03 14 X Mean dependent var S.0320 3.8960 84 0.00016 .001) = 0.Binary Logit (Quadratic hill climbing) Date: 05/27/04 Time: 14:12 Sample(adjusted): 1 20 Included observations: 20 after adjusting endpoints Convergence achieved after 7 iterations Covariance matrix computed using second derivatives Variable C Coeffici Std. log likelihood LR statistic (1 df) Probability(LR Avg.f (−1.5))(0.0001 To test the fit of the model (analogous to R-squared).0460 6 0. of regression Sum squared resid Log likelihood 0 var 0.00179 0. log 13.1.E. dependent 0. 5 4 14.50000 0 5 S.8647 =1− = 0.68106 .001(3469.00493 Schwarz 9 criterion Hannan-Quinn 6. ent Error Statistic .00090 1.3383 23 0.8629 likelihood McFadden R- 6 squared 0.50 ln L0 − 13.9762 20 0.8766 47 0.Prob.9942 + 0. z.
8629 15 .5) ) 2 λ (b1 + b2 X ).0001 (1 + e −3.5) (0.0018) = 0. but the marginal effects and significance should be similar.605+ 0.0018(3469.7664 =1− = 0. the slop coefficient is significant at the 5% level.stat) 5 As you can see from the output.51 ln L0 − 13. The coefficients are proportionally higher in absolute value than in the probit model.0018(3469.605+ 0. LRI = 1 − ln L − 6. ∂prob( yi = 1) ∂F (b1 + b2 xi ) = b2 ∂xi ∂xi = b2 e − ( β1 + β 2 xi ) [1 + e − ( β1 + β 2 xi ) ]2 e −3.b2 = This can be interpreted as the marginal effect of GDP per capita on the expected value of Y.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.