You are on page 1of 10

Solution

1.

Here, we see that the mean total expenditure spent on food is 18.78 with a standard deviation of
11.37. The median total expenditure spent on food is 16.822. Here, we see that the mean total
expenditure spent on food is greater than the median total expenditure spent on food. Therefore,
the distribution of total expenditure spent on food is skewed right and thus, the assumption of
normality is violated

Here, we see that the mean total expenditure spent on transportation is 13.71 with a standard
deviation of 13.38. The median total expenditure spent on transportation is 10.54. Here, we see
that the mean total expenditure spent on transportation is greater than the median total
expenditure spent on transportation. Therefore, the distribution of total expenditure spent on
transportation is skewed right and thus, the assumption of normality is violated

2.

Model – 1 (Dependent variable – Food)


The regression equation is

Food = 38.515 – 0.211 * Sex – 0.0465 * Age – 1.277 * Education – 0.0259 * Family Size

The coefficient of determination is 0.0474, indicating that 4.74% of the variation in the
dependent variable is explained by the regression model and the remaining 95.26% left
unexplained

The 95% confidence interval for the independent variable Age is (– 0.066, – 0.027)

Model – 2 (Dependent Variable – Transportation)

The regression equation is

Food = 19.103 – 0.525 * Sex – 0.0476 * Age – 0.237 * Education – 0.378 * Family Size

The coefficient of determination is 0.0082, indicating that 0.82% of the variation in the
dependent variable is explained by the regression model and the remaining 99.12% left
unexplained. The 95% confidence interval for the independent variable Age is (– 0.071, –
0.0241)

3.
Food
300
200
Residuals
1000
-100

10 20 30 40
Fitted values

The first test on Heteroscedasticity given by imest is the White’s test. Here, we test the null
hypothesis that the variance of the residuals is homogenous. Therefore, since the p-value is very
small, we would have to reject the hypothesis and accept the alternative hypothesis that the
variance is not homogenous.

Transportation
200
150
Residuals
100
50
0

10 12 14 16 18
Fitted values

The first test on Heteroscedasticity given by imest is the White’s test. Here, we test the null
hypothesis that the variance of the residuals is homogenous. Therefore, since the p-value is very
small, we would have to reject the hypothesis and accept the alternative hypothesis that the
variance is not homogenous.

4. Model – 1 (Dependent variable – Food)


The regression equation is

Food = 22.14 – 0.000444 * Total Expense – 0.0357 * Income hours

The coefficient of determination is 0.1072, indicating that 10.72% of the variation in the
dependent variable is explained by the regression model and the remaining 89.28% left
unexplained

Model – 2 (Dependent Variable – Transportation)

The regression equation is

Food = 11.86 – 0.00063 * Total Expense – 0.0435 * Income hours


The coefficient of determination is 0.1081, indicating that 10.81% of the variation in the
dependent variable is explained by the regression model and the remaining 89.19% left
unexplained

5. Food
60
40
Residuals
20 0
-20

-20 -10 0 10 20
Fitted values

The first test on Heteroscedasticity given by imest is the White’s test. Here, we test the null
hypothesis that the variance of the residuals is homogenous. Therefore, since the p-value is very
small, we would have to reject the hypothesis and accept the alternative hypothesis that the
variance is not homogenous.

Transportation
100
50
Residuals
0
-50

10 20 30 40 50 60
Fitted values

The first test on Heteroscedasticity given by imest is the White’s test. Here, we test the null
hypothesis that the variance of the residuals is homogenous. Therefore, since the p-value is very
small, we would have to reject the hypothesis and accept the alternative hypothesis that the
variance is not homogenous.
6.

Heteroscedasticity is violation of the assumption that “Variances among the groups are equal”
and it normally occurs when the variance of the error terms differ across observations.
Heteroscedasticity has serious consequences for the OLS estimator. Although the OLS estimator
remains unbiased, the estimated SE is wrong. Because of this, confidence intervals and
hypotheses tests cannot be relied on. In addition, the OLS estimator is no longer BLUE. If the
form of the Heteroscedasticity is known, it can be corrected (via appropriate transformation of
the data) and the resulting estimator, generalized least squares (GLS), can be shown to be BLUE.
The effects of Heteroscedasticity are:

 OLS is still unbiased and consistent

 The standard errors of the estimates are biased if we have Heteroscedasticity

7.

Multicollinearity is an issue whether we see that there exists a significant relationship among the
independent variables included in the study. In cases of perfect multicollinearity, OLS estimators
are not even defined. An exact linear relationship between two or more (explanatory) variables;
more than one exact linear relationship between two or more explanatory variables. In perfect
collinearity there is an exact linear relationship between two or more variables, whereas in
imperfect collinearity this relationship is not exact but an approximate one

Here, we see that the total expense variable is the main root cause for Heteroscedasticity

The assumption called homogeneity of variances should also be validated before performing the
regression analysis. It is defined as “Variances among the groups are equal” and this assumption
is violated when there is very large variations in the error terms. Heteroscedasticity has serious
influence on the least square estimator.
Question 2

1.

This assumption represents the equality of variances assumption. If observations across


households were potentially correlated, then this assumption is violated and also we can say that
there is an existence of multicollinearity

2.

Model – 1 (Dependent variable – Food)

The coefficient of determination is 0.532, indicating that 5.32% of the variation in the dependent
variable is explained by the regression model and the remaining 94.68% left unexplained

Here, the dummy variables South Region seems to be significant predictor for food expenditure

Food expenditure seems to highest in South region and lowest in Northeast region

3.
The coefficient of determination is 0.5, indicating that 5.3% of the variation in the dependent
variable is explained by the regression model and the remaining 94.7% left unexplained

Here, the dummy variables South Region seems to be significant predictor for food expenditure

Food expenditure seems to highest in South region and lowest in Midwest region

4.
Here, we see that as the income increases, the amount spent on food decreases. Thus, it is said to
be a necessity good if its share decreases with income

5.

The regression output is given below

The coefficient of determination is 0.0499, indicating that 4.99% of the variation in the
dependent variable is explained by the regression model and the remaining 95.01% left
unexplained

Apart from the variables Sex and Family Size, all other independent variables seems to be
significant predictor of Food expenditure

The STATA code is given below

regress food sex_ref age_ref educ_ref fam_size age2, vce(bootstrap, reps(200) seed(1234)) beta
The regression output is given below

The coefficient of determination is 0.0499, indicating that 4.99% of the variation in the
dependent variable is explained by the regression model and the remaining 95.01% left
unexplained

Apart from the variables Sex and Family Size, all other independent variables seems to be
significant predictor of Food expenditure

On comparing this bootstrap with the one obtained via delta method, we see that there seems to
be no difference in the regression findings

You might also like