You are on page 1of 6

SUBMITTED BY: AVANI MISHRA [2016AAPS0282U]

Question: In excel file column 1 provides the name of US states for which data is collected; column 2
provides data on Gross Domestic Product (GDP) of respective states; column 3 provides the data on
Total Population, and Column 4 provides data on total Land Area (Area) of these respective states.

(Hint: You may use log transformation on these variables to reduce their scale as they are given in
large numbers)

Estimate the impact of Population and Land Area on GDP.

1. Are the coefficients associated with Population and land Area significant? (Use t-test)

2. Interpret these coefficients.

3. Which of the two variables is more important in determining GDP.

4. Determine the value of r-square when all the three variables are included in model.

5. What will be r-square if you drop land Area variable from the equation?

6. Estimate the standard error of these coefficients in this model.

6. Examine the various properties of Error term obtained from this model when estimated with all
the variables.

7. Can you check, whether there is a problem of multicolinearity.

8. Plot the predicted and actual values GDP over time. What do you infer?

9. Based on evidence from date, what will be your advice to the Central Government?
Null Hypothesis:

 H0 : β1=0
 H0 : β2=0
 H0 : β3=0

Question 1. Are the coefficients associated with Population and land Area significant?
(Use t-test)
Answer: The values of p obtained after performing regression analysis are 1.927x10 -34 and
0.9598694 respectively. According to significance level p, if p < 0.05, then the values of the
coefficients are considered to be significant. Here, the value of p for population is less than
0.05 whereas value of p for land area is more than 0.05.
The values of t obtained after performing regression analysis are 33.47558 and 0.05058
respectively. According to the t-test, if t > 2, then the values of the coefficients are considered
to be significant. Here, the value of t for population is more than 2 whereas value of t for land
area is less than 2.

Hence, the coefficient associated with population is statistically significant and we reject the
null hypothesis (H0 : β2=0) whereas coefficient associated with land area is not statistically
significant and we accept the null hypothesis (H0 : β3=0).
Coefficients Standard Error t Stat P-value Lower 95%Upper 95%
Intercept (β1) -1.260732632 0.209994771 -6.003638223 2.66E-07 -1.68319 -0.83828
log(Population) (β2) 0.98564387 0.029443672 33.47557603 1.927E-34 0.926411 1.044877
log(Area) (β3) 0.001314975 0.025994599 0.050586484 0.9598694 -0.05098 0.053609

log(Population) Line Fit Plot


8
6
log(GDP)

4
log(GDP)
2
0 Predicted log(GDP)
0 5 10
log(Population)

log(Area) Line Fit Plot


8
6
log(GDP)

4
log(GDP)
2
0 Predicted log(GDP)
0 5 10
log(Area)
Question 2. Interpret these coefficients.
Answer: Since the coefficient associated with population is statistically significant, the null
hypothesis is rejected.
Since the coefficient associated with land area is not statistically significant, the null
hypothesis is accepted. 
The results suggest that the coefficient associated with population is 0.985644 implying that
the population has a positive and significant impact on GDP. Hence, an increase in
population will increase the GDP. 0The coefficient associated with land area is 0.001315
which is very close to zero implying that land area has little to no impact on the GDP.

Question 3. Which of the two variables is more important in determining GDP.


Answer: Since the magnitude of the coefficient associated with population (β2) is much
greater than the magnitude of the coefficient associated with land area (β3), the population has
a stronger impact on GDP as compared to the effect of land area. Therefore, the population is
more important in determining the GDP.

Question 4. Determine the value of r-square when all the three variable are included in
model.
Answer: The value or r-square is 0.960831. This implies that 96% of the total variation in
GDP is due to changes in population and land area.
Regression Statistics
Multiple R 0.980219778
R Square 0.960830813
Adjusted R Square 0.959164039
Standard Error 0.090032307
Observations 50

Question 5. What will be r-square if you drop land Area variable from the equation?
Answer: When we drop land area from the equation and only population is included as an
independent variable in the equation, then the value of r-square reduces by a very negligible
amount and falls to 0.960829.
This shows that land area has a negligible effect on the GDP as compared to the effect of
population. This is in line with what is inferred from the answer to question 3.
Regression Statistics
Multiple R 0.980219
R Square 0.960829
Adjusted R Square 0.960013
Standard Error 0.089092
Observations 50
Question 6. Estimate the standard error of these coefficients in this model.
Answer: The standard error for the coefficient associated with population is 0.029444 and the
standard error for the coefficient associated with land area is 0.025995

Question 6. Examine the various properties of Error term obtained from this model
when estimated with all the variables.
Answer: Properties of error terms:
 Mean: The average value of the error term is -3.535x10-15. This is very close to zero.
The value is not exacly equal to zero because the sample size is small, however, the
property that mean value of the error term is zero remains satisfied.

 Skewness: The skewness of the error term is 0.120840815. The skewness measures
the degree of symmetry of the distribution around the mean value. Here, the positive
value of the skewness implies that the right-side tail is thicker than the left-side tail.

 Variance: The value of the variance of the error term is 0.007774967. The variance is
very less and this shows that the error term hovers closely around its mean value of
zero. The standard deviation of the error term is 0.088175771.

 Histogram Plot: The histogram plot of the values of the error term:

Histogram Plot of Error Terms


0.25
0.2
0.15
0.1
0.05
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
-0.05
-0.1
-0.15
-0.2
-0.25

It does not exactly reflect the normal distribution, however, the error term more or
less fluctuates around zero. This implies that the mean value of the error term is
almost zero and we can conclude that the regression analysis is reasonably good.

The data for residuals along with the properties is as follows:


Observation Predicted log(GDP) Residuals log(GDP)
1 5.298100725 -0.121055067 5.177045657
2 4.460850448 0.189476506 4.650326954
3 5.359755397 -0.003542088 5.35621331
4 5.080276972 -0.118874864 4.961402108
5 6.167842991 0.071546591 6.239389583
6 5.28422903 0.085558087 5.369787117
7 5.182544666 0.12148461 5.304029276
8 4.553102231 0.204999333 4.758101564
9 5.845761631 -0.025500386 5.820261245
10 5.55939019 0.003864196 5.563254386
11 4.740280355 0.022975385 4.763255739
12 4.769914659 -0.05841956 4.7114951
13 5.737768279 0.027224844 5.764993123
14 5.431810513 -0.049914225 5.381896288
15 5.119005112 -0.009774275 5.109230837
16 5.08292485 -0.028438196 5.054486654
17 5.257051265 -0.107013616 5.150037648
18 5.300166045 0.013441786 5.313607831
19 4.763045646 -0.111564104 4.651481542
20 5.372016691 0.050200409 5.422217099
21 5.449469237 0.09282949 5.542298728
22 5.642397263 -0.114217433 5.528179831
23 5.341592186 0.047417879 5.389010065
24 5.106794862 -0.181111561 4.925683301
25 5.396579923 -0.061927166 5.334652757
26 4.615868148 -0.11094895 4.504919198
27 4.889525118 0.012961867 4.902486985
28 4.956097845 0.095066259 5.051164103
29 4.748972612 0.003628921 4.752601533
30 5.569992083 0.060196712 5.630188795
31 4.915930792 -0.067760156 4.848170636
32 5.919169662 0.087873629 6.007043292
33 5.55209404 0.033470457 5.585564497
34 4.469938583 0.064874127 4.534812711
35 5.69916798 -0.077077318 5.622090662
36 5.189680343 -0.062102616 5.127577727
37 5.186222584 0.083822395 5.270044979
38 5.732852091 -0.033497472 5.699354619
39 4.67738551 -0.037271938 4.640113572
40 5.253737907 -0.097558396 5.156179511
41 4.539167256 -0.002066285 4.537100972
42 5.403431188 -0.034220898 5.369210289
43 5.960473204 0.100189892 6.060663095
44 5.003524514 0.03122022 5.034744734
45 4.445955449 -0.08483227 4.361123179
46 5.496934007 0.077961515 5.574895522
47 5.418845639 0.073783464 5.492629103
48 4.912482238 -0.166120531 4.746361707
49 5.378366103 -0.032520102 5.345846002
50 4.357628328 0.141260899 4.498889227

Average value of
the error term -3.53495E-15
Skewness 0.120840815
Variance 0.007774967
Standard Deviation 0.088175771
Question 7. Can you check, whether there is a problem of multicolinearity.
Answer: The correlation coefficient between population and land area is equal to
0.165932339. This indicates a low correlation between the inputs population and land area.
Hence, there is no problem of multicollinearity in this dataset.

Correlation
0.165932339
Coefficient (r)

Question 8. Plot the predicted and actual values of GDP over time. What do you infer?

Plot of Actual GDP, Predicted GDP, and Residual Term


7

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
-1

Predicted log(GDP) Residuals log(GDP)

Answer: The plot suggests that the predicted values from the regression model very closely
follow the actual values of GDP. The residual term plot shows that the deviation between the
actual GDP and predicted GDP is very less. This implies that the regression model is
predicting the GDP fairly accurately. Therefore, this regression model can be used for
predicting the future values of the GDP.

Question 9. Based on evidence from date, what will be your advice to the Central
Government?
Answer: Based on the evidence from the data, the results show that population is important
for having a high GDP, and land area not so much.
In the light of the analysis performed, I would advise the Central Government to invest more
in the states that have a higher population to support and increase the GDP.

You might also like