You are on page 1of 10

Econometrics Project - Total household expenses function of disposable total household income between 2001 and 2011 -

Constantin Doina, Group 331

Hypothesis

We are interested in exploring the relationship between households` total expenses and households` total income (both measured in RON) between 2001 and 2011. In a survey for the NIS, all households were asked about their monthly expenses. In addition, data were collected about the monthly income of each household. In order to simplify the process of collecting data, we collected the data particularly on regions (North East, South East, South Walachia, South West Oltenia, West, North West, Centrum and Bucharest Ilfov) and then summarized it. Following data were obtained:

The data

Total household expenses function of disposable total household income between 2001 and 2011
independent x Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Total Income 521,79 658,51 795,09 1085,79 1212,18 1386,32 1570,72 1749,98 1929,25 2108,52 2287,79 dependent y Total Expense 516,52 651,66 781,45 1049,94 1149,33 1304,66 1479,15 1642,07 1804,99 1967,91 2130,83

Legend

A period of 11 years between 2001 and 2011 is selected. Income = total household income; this is the independent variable, also called exogenous variable Expense = total household expense; this is the dependent variable, also called endogenous variable

Source

We`ve compiled the data into one table using two statistics from www.insse.ro: Total household expenses: http://www.insse.ro/cms/files/statistici/Statistica%20teritoriala%202008/rom/24.htm Total household income: http://www.insse.ro/cms/files/statistici/Statistica%20teritoriala%202008/rom/20.htm

The regression model expenses = 0 + 1 * income + expenses = 51.68 + 0,90 * income

Total household expenses function of disposable total household income between 2001 and 2011
1400 1200 1000 800 600 400 200 0 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00

Interpretation

The Y intercept 0 = 51.68 is meaningless in this situation (unless a person has expenses without having a previous income). The slope 1 is 0.90. This means that for every additional unit in the income (in RON), we would expect the expenses to increase by 0.90 on average.

The output
SUMMARY OUTPUT

Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0,999942345 0,999884693 0,999871881 6,122637185 11

ANOVA df Regression Residual Total 1 9 10 SS 2925586,136 337,3801748 2925923,516 Standard Error 4,888490393 0,003253105 Upper 95% 62,74351 0,916154 Lower 95,0% 40,62644 0,901436 Upper 95,0% 62,74351 0,916154 MS 2925586 37,48669 F 78043,34 Significance F 4,91E-19

Coefficients Intercept Income 51,68497469 0,908795135

t Stat 10,57279 279,3624

P-value 2,25E-06 4,91E-19

Lower 95% 40,62644 0,901436

Interpretation Significance of the model (Did the model explain the deviations in the dependent variable?) Significance F shows the goodness of fit of the model. In this case, SigF is lower than .05, meaning that the model fits the data. Sum of squares The SST (Total Sum of Squares) is the total deviations in the dependent variable. The aim of the regression is to explain these deviations (by finding the best betas that can minimize the sum of the squares of these deviations). In our example, SST is equal to 2925923.516.

The SSR (Regression Sum of Squares) is the amount of the SST that could be explained by the model, therefore the explained variation attributable to the linear relationship between x and y (in this case 2925586.136). Determination The coefficient of determination R2 (R2 = ) measures the proportion of the variation in the

dependent variable (expenses) that was explained by variations in the independent variable (income). It reports that 99.988% in the total household income is explained by the variation of the total household expenses. Given the fact that R2 is close to 1, it indicates that the regression line fits the data well. The adjusted R2 measures the proportion of the variance in the dependent variable (expenses) that was explained by variations in the independent variable (income). In this example, the adjusted R2 shows that 99.987% of the variance was explained. For the problem we are analyzing, R2 = .99988 and the Adjusted R2 = .99987. These values are very close, anticipating minimal shrinkage based on this indicator. The standard error measures the dispersion of the dependent variable in the regression line. The magnitude of se is judged relative to the size of the y values. Se = 6.122 K is small relative to the income in the 500K 1400K range. The coefficients For the confidence interval, we look at the columns Lower than 95% and Upper than 95%. Given the fact that they do not contain the 0 value, we can affirm that there is a linear relationship between the coefficients.

Testing

expenses = 51.68 + 0,90 * income H0: 0 = 0 H1: 0 0 tstat = (- 0 + b0) / Sb0 = (51.68 0) / 4.88 = 10.6 ttab = t0.05 = 1.96

10.6

Decision: we reject H0 Therefore, 0 is statistically significant, although it doesn`t have an economic meaning. expenses = 51.68 + .90 * income H0: 1 = 0 H1: 1 0 tstat = (- 1 + b1) / Sb1 = (.90 0) / .003 = 300 ttab = t0.05 = 1.96

300

Decision: we reject H0 Therefore, 1 is statistically significant. It is different from 0, so there is a linear relationship between expenses and income.

Prediction We want to predict the value of the expenses of a household with an income of 800 RON. expenses = 51.68 + .90 * income expenses = 51.68 + .90 * 800 expenses = 51.68 + 720 expenses = 771.68 The predicted amount of expenses for a household witn an income = 800 RON is also 771.68.

We further want to predict the value of the expenses of a household with an income of 2,000 RON. expenses = 51.68 + .90 * income expenses = 51.68 + .90 * 2,000 expenses = 51.68 + 1,800 expenses = 1,851.68 The predicted amount of expenses for a household witn an income = 2,000 RON is also 1,851.68

Therefore, the higher the income, the higher the expenses of a household.

Residual Analysis The difference between the observed value of the dependent variable (y) and the predicted value () is called the residual (e). Each data point has one residual.

Residual = Observed value - Predicted value e=y


RESIDUAL OUTPUT Standard Residuals -1,61234 0,262435 1,238043 1,978905 -0,68491 -1,18893 0,000453 0,000907 0,00136 0,001813 0,002267

Observation 1 2 3 4 5 6 7 8 9 10 11

Predicted Y 525,8851881 650,1356589 774,2588984 1038,445644 1153,308261 1311,565846 1479,144034 1642,0614 1804,978767 1967,896134 2130,813501

Residuals -9,365188051 1,524341124 7,191101619 11,49435594 -3,978261137 -6,905845904 0,002633094 0,005266187 0,007899281 0,010532374 0,013165468

Income Line Fit Plot


3000 Expenses 2000 1000 0 0.00 500.00 1000.00 1500.00 2000.00 2500.00 Income

The chart below displays the residual and independent variable (income) as a residual plot. A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. Residual plots can be used to assess the quality of a regression.

Household Expenses Model Residual Plot

15.00 10.00 Residuals 5.00 0.00 -5.00 0.00 -10.00 -15.00 Income 500.00 1000.00 1500.00 2000.00 2500.00

Does not appear to violate any regression asumptions.

The residual plot shows a fairly random pattern - the first residual is negative, the next three are positive, the next two are negative, and the following residuals are positive. This random pattern indicates that a linear model provides a decent fit to the data.

Checking the error variance and linearity The residual plot indicates a horizontal-band pattern. This suggests that the variance of the residuals is constant. The regression analysis also has an assumption of linearity.

Household Expenses Model Residual Plot

15.00 10.00 Residuals 5.00 0.00 -5.00 0.00 -10.00 -15.00 Income 500.00 1000.00 1500.00

Checking for homoscedasticity The assumption of homoscedasticity is that the residuals are approximately equal for all predicted scores. This residual plot shows data that are fairly homoscedastic.

Checking the process drift As it can be observed from the previous residual plot, the residuals are randomly distributed around zero, meaning that there is no drift in the process.

Checking independence of the error term The random pattern indicates that the error term is independent. The Durbin Watson Statistic

We consider the DurbinWatson statistic to test for autocorrelation. We have n = 11 yearly values of the expenses (y) and of the income (x). There is only one independent variable, so k = 1. H0: successive residuals are not correlated ( = 0 ) H1: autocorrelation is present ( 0 ) The test statistic is:

d=

where

ei = yi i , yi = observed value, i = predicted value of the response variable for individual i


Upper and lower critical values (du and dl) have been tabulated for k = 1 and n = 11. If we choose = .05, then: dU = 1.32 dL = 0.92 Total Income t 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 1 2 3 4 5 6 7 8 9 10 11 xt 521,79 658,51 795,09 1085,79 1212,18 1386,32 1570,72 1749,98 1929,25 2108,52 2287,79 Total Expense yt 516,52 651,66 781,45 1049,94 1149,33 1304,66 1479,15 1642,07 1804,99 1967,91 2130,83 -9,37 1,52 7,19 11,49 -3,98 -6,91 0,0026 0,0053 0,008 0,01 0,013 87,7969 2,3104 51,6961 132,0201 15,8404 47,7481 0,000007 0,00003 0,000064 0,0001 0,00017 118,6 32,15 18,5 239,32 8,58 47,75 0,000007 0,000007 0,000004 0,000009 Residuals

= 337.4124 = 464.9

= 1.377

Testing for positive autocorrelation d > dU we do not reject H0 positive autocorrelation does not exist Inconclusive 0 Reject H0 dL dU Do not reject H0 2

Testing for negative autocorrelation

Reject H0 Inconclusive

Do not reject H0

Inconclusive

Reject H0

dL = .92

dU = 1.32

4-dU = 2.68

4-dL = 3.08

d = 1.377 < 4-dL = 3.08 d = 1.377 < 4-dU = 2.68 Therefore, we do not reject H0 negative autocorrelation does not exist

Given the Durbin Watson statistic, we conclude that succesive residuals are not correlated.