You are on page 1of 8

Friday May 8 20:40:42 2020 Page 1

Statistics/Data Analysis

name: <unnamed>
log: C:\Users\ssk0012\Downloads\Andrew Gutierrez\project.smcl
log type: smcl
opened on: 8 May 2020, 20:40:41

1 . use NTAXI_hhtax_data, clear

2 .
3 . ** unique identifier is the newid
4 .
5 . * Question 3 :
6 .
7 . ** generating the vaild observation which is indicated by D(This will show how mant
> valid obs are there)
8 . gen soivalid = 1 if soi_st_ == "D"
(722 missing values generated)

9 . gen wagehdvalid = 1 if wage_hd_=="D"

10. gen wagespvalid = 1 if wage_sp_=="D"

11. gen taxpensvalid = 1 if taxpens_=="D"

12. gen sosvalid =1 if sossecb_=="D"

13. gen rentvalid = 1 if rntpaid_=="D"

14.
15. * this is showing the flag variable
16. tab soi_st_

soi_st_ Freq. Percent Cum.

D 6,737 90.32 90.32


T 722 9.68 100.00

Total 7,459 100.00

17. tab wage_hd_

wage_hd_ Freq. Percent Cum.

D 7,459 100.00 100.00

Total 7,459 100.00

18. tab wage_sp_

wage_sp_ Freq. Percent Cum.

D 7,459 100.00 100.00

Total 7,459 100.00

19. tab taxpens_

taxpens_ Freq. Percent Cum.

D 7,459 100.00 100.00

Total 7,459 100.00


Friday May 8 20:40:42 2020 Page 2

20. tab sossecb_

sossecb_ Freq. Percent Cum.

D 7,459 100.00 100.00

Total 7,459 100.00

21. tab rntpaid_

rntpaid_ Freq. Percent Cum.

D 7,459 100.00 100.00

Total 7,459 100.00

22.
23. * Summarize the tax file
24. su

Variable Obs Mean Std. Dev. Min Max

newid 0
taxid 7,459 1.22188 .569139 1 10
taxyr_cy 0
taxyr_py 0
depcnt 7,459 .805604 1.189823 0 11

ficar_cy 7,459 .1063618 .0681654 0 .153


ficar_py 7,459 .1061491 .0682067 0 .153
filestat 0
frate_cy 7,459 .0880746 .2104227 -.99 .5746
frate_py 7,459 .0890707 .2107414 -.99 .5746

soi_st 0
soi_st_ 0
srate_cy 7,459 .0254364 .0367559 -.1845 .317
srate_py 7,459 .0260557 .0365314 -.1845 .317
t65ct_cy 7,459 .1769674 .440386 0 2

t65c__cy 0
t65ct_py 7,459 .1606113 .4174376 0 2
t65c__py 0
wage_hd 7,459 34432.78 46362.51 0 387692
wage_hd_ 0

wage_sp 7,459 15454.72 36983.8 0 387692


wage_sp_ 0
othtxinc 7,459 1661.404 22740.78 -142740 893982
otht_inc 0
taxpens 7,459 2099.326 9507.639 0 87368

taxpens_ 0
sossecb 7,459 3363.833 7719.645 0 48696
sossecb_ 0
nontxinc 7,459 14.27846 253.4536 0 11076
nont_inc 0

rntpaid 7,459 3293.083 6899.272 0 49108


rntpaid_ 0
proptxpd 7,459 1853.255 3427.271 0 62874
prop_xpd 0
amtdedct 7,459 360.6988 1559.266 0 70518

amtd_dct 0
chldcare 7,459 37.95469 571.3468 0 34800
chld_are 0
depund17 7,459 .4638692 .9402228 0 9
depu_d17 0

othdedct 7,459 4333.06 57264.05 0 4799996


othd_dct 0
Friday May 8 20:40:42 2020 Page 3

ftaxowe 7,459 6240.723 15372.11 -9152 324429


staxowe 7,459 1583.107 3511.498 -3665 72286
ftaxo_py 7,459 6247.088 15378.85 -9137.16 324548.8

ftaxo_cy 7,459 6171.521 15299.13 -9318.29 323832.5


staxo_py 7,459 1586.315 3515.055 -3665 72285.6
staxo_cy 7,459 1549.601 3480.035 -3708 72256.26
fica_py 7,459 6981.057 7912.932 0 56061.49
fica_cy 7,459 6995.168 7947.503 0 56433.49

fdagi_py 7,459 54509.34 69520.77 -140490 1113449


fdagi_cy 7,459 54509.34 69520.77 -140490 1113449
stagi_py 7,459 39180.35 64583.81 -140490 1113449
stagi_cy 7,459 39150.62 64582.38 -140490 1113449
chdtx_py 7,459 199.3893 580.3135 0 6487.2

chdtx_cy 7,459 197.9686 577.924 0 6487.2


addtx_py 7,459 116.171 502.5734 0 9000
addtx_cy 7,459 117.6265 506.3417 0 9000
dpcar_py 7,459 4.751048 53.78123 0 1200
dpcar_cy 7,459 4.729894 53.58895 0 1200

eitcr_py 7,459 365.1424 1130.648 0 6143


eitcr_cy 7,459 378.0444 1160.273 0 6242
amtin_py 7,459 51758.49 64840.02 0 869950
amtin_cy 7,459 51778.42 64848.87 0 869950
amtow_py 7,459 96.26501 723.3471 0 12010.56

amtow_cy 7,459 88.43388 694.7973 0 11716.61


frgtx_py 7,459 6811.751 14458.91 0 299090.8
frgtx_cy 7,459 6756.93 14393.16 0 298374.5
ftxbc_py 7,459 6811.751 14458.91 0 299090.8
ftxbc_cy 7,459 6756.93 14393.16 0 298374.5

sstdd_py 7,459 3013.393 4298.433 0 18158


sstdd_cy 7,459 3050.108 4346.936 0 18464.09
sitdd_py 7,459 2839.276 28892.79 0 2404441
sitdd_cy 7,459 2834.144 28893.13 0 2404502
sprcr_py 7,459 23.51891 148.9368 0 2571.45

sprcr_cy 7,459 23.58012 149.622 0 2588.5


sdcar_py 7,459 .5819788 14.12507 0 852.48
sdcar_cy 7,459 .5791326 13.7637 0 832.48
seitc_py 7,459 21.11935 148.1038 0 4563
seitc_cy 7,459 23.07075 157.5391 0 4593.42

divinc 7,459 0 0 0 0
soivalid 6,737 1 0 1 1
wagehdvalid 7,459 1 0 1 1
wagespvalid 7,459 1 0 1 1
taxpensvalid 7,459 1 0 1 1

sosvalid 7,459 1 0 1 1
rentvalid 7,459 1 0 1 1

25.
26.
27. * Question 4
28.
29. * Calculate the median and mean
Friday May 8 20:40:42 2020 Page 4

30. * The 50th percentile is the median value


31.
32. su staxo_py staxo_cy wage_hd proptxpd rntpaid , detail

staxo_py

Percentiles Smallest
1% -1160 -3665
5% -100 -3404.09
10% 0 -2661.37 Obs 7,459
25% 0 -2649.95 Sum of Wgt. 7,459

50% 0 Mean 1586.315


Largest Std. Dev. 3515.055
75% 1824.08 41476.93
90% 4663.72 41531.11 Variance 1.24e+07
95% 7925.8 42012.52 Skewness 4.8412
99% 16784.8 72285.6 Kurtosis 46.50511

staxo_cy

Percentiles Smallest
1% -1200 -3708
5% -100 -3417.84
10% 0 -2758.09 Obs 7,459
25% 0 -2695.03 Sum of Wgt. 7,459

50% 0 Mean 1549.601


Largest Std. Dev. 3480.035
75% 1771.27 41440.8
90% 4560.35 41453.21 Variance 1.21e+07
95% 7641.39 41608.55 Skewness 4.926397
99% 16460.41 72256.26 Kurtosis 47.99574

wage_hd

Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459

50% 21000 Mean 34432.78


Largest Std. Dev. 46362.51
75% 49600 387692
90% 81400 387692 Variance 2.15e+09
95% 119636 387692 Skewness 2.583185
99% 219467 387692 Kurtosis 12.14514

proptxpd

Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459

50% 250 Mean 1853.255


Largest Std. Dev. 3427.271
75% 2450 33222
90% 5200 36405 Variance 1.17e+07
95% 8000 37518 Skewness 4.092603
99% 18759 62874 Kurtosis 32.90939

rntpaid

Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459
Friday May 8 20:40:42 2020 Page 5

50% 0 Mean 3293.083


Largest Std. Dev. 6899.272
75% 4200 45388
90% 12000 45388 Variance 4.76e+07
95% 16800 45388 Skewness 3.073225
99% 30000 49108 Kurtosis 15.41023

33.
34.
35.
36. *Question 5
37.
38. * Generate log values
39. gen logwage = log(wage_hd)
(2,269 missing values generated)

40. gen logtaxliabilty = log(amtow_cy)


(7,245 missing values generated)

41.
42. * summary of original values
43. su wage_hd amtow_cy

Variable Obs Mean Std. Dev. Min Max

wage_hd 7,459 34432.78 46362.51 0 387692


amtow_cy 7,459 88.43388 694.7973 0 11716.61

44.
45. * summary of log values
46.
47. su logtaxliabilty logwage

Variable Obs Mean Std. Dev. Min Max

logtaxliab~y 214 7.488061 1.26604 -.0512933 9.368763


logwage 5,190 10.36398 1.117074 0 12.86797

48.
49.
50. * Note
51. * Taking log makes the interpretation easier.
52. * the log value will be smaller than the actual value.
53.
54.
55. **Question 6
56. * regression
57.
58. reg logtaxliabilty logwage

Source SS df MS Number of obs = 193


F(1, 191) = 3.23
Model 5.36811593 1 5.36811593 Prob > F = 0.0740
Residual 317.744054 191 1.66358144 R-squared = 0.0166
Adj R-squared = 0.0115
Total 323.11217 192 1.68287589 Root MSE = 1.2898

logtaxliab~y Coef. Std. Err. t P>|t| [95% Conf. Interval]

logwage .24856 .1383701 1.80 0.074 -.0243698 .5214898


_cons 4.514616 1.655524 2.73 0.007 1.249157 7.780075
Friday May 8 20:40:42 2020 Page 6

59.
60.
61. * Question 7
62. gen logwage1 = log(wage_hd + 1)

63. gen AMT = log(amtow_cy + 1)

64.
65. su logwage1 AMT

Variable Obs Mean Std. Dev. Min Max

logwage1 7,459 7.21201 4.857703 0 12.86797


AMT 7,459 .2149637 1.268433 0 9.368848

66. * regression
67. reg AMT logwage1

Source SS df MS Number of obs = 7,459


F(1, 7457) = 117.48
Model 186.114855 1 186.114855 Prob > F = 0.0000
Residual 11813.2256 7,457 1.58417938 R-squared = 0.0155
Adj R-squared = 0.0154
Total 11999.3405 7,458 1.60892203 Root MSE = 1.2586

AMT Coef. Std. Err. t P>|t| [95% Conf. Interval]

logwage1 .0325198 .0030003 10.84 0.000 .0266385 .0384012


_cons -.0195697 .0260881 -0.75 0.453 -.0707096 .0315702

68.
69.
70. ** use expenditure data
71. use FMLI_hhexpenditures_data,clear

72.
73. *Question 9
74. su educacq stdntyrx cartkncq cartkucq fam_size

Variable Obs Mean Std. Dev. Min Max

educacq 6,208 145.9742 1453.78 0 50100


stdntyrx 182 36312.85 58910.11 0 490000
cartkncq 6,208 143.7452 2172.728 0 54000
cartkucq 6,208 121.6269 1603.993 0 42500
fam_size 6,208 2.430251 1.451531 1 14

75.
76. * Check the flag variables for missing values
77. tab stdt_rbx

stdt_rbx Freq. Percent Cum.

A 6,186 99.65 99.65


B 3 0.05 99.69
D 19 0.31 100.00

Total 6,208 100.00


Friday May 8 20:40:42 2020 Page 7

78. tab fam__ize

fam__ize Freq. Percent Cum.

D 6,208 100.00 100.00

Total 6,208 100.00

79.
80.
81. * Question 12
82.
83. ** checking for repeated observation
84. bys newid: egen newidcount = count(newid)

85. su newidcount

Variable Obs Mean Std. Dev. Min Max

newidcount 6,208 1 0 1 1

86. tab newidcount

newidcount Freq. Percent Cum.

1 6,208 100.00 100.00

Total 6,208 100.00

87.
88. use NTAXI_hhtax_data,clear

89.
90. bys newid: egen newidcount = count(newid)

91. su newidcount

Variable Obs Mean Std. Dev. Min Max

newidcount 7,459 1.418287 .7718347 1 9

92. tab newidcount

newidcount Freq. Percent Cum.

1 5,186 69.53 69.53


2 1,692 22.68 92.21
3 414 5.55 97.76
4 116 1.56 99.32
5 30 0.40 99.72
6 12 0.16 99.88
9 9 0.12 100.00

Total 7,459 100.00

93. drop if taxid>1 //keeping only the 1st quarter


(1,251 observations deleted)

94. save NTAXI_hhtax_data1,replace


file NTAXI_hhtax_data1.dta saved
Friday May 8 20:40:42 2020 Page 8

95.
96. use FMLI_hhexpenditures_data

97.
98.
99. merge 1:1 newid using NTAXI_hhtax_data1 //merging file

Result # of obs.

not matched 0
matched 6,208 (_merge==3)

.
100
. //all matched
101
.
102
. save Merge, replace
103
file Merge.dta saved

.
104
.
105

You might also like