Professional Documents
Culture Documents
Statistics/Data Analysis
name: <unnamed>
log: C:\Users\ssk0012\Downloads\Andrew Gutierrez\project.smcl
log type: smcl
opened on: 8 May 2020, 20:40:41
2 .
3 . ** unique identifier is the newid
4 .
5 . * Question 3 :
6 .
7 . ** generating the vaild observation which is indicated by D(This will show how mant
> valid obs are there)
8 . gen soivalid = 1 if soi_st_ == "D"
(722 missing values generated)
14.
15. * this is showing the flag variable
16. tab soi_st_
22.
23. * Summarize the tax file
24. su
newid 0
taxid 7,459 1.22188 .569139 1 10
taxyr_cy 0
taxyr_py 0
depcnt 7,459 .805604 1.189823 0 11
soi_st 0
soi_st_ 0
srate_cy 7,459 .0254364 .0367559 -.1845 .317
srate_py 7,459 .0260557 .0365314 -.1845 .317
t65ct_cy 7,459 .1769674 .440386 0 2
t65c__cy 0
t65ct_py 7,459 .1606113 .4174376 0 2
t65c__py 0
wage_hd 7,459 34432.78 46362.51 0 387692
wage_hd_ 0
taxpens_ 0
sossecb 7,459 3363.833 7719.645 0 48696
sossecb_ 0
nontxinc 7,459 14.27846 253.4536 0 11076
nont_inc 0
amtd_dct 0
chldcare 7,459 37.95469 571.3468 0 34800
chld_are 0
depund17 7,459 .4638692 .9402228 0 9
depu_d17 0
divinc 7,459 0 0 0 0
soivalid 6,737 1 0 1 1
wagehdvalid 7,459 1 0 1 1
wagespvalid 7,459 1 0 1 1
taxpensvalid 7,459 1 0 1 1
sosvalid 7,459 1 0 1 1
rentvalid 7,459 1 0 1 1
25.
26.
27. * Question 4
28.
29. * Calculate the median and mean
Friday May 8 20:40:42 2020 Page 4
staxo_py
Percentiles Smallest
1% -1160 -3665
5% -100 -3404.09
10% 0 -2661.37 Obs 7,459
25% 0 -2649.95 Sum of Wgt. 7,459
staxo_cy
Percentiles Smallest
1% -1200 -3708
5% -100 -3417.84
10% 0 -2758.09 Obs 7,459
25% 0 -2695.03 Sum of Wgt. 7,459
wage_hd
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459
proptxpd
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459
rntpaid
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 7,459
25% 0 0 Sum of Wgt. 7,459
Friday May 8 20:40:42 2020 Page 5
33.
34.
35.
36. *Question 5
37.
38. * Generate log values
39. gen logwage = log(wage_hd)
(2,269 missing values generated)
41.
42. * summary of original values
43. su wage_hd amtow_cy
44.
45. * summary of log values
46.
47. su logtaxliabilty logwage
48.
49.
50. * Note
51. * Taking log makes the interpretation easier.
52. * the log value will be smaller than the actual value.
53.
54.
55. **Question 6
56. * regression
57.
58. reg logtaxliabilty logwage
59.
60.
61. * Question 7
62. gen logwage1 = log(wage_hd + 1)
64.
65. su logwage1 AMT
66. * regression
67. reg AMT logwage1
68.
69.
70. ** use expenditure data
71. use FMLI_hhexpenditures_data,clear
72.
73. *Question 9
74. su educacq stdntyrx cartkncq cartkucq fam_size
75.
76. * Check the flag variables for missing values
77. tab stdt_rbx
79.
80.
81. * Question 12
82.
83. ** checking for repeated observation
84. bys newid: egen newidcount = count(newid)
85. su newidcount
newidcount 6,208 1 0 1 1
87.
88. use NTAXI_hhtax_data,clear
89.
90. bys newid: egen newidcount = count(newid)
91. su newidcount
95.
96. use FMLI_hhexpenditures_data
97.
98.
99. merge 1:1 newid using NTAXI_hhtax_data1 //merging file
Result # of obs.
not matched 0
matched 6,208 (_merge==3)
.
100
. //all matched
101
.
102
. save Merge, replace
103
file Merge.dta saved
.
104
.
105