You are on page 1of 7

The do-files for microsimulation

1 Data preparation

1.1 rawurb, rawrur


These batch-files generate raw datasets by merging the necessary data files.

1.2 indurb, indrur


These files generate individual variables. Labour market segments are defined. The
variables are left hand side variables, such as wage income and occupational choice, or
right hand side variables, such as education, experience, or number of children or elderly
people in a household (see also Word Document variables). Top-coded incomes are
identified.

1.3 hhurb, hhrur


These files generate the profit function variables, i.e. for example profits and the number
of self-employed. Furthermore, the datasets are cleared and the household weights
rescaled. Households, in which the household head does not report income, occupational
choice, or education, or reports top-coded income, are dropped. Then, households
weights are rescaled based on regional and income strata. The number of dropped
households per strata is calculated, and the new household weights are generated by
multiplying the old weights with 1 plus the ratio of dropped and the total strata
population. Note that both the dropped households and the strata population are
calculated based on the old weights

2 Estimation

2.1 multiurb, multirur


In this do-file, the occupational choice model and the wage equations are estimated. The
coefficient vectors are stored and datafiles for each labour market segment generated.
The occupational choice is estimated for household heads, spouses, and other family
members. The coefficients of this multilogit are stored into headv*, spv*, and othv*. The
marginal effects on the probabilities are also calculated and listed in a log-file called
delta*. Note that only the cofficients are used in the microsimulation.
The wage equations are estimated separately for each segment (male-female/high-low-
skill, i.e. 4 segments in both urban and rural areas, which makes a total of 8 segments).
The residuals for those who are not wage employed are generated. The coefficients are
stored into lx*.
The difference between rural and urban areas is the number of occupational choices. In
rural areas, people can be both self and wage employed. This third occupational option
dies not exist in urban areas.
Datafiles that contain - in addition to the original data - the residuals are saved separately
for each segment under i*.
2.2 regprofiturb, regprofitrur
As the name suggests, these files estimate the profit functions. For urban areas, only one
profit function is estimated. In rural areas, three profit functions are estimated. One for
agricultural activities, a second one for mixed, and a third for non-agricultural activitities.
The number of self-employed is instrumented. Note that instrumentation has to be
considered when the residuals are calculated. Again, non-observed residuals are
generated. The coefficient vector is stored in ulx* for urban, and lx* for rural areas.
The probit for pure agriculture is still estimated, but the results are not used in the
imcrosimulation.
A household dataset with the profit function residuals is saved under s*.

2.3 drawurb, drawrur


These files generate the unobserved residuals for the occupational choice model. Then,
the segment datasets are merged to i*.

3 The Microsimulation

3.1 Simcge, simhist


The microsimulation is run either from simcge.do or from simhist.do. The structure of
these files is similar. Simcge.do runs 5 simulations, sim1, sim2, sim3, simall, and simhist,
the historical simulation. It uses the simcge.dta file which contains the vectors of the
simulation results that are passed on from the CGE model. In the simhist.do, the vectors
have to be entered by hand into the code. It is a more flexible file that can be used for
experiments. Furthermore, it decomposes the historical shock into participation and price
effects. This is done by the act-matrix, which picks up the elements of the simulation if
the corresponding element equals 1. This decomposition could be easily added to the
simcge.do.
Note that the CGE passes labour force composition, labour incomes and profits, and
food- and non-food prices to the microsimulation (index numbers, base year = 1). This is
4 urban wages, urban wage employed and self-employed for each segment (8 variables),
1 urban profit, 4 rural wages, rural wage employed, self-employed, and both for each
segment (12 variables), and 3 rural profits.
These two files simulate and calculate the changes in income distribution and poverty
indicators. They do so by running two batch-files, which contain the programs for the
simulation and the production of results. These batch-files are anewturb/rur for the
simulation and simres for the calculus of indicators. To get action, these programs are
then run. Note that the simcge.do and the simhist.do pass on a number of macros to these
programs. In most cases, the comments in the STATA code trace back these macros to
their origin.
The batch-file, which contains all the programs for the simulation is the anewturb.do. It
also has a small matrix generation part in the beginning. So, there is already something
going on if one runs the anewturb.do. This is not true for the simres.do, which contains
only programs. This means, if the simres.do is quietly run by STATA it only reads in the
programs into STATAs memory, but nothing happens until the programs are actually
called.
3.2 anewturb, anewtrur
The purpose of the code is to find a set of constants that makes the data satisfy a set of
variables (13 in the urban, 20 in the rural case) namely :
- proportion of wage workers among workers of each segment (4)
- proportion of self employed among workers of each segment (4)
- proportion of wage+self employed among workers of each segment (4), only rural
- mean wage for each segment (4)
- mean per capita income for agricultural activities, only rural
- mean per capita income for non agricultural activities
- mean per capita income for mixed activities, only rural
These variables are given by the CGE model. Their target value is set in the "newton"
program. A Newton-Raphson algorithm is used to find the set of constants. The code is
written with many nested programs.
The core program is "compar" which computes the variables described above given a set
of constants. The "compjinv" program computes the Jacobian matrix which inverse is
used to increment the constants ("instruments") in order to converge to the target. The
"newton" program is were the algorithm is put together. All the other programs are
"bricks" of these 3 main programs.
The detailed program structure:
newt*

calls

compar

compjinv

submatu calls
smatself

calls

compjacu
calls

compar

utscore
parseg
parself

For details of the specific programs see the commented STATA code, in particular the
anewturb (1988).
3.3 Simres, incadj
Simres.do calculates inequality and poverty indicators. The main program is simres,
which is called by simcge.do, respectively simhist.do.
Simres.do defines the five programs simres, desineq, poverty, perchg, and hhcent. The
structure of these nested programs is the following:

simres calls

hhcent

desineq calls

perchg
poverty

Hhcent calculates household centiles based on per capital income. Desineq calculates
inequality indices, such as Gini and Theil measures, and it calls poverty, which calculates
poverty measures. Perchg calculates percentage changes. Note that the changes given in
the result matrices are percentage and NOT percentage point changes.
The incadj.do file generates the incadj.dta that is used for possible adjustments, in
particular from the expenditure side. Note that the incadj.dta is based on the 1994/95
income and expenditure survey. It therefore can be matched with the income survey data
only through a household class variable that has to be defined through both surveys. This
structure was chosen in order to make future changes easier to include.
1u. Data manipulation 3u. Residual generation
eh61ocup.dta headvu.dta ulx88.dta
eh61ftra.dta spvu.dta s88u.dta
eh61educ.dta othvu.dta
lx88111.dta i88111.dta
eh61inac.dta i88112.dta
eh61fact.dta lx88112.dta
lx88121.dta i88121.dta
eh61hoga.dta i88122.dta
lx88122.dta
i88111.dta
i88112.dta
1.1.1.1 rawurb i88121.dta 1.1.1.6 drawur
rawurbdata.dta i88122.dta

i88u.dta

1.1.1.2 indurb.
indurbdata.dta

1.1.1.3 hhurb.
urbdata.dta

2u. Estimation
urbdata.dta

1.1.1.5 multiur 1.1.1.4 regprof


1r. Data manipulation 2r. Estimation 3r. Residual generation
ehruftcn.dta rurdata.dta i88211.dta
ehruho61.dta i88212.dta
hogares.dta i88221.dta
ehruftaj.dta i88222.dta
1.1.1.11 multiru 1.1.1.10 regprof

1.1.1.7 rawrur.
headvr88.dta lx88p.dta
rawrurdata.dta spvr.dta lx88s.dta 1.1.1.12 drawru
othvr.dta lx88n.dta
lx88211.dta s88r.dta i88r.dta
lx88212.dta
1.1.1.8 indrur. lx88221.dta
indrurdata.dta lx88222.dta
i88211
i88212
i88221
1.1.1.9 hhrrur.
i88222
rurdata.dta
1.1.1.14 simcg

4. Income adjustments 5. Microsimulation


i88u.dta
i88r.dta

1.1.1.15 simhi
1.1.1.13 incadj. ulx88.dta headvr88.dta
headvu.dta spvr.dta
incadj.dta spvu.dta othvr.dta
othvu.dta lx88211.dta
lx88111.dta lx88212.dta
lx88112.dta lx88221.dta
lx88121.dta lx88222.dta
lx88122.dta lx88p.dta
i88u.dta lx88s.dta
i88s.dta 1.1.1.16 anewt pz88p.dta
i88r.dta
s88r.dta

1.1.1.18 simcg 1.1.1.17 simhi 1.1.1.21 simcg 1.1.1.20 simhi


baseu.dta h95u.dta baser.dta h95r.dta
sim1u.dta h95Ou.dta sim1r.dta h95Or.dta
sim2u.dta h95Wu.dta sim2r.dta h95Wr.dta
sim3u.dta sim3r.dta
simallu.dta simallr.dta

1.1.1.19 simr

You might also like