You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/334031538

Estimating the Economic Model of Crime with Panel Data

Presentation · June 2019


DOI: 10.13140/RG.2.2.34988.72320

CITATIONS READS

0 489

2 authors, including:

Alessandro Cremaschini
Sapienza University of Rome
2 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cointegration across exchange rates View project

All content following this page was uploaded by Alessandro Cremaschini on 26 June 2019.

The user has requested enhancement of the downloaded file.


Doctoral School of Economics Microeconometrics Alessandro Cremaschini

Doctoral School of Economics

PhD in Mathematics for Economic and Financial Applications

Microeconometrics

Professor: Marianna Belloc

Student: Alessandro Cremaschini

1. INTRODUCTION
2. DATA
3. METHODOLOGY
4. RESULTS
5. CONCLUSION
REFERENCES

1. INTRODUCTION
In this report we present results from CORNWELL - TRUMBULL, (1994), Estimating an
Economic Model of Crime with Panel Data, as also presented in simpler version in
WOOLDRIDGE, Introdoctury Econometrics (5st ed.), by using cornwell.dta dataset.
We chose such a dataset because crime is one of the most dramatic issue in Italy and
similar analysis have to be exploited in order to provide valid tools to criminal justice to
implement effective public policies, in particular in local level. Furthermore, we believe
that understanding of the incentives to join into criminal activities has to be crucial also to
implement social development policies.
In this area of research the main critique was the use of aggregated data instead of
invidual ones; usually researchers used data on country level loosing informations about
individual incentives to join into illegal activities. Anyway, obviuos difficulties to collect
such individual data have left scholars free to use aggregated data in their analysis.
CORNWELL – TRUMBULL (1994) used data with lower level, from counties of North
Carolina in 1981-87.
!1
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

The aim of authors was to apply panel data for the first time to economic models of
crime since up there many researchers simply applied regressions in a cross sectional
environment. CORNWELL – TRUMBULL (1994) consider the simple linear regression model
by OLS unable to capture the variability of the dependent variable since an unobserved
component has an active role into it, then they consider much more useful a panel data
approach as able to better represent the effects on the dependent variable. We will use
models in order to take into account the individual heterogeneity, such as Fixed Effects
model or Random Effects model, exploring different estimators.
Significant differences will be addressed to this latent variable: by using panel data the
effect of criminal justice as deterrent for joining into illegal activities will be smaller than
previuos works where unobserved variable was not taken into account.

2. DATA
It’s been used a balanced fixed1 dataset of 90 units – counties of North Carolina - in
seven time observations (1981-87), 630 observations from FBI’s Uniform Crime Reports,
North Carolina Dept. of Correction and North Carolina Employment Security Commission.
Such as dataset has been one of the improvement than past works due to a lower level of
aggregated data (instead of federal or state level). Description of the variables is the
following:
Crime Rate (R) is the ratio of FBI index crimes to county population; Pa, Pc, Pp is a set

of probabilities of arrest (proxy of ratio of arrest to offense, conditional), convinction


(proxy of ratio of convinction to arrest conditional on arrest) and imprisonment (proxy of
proportion of total convinction, conditioned on); S
! is the severity of punishment. Then a set
of variables to control for return of legal activities: all variables pre-labeled with W
! stand
for weekly wage from different labour sectors as construction (!WCON) , transportation
(!WCON), retail (!W T R D), finance and insurance (!WFIR), service (!WSER), manifacturing
(!WMFG), governments (!WFED; WSTA; WLOC). Furthermore, variables to control
puculiar geografic characteristics - incentives to partecipation into legal actvities such as
rural and urban areas or general cultural factors - have been used as dummy variables
(!WEST ; CEN T R A L, UR BA N) and DENSI
! T Y as the ordinary definition of it. Also
variables which control for demographic characteristics have been included (!PERCEN T

1 The same entities are observed for each period without changing
!2
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

! YOUNG M
! A L E , proporition of male in 15-24 ages on the country population, and
! PERCEN T MINOR
! I T Y, proportion of minority or non-white). POL
! ICE is the variable
to control county’s capability to deter crime, measured by police per capita. Finally, it’s
been used a set of time effects dummy variable to capture for crime rate variations common
to all counties. All variables will be expressed in logarithmic form in order to consider as
elasticities.
Below a summary statistics about wages after setting our data as panel dataset; this kind
of descriptive statistics are much more useful than simple summary statistics because we
can check in detail the different sources of variation of our variables2:

Summary statistics
Variable Mean Std, Dev, Min Max

wcon overall 245,6661 121,9837 65,62158 2324,598


between 55,56979 170,9994 515,6662
within 108,7267 -83,03433 2054,598

wtuc overall 406,1028 266,5138 28,8577 3041,958


between 90,27398 176,8397 754,5614
within 250,9142 -82,7703 2693,499

wtrd overall 192,8231 88,40727 16,87376 2242,747


between 43,47784 142,5774 505,4261
within 77,09445 -127,9656 1930,144

wfir overall 272,0593 55,76809 3,51568 509,4655


between 41,38465 156,9322 416,0939
within 37,59956 34,09314 376,1693

wser overall 224,6705 104,8667 1,843794 2177,068


between 45,58889 113,2337 467,8198
within 94,54365 -77,06545 1933,919

(To be
continued)

2 This is a balanced panel data so observations are the same for every variable.
!3
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

Variable Mean Std, Dev, Min Max

wmfg overall 285,1701 82,36807 101,83 646,85


between 74,36465 144,0386 538,48
within 36,15451 169,2501 408,6501

wfed overall 403,8959 63,06669 255,4 597,95


between 55,37068 298,6857 538,6143
within 30,6715 281,7887 482,933

wsta overall 296,9075 53,43161 173,02 548


between 34,3623 214,0714 407,8571
within 41,05403 208,9346 526,6946

wloc overall 257,9762 41,35802 163,59 388,09


between 21,60825 206,1 317,3171
within 35,32738 174,7076 387,6119

We can easily check the different sources of variability, for example the wages in the
sector of construction (!WCON) there is an higher within variability (in terms of standard
deviation) than between; an explanation can be that across years individuals can get
increases in wage due to personal ability instead of other sources since in this sector there
can be a kind of union wages that level out them.
! 02 ≈ σB2 + σW
To remember that σ 2
.

3. METHODOLOGY
CORNWELL – TRUMBULL (1994) stated that cross – section data were not able to capture
the real effect on the dependent variable - the crime rate - of several independent
regressors. Estimated models by simple linear regression were not reliable because of a loss
of informations so estimated coefficients were not consistent.
CORNWELL – TRUMBULL (1994) show that cross - section analysis loses the effect of an
unobservable variable which was crucial to explain the variability of crime rate. Authors

!4
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

identify this latent variable in the specific characteristics of area (county heterogeneity)
where data values comes from, correlated with the criminal justice variables.
They used a panel data of 630 observations considering both county heterogeneity and
simultaneosus heterogeneity, the former using a within and between estimation, the latter by
a two stage least square estimation which we won’t threat it.
Panel data is the right approach since we need to capture the county heterogeneity as the
correlated unobserved variable with (!X′it , P
! it′ ) because they help us to deal with the
endogeneity problem, which means that orthogonality assumption3 of regressors and error
term does not longer hold, so that:
𝔼(ε
! | X) ≠ 0 (1)
𝔼(x
! k ϵ) = cov(xk , ϵ) ≠ 0 (2)
Assuming a presence of an individual heterogeneity correlated to the set of regressors,
authors used a model comparing different estimators such as within and between estimator.
A Fixed Effects model, seems to be the most appropriate model to exploit variability of
dependent variable since it’s reasonable to assume that:
𝔼(x
! it ϵis ) = 0 ; ! ∀t, s = 1,2,..,T (3)
as strict exogeneity assumption (SEA) and :
cor r (ci, xit ) ≠ 0 ∀t (4)4.
Different estimators - i.e. pooled OLS estimator - could instead provide us inconsistence
estimates since we cannot assume that:
𝔼(x′it νit ) = 0. (5)5
ν! it = ci + ϵit
Unlike authors we will run other estimations by pooled estimator, first difference
estimator and random effect estimator to show different estimated coefficients. After
running regression and comparing estimates, we test if the fixed effects model is the most
appropriate to our problem.
So, we can define our fixed effects model equation as:
! it = X′it β + Pit′ γ + αi + εit
R
i! = 1,…, N; t = 1,..,T (6)

3 e t X = y t (I − H)X = 0

4 These assumptions allow us to choose FE model instead of Pooled model or RE model.

5 As the usual assumption for pooled estimation.


!5
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

! X′ is a matrix of control variables for return of legal activities, P! is the set of


probabilities as defined above, α! i is a T! x1 vector of fixed effects representing the
unobserved component of county specific characteristics: It geometrically represents the
different intercept for every individual. Finally, ϵ! it is the usual error term it indexed.
First of all, authors compare estimates from the between transformation of (6)
Ri. = X′i. β + Pi.′ γ + αi + εi. (7)6
such that
Ri = T −1

Rit (8)7
i

where data are expressed in county means


The between estimator relies on variations between individuals (say, i and j), so we are
estimating β! using the cross-sectional information in the data (the time-series individual i
variation is gone); estimation of (6) by OLS would give us consistent estimates if the
unobserved component does not exist, (!αi = 0) so neglecting the effect of α
! i or assuming
that

𝔼(x
! it αi ) = 0 (9)

The within transformation of (6)


! it = X̃′it β + P̃′it γ + εi.
R̃ (10)
expresses deviations from mean 8. The within estimator allows us to net out time-invariant
unobserved characteristics ignoring between-group variation through the (general)
transformation:
y! it − ȳi. = (xit − x̄)β + (ϵit − ϵ̄i.) t! = 1,2,...,T (11)
y = ··x β + ··ϵ
!·· it it (12)
Unobserved term α
! i is considered as follow:
! i = ȳi − X̄′it β ̂
α (13)
which represents the leftover variability on dependent variable which is not explained by
regressors.

6 We omit in (7) the set of dummy variables for years and geographic/demographic control.

7 The same for all variables

8 In this case equation does not depend on the county heterogenenity.


!6
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

Once time invariant unobserved effect are cancel out, (12) is the estimated model (by
software) by using OLS.
All regressions have been run by considering dummy variables to control for years and
for geographic and demographic characteristics (d82 up to d87, west, central, urban,
density).

4. RESULTS
We start with the pooled OLS estimations; in the pooled estimation the whole set of X
regressors (the opportunities of legal activities represented by weekly wages) has p-value
greater than the significance level .05. In general we don’t care about pooled estimation
because we are not confident with the usual assumptions of POLS estimation. Breush -
Pagan test performed later will confirm that Pooled estimation is not the right approach.
Then we replicate results of CORNWELL - TRUMBULL, (1994) by between and within
estimators. For the between estimator we got statistically significant results just for P
! b and
! c in the P
P ! vector of probabilities as defined above. In general it’s not much used estimator
but useful for comparisons with the within. For the within estimation the set of probabilities
are statistically significant and we face an important difference with respect to between
estimator: now estimated coefficients are much lower than in the previous estimates.
Authors states that this is the prove that justice’s effectiveness (as deterrent effect) has a
lower impact into decision to join into illegal activities, lower than in the previous
econometric analisys of crime because of the unobserved county heterogeneity.
Furthermore we can check to ρ! to asses the percentage of variability due to the specific
individual effect. In our case ρ
! = 89% (! prob > 0.0000) ) which is quite high; in this way
we can assume a real effect of the unobserved component without considering an
idiosyncratic effect.
A similar estimation is by first difference (FD) which take into account the first
difference of every observation such as:
y! it − yi,t−1 = (xit − xi,t−1)′ β + (eit − ei,t−1) (14)

! △ yit = △ xit β + △ϵit (15)

!7
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

This model cancels out time invariant variable so the leftover variability is due just to
individuals. FD model is satisfied by the strict exogeneity assumption and the full rank
condition9. Estimated coefficients are pretty the same with the within estimator.
Finally, we run a Random Effects model (RE). This kind of model considers the
individual heterogeneity as part of the error term and it assumes orthogonality between x
! it
and !ϵit. In particular α
! i is treated as a random draw from a distribution and considering
𝔼(x
! it αi ) = 0 t! = 1,2,...,T (16)
it exploits serial correlation in the composite error term by GLS analisys. So, it allows us to
consider models with time invariant variable since FE model cannot.
Below, a summary table for estimated coefficients and standard error in round brackets10.

9 That’s why we need to cancel out time invariant variable.

10 We don’t present coefficients for all considered variables.


!8
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

Beyond statistics, we can think about economic interpretation of FE, FD and RE


estimates: The set of probabilities has the right negative sign (instead of between estimation
where P
! p has a positive sign), this means that an increase on criminal justice capacity/

ability decreases crime rate. Decreasing estimated coefficients stand for a lower strenght of
criminal justice than previuos outcomes as deterrent effect against “criminal opportunity
cost”. The set of X
! regressors are statistical significant in some cases, not in others and it
seems quite involved to find economic considerations if a wage is significant and another is
not.
At this point, what is important is to check for correlation between α
! i and x
! it in order to
choose the right estimator. If we would face non zero autocorrelation then FE estimates are
consistent but RE are not. In presence of zero correlation RE estimates are consistent and
more efficient than FE estimates because RE estimates are weighted average of within and
between estimates.
So we can test by Hausman test if we should use a FE or RE models. Basically we want
to test for significative difference between β ̂ and β
! FE ̂ . Significative difference is evidence
! RE
̂ . We define our hypothesis as:
against !βRE
! 0 : 𝔼(ci | xi1, xi2, . . , xiT = 𝔼(ci ) = 0 ;
H ! ⇒ βFE = βRE
(no correlation between c! i and !xit)
vs
! 1 : 𝔼(ci | xi1, xi2, . . , xiT ≠ 𝔼(ci )
H ! ⇒ βFE ≠ βRE
(correlation between c! i and !xit)
Anyway, the limit of Hausman test is that it’s not performed on c! i but on the composite
error term ν ! 0 : (νit | xi1, xi2, . . . , xiT ) depends on
! it = ci + ϵit. This implies that rejecting H
!𝔼(ci | xi1, xi2, . . . , xiT ) ≠ 0
or
! it | xi1, xi2, . . . , xiT ) ≠ 0
𝔼(ϵ
After running Hausman test, we can reject H
! 0 at 1% confidence level, so rejecting
independence between individual heterogeneity and regressors, we can choose FE model,
so assumptions (3) and (4) are correctly assumed.

!9
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

After this last analisys we can state that we are confident about the effect of county
heterogeneity on crime rate and excluding such unobserved component would give us
inconsistent estimates.
Moreover, we tried to estimate heterogeneity coefficients, below a summary:

Variable Obs Min Max

alphafehat 630 -1.47028 .7273102

Further development in this researching area can go toward much more micro analisys,
trying to use more disaggregated data; in this way public policy can be implemented in
regional level in a wider sight of federal government.

5. CONCLUSION
We have replicated results from CORNWELL - TRUMBULL, (1994) and we went further
using different estimators, in particular to compare FE and RE models. Previous
estimations by simple linear regression were not able to detect for the real effect on crime
rate because cross-section environment did not capture time variations and unobserved
component as well. Panel data is the right approach to explore this kind of analisys,
especially by applying FE model instead others as Hausman test has shown.
As we stated before, public policy should be implemented on the basis of such analisys
as lower disaggregated data are available.

REFERENCES

- Andrew Bell and Kelvyn Jones Explaining Fixed Effects: Random Effects Modeling of Time-
Series Cross-Sectional and Panel Data; Political Science Research and Methods, (May 2014),
pp 1 - 21

- Christopher Cornwell; William N. Trumbull - Estimating the Economic Model of Crime with
Panel Data; The Review of Economics and Statistics, Vol. 76, No. 2. (May, 1994), pp. 360-366

- Jeffrey M. Wooldridge - Introductory Econometrics, a modern approach, 5th ed. (2010)

!10
Doctoral School of Economics Microeconometrics Alessandro Cremaschini

!11

View publication stats

You might also like