C a r i b b e a n J o u r n a l o f C r i m i n o l o g y a n d P u b l i c S a f e t y

J a n u a r y & J u l y , 2 0 1 0 . 1 5 ( 1 & 2 ) . 1 9 5 - 2 1 4 . I S S N 2 0 7 3 5 4 0 5
© 2 0 1 0 T h e U n i v e r s i t y o f T r i n i d a d a n d Tobago, O’Meara Campus, Trinidad and Tobago



CRI M E DATA M I NI NG:
AN ANALY SI S OF REAL TI M E DATA
I N PAKI STAN

Abida Ellahi
International Isl amic University, Paki stan

Irfan Manarvi
University of Engineeri ng and Technology, Paki stan

The law and order cri si s in Paki stan has continued to
deepen over time, and i n recent years the pol ice have been
increasingly unabl e to cope wi th their additi onal
responsibili ties, parti cul arly regarding to combating
seri ous crime.
For law enforcement to be effective, it needs to extract
previously unknown knowl edge from large amounts of
different types of data. Data mi ning is themost compell ing
tool for this task.
This report takes the real time data of Police crime
stati stics of si x years from the Federal Bureau of Statistics
of Paki stan. It investigates its trends and poses new
sol uti ons not only for current pol ice systems at local levels
but at the national level also. This paper also explores the
weak areas of pol ice systems in dealing wi th crimes and
suggests recommendations.

196 CRIME DATA MINING IN PAKISTAN

No part of the worl d is free from cri me. The concern of
securi ty and cri mes has taken a new turn in the 21
st
century. Some mega events li ke the terrori st attacks on 11
September 2001 have created securi ty threats not only for
the affected country but also for other countries.
Technology has brought not only easi ness for law
enforcement and intell igence gathering agenci es, but also
opened new ways for cri mi nals. In today s worl d, the
major chall enge for these agencies is analyzi ng the
increasing volume of data accuratel y and effi cientl y. Good
information enables law enforcement to prevent cri mes
and reduce risks of potenti al dangers (Cope, 2004; Til ley,
2005).
The current pol ice system was introduced by the Briti sh in
Paki stan. This system currentl y comprises outdated, time
consuming techniques. Suddl e (2008) indicates many
problems of Paki stani pol ice such as an outdated legal and
insti tuti onal framework, inadequate accountabil ity, poor
incentive systems, widespread corruption, and severe
under resourcing of law and order.
Today s insecure environment of Paki stan imposes a
strong need for more advanced computerized systems for
sol ving cri mes. This study intends to expl ore trends of
cri me and poses new sol uti ons not only for current pol ice
systems at the local level, but also thenational level also by
taki ng a real time data perspective. Thestudy also explores
the weak areas of the pol ice system and suggests
recommendations both for thepol ice and national pol icy
makers.

ELLAHI AND MANARVI 197

Data M ining Techniques and Cr imes

Definition of Data Mining
Data mi ni ng is an interdiscipli nary fi el d mai nly consisti ng
of research in appli ed mathemati cs and computer science
(Han & Kamber, 2006). We defi ne data mi ni ng in the
context of cri me analysis as “…an intoxi cating tool which
aids the cri mi nal investi gators in qui ck, well organi zed
and valuable extraction of relati onships from a large
number of cri me statistics.” It over ri des the need of
traditi onal stati stical framework of testi ng relati onshi ps
and anal ysis which is compl ex and consumes time. Data
mi ning provi des in depth anal ysis while enabling the
investi gators to identify patterns and trends by scanni ng
the cri me data sets. Some of the techni ques are hot spots,
Artificial Neural Networks, Repeat Victimi zation,
Uni variateMethods and Geographic profil ing.

Application of Data Mining to Law Enforcement

Some of the appli cations of data mi ning are as fol lows:
Data mi ni ng is used to disclose li nks between cri mes and
offenders (Goldberg & Wong, 1998). Data mi ning can be
used to get useful information (e.g., mari tal status,
conviction history, and drugs and alcohol addiction) to
fi nd li nks such as physi cal characteri stics of victi ms and
offenders, type of weapon used, and locati on (Bl okland et
al., 2005; De Brui n et al., 2006). Similar appli cation of data
mi ning can be seen in detection of credit card frauds using
fi nanci al transacti on (Kingston et al., 2004). The last but
198 CRIME DATA MINING IN PAKISTAN

not the least appli cation of data mi ning can be seen in the
analysis of armed robberies (Dahbur & Muscarell o, 2003).
Accordi ng to Thongtae and Srisuk, (2008), the appli cations
in law enforcement from data analysis have two
categories. One is cri me control and the other is cri mi nal
suppression. In cri me control, the information from
analyzed data is used to control and prevent the inci dence
of cri me. However, crimi nal suppression is used to grab a
cri mi nal by using his/her history recorded in data mi ni ng.
These and other appli cations of data mi ning in law
enforcement are helpful in expl oring the hidden and
obvious li nks, patterns and trends. Enquiring all these
li nks, patterns and trends gi ves a cl ear picture or image of
understandi ng from thecri mealbum.
Matching Crimes
Matchi ng cri mes refers to identifying a li nk among a series
of cri mes. Matchi ng or li nking cri mes is important to the
pol ice. When cri me analysts match a set of incidents then
they can identify those patterns which are used by
cri minals for operations. Accordi ng to Pease (2001),
locati on is not a suffi cient basis for detection and
prevention of cri mes but non spatial variabl es can become
a potenti al sourcefor generati ng patterns of concentration.
A number of studies have been conducted to explore the
problem of li nking crimi nal inci dents (Brown & Hagen.,
2003; Lin and Brown., 2006). Lin and Brown (2004)
explai ned that in agenci es where inci dents are not in
ELLAHI AND MANARVI 199

compl ete order, manual compari son of inci dent records is
used for making li nks. On the contrary, agenci es wi th well
managed and well defi ned record systems used
automated systems.
Predicting Crimes
The li terature contains insuffi cient informati on about
predi ctive model s for pol ice deci sion support (Oatl ey,
Maclntyre, Ewart & Mugambi , 2002). Predi ction of
locati on of cri mes can be a valuable source of informati on
not only for pol ice but for all law enforcement agenci es. It
can help in making decisions about all ocation of pol ice
resources. By properl y all ocating resources, pol ice can
respond to any cri me activi ty in less time than before.
Similarly if taken from the pol icy making perspective,
predi ction can help the pol icy makers not only about the
future trends, but also to provi de information about the
social causes of cri mes. It gi ves a chance to revise the
strategies which seem less effective.
Crime Data Mining Techniques

Data mi ning techniques use both structured and
unstructured data. Some of the famous techniques used in
theworl d in advanced cri medetection systems are:
(1) Classification groups data into predefined categories
with common observed properties. Some of the examples
include Bayesian cl assifiers, evol utionary computing,
fuzzy logic, neural networks and rule induction. These are
200 CRIME DATA MINING IN PAKISTAN

often used for the prediction of crime trends. Reasonabl y
complete training and testing data may be a requirement
of classification in order to avoi d limit prediction caused
by missing data. It is also known as pat t er n r ecogni t i on,
di scr i mi nat i on, super vi sed l ear ni ng or pr edi ct i on.
(2) String comparator techniques make compari sons of
textual fields in data base records. Al ong with it,
computation of resemblance between the records is also
part of its functi ons. Identificati on of unreliable
information like name, addresses and security number is
also another function of stri ng comparators.
(3) Entity extraction is the technique of data mining which
point outs the patterns from the available data sets in the
form of text, images, or audi o materials. According to
Chau, Xu and Chen (2002), it has been used for the
identifi cati on of persons, addresses, vehicles, and
personal characteristi cs from policenarrative reports.
(4) Clusteringtechniqueshave an objective of analysis of data
so that it can be grouped in similar groups of data for
interpretation in order to learn something about thecases.
(5) Association rule mining aims to di scover the modes
which associate the data element with other elements of
data e.g., the associ ation of parent’s separation and
children crimes. Lee, Stolfo, and Mok (1999) found this
technique to be applicable in network intruders’ profiles
to hel p detect potential futurenetwork attacks.
(6) Deviation detection is the technique used for the
determination of a di fferent looking part of whole data. It
is also call ed the outlier detection. It can be used for fraud
detection, network intrusion detection and other crime
ELLAHI AND MANARVI 201

analyses.
(7) Series Analysis: A database which consists of sequences
of values or events that change with ti me is referred to as
a ti me series database (Han & Kamber 2000). The aim of
seri es analysis is to di scover sequences within the data.
Han & Kamber (2000) found four types of patterns
extracti ng from ti me series analysis. These are: Trend
analysis, Sequential patterns, Periodical patterns and
Similarity search.
Data Challenges Associated with Crime Data

Currently, law enforcement agencies are gatheri ng large
amounts of data from various sources. Processi ng and
analyzi ng such data, however, has become increasingl y
difficult. McCue (2006) identifi ed that inci dent data,
narrati ve reports, fi nanci al transactions, telephone records,
and internet activi ty does not represent all information
resources but a few. The chall enges associ ated wi th such
data that li mit access and functional integrati on of data
resources are:
(1) Characteristics of crime related data. The challenges
associated with crime data are the overload of data
obtained from diverse data sources, stored in multiple
data formats, and large data volumes. Both authori tati ve
information (e.g., crime incident reports, tel ephone
records, financial statements, and immigration and
customs records) and open source information (e.g.,
news stories, journal articl es, books etc) are necessary to
be coll ected. All such data are in multipl e formats. This
causes theunstructured nature of records. Similarly the
202 CRIME DATA MINING IN PAKISTAN

increasing volume of data is also a problem.
(2) Characteristics of crime analysis techniques. Several
information technologies techniques are being used for
crime analysis purposes. However they lack a steady
framework which can address major challenges. The
effective employment of such techniques in crime analysis
is theunanswered question.
(3) Characteristics of individual criminals and gangs. The
criminals may be from different countries, nations and
cities. As a result, an investigation must cover multiple
offenders who commit criminal activiti es in di fferent
places at di fferent times. This can be difficul t based on
limited investigation resources.
Chen et al (2004b) propose a sol uti on for various data and
technical chall enges, in the form of the development of
“Intelligence and Security Informatics” (ISI). The mai n
objective of this ISI is the “development of advanced
information technologies, systems, algorithms, and
databases for national securi ty related appli cations,
through an integrated technol ogi cal, organi zati onal , and
pol icy based approach” (Chen et al., 2003a).
M ethodology

The study anal yzes the real time data (six years) of pol ice
cri mes. Data was taken from the Federal Bureau of
Statistics of Paki stan. The data comprised of cri me
stati stics and numbers of pol ice stations. The variabl es
taken for study were: time peri od of si x years, cri me
agai nst property, cri me agai nst person, geographical
ELLAHI AND MANARVI 203

locati on and number of pol ice stations. Crime analysis can
be done in respect of population but unfortunately the
data for the population of Paki stan is 10 year old census
data. Secondly, the cri me data of most recent years could
not be retrieved.
The purpose is to explore the trend of cri mes over the last
si x years in each province of Paki stan, correl ati on of cri mes
agai nst the person, wi th cri me agai nst property; the most
probable area for cri me rate, required acti ons of pol ice and
reduction of cri mes at the nati onal level. Data was
analyzed using stat pro for excel software. Trends and
probable areas of cri me were analyzed using time series
plots.
Data Analysis

Il lustrated bel ow are trends of cri me agai nst persons and
property in all four provinces over the si x year peri od of
time. In Fi gure 1 and Fi gure 2 the offence years are shown
on the x axi s while the cri me stati stics are shown on the y
axi s.
The fi gures gi ve information about the most probable area
for cri mes among four provi nces as well as the increasing
or decreasing trends of cri mes over time. This point is
more chall enging for pol ice. From the probabl e area point,
the trend shows that the highest cri me rate was in Sindh
and Punjab (both cri me agai nst person and property).
NWFP is third and Balochi stan stands last. This poses an
important question for Sindh and Punjab pol ice
204 CRIME DATA MINING IN PAKISTAN

Figure 1
Time Series Plot of Crime Against Property in all Provinces

Figure 2
Time Series Plot of Crime against Person in all Provinces
ELLAHI AND MANARVI 205

departments. Increase in cri mes agai nst the person can be
for several reasons. The movement of NWFP and
Balochi stan cri merates over the years seems to be smooth.
Required Actions of Police in this Scenario

These results show that pol ice have several chall enges in
all provinces. These are the detection of cri me offenders,
sol ving all the reported cases and the prevention of cri mes.
In this scenario thepol ice haveto:
1) Quit all the traditional and inefficient practices (incidents
of neglect, incompetence, inefficiency, arbi trariness,
inadequate or no response to citizens’ requests for help to
institutionalized abuse of power and widespread resort to
hi gh handednessand corruption).
2) They have to look into more advanced techniques to hel p
in handling crimethreats.
3) To keep a strict check and balance in both provinces
regardi ng migrants.
4) To weaken the most wanted and powerful criminal
groups.
5) To focus on weakening the network of these activegroups
especiall y in Sindh.
6) To find out the possi ble linkages among the criminal
groups of both provinces.
Cor r elation between Cr ime Against Per son and Cr ime
Against Pr oper ty in High Potential Cr imes Pr ovinces

As Sindh and Punjab have the most potential area for
cri mes, an effort has been made to fi nd out correl ations
between crimes agai nst property and crimes agai nst

206 CRIME DATA MINING IN PAKISTAN

Figure 3
Correlation between Crime against Person
and Crime Against Property in Sindh
















Figure 4
Correlation between Crime against Person
and Crime against Property in Punjab









Crime
Against
Property
Crime
Against
Property
ELLAHI AND MANARVI 207

person. The correl ati on is 0.8 for both which is highly
si gnificant (see Fi gures 3 and 4, respectivel y). An
assumpti on can be taken that most of car thefts are used as
means to commi t cri mes agai nst persons. Similarl y the
murder and attempt to murder can be related to dacoi ty. It
means that in dacoi ty the chance of murder or attempt to
murder can beincreased.

Required Actions of Police in this Scenario

For the pol ice, it indicates that the both categories of
cri mes can be interli nked. For this purpose a strong
networking is necessary. Police should mai ntai n an
effi cient data base, especially a geographi cal information
system which is able to forecast the most probable area for
the next cri me attempt. It can help pol ice to prevent cri me
by tighteni ng securi ty. The correlation between total
cri mes of two provinces is positive (0.843). (Refer to Fi gure
5). The trend li ne shows the increasing trend in both
provi nces over thetime.
This si tuation requi res a strong col laboration and
coordinati on between pol ice departments of both
provinces. They have to stri ctly check the incoming,
outgoing mi grants as well as the vehi cles. The supply of
weapons and explosives has to be carefully checked
especial ly i n the remote areas. It also requi res keeping
check on communi cati on medi a especial ly cells phone
Sims. It requires col laborati on wi th telecom agents. To
assess the nature, extent, and distribution of cri me in one
provi nce is requi red in order to effi ciently and effectively
all ocateresources and depl oy personnel in both provinces.
208 CRIME DATA MINING IN PAKISTAN

Figure 5
Correlation between Total Crime in both Provinces
The stati stics show the number of pol ice check posts
present all over the provinces. This demands a need for
adequatepol ice forceor personnel . SeeTable 1 and Fi gure
6.
Table 1
Statistics of Police Centers

Province

Division
No of
Police
Stations
in 2006
No of
Police
Stations
in 2007
No of
Police
Posts
in 2006
No of
Police
Posts
in 2007
Punjab 8 497 629 206 196
Sindh 5 465 436 324 345
NWFP 7 213 214 231 545
Balochistan 6 141 139 215 332
Islamabad 13 15



SINDH
ELLAHI AND MANARVI 209

Figure 6
Forecasting Trend for Pakistan


















Requi r ed Acti ons of Pol i ce Admi ni str ati on i n thi s Scenar i o

1. Se¡arale division for force on enlry and exil ¡oinls of lhe
cily (Terrorism ¡revenl division).
2. To ¡rovide advanced gadgels lo delecl ex¡Iosives.
3. To kee¡ a baIance belveen lechnoIogicaI race of criminaIs
and ¡oIice.
4. To ¡rovide adequale melaI deleclors and vaIklhrough
gales, enough buIIel¡roof |ackels, shieIds, ¡ads and
heImels.
5. Adequale and live lraining (for exam¡Ie, lraining
moduIes, vorksho¡s).
6. Reformalion of currenl lraining syslems. Il shouId lrain
minor lo ma|or issues such as (i) Hov lo besl idenlify a
sus¡icious ¡erson (ii) Hov lo idenlify a vehicIe being


Total
Crimes
210 CRIME DATA MINING IN PAKISTAN

used in a suspected criminal activity (iii) What to do
about suspi cious people loitering on your street (iv) How
to identify stolen merchandise (v) How to recogni ze an
auto theft in progress (vi) How to recognize a burglary in
progress.

Paki stan has to look not only from a technological
advancement perspective but also to systems and pol icy
view. Following are the mai n areas which need strong
attention:
Law Enforcement Modernization

1. To changetheway thepoliceareoperati ng.
2. To develop a sub culture of professional polici ng, trained
and equipped to uphol d the rule of law by shifting from
more than century old oppressive pol icing practi ces to
community policing.
3. To rei nvent the police partnership with citi zens and
communities.
4. To enter into a customer service contract with the people of
Pakistan, with a new guarantee of more responsive and
accountablepolicing.
5. To implement ideas that work and get rid of those that did
not.
6. The police hierarchy should be made responsible not
merel y for the organization and theinternal administration
of the force, but also for other matters connected with
maintenance of law and order.
7. To take necessary steps for rendering the police
professi onally competent, operationally neutral,
functionally cohesiveand responsiblefor all its actions.
ELLAHI AND MANARVI 211

8. To implement the new Police Ordinance (2002). The new
Poli ce Ordinance will lead to efficient police operati ons,
better quality decision making, improved discipline of the
Force, and revamping of internal accountabi lity
mechanisms.
9. To establi sh publ ic safety commissions at national,
province and district levels crucial to bringing police under
a system of external accountability that enjoys publ ic
confidence.
10. To improve the quali ty of both investigation and
prosecution, in additi on to introduci ng a system of check
and balance.
11. To create insti tutional structures that ensure pol itical
neutral ity and democrati c control of thepolice.
12. To establi sh the recruitment and selection system of
personnel based on merit.
Soci al Refor mat i on
A national acti on plan agai nst viol ence is necessary. This
plan should be capable of col lecti ng data on cri mes,
defi ni ng prioriti es for, and support research on the causes,
consequences and prevention of cri mes, promoting
primary prevention responses and integrati ng cri me
prevention into social and educational pol icies. However,
it cannot be ignored that the social perspective of cri me
gi ves evidence about injustice in law and publi c disorder
as major factors for increasing cri mes. What is needed
incl udes effective prison rehabil itation, community
pol icing, and identifying the conditi ons that faci litate
cri me and inci vili ty so that pol icymakers may make
informed deci sions about prevention approaches.


212 CRIME DATA MINING IN PAKISTAN

Conclusion

In this study, an overvi ew of crime data mi ning in a
developing country perspective is presented. The use of
data mi ni ng for identifying cri me patterns (trends and
forecasting) in Paki stani cri me data is analyzed.
Conventional investi gation approaches make inevitabl e
mi stakes in sol ving cri mes. Therefore data mi ning can be a
better sol uti on in order to reduce crime and produce more
accurate resul ts. The crime pattern analysis, however, can
only help the detectives, not repl ace them. Also data
mi ning is sensi tive to qual ity of input data that may be
inaccurate, have mi ssing information, be data entry error
prone, etc. The reduction of cri mes also needs the revision
of social policies.

ELLAHI AND MANARVI 213

Refer ences

Bl okland, A., Nagi n, D., & Nieuwbeerta, P., (2005). Li fe Span Offendi ng
Trajectori es of a Dutch Convi cti on Cohort. Cri mi nol ogy Vol ume 43
(Issue 4), 919 954.
Cope, N., (2004). Intel l i gence Led Pol i cing or Pol i cing Led Intel l i gence?
Integrati ng Vol ume Crime Anal ysi s i nto Pol i cing. The Briti sh Journal of
Cri mi nol ogy Vol ume 44 (Issue 2), 188 203.
Dahbur, K., & Muscarel l o, T., (2003). Classi fi cati on System for Seri al Cri mi nal
Patterns. Arti fi ci al Intel l igence and Law Vol ume 11 (Issue 4), 251 269.
D.E. Brown, S.C. Hagen., (2003). Data associ ati on methods wi th appl ications to
l aw enforcement. Decisi on Support Systems, 34 (4): 369– 378.
De Bruin, J., Cocx, T., Kosters, W., Laros, J., & Kok, J., (2006). Data Mi ni ng
Approaches to Cri mi nal Career Anal ysi s. In C. Cli fton, & N. Zhong
(Eds.), 6th IEEE I nternati onal Conference on Data Mi ni ng (pp. 171 177).
Hong Kong: IEEEComputer Society.
D.J. Icove., (1986). Automated cri me prof il i ng. Law Enforcement Bull eti n, 55: 27–
30.
Ewart, B.W., Oatl ey, G.C., & Bum K., (2004). Matchi ng Crimes Usi ng Burgl ars
Modus Operandi : A Test of Three Model s. Forthcoming.
Goldberg, H., & Wong, R., (1998). Restructuri ng Transacti onal Data for Li nk
Anal ysi s i n the Fi nCEN AI System. I n D. Jensen, & H. Goldberg (Eds.),
AAAI Fal l Symposium (pp. 38 46). Orl ando, FL: AAAI Press.
J. Han and M. Kamber., (2001). Data Mi ni ng: Concepts and Techni ques, Morgan
Kaufmann.
Ki ngston, J., Burkhard, S., & Vandenberghe, W., (2004). Towards a Fi nanci al
Fraud Ontology: a Legal Model l ing Approach. Arti fi ci al Intel l igence and
Law Vol ume 12 (I ssue 28), 419 446.
M.Chau, J.J. Xu, and H. Chen., (2002). Extracting Meani ngful Enti ti es from Pol i ce
Narrati ve Reports, Pr oc. Nat’l Conf. D i gi t al Gover nment Resear ch, Digi tal
Government Research Center, pp. 271 275.
214 CRIME DATA MINI NG I N PAKI STAN

OalIey, G.C., MacInlyre, }., Ivarl, ß.W., & Mugambi, I., (2002). SMART e
for Decision Makers KDD Ix¡erience. Kncu|c!gc Basc! Sqsicns, 323-333.

Iease, K., (2001). Whal lo do aboul il` Lels lurn o our minds and GIS. In:
Ma¡¡ing and AnaIysing Crime Dala - Lessons from Research and
Iraclice, A., Hirsch d & K, ßovers (Ids.). TayIor and Irancis, London
and Nev York, ¡¡. 225-237.

Sahilo, Irfan (2009). Im¡Iemenlalion of DigilaI and e. Invesligalion Techniques in
Lav Inforcemenl Agencies in Iakislan. accessed from:
66_index.hlmI

Song Lin, DonaId I. ßrovn., (2006). An oulIier-based dala associalion melhod for
Iinking criminaI incidenls. Decision Su¡¡orl Syslems , 41(3): 604-615.

SuddIe Shoaib Muhammad., (2008). Reforming Iakislan IoIice: An Overviev,
Accessed from: vvv.unafei.or.| f

TiIIey, N. , (2005). Communily IoIicing, IrobIem-Orienled IoIicing and
InleIIigence-Led IoIicing. In T. Nevburn (Id.), Handbook of IoIicing
(¡¡. 311-339). IIeymoulh Devon: WiIIian IubIishing.

Thonglae, I., Srisuk, S., (2008). An AnaIysis of Dala Mining A¡¡Iicalions in
Crime Domain. Com¡uler and Informalion TechnoIogy Worksho¡s,
2008. CIT Worksho¡s 2008. IIII 8lh InlernalionaI Conference,
Iage(s):122 ÷ 126.