You are on page 1of 33

Group Members :

Amit kumar
Gokulahasan
Nishanthi
Rajkumar
What is a crime?
 The breach of one or more rules or laws for which
governing authority via police power may ultimately
prescribe a conviction.
 It is injurious to the general population or the state.
 So, crime prevention and identifying the criminals is
the necessity in today's society.
Major challenges
 All law-enforcement and intelligence-
gathering organizations are currently facing
problems of accurately and efficiently
analyzing the growing volumes of crime data
 Different modes, patterns, cross-border
operations, technologically advanced crimes
are difficult to track and solve the case.
 Investigation of the crime takes longer
duration due to complexity of issues.
What is data mining?
“Data mining is a collection of techniques for efficient
automated discovery of previously unknown, valid,
novel, useful and understandable patterns in large
databases. The patterns must be actionable so that
they may be used in an enterprise’s decision making
process.”
Advantages of data mining
 Lot of permutations & combinations can be
incorporated in the software
 Less time consuming and better accuracy
 Installing and running the software costs much
less than hiring personnel
 Different data mining techniques or combination
of some can be incorporated in one assignment
 Advancement in data mining field is yielding
better and better results
Which model suits the process of criminal
identification?
 Different law-enforcement agencies are involved in
investigation of different kinds based on severity and
jurisdiction of crime.
 Researchers have developed various automated data
mining techniques for both local law enforcement and
national security applications.
Objective of Crime Data Mining:
 Using Data mining techniques to aid
analysis of data related to crimes
 Extracting named entities from
narrative reports
 Detecting deceptive criminal identities
 Identifying criminal groups and key
members
Entity extraction
 used to automatically identify persons, addresses,
vehicles, and personal characteristics from police
narrative reports
 subsequently helps in grouping similar activities
by criminals and tracing their behavior
 Its performance depends greatly on the availability
of extensive amounts of clean input data.
Clustering techniques
 group data items into classes with similar characteristics to
maximize or minimize intraclass similarity
 use the statistics-based concept space algorithm to
automatically associate different objects such as persons,
organizations, and vehicles in crime records
 link analysis techniques to identify similar transactions
 It can automate a major part of crime analysis but is limited
by the high computational intensity typically required
Association rule mining
 discovers frequently occurring item sets in a database and
presents the patterns as rules
 application in network intrusion detection to derive
association rules from users’ interaction history, detection
of intruders’ profiles to help detect potential future
network attacks.
 Similar to this sequential pattern mining can be applied to
find patterns.
 Performance of these techniques relies on the accuracy and
richness of available data .
Deviation detection
 Used to overcome the deviation in the data produced by
the criminals so it’s also called outlier detection.
 Applicable in fraud detection, network intrusion detection,
and other crime analyses
 But identifying the incorrect data is itself a tedious job.
Classification
 finds common properties among different crime entities
and organizes them into predefined classes
 Applicable in identify the source of e-mail spamming based
on the sender’s linguistic patterns and structural features
 used to predict crime trends, classification can reduce the
time required to identify crime entities
 Performance is dependent on richness of data
String comparator
 compares the textual fields in pairs of database records and
compute the similarity between the records
 applicable in detect deceptive information such as name,
address, and Social Security number in criminal records
Social network analysis
 Explains the roles of and interactions among nodes in a
conceptual network
 Used to construct a network that illustrates criminals’
roles, the flow of tangible and intangible goods and
information, and associations among these entities
 In-depth analysis can reveal critical roles and subgroups
and vulnerabilities inside the network
Caution
 Entity Extraction & Sequential Pattern Mining –
requires rich data for accuracy
 Clustering Techniques & String comparator – High
computational intensity
 Deviation Detection – Appear to be normal
 Classification – Predefined classification scheme
 Social Networking – Low profile
Crime data mining framework
Identifies relationships between techniques applied in
criminal and intelligence analysis at various levels
Case 1: Named Entity Extraction
 36 narcotics related cases-AI entity extractor
 3 steps
-identifies noun phrases
-calculates a set of feature scores for phrases
-predicts the most likely entity type
 Entities- names, addresses, vehicles, narcotics names,
physical characteristics
Case 2: Deceptive entity detection
 Criminals provide false information about themselves
 This creates redundancy in the database
 Makes probing into further details about them,
difficult
An Alternative Analysis
 Other techniques that
can be utilized

 Entity extraction
 Association rule mining
combined with outlier
detection
Criminal- Network Analysis
 Problems: Drug, Cybercrime, Terrorism etc.
 Clue: Criminals often develops networks in which they
form groups or teams to carry out various illegal
activites.
Objective
 Primery: To identify subgroups and key members in
the criminal networks and studying interaction
patterns
 Secondary: To develop effective strategies for
disrupting the networks
Data Collection
 272 Tucson Police Department incidents summaries
 Involving 164 crimes
 Committed form 1985 through May 2002
Methods and Techniques
 Concept Space (Clustering)
To extract criminal relations and create a likely
network of suspects.
 Co – Occurrence Weight – To find the strength
 Hierarchical Clustering – To partition the network into
subgroups
 Block Modeling – To identify interaction patterns
between these subgroups
For Key Member….
 Centrality measures
 Degree
 Betweenness
 Closeness
164 Criminals
Sub Groups
Validation
 2 hr field study with
 3 Tucson Police Department domain experts
 Evaluated the analysis’s validity
 The analysis was valid.
Advantages
 Increase crime analysts’ work productivity
 Visualize Criminal Networks
 Risk is reduced
 Time is saved-Police can use it for other valuable tasks
 Reduce error
 Effective strategies can be formulated to disrupting
criminal networks

 PS : Only Static Network is visualized


Emerging Field
Techniques Application
Entity Extraction To analyze the behavioral pattern of
serial offenders

Crime Association & Clustering Reveals the identities of cyber-criminals


who use the internet to spread illegal
messages or malicious code

Machine Learning Algorithms To predict crimes by analyzing factors


• ID3 such as Time, Location, Vehicle,
• Neural Networks Address. Physical characteristics, and
• Support Vector Machine Property
• Genetic Algorithms
Applying the technique
 Using Entity Extraction
 Recognizing a pattern of deception
 Using Association Rule
 Arriving at a general rule for a deceptive entry
 Using Outlier Detection
 Spotting the odd profile
Approach adopted in the case study:
 They have adopted an experimental analysis and a
little bit of simulation and they have interpreted from
the conclusions there from
 They have explored the system of analysis by trying to
solve the problems using newer methods and
approaches of data mining
Conclusion:
 Crime data has increased to very large quantities
running into zota bytes(1024 bytes) requiring
advanced techniques such as data mining
 Data mining has immense potential for crime data
analysis
 As is the case with any other new technology, even
DM has its own limitations as of now
 But as the technology advances, it is going to be
one of the most powerful tools of data analysis