You are on page 1of 19

Tracking Organized Crime Using

Machine Learning and Text Analysis

Javier Osorio
University of Arizona
School of Government and Public Policy

Mutlistakeholder Event on Strategies


to Prevent and Combat Organized Crime
Latin America and the Caribbean

United Nations Office on Drugs and Crime, UNODC.

Panama City, Panama, March 30-31, 2022


Tracking Organized Criminal Groups
Policy Evaluation
• Requires fine-grained data
• Evaluate the impact of policy interventions

Strategy effectiveness?
• Align activities, capabilities, and tactics to an end
• We need high-quality, valid, and verifiable data

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


Challenges
Nature of organized crime
• Clandestine activities are not systematically observable
• Highly dynamic and adaptative

Violent presence of organized criminal groups


• Violence is a limited measure of criminal presence
• However, it is measurable
Natural Language Processing and Machine Learning
• Powerful tools for conflict research
• Initial cost of development
• Process massive amounts of information
• Rapid deployment and update
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
Information sources
Closed Sources Open Sources

• Government sources • News and NGO reports


• Crime and intelligence • Massive amounts of
reports, and 911 calls information
• Relatively structured text • Unstructured text
• Often lack the capacity • May be subject to
to process and analyze coverage bias
• Lack of transparency • Is readily available and
prevents validity checks can be validated
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
NLP and ML
Natural Language Processing NLP Limitations
• Analyze large volumes of text • Unstructured text can be
• Document, paragraph, sentence level incredibly challenging
• Tasks: • Rule-based approaches may be
• Named Entity Recognition (NER)
• Actors
costly to develop and to update
• Locations • Works better for explicit tasks
• Syntactic relationships: rather than for implicit tasks
• Who?
• Did what? • Multi-lingual challenges
• To whom?
Machine Learning ML Limitations
• Humans provide annotated examples • Training data is costly
to the ML algorithm to learn from • Training data bias or error may
• Variety of ML algorithms for affect ML performance
• Classification • ML algorithms can be opaque
• Prediction • Multi-lingual challenges
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
NLP: Eventus ID
Eventus ID
• Rule-based approach
• Process text in Spanish
• Actors, actions, locations
• Who?
• Did what?
• To whom?
• When?
• Where?
“Hitmen of the Sinaloa Cartel ambushed a military
convoy in Culiacan, Sinaloa”

• Who: Hitmen of the Sinaloa Cartel


• Did what: ambushed
• To whom: military convoy
• When: Culiacan, Sinaloa
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
ML: ConfliBERT
Training data ML performance
ML classification
• Variety of ML algorithms
• Put them to compete

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


ML: ConfliBERT
Training data ML performance
ML classification
• Variety of ML algorithms
• Put them to compete

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


ML: ConfliBERT
Training data ML performance
ML classification
• Variety of ML algorithms
• Put them to compete

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


ML: ConfliBERT
Training data ML performance
ML classification
• Variety of ML algorithms
• Put them to compete

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


ML: ConfliBERT
Training data ML performance
ML classification
• Variety of ML algorithms
• Put them to compete

BERT
• Google’s pre-trained language model
• Learn from billions of documents
• Generic documents limited performance

ConfliBERT
• Domain specific: conflict + violence + crime
• English sources
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
NLP and ML applications
Colombia
• NLP tools to track the violent presence of armed actors

Mexico
• ML and NLP to track violent presence of criminal groups

Latin America
• Organized crime activity

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


Armed Actors in Colombia
• ViPAA: www.colombiaarmedactors.org
• Source: CINEP (1988-2020)
• Violent presence of armed actors
• Government forces
• Insurgents
• Paramilitaries
• Criminal groups
• Actor geo-location
• Eventus ID
• Municipality-day
• Dictionary: 8,500 actors
• Dynamic GIS interface
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
Organized Criminal groups in Mexico
 94.9%
• OCVED: www.ocved.mx
• Sources (2000-2019)
• 105 newspapers
• Government agencies
• ML Task: Relevance

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


Organized Criminal groups in Mexico
 94.9%
• OCVED: www.ocved.mx
• Sources (2000-2019)
• 105 newspapers
• Government agencies
• ML Task: Relevance

• Criminal Organizations
• 10 main groups
• 200 subgroups and gangs
• Actor geo-location
• Eventus ID
• Municipality-day
• Dictionary: 7,900 actors
• Dynamic GIS interface
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
Organized Crime activities in LA
• Source:
• Insight Crime
• July 2004 to March 2020
• 13,000 news articles in English
• ML Task:
• Multi-label classification

Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022


Organized Crime activities in LA
• Source:
• Insight Crime
• July 2004 to March 2020
• 13,000 news articles in English
• ML Task:
• Multi-label classification
• Training data
• 2,533 news articles
• 16 distinct crime categories
• 41 criminal groups
• 33 countries
Type BERT ConfliBERT ConfliBERT
• ConfliBERT continual from scratch
• Compare BERT vs ConfliBERT Uncased 68.57 69.09 68.68

• ↑categories = ↑ difficulty Cased 67.83 69.15 69.47


• ConfliBERT is more efficient
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
Strategic Principles
ML and NLP tools and techniques can help to:
organized crime from Track criminal organizations to:
(re)infiltrating • Identify hot and cold spots
PREVENT communities, the economy • Determine characteristics of cold spots
and political • Design interventions to prevent contagion
institutions • Evaluate preventive interventions
Track criminal organizations to:
• Evaluate the causal impact of punitive
organized criminal
PURSUE groups and their illicit
policy interventions
• Evaluate unintended consequences of
gains
those interventions
Track criminal organizations to:
• Monitor organized criminal presence and
vulnerable persons and
PROTECT victims from (further) harm
dynamic behaviors
• Forecast risks to protect vulnerable population

Establish strategic partnerships to:


• Develop effective and timely systems to
partnerships and
PROMOTE cooperation at all track criminal organizations
• Multi-stake holder inputs to define and
levels
evaluate strategies
Osorio - UA Tracking Organized Crime Using ML & NLP UNODC 2022
Thank you!
Javier Osorio
University of Arizona
School of Government and Public Policy
@javier_osorio

You might also like