You are on page 1of 20




“Title: Crime analysis and forecasting using machine learning techniques.””
Submitted in the partial fulfillment of requirements for the

Bhumika S V 4BD20CS022
Deeksha B P 4BD20CS028
Meghana S M 4BD20CS056
Nagachaitra V V 4BD20CS060


Dr. Gururaj T Ph.D., M.Tech., Prof. Vaishnavi A I M.Tech.,

Associate Professor Assistant Professor
Main Guide Co-Guide

Bapuji Institute of Engineering and Technology
Department of Computer Science and Engineering
Bapuji Institute of Engineering and Technology
Davanagere – 577004

Department of Computer Science and Engineering


This is to certify that Bhumika S V, Deeksha B P, Meghana S M, Nagachaitra V V bearing

USN 4BD20CS022, 4BD20CS028, 4BD20CS056, 4BD20CS060 respectively of Computer
Science and Engineering department have satisfactorily submitted the Project Phase-I report entitled
“Crime analysis and forecasting using machine learning techniques” for 7th SEM PROJECT
PHASE-I (18CSP77). The project report has been approved as it satisfies the academic
requirements for the year 2023-24.

__________________________ ______________________
Dr. Gururaj T Ph.D., M.Tech., Prof. Vaishnavi A I M.Tech.,
Associate Professor Assistant Professor
Guide Co-Guide

Dr. Nirmala C R Ph.D
Head of Department

Signature of Examiners:
Date: 1.__________________________

Place: Davanagere 2.__________________________

Salutations to our beloved and highly esteemed institute, “BAPUJI INSTITUTE OF
ENGINEERING AND TECHNOLOGY” for having well-qualified staff and labs furnished with
the necessary equipment.

We express our sincere thanks to our resourceful guides Guide Name, Designation,
Department of Computer Science and Engineering, BI.E.T., Davanagere, and Co-Guide Name,
Designation, Department of Computer Science and Engineering, BI.E.T., Davanagere, who helped
us in every aspect of our project. We are indebted to her discussions about the technical aspects and
suggestions pertaining to our project.

We are grateful to Dr. Nirmala C R, Professor and H.O.D, Department of Computer Science
and Engineering, B.I.E.T., Davanagere, for endeavoring encouragement, facilities, and extended

We also express our wholehearted gratitude to our respected Principal, Dr. H B Aravind for
his moral support and encouragement.

We would like to extend our gratitude to all staff of the Department of Computer Science
and Engineering for the help and support rendered to us. We have benefited a lot from the feedback,
and suggestions given by them.

We would like to extend our gratitude to all our family members and friends especially for
their advice and moral support.

Bhumika S V (4BD20CS022)
Deeksha B P (4BD20CS028)
Meghana S M (4BD20CS056)
Nagachaitra V V (4BD20CS060)
Bapuji Educational Association (Regd.)
Bapuji Institute of Engineering and Technology, Davangere-577004

Vision and Mission of the Institute

“To be a centre of excellence recognized nationally internationally, in distinctive areas of
engineering education and research, based on a culture of innovation and invention.

“BIET contributes to the growth and development of its students by imparting a broadbased
engineering education and empowering them to be successful in their chosen field by inculcating in
them positive approach, leadership qualities and ethical values.”

Vision and Mission of the Computer Science and Engineering

“To be a centre-of-excellence by imbibing state-of-the-art technology in the field of Computer
Science and Engineering, thereby enabling students to excel professionally and be ethical.”

1. Adapting best teaching and learning techniques that cultivates Questioning and
Reasoning culture among the students.

2. Creating collaborative learning environment that ignites the critical thinking in students
and leading to the innovation.

3. Establishing Industry Institute relationship to bridge skill gap and make them industry
ready and relevant.

4. Mentoring students to be socially responsible by inculcating ethical and moral values.

Program Educational Objectives (PEOs):
PEO1 To apply skills acquired in the discipline of computer science and engineering for
solving Societal and industrial problems with apt technology intervention.

PEO2 To continue their carrier ion industry /academia or pursue higher studies and research.

PEO3 To become successful entrepreneurs, innovators to design and develop software

products and services that meets societal, technical and business challenges.

PEO4 To work in the diversified environment by acquiring leadership qualities with

effective communication skills accompanied by professional and ethical values.

Program Specific Outcomes (PSOs):

PSO1 Analyse and develop solutions for problems that are complex in nature but applying the
knowledge acquired from the core subjects of this program.

PSO2 To develop secure, scalable, resilient and distributed applications for industry and
societal Requirements.

PSO3 To learn and apply the concepts and contract of emerging technologies like artificial
intelligence, machine learning, deep learning, big-data analytics, IOT, cloud computing
etc for any real time problems.

Course Outcomes:
CO1: Demonstrate technical knowledge of their selected project topic by analysing different
software development process paradigms, software engineering principles and develop an ability
to apply them to software design of real-life problems.
CO2: Explore the problem identification, formulation, and solution by carrying out literature
CO3: Able to prepare synopsis by providing relevant scope of the project selected with proper
justification as per the standard format shared.
CO4: Demonstrate the ability to communicate and coordinate effectively as a team member, so
that projects will be completed in a timely manner that caters to enhance their lifelong learning.
Crimes are treacherous and common social problem faced worldwide. Crimes affect the
quality of life, economic growth and reputation of nation. With the aim of securing the society from
crimes, there is a need for advanced systems and new approaches for improving the crime analytics
for protecting their communities. If we can come up with ways to predict crime in detail before it
occurs or come up with a machine that can assist police officers, it would lift the burden of police
as they can anticipate when and where crimes might occur, allowing them to take proactive measures
for preventing the crimes. Thus, we put forward a system that analyses, recognizes and forecasts
different crime probabilities in a given area



1.1 Introduction 1


2.1 Literature Survey Review 2-4

2.1 Literature review summary 4-5

2.2 Existing system 5

2.3 Problem statement 5

2.4 Proposed system 5-6

2.5 Objectives 6


3.1 Software Requirements 7

3.2 Hardware Requirements 7




Sl. No Figure. No Description Page. No

01 4.1 Block Diagram 8

Crime Analysis and Forecasting using Machine Learning Techniques

In the realm of crime analysis, the project titled "Crime Analysis and Forecasting Using
Machine Learning Algorithms" stands as a pioneering effort to harness the power of advanced
data analytics and machine learning methodologies. Focused on an extensive dataset derived
from Kaggle, specifically tailored to the Indian context, this initiative endeavors to dissect
historical crime data with a multifaceted approach. By applying clustering algorithms, the
project aims to unearth spatial patterns and discern crime hotspots, providing law enforcement
agencies with critical insights into historical crime dynamics. Beyond mere analysis, the project
incorporates the formidable Random Forest Classifier to classify and categorize crimes based
on a diverse array of factors. This classification approach adds granularity to the understanding
of crime types, empowering authorities to tailor their responses to specific criminal activities
A transformative dimension of the project involves the application of time series
algorithms. This step is pivotal in predicting future crime trends by deciphering temporal
patterns. The aim is to equip law enforcement agencies with proactive insights, enabling
strategic planning to prevent and combat specific crimes. Rooted in the acquisition of data from
Kaggle, the project ensures a robust foundation for analysis, covering a comprehensive
timeframe to capture the evolution of crime trends over the years. Through this project, we
aspire not only to analyze historical crime data but also to forecast and anticipate future
criminal activities, ultimately contributing to more informed and effective law enforcement
strategies. As we navigate through the intricacies of crime analysis using cutting-edge machine
learning algorithms, the project envisions a future where data-driven insights play a pivotal
role in creating safer and more secure communities.

Crime Analysis and Forecasting using Machine Learning Techniques

1. Ganesh Koka, et al. “Prediction of Crime Data using Machine Learning Techniques”,
2023. International conference on Sustainable Computing and Data Communication
System(ICSCDS). IEEE, March 2023.
This paper addresses the profound societal impact of crime, spanning from violent acts
like murder to less severe offenses such as burglary. Focusing on the challenges associated with
maintaining accurate crime records, it highlights issues like inconsistency in tracking methods
across states and resource limitations leading to incomplete information. Proposing a model,
this study aims to enhance criminal investigations with a focus on crime analysis and
prediction. Utilizing various pre-processing techniques, the paper processes datasets through
machine learning algorithms like Support Vector Machine, Random Forest Classifier, Decision
Tree, and K-Means. In summary, the paper emphasizes the potential of crime data analysis and
prediction in offering valuable insights for effective crime prevention strategies.
Advantages: Utilizing machine learning algorithms offers a data-driven and potentially more
accurate method for understanding and predicting criminal activities.
Limitations: While machine learning algorithms provide valuable insights, their effectiveness
relies on the quality and representativeness of the data, and they may not account for all factors
influencing criminal activities.
2. Akshara Dilli, et al. “Machine Learning based advanced Crime Prediction and
Analysis”, 2023. International conference on Sustainable Computing and Data
Communication System(ICSCDS). IEEE, March 2023.
This paper addresses the formidable challenge of crime prevention in society,
emphasizing its importance as a visible facet of civilization. Focused on machine learning for
crime prediction in India, the study employs various algorithms like Naive Bayes, Support
Vector Machine, and Random Forest Regression. Notably, the proposed technique achieves an
impressive 99.9% classification accuracy on test data, surpassing earlier models and
demonstrating greater predictive power, especially when compared to baseline studies focused
on violence-based crime datasets. The findings underscore the alignment of criminological
theories with empirical evidence, showcasing the paper's contribution in providing an effective
method for potential crime predictions.

Crime Analysis and Forecasting using Machine Learning Techniques

Advantage: High 99.9% accuracy in crime prediction.

Limitation: Challenges adapting to dynamic, real-world crime scenarios.
3. P. Kirubanantham, et al. “Crime Analysis and Prediction using Machine Learning
Algorithms” 2022 1st International Conference on Computational Science and
Technology(ICCST). IEEE, November 2022.
This paper explores the widespread use of machine learning algorithms in data analysis
and highlights their significance in various fields. It introduces a novel approach for predicting
and classifying crimes using data mining algorithms, specifically K-Nearest Neighbour,
Logistic Regression, and Support Vector Machine. Through the application of these techniques,
along with thorough pre-processing of datasets, the research achieves highly accurate
predictions, surpassing current models in effectiveness.
Advantages: The application of these techniques, coupled with pre-processing, yields highly
accurate predictions, surpassing current models in effectiveness.
Limitations: However, potential limitations may stem from the specificity of the chosen
algorithms, affecting their adaptability to diverse crime scenarios or variations in data patterns
over time.
4. Rafia Mumtaz, et al. “Crime classification using Machine Learning and Data
Analytic”, 2022. 19th International Conference on Smart Communities: Improving
Quality of Life Using ICT, IoT and AI (HONET). IEEE, 2022.
This paper explores the short and long-term impacts of crimes and underscores how
modern law enforcement, through data analytics and machine learning, aims to prevent crimes.
It addresses the deficiency in crime reporting systems in developing countries like Pakistan,
emphasizing the importance of improving these systems for effective crime data analysis. The
study employs three machine learning algorithms—Naïve Bayesian Classifier, Decision Tree
Classifier, and Random Forest Classifier—to predict primary crime types, revealing potential
data issues specific to Pakistan. The Random Forest Classifier outperforms Naïve Bayes and
Decision Tree with 55.03% accuracy. The research suggests that feature limitations contribute
to prediction inaccuracies, proposing the extraction of more features for enhanced accuracy.
Ultimately, the study highlights the absence of structured crime data in Pakistan and offers
recommendations for its application in the country.
Advantage: The study showcases machine learning's potential, achieving 55.03% accuracy in
predicting primary crime types, offering insights for developing crime prediction systems.

Crime Analysis and Forecasting using Machine Learning Techniques

Limitation: Feature constraints contribute to low prediction accuracy; addressing this

limitation by extracting more features is recommended for improved results.
5. Avani Vaishnav et al. “Crime Analysis in India with Interactive Visualization” 2021.
International Journal of Computer Applications. September, 2021.
The paper presents a computational model utilizing machine learning to analyse the
interplay between education, poverty, unemployment, and crime rates in each state of India.
The study employs data from reliable government sources, focusing on socioeconomic
indicators. The proposed model uses machine learning algorithms, including simple linear
regression and multiple linear regression, for analysis and visualization. The literature review
highlights previous studies on crime prediction using various algorithms and datasets. The
methodology involves data collection, pre-processing, regression analysis, and visualization.
Visual analysis methods are used to understand patterns of crime in the different states of India.
The predictive model aims to enhance understanding of the complex relationships between
socioeconomic factors and crime rates, offering valuable insights for crime prevention
strategies in India.
Advantage: The paper leverages verified government data, ensuring the authenticity of its
study on the correlation between socio-economic factors and crime rates in India.
Limitation: However, potential limitations may arise from the complexity of socio-economic
dynamics, impacting the model's ability to capture all nuances in the relationship between
education, poverty, unemployment, and crime.


The collective research investigates the multifaceted realm of crime and its societal
implications, addressing challenges in accurate crime record-keeping and emphasizing the
significance of improved data analysis for effective crime prevention. Utilizing diverse
machine learning algorithms such as Support Vector Machine, Random Forest, Naive Bayes,
and Decision Tree, these studies aim to predict and classify crimes, demonstrating significant
advancements in accuracy, surpassing existing models, and showcasing the potential for
enhanced crime prevention strategies. From tackling data inconsistencies and resource
limitations to exploring the interconnectedness between socioeconomic factors and crime rates,
these papers collectively underscore the pivotal role of machine learning in analysing crime
data. Furthermore, they offer insights into developing more efficient predictive models,
highlighting the need for refined data collection methodologies and feature extraction to

Crime Analysis and Forecasting using Machine Learning Techniques

address prediction inaccuracies and enhance crime management and prevention strategies in
various geographical and socioeconomic contexts.


In the existing system, crime analysis predominantly relies on traditional methods and
historical data to understand criminal patterns. Law enforcement and organisations use manual
processes to make sense of crime data, which may lead to inefficiencies and limited predictive
capabilities. Data analysis tools and machine learning techniques are not extensively utilised.
The identification of crime patterns is often reactive rather than proactive, making it
challenging to prevent criminal activities effectively. Additionally, the public lacks convenient
access to valuable insights on crime trends, which could help them make informed decisions
about safety in different areas. In this scenario, there is a noticeable lack of using technology
and advanced analytics for crime analysis and prevention.


There are crime issues in many cities, but it's not always obvious why some
neighbourhoods are more impacted than others. Past studies suggest demographics play a role,
but there's a lack of detailed analysis. We must ascertain how particular factors such as age,
income, and educational attainment, relate to crime. The challenge is that comprehensive
statistical analysis is missing from current research. To build a model for predicting which kind
of crime might occur at the given period based on demographics, we must collect relevant data
and discover the spatial temporal relationship with the crime data. We also need to determine
the distribution of crimes across various urban locations.


The proposed system introduces an innovative approach to crime analysis and
prediction by integrating data analytics and machine learning techniques. This system aims to
predict and analyse criminal behaviour patterns proactively, enhancing crime control in cities
and regions. K-means clustering is used to identify crime hotspots based on historical data. The
future cluster trends for the various places are predicted using the random forest and decision
tree techniques. To forecast the number of crimes in a given location during a specified time
period, linear regression is utilized. The whole process involves three main phases: data

Crime Analysis and Forecasting using Machine Learning Techniques

preprocessing, model training, and prediction/analysis. By using open-source crime records

such as those available on Kaggle, the system aims to understand crime patterns, determine
severity, and identify when and where crimes occur. Visualization techniques are used to reveal
crime trends, and a wide range of input features, including time, location, and crime type, are
employed in the machine learning algorithms. The proposed system provides an effective and
data-driven approach to crime analysis, which can aid law enforcement and the public in
enhancing safety and crime prevention efforts.

• To provide model-based suggestions for efficient resource allocation and policing
strategies while prioritizing ethical considerations and responsible decision-making.
• To Provide intuitive visual representations enabling quick understanding of crime
patterns, aiding informed decision-making.
• To analyse the crime dataset to pinpoint crime hotspots and patterns, empowering law
enforcement agencies with data-driven insights for informed decision-making.

Crime Analysis and Forecasting using Machine Learning Techniques

3.1 Software requirements:
• OS: Windows

• Language: Python

• IDE: PyCharm

• Machine Learning Frameworks Scikit-learn, Matplotlib, Pandas.

and Libraries:

• Data Visualization: Power BI

3.2 Hardware requirements:

Memory(RAM): 4 GB
Processor: Quad-core

Crime Analysis and Forecasting using Machine Learning Techniques


Fig 4.1 Block Diagram

1. Data Acquisition: The data acquisition process involves obtaining information from
Kaggle, specifically targeting a defined timeframe, such as 2001 to 2021. The data
encompasses various types, notably.
1. Crime Details: Information regarding the nature of crimes committed, including
the type of crime, date, and time of occurrence, alongside the geographic location,
which includes district and state details. Additionally, demographic data concerning
both victims and perpetrators is gathered.
2. Total Number of Crimes: This involves aggregating the overall count of recorded
crimes within the specified timeframe. This data can be categorized based on districts,
types of crime (e.g., Murder, Attempt to Murder, Rape, Abduction, Robbery, among
others), and other relevant classifications.
2. Data Preprocessing and Feature Engineering: Data preprocessing involves several steps
to ensure the quality and usefulness of the data. They are:
1. Cleaning: Missing values are handled through imputation techniques, rectifying
inconsistencies, and standardizing data formats for consistency.
2. Feature engineering: This is then applied to enrich the dataset, creating new
indicators that signify crime severity and spatial or temporal patterns. These derived
features help in identifying crime hotspots, determining peak timeframes for criminal

Crime Analysis and Forecasting using Machine Learning Techniques

activities, and gauging the severity of different types of crimes. Overall, this process
optimizes the dataset for subsequent analysis, enabling better insights into crime trends
and aiding in the development of effective crime prevention strategies.
3. Clustering for Crime Category Creation and Region Segmentation: To categorize crimes
based on similarities in incident details and patterns, clustering techniques like K-means or
Hierarchical clustering are employed. These methods group similar crime incidents together,
forming distinct crime categories. The evaluation of these clusters involves metrics like the
silhouette coefficient or Calinski-Harabasz index, which assess the compactness and
separability of clusters, determining their effectiveness.
4. Classification for Crime Type Prediction: The classification takes place in several steps.
They are:
1. Algorithm Selection: Multiple supervised learning algorithms like Decision Trees
(for interpretability and feature importance insights) and Random Forests (for robust
performance and large dataset handling) are trained and compared.
2. Cross-validation: K-fold cross-validation is employed to ensure model
generalizability and prevent overfitting, dividing the dataset into K subsets and training
the model K times, each time using K-1 subsets for training and the remaining subset
for validation.
3. Evaluation Metrics: Model performance is evaluated using precision, recall, F1-
score, and Area Under ROC Curve (AUC) for multi-class classification. These metrics
offer insights into the model's accuracy, completeness, and ability to distinguish
between different crime types, ensuring a comprehensive assessment of its predictive
5. Time Series Analysis for Future Crime Forecasting: The various steps involved are:
1. Model Selection: Depending on data characteristics, appropriate time series models
are chosen: ARIMA for stationary data with seasonal trends, SARIMA for non-
stationary data containing seasonal components, and LSTM, a deep learning method
specifically designed for forecasting.
2. Model Training and Testing: Historical crime data is utilized to train the selected
models, and their forecasting accuracy is evaluated using unseen data points. This
process ensures the models' capability to predict future crime occurrences based on
learned patterns from historical trends.
3. Confidence Intervals: To quantify the uncertainty surrounding future predictions,
confidence intervals are computed around the forecasts. These intervals offer insights

Crime Analysis and Forecasting using Machine Learning Techniques

into the range within which future crime occurrences are expected to fall, aiding in
understanding the level of uncertainty associated with the forecasting models'
6. Implementation and Visualization:
1. Platform Development: Utilize POWERBI or Tableau to create a user-friendly
platform for exploring crime data, predictions, and forecasts.
2. Interactive Visualizations: Implement interactive charts and maps to communicate
insights effectively and aid in strategy development.
3. Communication and Strategy: Use visualizations to inform stakeholders and
shape informed crime prevention strategies.

10 | P a g e
Crime Analysis and Forecasting using Machine Learning Techniques

Predicting crimes before they happen is simple to understand, but it takes a lot more
than understanding the concept to make it a reality. The proposed system, underpinned by data
analytics and machine learning, heralds a new era in crime analysis and prediction. It empowers
law enforcement to transition from reactive to proactive crime control, equipping them with
tools for identifying patterns, assessing severity, and predicting criminal activities.
Simultaneously, it provides the public with valuable insights into crime trends, fostering
informed decision-making for safety. In conclusion, this proposed system represents the future
of crime analysis and prediction. Harnessing technology, data, and informed decision-making
to forge safer communities worldwide. It symbolizes hope in the face of rising crime,
illuminating a path toward a secure and prosperous future

11 | P a g e
Crime Analysis and Forecasting using Machine Learning Techniques

Ganesh Koka, et al. “Prediction of Crime Data using Machine Learning Techniques”,
2023. International conference on Sustainable Computing and Data Communication
System(ICSCDS). IEEE, March 2023.
Akshara Dilli, et al. “Machine Learning based advanced Crime Prediction and
Analysis”, 2023. International conference on Sustainable Computing and Data
Communication System(ICSCDS). IEEE, March 2023.
P. Kirubanantham, et al. “Crime Analysis and Prediction using Machine Learning
Algorithms” 2022 1st International Conference on Computational Science and
Technology(ICCST). IEEE, November 2022.
Rafia Mumtaz, et al. “Crime classification using Machine Learning and Data
Analytic”, 2022. 19th International Conference on Smart Communities: Improving
Quality of Life Using ICT, IoT and AI (HONET). IEEE, 2022.
Avani Vaishnav et al. “Crime Analysis in India with Interactive Visualization” 2021.
International Journal of Computer Applications. September, 2021.

12 | P a g e

You might also like