Professional Documents
Culture Documents
A report submitted in partial fulfilment of the requirements for the Award of Degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE DATA SCIENCE
on
“DATA ANALYSIS”
from
“CODSOFT”
on
“CREDIT CARD FRAUD DETECTION”
by
HARSH KUMAR
Roll. No.: 2201331549007
Under Supervision of
Mr. Raviraj Singh Kurmi
(Assistant professor, CSDS )
CERTIFICATE
This is to certify that the “Internship report” submitted by HARSH KUMAR (Regd.
No.:2201331549007) is work done by him and submitted during academic year 2023-2024, in partial
fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY in
COMPUTER SCIENCE AND ENGINEERING (DATA SCIENCE), at CODSOFT
TECHNOLOGIES pvt.Ltd, New Delhi.
First I would like to thank CODSOFT, New Delhi for giving me the opportunity to do an internship
within the organization.
It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge the help of
these individuals.
I am highly indebted to Director Dr. Vinod M. Kapse, for the facilities provided to accomplish this
internship.
I would like to thank my Head of the Department (HOD) Dr. Priyanka Chandani for her support
throughout my internship.
I would like to thank Mr Raviraj Singh Kurmi internship coordinator Department of CSDS (Data
Science) for their support and advices to get and complete internship in above said organization.
I am extremely great full to my department staff members and friends who helped me in successful
completion of this internship.
HARSH KUMAR
(0221DCSDS164)
ABSTRACT
CodSoft are IT services and IT consultancy that specializes in creating innovative solutions for
businesses. We are passionate about technology and believe in the power of software to transform
the world. Our internship program is just one of the ways in which we are investing in the future of
the industry.
At Cod Soft, we believe practical knowledge is the key to success in the tech industry. Our aim is to
help students lacking basic skills by offering hands-on learning through live projects and real-world
examples.
HARSH KUMAR 2201331549007 Page |5
At Cod Soft, we believe that collaboration is the key to success. Our internship program is designed
to help you build lifelong relationships with fellow interns, mentors, and industry experts. We're
proud to have a diverse community of passionate individuals who are committed to pushing the
boundaries of technology. Whether you're interested in front-end development, back-end
development, or UI/UX design, we have something for everyone. Join us today and become a part
of our vibrant community.
Organisation Information:
This ground up approach helps us deliver not only the solution to our clients but also add
value to At the core NANO MINDZ operates in three specific domains namely Software
Development, Website Design& Development and Geographic Information Services. We
also offer our services in building E-Commerce solutions, Search Engine Optimization
(SEO) and Database Administration services. Under each division we further provide
specific industry solutions on focused domains with cutting edge technologies. We
emphasize on building relationships with our clients by delivering projects on time and
within budget.
We follow a structured methodology for our projects which starts from designing the
solution to the implementation phase. Well planned Project reduces the time to deliver the
project and any additional ad-hoc costs to our clients, hence we dedicate majority of our
time understanding our clients business and gather requirements. This ground up approach
helps us deliver not only the solution to our clients but also add value to your investments.
Under each division we further provide specific industry solution on focused domains with
cutting edge technologies. We emphasize on building relationships with our clients by
delivering projects on time and within budget.
1. Introduction ....................................................................................................................12
2. Analysis ..........................................................................................................................14
4.4 DecisionTree_Classifier........................................................................................... 17
5. Coding… ........................................................................................................................19
6. Screenshots ................................................................................................................20-24
7. Conclusion ....................................................................................................................25
8. Bibilography .................................................................................................................26
The internship objectives for my role as a Data Science intern at Codsoft on the "Credit Card
Fraud Detection" project using the Kaggle dataset might encompass:
• Data Analysis Proficiency: Enhance skills in exploratory data analysis (EDA) to derive
insights and patterns from datasets.
• Performance Evaluation: Understand and apply various model evaluation metrics for
fraud detection, such as precision, recall, F1-score, and AUC-ROC.
• Communication and Presentation: Develop the ability to articulate and present complex
technical concepts to both technical and non-technical audiences.
¨ Data Pre-processing:
We'll start by discussing the importance of data preprocessing in credit card
fraud detection. Python libraries like Pandas and NumPy will be employed
to clean and transform raw transaction data, making it suitable for further
analysis.
¨ Feature Engineering:
Feature engineering is a crucial step in identifying patterns and anomalies
in the transaction data. We'll explore how Python's Scikit-Learn and
TensorFlow can help in creating meaningful features that enable effective
fraud detection.
Anomaly Detection:
Credit card fraud often involves rare and unusual transactions. We will
investigate how Python's libraries, like Scikit-Learn and Isolation Forest,
can be used for anomaly detection to identify fraudulent activities.
Existing System:
Proposed System:
Software Requirements:
• Operating system: Windows 11 / MacOS Ventura and above...
• Coding Language: Python Language
• Front-End: Visual Studio 2012 Professional / Jupyter Notebook / Google
Collab
• Dataset: Kaggle
Hardware Requirement:
• System: Any Good performance system.
•Internal-SSD: 256GB / 512 GB / 1TB.
• Ram: 8 GB / 16 GB
¨ Matplotlib:
Matplotlib is a popular Python library for creating 2D and 3D plots and
visualizations. It provides a wide range of tools and functions for creating
high-quality graphs, charts, and figures for data analysis, scientific research,
and data visualization purposes. Matplotlib is open-source and is widely
used in the data science and scientific computing communities.
¨ Pandas:
Pandas is an open-source Python library designed for data manipulation and
analysis. It provides easy-to-use data structures and functions to work with
structured data, primarily in the form of tabular data like spreadsheets or
SQL tables. Pandas is a fundamental tool in the data science and data
analysis.
¨ NumPy:
NumPy, short for "Numerical Python," is a fundamental Python library for
numerical and scientific computing. It provides support for large, multi-
dimensional arrays and matrices, as well as a wide range of mathematical
functions to operate on these arrays. NumPy is a cornerstone library in the
Python data science and scientific computing ecosystem and offers several
key features.
¨ DescisionTree Classifier:
A Decision Tree Classifier is a supervised machine learning algorithm used
for both classification and regression tasks. Decision trees are a popular and
interpretable choice for classification problems, where the goal is to predict
a categorical target variable based on input features. They work by
recursively splitting the dataset into subsets based on the values of input
features, ultimately making decisions or classifications at the leaf nodes of
the tree.
¨ Model_selection:
The Model_selection module in the scikit-learn library (often imported as
sklearn. model_selection) provides tools for model selection,
hyperparameter tuning, and various techniques for splitting data into training
and testing sets. This module is an integral part of scikit-learn, a popular
Python library for machine learning. It is used to perform tasks related to
cross-validation, hyperparameter optimization, and dataset splitting.
“
Train-Test Splitting:
train_test_split: This function allows you to split a dataset into a training
set and a testing (or validation) set. It's commonly used to assess a model's
performance on unseen data.
¨ RandomForest Classifier:
The RandomForest Classifier is a popular machine learning algorithm that
falls under the ensemble learning category. It is used for both classification
and regression tasks and is based on the concept of bagging (Bootstrap
Aggregating) and decision trees. Random Forests are known for their
robustness and high predictive accuracy.
Snaps:
Through the power of Python and its versatile libraries, we have explored the
development of a robust credit card fraud detection system. By meticulously
preprocessing and engineering data, we have transformed raw transaction
information into a format amenable to sophisticated analysis. Leveraging the
capabilities of Python's Scikit-Learn and TensorFlow, we have delved into the
realm of machine learning, crafting predictive models capable of identifying
fraudulent activities.
The following books are referred during the analysis and execution phase of the
project.
WEBLINKS:
1. www.Pytutorial.com - covering all the most important C# concepts. This
tutorial is primarily for new users.