You are on page 1of 26

INTERNSHIP REPORT

A report submitted in partial fulfilment of the requirements for the Award of Degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE DATA SCIENCE

on

“DATA ANALYSIS”
from
“CODSOFT”
on
“CREDIT CARD FRAUD DETECTION”

by
HARSH KUMAR
Roll. No.: 2201331549007

Under Supervision of
Mr. Raviraj Singh Kurmi
(Assistant professor, CSDS )

(Duration: 15th July, 2023 to 15th August, 2023)

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING IN EMERGING TECHNOLOGIES


DEPARTMENT OF DATA SCIENCE
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY,
GREATER NOIDA
(An Autonomous Institution)

HARSH KUMAR 2201331549007 Page |1


SCHOOL OF COMPUTER SCIENCE AND ENGINEERING IN EMERGING TECHNOLOGIES
DEPARTMENT OF DATA SCIENCE
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY,
GREATER NOIDA
(An Autonomous Institution)

CERTIFICATE

This is to certify that the “Internship report” submitted by HARSH KUMAR (Regd.
No.:2201331549007) is work done by him and submitted during academic year 2023-2024, in partial
fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY in
COMPUTER SCIENCE AND ENGINEERING (DATA SCIENCE), at CODSOFT
TECHNOLOGIES pvt.Ltd, New Delhi.

Department Internship Coordinator


Mr. Raviraj Singh Kurmi
. (Assistant professor, CSDS)

Dr. Priyanka Chandani


Head of the Department
Department of CSBS/AI/DS

HARSH KUMAR 2201331549007 Page |2


HARSH KUMAR 2201331549007 Page |3
ACKNOWLEDGEMENT

First I would like to thank CODSOFT, New Delhi for giving me the opportunity to do an internship
within the organization.

HARSH KUMAR 2201331549007 Page |4


I also would like all the people that worked along with me with their patience and openness they created
an enjoyable working environment.

It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge the help of
these individuals.

I am highly indebted to Director Dr. Vinod M. Kapse, for the facilities provided to accomplish this
internship.

I would like to thank my Head of the Department (HOD) Dr. Priyanka Chandani for her support
throughout my internship.

I would like to thank Mr Raviraj Singh Kurmi internship coordinator Department of CSDS (Data
Science) for their support and advices to get and complete internship in above said organization.

I am extremely great full to my department staff members and friends who helped me in successful
completion of this internship.

HARSH KUMAR
(0221DCSDS164)

ABSTRACT

CodSoft are IT services and IT consultancy that specializes in creating innovative solutions for
businesses. We are passionate about technology and believe in the power of software to transform
the world. Our internship program is just one of the ways in which we are investing in the future of
the industry.
At Cod Soft, we believe practical knowledge is the key to success in the tech industry. Our aim is to
help students lacking basic skills by offering hands-on learning through live projects and real-world
examples.
HARSH KUMAR 2201331549007 Page |5
At Cod Soft, we believe that collaboration is the key to success. Our internship program is designed
to help you build lifelong relationships with fellow interns, mentors, and industry experts. We're
proud to have a diverse community of passionate individuals who are committed to pushing the
boundaries of technology. Whether you're interested in front-end development, back-end
development, or UI/UX design, we have something for everyone. Join us today and become a part
of our vibrant community.

Organisation Information:

CODSOFT is a professionally managed company with years of industry experience in developing


and delivering Enterprise specific Software and Web development solutions using latest
technologies. Quality is the buzz word in today's world without which no organization can
survive. Along with quality we at CODSOFT. "Think Beyond" to take one step ahead and focus
on Delivery of the solutions. We design processes that focus not just only on quality but also on
delivery which increases the value to our global clients. Apart from training our employees on
latest technologies, we also empower them to deliver exciting solutions to our clients. At the core
CODSOFT operates in three specific domains namely Software Development, Website Design &
Development and Geographic Information Services. We also offer our services in building E-
Commerce solutions, Search Engine Optimization (SEO) and Database Administration services.
Under each division we further provide specific industry solutions on focused domains with
cutting edge technologies. We emphasize on building relationships with our clients by delivering
projects on time and within budget.

Programs and opportunities:

This ground up approach helps us deliver not only the solution to our clients but also add
value to At the core NANO MINDZ operates in three specific domains namely Software
Development, Website Design& Development and Geographic Information Services. We
also offer our services in building E-Commerce solutions, Search Engine Optimization
(SEO) and Database Administration services. Under each division we further provide
specific industry solutions on focused domains with cutting edge technologies. We
emphasize on building relationships with our clients by delivering projects on time and
within budget.

HARSH KUMAR 2201331549007 Page |6


Methodologies:

We follow a structured methodology for our projects which starts from designing the
solution to the implementation phase. Well planned Project reduces the time to deliver the
project and any additional ad-hoc costs to our clients, hence we dedicate majority of our
time understanding our clients business and gather requirements. This ground up approach
helps us deliver not only the solution to our clients but also add value to your investments.

Key parts of the report:


Under each division we further provide specific industry solutions on focused domains with
cutting edge technologies.

Benefits of the Company/Institution through our report:

Under each division we further provide specific industry solution on focused domains with
cutting edge technologies. We emphasize on building relationships with our clients by
delivering projects on time and within budget.

HARSH KUMAR 2201331549007 Page |7


INDEX

S.no CONTENTS Page no

1. Introduction ....................................................................................................................12

1.1 Modules ................................................................................................................... 13

2. Analysis ..........................................................................................................................14

3. Software requirements specifications ............................................................................ 15

4. Technology .................................................................................................................... 16-

4.1 Matplotlib ................................................................................................................ 16

4.2 Pandas ......................................................................................................................16

4.3 Numpy/Sklearn ....................................................................................................... 16

4.4 DecisionTree_Classifier........................................................................................... 17

4.5 Model Selection ...................................................................................................... 17

5. Coding… ........................................................................................................................19

6. Screenshots ................................................................................................................20-24

7. Conclusion ....................................................................................................................25

8. Bibilography .................................................................................................................26

HARSH KUMAR 2201331549007 Page |8


Learning Objectives/Internship Objectives

The internship objectives for my role as a Data Science intern at Codsoft on the "Credit Card
Fraud Detection" project using the Kaggle dataset might encompass:

• Technical Skill Development: Gain hands-on experience in data preprocessing, feature


engineering, and model development specifically tailored for fraud detection.

• Understanding Machine Learning: Learn and apply machine learning techniques,


particularly logistic regression or other models suitable for fraud detection.

• Data Analysis Proficiency: Enhance skills in exploratory data analysis (EDA) to derive
insights and patterns from datasets.

• Real-world Application: Apply theoretical knowledge in a practical, real-world scenario


within the domain of financial security.

• Handling Imbalanced Data: Learn techniques to address the challenges posed by


imbalanced datasets, especially in fraud detection scenarios.

• Performance Evaluation: Understand and apply various model evaluation metrics for
fraud detection, such as precision, recall, F1-score, and AUC-ROC.

• Communication and Presentation: Develop the ability to articulate and present complex
technical concepts to both technical and non-technical audiences.

• Problem-solving in Data Science: Tackle the complexities of fraud detection problems,


such as dealing with sensitive data and maintaining model transparency.

• Project Management: Gain experience in handling a complete project lifecycle, including


data gathering, preprocessing, model development, evaluation, and presentation of findings.
• Collaboration and Teamwork: Work within a team environment, learning from
experienced professionals, and contributing to a collective goal.

• These objectives are designed to provide a comprehensive learning experience, combining


technical skills, problem-solving abilities, and the application of data science methodologies
in a practical, business-oriented context.Utilizing internships is a great way to build your
resume and develop skills that can be emphasized in your resume for future jobs. When you
are applying for a Training Internship, make sure to highlight any special skills or talents that
can make you stand apart from the rest of the applicants so that you have an improved chance
of landing the position.

HARSH KUMAR 2201331549007 Page |9


WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES

DATE DAY NAME OF THE TOPIC/MODULE COMPLETED


17/07/23 Monday Introduction of Data Science.
18/07/23 Tuesday Python
19/07/23 Wednesday Python Libraries
20/07/23 Thursday Numpy
21/07/23 Friday Pandas
22/07/23 Saturday Matplotlib

DATE DAY NAME OF THE TOPIC/MODULE COMPLETED

24/0/23 Monday Google colab, Jupyter Notebook

25/07/23 Tuesday Line plot, Scatter Plot

26/07/23 Wednesday Histograms

27/07/23 Thursday Pie chart

28/07/23 Friday Boxplot

29/07/23 Saturday Heatmap

HARSH KUMAR 2201331549007 P a g e | 10


DATE DAY NAME OF THE TOPIC/MODULE COMPLETED
31/07/23 Monday Datasets
01/08/23 Tuesday Datapoints
02/08/23 Wednesday Data Cleaning

03/08/23 Thursday Data Transformartion

04/08/23 Friday Normalization


05/08/23 Saturday Standardization

DATE DAY NAME OF THE TOPIC/MODULE COMPLETED

07/08/23 Monday Feature Engineering

08/08/23 Tuesday Categorical Encoding

09/08/23 Wednesday Data Aggregation

10/08/23 Thursday Data Sampling

11/08/23 Friday Data Splitting

12/08/23 Saturday Project session

DATE DAY NAME OF THE TOPIC/MODULE COMPLETED

14/08/23 Monday Design& Analysis

15/08/23 Tuesday Coding

16/08/23 Wednesday Testing

HARSH KUMAR 2201331549007 P a g e | 11


INTRODUCTION

Credit Card Fraud Detection1

In our increasingly digitized world, financial transactions are predominantly


conducted electronically, with credit cards being one of the most commonly used
payment methods. While this convenience has revolutionized the way we shop
and manage our finances, it has also given rise to a pressing concern - credit card
fraud. As the digital landscape continues to evolve, so do the methods employed
by fraudsters. Detecting and preventing credit card fraud has become a
paramount challenge for financial institutions, businesses, and consumers.
Python, as a versatile and powerful programming language, has emerged as a
valuable tool for developing sophisticated fraud detection systems. In this era of
data-driven decision-making, machine learning and data analysis techniques,
Python's libraries and frameworks are at the forefront of credit card fraud
detection solutions. This report is an exploration of how Python can be leveraged
to combat credit card fraud effectively. We will delve into the fundamental
principles, methods, and technologies that underlie credit card fraud detection,
and demonstrate how Python code can be employed to build a robust system.

Our journey will encompass the following key aspects:

¨ Data Pre-processing:
We'll start by discussing the importance of data preprocessing in credit card
fraud detection. Python libraries like Pandas and NumPy will be employed
to clean and transform raw transaction data, making it suitable for further
analysis.

¨ Feature Engineering:
Feature engineering is a crucial step in identifying patterns and anomalies
in the transaction data. We'll explore how Python's Scikit-Learn and
TensorFlow can help in creating meaningful features that enable effective
fraud detection.

¨ Machine Learning Algorithms:


Python offers a rich ecosystem of machine learning libraries, such as
ScikitLearn and XG- Boost, that empower the development of predictive
models. We will delve into various algorithms, including logistic regression,

HARSH KUMAR 2201331549007 P a g e | 12


random forests, and deep learning, and explain how they can be
implemented for fraud detection.

Anomaly Detection:
Credit card fraud often involves rare and unusual transactions. We will
investigate how Python's libraries, like Scikit-Learn and Isolation Forest,
can be used for anomaly detection to identify fraudulent activities.

¨ Model Evaluation and Deployment:


We will guide you through the process of evaluating the performance of
your credit card fraud detection model and discuss best practices for
deploying it into a production environment.

¨ Real-time Monitoring and Alerts:


The ability to detect fraud in real-time is critical. Python can be
employed to build systems that monitor transactions as they occur and
trigger alerts when suspicious activity is detected.

The goal of this report is to equip you with a comprehensive understanding


of credit card fraud detection using Python. By the end of this journey, you
will have the knowledge and tools necessary to implement a proactive and
efficient credit card fraud detection system that can safeguard financial
transactions in a digital world filled with potential threats

HARSH KUMAR 2201331549007 P a g e | 13


2. SYSTEM ANALYSIS

2.1 Requirement Analysis

Existing System:

In an Existing we address these challenges and present an approach to


efficient, incremental consolidation of data-intensive flows. Following
common practice, our method iterates over information requirements to
create the final design. we show how to efficiently accommodate a new
information requirement to an existing design and, how to update a design in
lieu of an evolving information requirement. The final design satisfying all
requirements comprises a multiflow. As ‘coal’ is formed after the process and
extreme compaction of layers of partially decomposed materials1, Co Al
processes individual data flows and incrementally consolidates them into a
unified multi-flow.

Proposed System:

Following the previously proposed set of flow transformations in the context


of ETL processes and thus rely on the following four flow transformations
used for reordering the operations. Swap Applied to a pair of adjacent unary
operations, it interchanges the order of these operations.
Distribute/Factorize. Applied on a unary operation over an adjacent n-array
operation, it respectively distributes the unary operation over the adjacent
nary operation or factorizes several unary operations over the adjacent n-
array operation. Merge/Split. Applied on a set of adjacent unary operations,
it respectively merges several operations into a single unary operation or
splits a unary operation into several unary operations. Re-associate. Applied
on a pair of mutually associative n-array operations, it interchanges the order
in which these operations are executed.

HARSH KUMAR 2201331549007 P a g e | 14


3. SOFTWARE REQUIREMENTS SPECIFICATIONS
3.1 System configurations

The software requirement specification can produce at the culmination of the


analysis task. The function and performance allocated to software as part of
system engineering are refined by established a complete information
description, a detailed functional description, a representation of system
behavior, and indication of performance and design constrain, appropriate
validate criteria, and other information pertinent to requirements.

Software Requirements:
• Operating system: Windows 11 / MacOS Ventura and above...
• Coding Language: Python Language
• Front-End: Visual Studio 2012 Professional / Jupyter Notebook / Google
Collab
• Dataset: Kaggle

Hardware Requirement:
• System: Any Good performance system.
•Internal-SSD: 256GB / 512 GB / 1TB.
• Ram: 8 GB / 16 GB

HARSH KUMAR 2201331549007 P a g e | 15


4. TECHNOLOGY

Machine learning is a subset of artificial intelligence (AI) that focuses on the


development of algorithms and models that enable computer systems to
learn and make predictions or decisions based on data. It is a technology that
has gained significant prominence in recent years and has a wide range of
applications across various industries.

Python / Machine Learning Modules Used:

¨ Matplotlib:
Matplotlib is a popular Python library for creating 2D and 3D plots and
visualizations. It provides a wide range of tools and functions for creating
high-quality graphs, charts, and figures for data analysis, scientific research,
and data visualization purposes. Matplotlib is open-source and is widely
used in the data science and scientific computing communities.

¨ Pandas:
Pandas is an open-source Python library designed for data manipulation and
analysis. It provides easy-to-use data structures and functions to work with
structured data, primarily in the form of tabular data like spreadsheets or
SQL tables. Pandas is a fundamental tool in the data science and data
analysis.

¨ NumPy:
NumPy, short for "Numerical Python," is a fundamental Python library for
numerical and scientific computing. It provides support for large, multi-
dimensional arrays and matrices, as well as a wide range of mathematical
functions to operate on these arrays. NumPy is a cornerstone library in the
Python data science and scientific computing ecosystem and offers several
key features.

HARSH KUMAR 2201331549007 P a g e | 16


¨ Sklearn:
Scikit-learn, often referred to as "sklearn," is a popular open-source Python
library for machine learning and data mining. It is built on top of other
fundamental libraries like NumPy, SciPy, and matplotlib and provides a
consistent and user-friendly interface for a wide range of machine learning
algorithms and tools. Scikit-learn is widely used in the data science and
machine learning.

¨ DescisionTree Classifier:
A Decision Tree Classifier is a supervised machine learning algorithm used
for both classification and regression tasks. Decision trees are a popular and
interpretable choice for classification problems, where the goal is to predict
a categorical target variable based on input features. They work by
recursively splitting the dataset into subsets based on the values of input
features, ultimately making decisions or classifications at the leaf nodes of
the tree.

¨ Model_selection:
The Model_selection module in the scikit-learn library (often imported as
sklearn. model_selection) provides tools for model selection,
hyperparameter tuning, and various techniques for splitting data into training
and testing sets. This module is an integral part of scikit-learn, a popular
Python library for machine learning. It is used to perform tasks related to
cross-validation, hyperparameter optimization, and dataset splitting.


Train-Test Splitting:
train_test_split: This function allows you to split a dataset into a training
set and a testing (or validation) set. It's commonly used to assess a model's
performance on unseen data.

¨ RandomForest Classifier:
The RandomForest Classifier is a popular machine learning algorithm that
falls under the ensemble learning category. It is used for both classification
and regression tasks and is based on the concept of bagging (Bootstrap
Aggregating) and decision trees. Random Forests are known for their
robustness and high predictive accuracy.

HARSH KUMAR 2201331549007 P a g e | 17


DATASET USED: Creditcard.csv

HARSH KUMAR 2201331549007 P a g e | 18


Code: Credit Card Fraud Detection Using Python

Snaps:

HARSH KUMAR 2201331549007 P a g e | 19


HARSH KUMAR 2201331549007 P a g e | 20
HARSH KUMAR 2201331549007 P a g e | 21
HARSH KUMAR 2201331549007 P a g e | 22
Model Building:

1: Random Forest Classifier

HARSH KUMAR 2201331549007 P a g e | 23


2: Logistic Regression

3: Decision Tree Classifier

HARSH KUMAR 2201331549007 P a g e | 24


CONCLUSION

Here I would like to Conclude1 that, in today's rapidly evolving digital


landscape, the reliance on electronic transactions, particularly credit card
payments, has surged. However, as convenience and accessibility have grown,
so too has the risk of credit card fraud. This report has delved into the realm of
credit card fraud detection using Python modules, showcasing how innovative
technology can be harnessed to safeguard financial transactions and protect
consumers and businesses alike.

Through the power of Python and its versatile libraries, we have explored the
development of a robust credit card fraud detection system. By meticulously
preprocessing and engineering data, we have transformed raw transaction
information into a format amenable to sophisticated analysis. Leveraging the
capabilities of Python's Scikit-Learn and TensorFlow, we have delved into the
realm of machine learning, crafting predictive models capable of identifying
fraudulent activities.

In a world where technological advancements continue to shape our daily lives,


the tools and techniques outlined in this report stand as a testament to our
collective commitment to preserving the security and integrity of financial
transactions. Python, as the language of choice, exemplifies the innovation and
adaptability that will continue to drive progress in the realm of credit card fraud
detection. With ongoing vigilance and the utilization of cutting-edge
technology, we can fortify our defenses against the ever-persistent threat of
credit card fraud, ensuring a safer and more secure financial future for all.

HARSH KUMAR 2201331549007 P a g e | 25


8. BIBLOGRAPHY

The following books are referred during the analysis and execution phase of the
project.

1. M. Langerin, “Data integration: A theoretical perspective,” in PODS, 2002,


pp. 233– 246.

2. D. Caruso, “Bringing Agility to Business Intelligence,” February 2011,


Information
Management,http://www.informationmanagement.com/infodirect/200919
1/business intelligence metadata analytics ETL data management10019747-1.html.

3. R. Hughes, Agile Data Warehousing: Delivering world-class business


intelligence systems using Scrum and XP. IUniverse, 2008.

4. Y. Chen, S. Alsbaugh, and R. Katz, “Interactive analytical processing in big


data systems:

5. A cross-industry study of map reduce workloads,” Proceedings of the


VLDB Endowment, vol. 5, no. 12, pp. 1802–1813, 2012.

WEBLINKS:
1. www.Pytutorial.com - covering all the most important C# concepts. This
tutorial is primarily for new users.

2. www.Machinelearning.com - what is the Machinelearning all about? For


sample

HARSH KUMAR 2201331549007 P a g e | 26

You might also like