You are on page 1of 14

FINAL

PROJECT
“CREDIT CARD FRAUD DETECTION”
By Sunrise Team
Data Science Bootcamp Batch 30
MEET THE TEAM

RENNY FAZRIN NISA YASMIN QURROTA SILVIA MAULINA TSANIA PUTRI


AINI KHUSNUR ROFIAH
TABLE OF
CONTENTS
1) Background

2) Dataset Story

3) Key Features

4) Observations

5) Analysis Steps

6) Conclusion
BACKGROUND
Credit card fraud is one of the most common types of identity fraud.

Its prevalence rose significantly during the coronavirus pandemic, with

fraudulent credit card applications up 17 percent in the first month of the

pandemic alone.

This has been sustained since, with the National Fraud Hunter Prevention

Service revealing that UK credit card fraud reached a five-year high in the last

three months of 2021

With this in mind, how can financial organizations protect

themselves and their customers from credit card fraud and

minimize its impact on financial institutions worldwide?


DATASET
STORY
 This dataset contains credit card transactions made by European
cardholders in the year 2023.
 It comprises over 550,000 records, and the data has been anonymized to
protect the cardholders' identities.
 The primary objective of this dataset is to facilitate the development of
fraud detection algorithms and models to identify potentially fraudulent
transactions.
Key Features

 id: Unique identifier for each transaction


 V1-V28: Anonymized features representing Observations
various transaction attributes (e.g., time,
 We have 568630 Rows of observations
location, etc.)
having 30 columns.
 Amount: The transaction amount
 'Class' is our Output feature indicating
 Class: Binary label indicating whether the
whether the transaction is fraudulent (1) or
transaction is fraudulent (1) or not (0)
not (0).
ANALYSIS STEPS

1 2 3
Data Preprocessing Ex p lo ra t o ry D a t a Mo de lling
A n a ly s is ( ED A)

• Detecting Missing Values 1. Heatmap 1. Logistic Regression


• Check Duplicates 2. Skewness 2. XGBoost
3. The Distribution of 'amount
feature’
4. Data Preparation
EDA

1
Heatmap * Few features have high co-relation among different features.
* V17 and V18 are highly co-related.
* V16 and V17 are highly co-related.
* V14 has a negative correlation with V4.
* V12 is also negatively correlated with V11.
* V11 is ngetively co-related with V10 and positvely with V4.
* V3 is positevely co-related with V10 and V12.
* V9 and V10 are also positively co-related.
EDA V1
V2
V3
-0.08
-1.40
0.01
V4 -0.04
V5 1.51
V6 -0.20
V7 19.03

2
V8 0.30
V9 0.17
V10 0.74
V11 -0.02
V12 0.07
V13 0.01
V14 0.21
V15 0.01
V16 0.27
V17 0.37

Skewness V18 0.13


V19 -0.01
V20 -1.56
V21 -0.11
V22 0.32
V23 -0.10
V24 0.07
V25 0.02
V26 -0.02  Columns with positive skewness (V7, V20, V27, and 28)
V27 2.76  Columns with negative skewness (V2, V5, V21, and V23)
V28 1.72  Columns with skewness close to zero (Amount and Class)
Amount 0.00
Class 0.00
dtype: float64
EDA

3
The Distribution of
'amount feature'
MODELLING

1
Logistic
Regression
MODELLING

4
XGBoost
Conclusion
 Kinerja model terbaik didapatkan pada metode XGBoost dengan akurasi 0,97 yang
artinya sebesar 97% model dapat mengklasifikasikan true positive dan true negative
dengan benar
 Pengembangan model yang efektif untuk deteksi penipuan adalah penting. Model
harus memiliki kemampuan untuk mengenali pola-pola yang mencurigakan dalam
transaksi kartu kredit.
 Karena pola penipuan dapat berubah seiring waktu, model mungkin perlu disesuaikan
secara berkala untuk tetap efektif dalam mendeteksi penipuan yang baru muncul.
 Memahami fitur-fitur yang paling berpengaruh dalam deteksi penipuan adalah
penting. Beberapa fitur mungkin memiliki keterkaitan yang tinggi dengan
kemungkinan penipuan.
 Pentingnya deteksi dini penipuan kartu kredit. Semakin cepat penipuan terdeteksi,
semakin kecil kerugian yang mungkin terjadi.
THANK YOU

You might also like