You are on page 1of 22

Exploratory Data

Analysis and
Visualization Project
FURKAN ÇELEN
HÜSEYİN KARA
Presentation Plan
Exploratory Data Analysis
 Univariate Analysis
 Bivariate Analysis
Anomaly Detection Algorithms
 Z-score
 K-Means
 Autoencoder
 Isolation Forest
 Gaussion Mixture
 PCA
Data- Fraud Data Set

A target
12 Features
10779 Rows
Summary Statictics
Univariate Analysis
Class-1 %4.5
Class-0 %95.5
Features
Features
Bivariate Analysis - Pair Plot
Bivariate Analysis – Corr Plot
V8, V6
V6 vs. Target
V8 vs. Target
Z-score
Z=(x-μ)/σ
Treshold = 3
Z> Treshold --> 1 (True)
Z< Treshold --> 0 (False)
K-means
K-means
2 Cluster
(283 (1) vs. 10496 )
V8 vs. V6
Autoencoder
Encoder (64,32,16)
Dencoder (16,32,64)
code_size=8
loss='msle’
metrics=['accuracy’]
optimizer='adam'
epochs=30
batch_size=256
Acc score 0.96
Isolation Forest
Gaussion Mixture Model
PRINCIPAL COMPONENT
ANALYSIS (PCA)
Our data is visualized in 2 dimensions
66% variance is covered
But we have 4.5% fraud
PCA
Reconstruction errors higher than threshold
With 8 components
95% variance is covered
Performance of PCA
Model Comparison
Teşekkürler..

You might also like