You are on page 1of 27

Data Analytics for Fraud

Detection
Oleh : Aris Budi Santoso
Pengertian Fraud
“Wrongful or criminal deception intended to result in financial or
personal gain” (Oxford Dictionary)

“Fraud is an uncommon, well-considered,


imperceptibly concealed, time-evolving and often
carefully organized crime which appears in many
types of forms” (Van Vlasselaer et al. 2015)
Bentuk dan Jenis Fraud

• Credit card fraud • Money laundering


• Insurance fraud • Click fraud
• Corruption • Identity theft
• Counterfeit • Tax evasion
• Product warranty fraud • Plagiarism
• Healthcare fraud
• Telecommunications fraud
Faktor Terjadinya Fraud
Deteksi dan Pencegahan Fraud

• The classic approach to fraud detection is an expert-based approach, it


builds on the experience, intuition, and business or domain knowledge of
the fraud analyst
• Organizations start to develop an effective fraud-detection and prevention
system, a shift is taking place toward data-driven or statistically based
fraud-detection methodologies
Data Driven Fraud Detection
Keunggulan dari Data Driven Fraud Detection

• Precision
Statistically based fraud-detection methodologies offer an increased detection power
compared to classic approaches

• Operational efficiency
An increasing amount of cases to be analyzed, requiring an automated process as
offered by data-driven fraud-detection methodologies

• Cost efficiency
A more automated and, as such, more efficient approach to develop and maintain a
fraud-detection system, as offered by data-driven methodologies, is preferred
Data Driven Fraud Detection
Metode danTeknik Fraud Detection
1. Unsupervised Learning
• Outliers Detection :
 have great value and allow detecting a significant fraction of fraudulent cases
 allow detecting fraud that is different in nature from historical fraud

2. Supervised Learning
• Predictive Analytics
Learn from historical information or observations in order to retrieve patterns that allow
differentiating between normal and fraudulent behavior

3. Social Network Analysis


• extends the abilities of the fraud-detection system by learning and detecting
characteristics of fraudulent behavior in a network of linked entities
Trend Penelitian Mengenai Fraud Detection

• Telaah literature dilakukan dengan mengumpulkan artikel mengenai fraud


detection
• Metode pengumpulan data dilakukan dengan Crawling ke Scopus API
http://api.elsevier.com/content/search/scopus
• Berdasarkan query dengan kata kunci ‘KEY(fraud AND detection)’ diperoleh
sebanyak 4.173 artikel yang dipublikasikan sejak 1982 hingga yang akan
terbit di 2024
Trend Penelitian Mengenai Fraud Detection

Jumlah penelitian mengenai fraud


detection meningkat dari tahun ke
tahun
Trend Penelitian Mengenai Fraud Detection
Machine Learning

“Ability to learn without being explicitly programmed”


--- Arthur Samuel, 1959

“Learn from experience (E) with respect to some task (T) and
some performance measure (P)”
--- Tom Mitchell, 1997

Machine learning is a field of computer science that aims to teach


computers how to learn and act without being explicitly programmed
--- https://deepai.org/machine-learning-glossary-and-terms/machine-learning
Machine Learning

Orang menulis rule dalam


bentuk kode aplikasi

Model (komputer) dilatih


menggunakan data

Mehra, Sidharth & Hasanuzzaman, Mohammed. (2020). Detection of Offensive Language


in Social Media Posts
Unsupervised Learning
Clustering

• Cluster adalah kumpulan objek data


• Memiliki kemiripan dengan antara objek dalam satu cluster;
• Memiliki perbedaan dengan objek lain di luar cluster;

• Clustering adalah metode untuk membagi/ mengelompokan data


berdasarkan properti-properti dari data tersebut;
• Clustering termasuk dalam kategori unsupervised machine
learning
• Data tidak memiliki label/ predefined class
Anomali Detection
Density Based Spatial Clustering of Application with Noise (DBSCAN)
Supervised Learning
Classification

Classification is a task that requires the use of


machine learning algorithms that learn how to
assign a class label to examples from the problem
domain [2]
Classification
Algorithm
1. K Nearest Neighbor
2. Naïve Bayyes
3. Support Vector Machine
4. Logistic Regression
5. Decision Tree
6. Bagging : Random Forest
7. Boosting : AdaBoost, XGBoost, LGBM
8. Stacking : Voting
9. Artificial Neural Network
Fraud Detection menggunakan Teknik Classification
Social Network Analysis

Graph merupakan objek yang dibentuk


oleh kumpulan node/ vertices dan link/
edges.

G = (V, E)

V = {v1,v2,v3 …… }
E = {e1,e2,e3 ……}
Social Network Analysis

Centrality Link Prediction Community Detection


Graph Data Science untuk Fraud Detection
Graph Data Science untuk Fraud Detection
Deteksi Fraud pada Network yang Kompleks
Judul Big data analytics for Nabbing Fraudulent Transactions in Taxation System
Studi Kasus Commercial Department of Taxes, Telangana, India
Penulis Mehta et al (2019)
Latar Belakang Pemeriksaan dengan metode manual tidak memadai untuk mendeteksi transaksi fiktif dengan circular
trading, karena :
1. Ukuran data yang besar, urutan dari rangkaian transaksi yang kompleks yang dijalankan oleh jaringan
perusahaan boneka, serta
2. Transaksi tanpa id pengenal identitas dalam skema penggelapan tersebut
Skema 1. Pengusaha memungut PPN dari transaksi penjualan tanpa menerbitkan faktur pajak, kemudian
Penghindaran/ menerbitkan faktur pajak fiktif untuk pihak lainnya (dengan nilai yang lebih kecil dari transaksi
Penggelapan Pajak sebenarnya yang tidak diterbitkan faktur pajak);
2. Faktur pajak tersebut digunakan sebagai kredit pajak oleh pihak lain, sehingga mengurangi jumlah pajak
yang harus dibayar;
3. Jaringan perusahaan boneka dibuat untuk menyamarkan skema penggelapan ini dengan rangkaian
transaksi fiktif
Deteksi Fraud pada Network yang Kompleks
Big data analytics for Nabbing Fraudulent Transactions in Taxation System
Data Media Sosial
Perolehan Data Twitter
Scraping Twitter Menggunakan Tweet Harvest
Saat ini
penggunaan API
Twitter yang
free telah
ditiadakan,
sehingga
alternative
perolehan data
Twitter adalah
dengan teknik
Scraping
Diskusi

You might also like