You are on page 1of 26

Enterprise App Data Warehouse

ATM
EFT Service

Mobile Kredi
Service Sqoop OLAP
Şube OLTP
Fatura
Service

Web

IVR HADOOP
Istediğimiz Uygulama
Abc.txt

Storage

NameNode
(HDFS Manager) High Available Mode

Linux
Stand by namenode
Journal Nodes (3)

HDFS
1 1 1
JVM JVM JVM JVM JVM JVM

Linux Linux Linux Linux Linux Linux

Data Node Data Node Data Node Data Node Data Node Data Node
SQL

Processing
Hive Engine

Job Java Code


YARN
(MapReduce)
(Resource Manager) Compile

Linux

Kod Kod
(MapReduce) (MapReduce)
1 1 1
JVM JVM JVM JVM JVM JVM

Linux Linux Linux Linux Linux Linux

Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node
JVM

Linux

1 8 RAM 12 6

CPU
Virtualization

Linux Windows

File System ( Ext4fs ) File System ( NTFS )

Hypervisor

CPU
1 CPU
1TB HDD (10 TB) RAM (1 TB)
128MB
Docker Desktop
Linux 1
Linux
Apps MongoDB

Docker
Linux Kernel

Windows

File System ( NTFS )

HDD (10 TB) RAM (1 TB)


Toplam = 0
Non-Parallel
For

3 CPU

8
Parallel

1
3
2
10
3 CPU
7
4

5 36
11
6
26
7
15
8
list RDD

SparkContext 1
1
parallelize 2
2
3
3
4
4
5
6
5
7
6
8
rdd.collect() 7
8

rdd.take(5)
Big Data

Processing
Mechine Learning
Storage

HADOOP Spark
+
1 x 1 x 1 x
2 y 2 y 2 y
6 x 3 3 3
4 y 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
26 31 36 41 46

42

39

36
Output
Input Fonksiyon

F(1,0) 7
???
F(0,1) 8

F(1,1) 10

F(x,y) = ax + by + c

F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Denklemde yerine koyma
F(0,1) = b + c = 8
Denklem Çözme
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
Label
Features Model

F(1,0) 7
???
F(0,1) 8

F(1,1) 10

F(x,y) = ax + by + c

F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Prediction
F(0,1) = b + c = 8
Model Training
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
Label
Features Model

F(3,1) 10.000.000
???
F(2,1) 8.000.000
F(2,1) 18.000.000
F(1,1) 6.000.000
F(1,1) 6.500.000

F(x,y) = ax + by + c

F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Prediction
F(0,1) = b + c = 8
Model Training
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
F(1) = 4
F(2) = 6 F(x) = 2x + 2
F(3) = 8

F(4) = 8

F(2) = 5

8
F(x) = 0

584
4
503 18

444
ML

Label var mı?


var yok
Supervised Learning UnSupervised Learning

Label kategorik mi? hayır


evet

Classification Regression Clustering


Machine Learning Project

%80 %20

Train / Validation
Data Engineering ( Cleaning, Feature Selection, Null, Categorical values ..etc)
Model
KMeans
Dataset

Train

Model Ne kadar başarılı ????


%80 %20

Train Test

Train

Model

Başarı oranı bulunabilir


%80
%20
%60 %20
Train Validation Test

maxBin = [1,2,3,4,5,6]
maxDepth = [10,20,30]
Train

Hyper Parameter Tuning


Model
Model
Model
Model
Model
Model
Model
(%88)
Başarı oranı bulunabilir

BestModel maxBin= 4
[%99) maxDepth = 30
%80
%20
%60 %20
Train Validation Test

maxBin = [1,2,3,4,5,6]
maxDepth = [10,20,30]
Train

Hyper Parameter Tuning


Model
Model
Model
Model
Model
Model
Model
(%88)
Başarı oranı bulunabilir

BestModel maxBin= 4
[%99) maxDepth = 30

You might also like