Professional Documents
Culture Documents
ATM
EFT Service
Mobile Kredi
Service Sqoop OLAP
Şube OLTP
Fatura
Service
Web
IVR HADOOP
Istediğimiz Uygulama
Abc.txt
Storage
NameNode
(HDFS Manager) High Available Mode
Linux
Stand by namenode
Journal Nodes (3)
HDFS
1 1 1
JVM JVM JVM JVM JVM JVM
Data Node Data Node Data Node Data Node Data Node Data Node
SQL
Processing
Hive Engine
Linux
Kod Kod
(MapReduce) (MapReduce)
1 1 1
JVM JVM JVM JVM JVM JVM
Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node
JVM
Linux
1 8 RAM 12 6
CPU
Virtualization
Linux Windows
Hypervisor
CPU
1 CPU
1TB HDD (10 TB) RAM (1 TB)
128MB
Docker Desktop
Linux 1
Linux
Apps MongoDB
Docker
Linux Kernel
Windows
3 CPU
8
Parallel
1
3
2
10
3 CPU
7
4
5 36
11
6
26
7
15
8
list RDD
SparkContext 1
1
parallelize 2
2
3
3
4
4
5
6
5
7
6
8
rdd.collect() 7
8
rdd.take(5)
Big Data
Processing
Mechine Learning
Storage
HADOOP Spark
+
1 x 1 x 1 x
2 y 2 y 2 y
6 x 3 3 3
4 y 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
26 31 36 41 46
42
39
36
Output
Input Fonksiyon
F(1,0) 7
???
F(0,1) 8
F(1,1) 10
F(x,y) = ax + by + c
F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Denklemde yerine koyma
F(0,1) = b + c = 8
Denklem Çözme
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
Label
Features Model
F(1,0) 7
???
F(0,1) 8
F(1,1) 10
F(x,y) = ax + by + c
F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Prediction
F(0,1) = b + c = 8
Model Training
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
Label
Features Model
F(3,1) 10.000.000
???
F(2,1) 8.000.000
F(2,1) 18.000.000
F(1,1) 6.000.000
F(1,1) 6.500.000
F(x,y) = ax + by + c
F(1,0) = a + c = 7 F(x,y) = 2x + 3y + 5
Prediction
F(0,1) = b + c = 8
Model Training
F(1,1) = a + b + c = 10
F(5,5) = 30
a=2
b=3
c=5
F(1) = 4
F(2) = 6 F(x) = 2x + 2
F(3) = 8
F(4) = 8
F(2) = 5
8
F(x) = 0
584
4
503 18
444
ML
%80 %20
Train / Validation
Data Engineering ( Cleaning, Feature Selection, Null, Categorical values ..etc)
Model
KMeans
Dataset
Train
Train Test
Train
Model
maxBin = [1,2,3,4,5,6]
maxDepth = [10,20,30]
Train
BestModel maxBin= 4
[%99) maxDepth = 30
%80
%20
%60 %20
Train Validation Test
maxBin = [1,2,3,4,5,6]
maxDepth = [10,20,30]
Train
BestModel maxBin= 4
[%99) maxDepth = 30