Professional Documents
Culture Documents
ScaleByte
APACHE SPARK AND SCALA
www.scalebyte.com
Introduction
2
Copyright ©
Batch Vs Real-time (Stream) Scenario
5
Copyright ©
Batch Vs Real-time (Stream) Scenario
6
Copyright ©
Analytics Type Based on Input Data
7
Copyright ©
Spark real time streaming
8
Uses RDD’s
Copyright© Scalebyte
In memory Computations
10
Copyright© Scalebyte
Spark unified stack
11 Copyright© Scalebyte
What is RDD
12
Copyright© Scalebyte
RDD Basics
14
Copyright© Scalebyte
RDD basics
16
Transformations
Transformations are operations that return new RDD’s
eg map(), filter() etc
filter() transformation in Scala
val inputRDD = sc.textFile("log.txt")
val errorsRDD = inputRDD.filter(line =>line.contains("error"))
Copyright© Scalebyte
RDD Operations (Actions)
20
Actions
Actions are operations performed on RDD
that return results to the driver program, or
they can be stored into some storage Eg
count(), first() etc
Suppose we might want to print out some
information about the badLinesRDD, see
the examples inCopyright©
the next slide
Scalebyte
RDD Operations (Actions)
21
Copyright© Scalebyte
Passing functions to spark
24
def containsError(s):
return "error" in s
word = rdd.filter(containsError)
Copyright© Scalebyte
RDD transformations
25
Copyright© Scalebyte
RDD Transformations
26
Copyright© Scalebyte
RDD Transformations
27
Copyright© Scalebyte
RDD ACTIONS
28
Copyright© Scalebyte
RDD ACTIONS
29
Copyright© Scalebyte
Quick architectural overview
30
Copyright© Scalebyte
SPARK Architecture
31
worker nodes 1
driver 5
Worker nodes create executers worker
4
Executers now directly come in
execu
contact with the driver nodes terT
T
and the further communication
happens between driver and
executer node
Copyright© Scalebyte
Major Industries leveraging Analytics
33
Copyright ©
Major Industries leveraging Analytics
34
Copyright ©
Before We Go Ahead
35
Copyright ©
Before We Go Ahead
36
Copyright ©
Most Popular Real-Time Analytics Tool
37
Copyright ©
Idle Tool Real-Time Analytics
38
Copyright ©
Apache Flink Idle Tool Real-Time Analytics
39
Copyright ©
Apache Flink
40
Copyright© Scalebyte
Features of Apache Flink
42
Copyright© Scalebyte
The Strength of Flink comes from its Architecture
43
44
Lambda Architecture
45
Lambda Architecture
46
47
48
Other Users of Apache Flink
49
Conclusion: Now is the Time for Apache Flink
50
Big Data
Processing Tool