Professional Documents
Culture Documents
Who am I?
Software Developer @ Oracle 8+ years in software industry Playing with big data by hands-on
..mostly focused on large on-disk datasets: great for batch but slow
Design Goal
Batch One stack to rule them all Interactive Streaming
Easy to combine batch, streaming, and interactive computations Easy to develop sophisticated algorithms Compatible with existing open source ecosystem (Hadoop/HDFS)
Apache Spark
Resilient Distributed Datasets (RDDs)
Actions return values Transformations return pointer to new RDDs
Spark exposes RDDs through a languageintegrated API similar to DryadLINQ and FlumeJava each dataset is represented as an object and transformations are invoked using methods on these objects
Cluster Manager
Spark Streaming
Spark Streaming
Spark Streaming
Shark
Apache Mesos is a cluster manager that makes building and running distributed systems, or frameworks, easy and efficient. Using Mesos you can simultaneously run Apache Hadoop, Apache Spark, Apache Storm,k and many other applications on a dynamically shared pool of resources (machines).
batch
services
Workloads
Scalding
Impala
Shark
MySQL
JBoss
Django
Rails
Apps
MPI
Hadoop
Spark
Storm
Kafka
Chronos
Marathon
etc..
Framework
Kernel
Node
Node
Node
Node
Node
Node
Node
Node
Node
Cluster
Sentiment Analyzer
Akka Actors
Spray JSON
Angular JS/ D3
Q&A