You are on page 1of 16

Spark Playground

Hardik Pandya Software Developer

Who am I?
Software Developer @ Oracle 8+ years in software industry Playing with big data by hands-on

What I am going to talk about


Apache Spark Spark Streaming Shark Apache Mesos Use Case

Todays Open Analytics Stack

Application Data Processing Storage Infrastructure

..mostly focused on large on-disk datasets: great for batch but slow

Design Goal
Batch One stack to rule them all Interactive Streaming

Easy to combine batch, streaming, and interactive computations Easy to develop sophisticated algorithms Compatible with existing open source ecosystem (Hadoop/HDFS)

Apache Spark
Resilient Distributed Datasets (RDDs)
Actions return values Transformations return pointer to new RDDs

Spark exposes RDDs through a languageintegrated API similar to DryadLINQ and FlumeJava each dataset is represented as an object and transformations are invoked using methods on these objects

Spark Cluster Components


Worker Node Executor Task Driver Program SparkContext Cache Task

Cluster Manager

Worker Node Executor Task Cache Task

Spark Streaming

Spark Streaming

Spark Streaming

Shark

Apache Mesos is a cluster manager that makes building and running distributed systems, or frameworks, easy and efficient. Using Mesos you can simultaneously run Apache Hadoop, Apache Spark, Apache Storm,k and many other applications on a dynamically shared pool of resources (machines).

batch

services
Workloads

Scalding

Impala

Shark

MySQL

JBoss

Django

Rails

Apps

MPI

Hadoop

Spark

Storm

Kafka

Chronos

Marathon

etc..

Framework

Kernel

Node

Node

Node

Node

Node

Node

Node

Node

Node

Cluster

Real Time Moods


Akka - toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. Spray - open-source toolkit for building REST/HTTP-based integration layers on top of Scala and Akka

Real Time Moods


Yahoo Data Twitter Data Spark Streaming DStreams

Scheduler Feeder Receiver

Sentiment Analyzer

Akka Actors

Spray JSON

Angular JS/ D3

Q&A

You might also like