You are on page 1of 7

Apache Spark

What is it ?

How does it work ?

Benefits

Tuning

Examples

www.xoomtrainings.com

sales@xoomtrainings.com

Spark What is it ?

Open Source

Alternative to Map Reduce for certain applications

A low latency cluster computing system

For very large data sets

May be 100 times faster than Map Reduce for

Iterative algorithms

Interactive data mining

Used with Hadoop / HDFS

Released under BSD License

www.xoomtrainings.com

sales@xoomtrainings.com

Spark How does it work ?

Uses in memory cluster computing

Memory access faster than disk access

Has API's written in

Scala

Java

Python

Can be accessed from Scala and Python shells

Currently an Apache incubator project

www.xoomtrainings.com

sales@xoomtrainings.com

Spark Benefits

Scales to very large clusters

Uses in memory processing for increased speed

High Level API's

Java, Scala, Python

Low latency shell access

www.xoomtrainings.com

sales@xoomtrainings.com

Spark Tuning

Bottlenecks can occur in the cluster via

Tune data serialization method i.e.

CPU, memory or network bandwidth


Java ObjectOutputStream vs Kryo

Memory Tuning

Use primitive types

Set JVM Flags

Store objects in serialized form i.e.

RDD Persistence

MEMORY_ONLY_SER

www.xoomtrainings.com

sales@xoomtrainings.com

Spark Examples
Example from spark-project.org, Spark job in Scala.
Showing a simple text count from a system log.

/*** SimpleJob.scala ***/

import spark.SparkContext

import SparkContext._

object SimpleJob {

def main(args: Array[String]) {

val logFile = "/var/log/syslog" // Should be some file on your system

val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",

List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))

val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line => line.contains("a")).count()

val numBs = logData.filter(line => line.contains("b")).count()

println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

www.xoomtrainings.com

sales@xoomtrainings.com

Contact Us

Feel free to contact us at

www.xoomtrainings.com

sales@xoomtrainings.com
-- USA :+1-610-686-8077 or India :+91-404-018-3355

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems

You might also like