You are on page 1of 1

There are three major components for any Spark development environment.

1. Driver======================> Shell
2. Spark Context===============>
3. RDD

Local Mode in Spark.

spark-shell

How to create an RDD?

1. You can create an RDD from an existing collection

2. You can create an RDD from an external file.

Wordcount program in Spark.


val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

val words = input.flatMap(x => x.split("\\W+"))

val lowercaseWords = words.map(x => x.toLowerCase())

valnn = lowercase.map(y => (y,1))

val textfile = sc.textFile("file:///home/hdfs/input.txt")


val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument
from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)

//Sort by key in ascending order


words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)

You might also like