1. You can create an RDD from an existing collection
2. You can create an RDD from an external file.
Wordcount program in Spark.
val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
val words = input.flatMap(x => x.split("\\W+"))
val lowercaseWords = words.map(x => x.toLowerCase())
valnn = lowercase.map(y => (y,1))
val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" ")) //Sort by value in descending order. For ascending order remove 'false' argument from sortBy words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false) //for ascending order by value words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)
//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey //Sort by key in descending order words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)