Professional Documents
Culture Documents
- reduceByKey(): Merge the values for each key using an associative and
commutative reduce function.
- Collect(): Returns all the elements of the dataset as an array at the driver
program.
- First(): Returns the first element of the dataset.
1. Word count:
```scala
Def wordCount(text: String): Map[String, Int] = {
Val words = text.split(\\s+)
Words.groupBy(identity).mapValues(_.length)
}
```
```scala
Def textSearch(text: String, word: String): Boolean = {
Text.split(\\s+).contains(word)
}
```
```scala
Import org.apache.spark.ml.classification.LinearSVC
Import org.apache.spark.ml.feature.VectorAssembler
// Input data
Val data = spark.read.format("libsvm").load("data.txt")
// Feature vector
Val assembler = new VectorAssembler()
.setInputCols(data.columns except "label")
.setOutputCol("features")
// Train model
Val lr = new LinearSVC()
.setFeaturesCol("features")
.setLabelCol("label")
// Make prediction
Val prediction = model.transform(transformedData)
.select("features", "prediction")
```
This shows basic Scala code to perform word counting, text search and linear SVM
prediction using Spark MLlib. Let me know if any part needs more explanation!