You are on page 1of 4

The RDD transformations or operations functions from the options provided are:

- flatMap(): Transforms each element of the RDD by applying a function that


returns multiple elements.

- Map(): Transforms each element of the RDD by applying a function.

- Filter(): Filters elements of the RDD using a predicate function.

- reduceByKey(): Merge the values for each key using an associative and
commutative reduce function.

- Reduce(): Merges the elements of the dataset using a function.

The rest are actions or terminal operations:

- Count(): Returns the number of elements in the dataset.

- Collect(): Returns all the elements of the dataset as an array at the driver
program.
- First(): Returns the first element of the dataset.

- Max(): Returns the maximum value in the dataset.

- sortByKey(): Sorts data by keys and returns a new RDD.

So in summary, the RDD transformations provided are: flatMap(), map(), filter(),


reduceByKey(), reduce(). The rest are actions/terminal operations.

Here are solutions to the problems in Scala:

1. Word count:

```scala
Def wordCount(text: String): Map[String, Int] = {
Val words = text.split(\\s+)
Words.groupBy(identity).mapValues(_.length)
}
```

2. Text search for word:

```scala
Def textSearch(text: String, word: String): Boolean = {
Text.split(\\s+).contains(word)
}
```

3. Prediction with linear SVM:

```scala
Import org.apache.spark.ml.classification.LinearSVC
Import org.apache.spark.ml.feature.VectorAssembler

// Input data
Val data = spark.read.format("libsvm").load("data.txt")

// Feature vector
Val assembler = new VectorAssembler()
.setInputCols(data.columns except "label")
.setOutputCol("features")

Val transformedData = assembler.transform(data)

// Train model
Val lr = new LinearSVC()
.setFeaturesCol("features")
.setLabelCol("label")

Val model = lr.fit(transformedData)

// Make prediction
Val prediction = model.transform(transformedData)
.select("features", "prediction")
```

This shows basic Scala code to perform word counting, text search and linear SVM
prediction using Spark MLlib. Let me know if any part needs more explanation!

You might also like