Welcome to Scribd!

Task Spark

Uploaded by

0% found this document useful (0 votes)

19 views4 pages

The document discusses RDD transformations and actions in Spark. It lists the main RDD transformations as flatMap(), map(), filter(), reduceByKey(), and reduce(). The actions/terminal operations mentioned are count(), collect(), first(), max(), and sortByKey(). RDD transformations operate on RDDs and return new RDDs, while actions return values to the driver program.

Original Description:

Hhggggrerreefguu

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

19 views4 pages

Task Spark

Uploaded by

Azza A. Aziz

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

The RDD transformations or operations functions from the options provided are:

- flatMap(): Transforms each element of the RDD by applying a function that

returns multiple elements.

- Map(): Transforms each element of the RDD by applying a function.

- Filter(): Filters elements of the RDD using a predicate function.

- reduceByKey(): Merge the values for each key using an associative and
commutative reduce function.

- Reduce(): Merges the elements of the dataset using a function.

The rest are actions or terminal operations:

- Count(): Returns the number of elements in the dataset.

- Collect(): Returns all the elements of the dataset as an array at the driver
program.
- First(): Returns the first element of the dataset.

- Max(): Returns the maximum value in the dataset.

- sortByKey(): Sorts data by keys and returns a new RDD.

So in summary, the RDD transformations provided are: flatMap(), map(), filter(),

reduceByKey(), reduce(). The rest are actions/terminal operations.

Here are solutions to the problems in Scala:

1. Word count:

```scala
Def wordCount(text: String): Map[String, Int] = {
Val words = text.split(\\s+)
Words.groupBy(identity).mapValues(_.length)
}
```

2. Text search for word:

```scala
Def textSearch(text: String, word: String): Boolean = {
Text.split(\\s+).contains(word)
}
```

3. Prediction with linear SVM:

```scala
Import org.apache.spark.ml.classification.LinearSVC
Import org.apache.spark.ml.feature.VectorAssembler

// Input data
Val data = spark.read.format("libsvm").load("data.txt")

// Feature vector
Val assembler = new VectorAssembler()
.setInputCols(data.columns except "label")
.setOutputCol("features")

Val transformedData = assembler.transform(data)

// Train model
Val lr = new LinearSVC()
.setFeaturesCol("features")
.setLabelCol("label")

Val model = lr.fit(transformedData)

// Make prediction
Val prediction = model.transform(transformedData)
.select("features", "prediction")
```

This shows basic Scala code to perform word counting, text search and linear SVM
prediction using Spark MLlib. Let me know if any part needs more explanation!

Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Learning Apache Spark With Python
Document10 pages
Learning Apache Spark With Python
dalalroshan
No ratings yet
CCA175 Demo Examenes
Document19 pages
CCA175 Demo Examenes
José Ramón Espinosa Muñoz
No ratings yet
PySpark Transformations Tutorial
Document58 pages
PySpark Transformations Tutorial
ravikumar lanka
100% (1)
Spark Transformations and Actions
Document24 pages
Spark Transformations and Actions
chandra
No ratings yet
Transformations and Actions: A Visual Guide of The API
Document122 pages
Transformations and Actions: A Visual Guide of The API
Jorge Emilio Roa Barreto
No ratings yet
4 - Action and RDD Transformations
Document25 pages
4 - Action and RDD Transformations
ravikumar lanka
No ratings yet
Advanced Spark Training
Document49 pages
Advanced Spark Training
Syed Safian
0% (1)
Ravi Pyspark RDD Tutorial 1665758938
Document20 pages
Ravi Pyspark RDD Tutorial 1665758938
Sree Krith
No ratings yet
Apache Spark Tutorials
Document9 pages
Apache Spark Tutorials
ronics123
No ratings yet
Neo4j Cypher Refcard 4
Document21 pages
Neo4j Cypher Refcard 4
kalyan.b.aninda5312
No ratings yet
5 - Programming With RDDs and Dataframes
Document32 pages
5 - Programming With RDDs and Dataframes
ravikumar lanka
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Spark
Document17 pages
Spark
Ravi Kumar
No ratings yet
Sap Hana - All About Views
From Everand
Sap Hana - All About Views
Alka Jain
Rating: 5 out of 5 stars
5/5 (29)
Assignment No 11
Document4 pages
Assignment No 11
Nandini Yamale
No ratings yet
Big Data Analysis With Scala and Spark: Heather Miller
Document17 pages
Big Data Analysis With Scala and Spark: Heather Miller
dd
No ratings yet
Open Spark Shell
Document12 pages
Open Spark Shell
RamyaKrishnan
No ratings yet
Spark
Document13 pages
Spark
thunuguri santosh
No ratings yet
Unit-5 Spark
Document24 pages
Unit-5 Spark
nosopa5904
No ratings yet
2.RDDs in Spark
Document38 pages
2.RDDs in Spark
durgapriyachikkala05
No ratings yet
Mongo DB Map Reduce
Document3 pages
Mongo DB Map Reduce
Ashok Avulamanda
No ratings yet
Chapter 2
Document35 pages
Chapter 2
Ace
No ratings yet
Chapter 2
Document35 pages
Chapter 2
amir
No ratings yet
Pyspark Essentials
Document24 pages
Pyspark Essentials
Basudev Chhotray
No ratings yet
DMSL Assignment 12
Document6 pages
DMSL Assignment 12
sirefen421
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
Document60 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
Ayush Jha
No ratings yet
4a Resilient Distributed Datasets Etc PDF
Document46 pages
4a Resilient Distributed Datasets Etc PDF
23522020 Danendra Athallariq Harya P
No ratings yet
Spark RDD: Nikolay Join Us in Telegram T.me/apache - Spark 2019
Document28 pages
Spark RDD: Nikolay Join Us in Telegram T.me/apache - Spark 2019
RAHUL SANAP
No ratings yet
RDD Numeric
Document2 pages
RDD Numeric
bhargavi
No ratings yet
BDA List of Experiments For Practical Exam
Document21 pages
BDA List of Experiments For Practical Exam
Pharoah Gamerz
No ratings yet
Assignment Group B16
Document3 pages
Assignment Group B16
mayank
No ratings yet
RDD Programing 8
Document28 pages
RDD Programing 8
chandrasekhar yerragandhula
No ratings yet
Unit-3 Introduction To MapReduce Programming
Document17 pages
Unit-3 Introduction To MapReduce Programming
Siva
No ratings yet
What Is MapReduce
Document6 pages
What Is MapReduce
Sundaram yadav
No ratings yet
Spark
Document12 pages
Spark
PRAMOTH KJ
No ratings yet
Big Data - Spark
Document72 pages
Big Data - Spark
SuprasannaPradhan
100% (1)
Week 10
Document15 pages
Week 10
Hanumanthu Gouthami
No ratings yet
Hadoop Interview Questions Faq
Document14 pages
Hadoop Interview Questions Faq
mihirhota
No ratings yet
Spark Transformations and Actions
Document4 pages
Spark Transformations and Actions
juliatomva
No ratings yet
Pyspark Questions & Scenario Based
Document25 pages
Pyspark Questions & Scenario Based
Sowjanya Vakkalanka
No ratings yet
Spark Tutorial
Document17 pages
Spark Tutorial
Gerardo Perez
No ratings yet
BDA Unit 3 Notes
Document11 pages
BDA Unit 3 Notes
duraisamy
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Document36 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Mony Simonetti
No ratings yet
Bda Unit 4
Document20 pages
Bda Unit 4
Keshava Varma
No ratings yet
Filter - This Is A Python Inbuilt Library That Returns Only Those
Document4 pages
Filter - This Is A Python Inbuilt Library That Returns Only Those
rockinever
No ratings yet
18mcs35e U4
Document7 pages
18mcs35e U4
jefferyleclerc
No ratings yet
Spark RDD
Document4 pages
Spark RDD
Jagadeesh Reddy
No ratings yet
Map Reduce Architecture: Adapted From Lectures by
Document37 pages
Map Reduce Architecture: Adapted From Lectures by
Anandh Kumar
No ratings yet
Hadoop Spark
Document34 pages
Hadoop Spark
klogeswaran.it
No ratings yet
Assignment 3
Document6 pages
Assignment 3
Aayush Mittal
No ratings yet
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
Document5 pages
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
benben08
No ratings yet
BDA Lab 5
Document6 pages
BDA Lab 5
Mohit Gangwani
No ratings yet
SpectreRF Matlab Toolbox
Document20 pages
SpectreRF Matlab Toolbox
luminedinburgh
No ratings yet
CS226 06 RDD
Document29 pages
CS226 06 RDD
chenna kesava
No ratings yet
Big Data Management
Document5 pages
Big Data Management
Keshav Chaudhary
No ratings yet
Action and Transformations (Wide and Narrow)
Document7 pages
Action and Transformations (Wide and Narrow)
velamatiskiran
No ratings yet
Unit1 ML Programs
Document5 pages
Unit1 ML Programs
diroja5648
No ratings yet
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet