Professional Documents
Culture Documents
Which of the following application types can Spark run in addition to batch-
processing jobs?
All the options--rgt
The transformation which produces one output value for each input value and the
operation which produces an arbitrary number values for each input value.
map(),flatmap()--rgt
RDD is
All the options--rgt
Spark can integrate with which of the following data storage systems?
All the options--rgt
An instance of the Spark SQL execution engine that integrates with data stored in
Hive:
This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00
https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
HiveContext--rgt
To launch a Spark application in any one of the four modes(local, standalone, MESOS
or YARN) use
./bin/spark-submit--rgt
Which of the following Scala statement would be most appropriate to load the data
(sfpd.txt) into an RDD? Assume that SparkContext is available as the variable “sc”
and SQLContext as the variable “sqlContext.”
val sfpd=sc.textFile(“/path to file/sfpd.txt”)--rgt
By default Spark uses which algorithm to remove old and unused RDD to release more
memory.
Least Recently Used (LRU)rgt
RDDs can also be unpersisted to remove RDD from a permanent storage like memory
and/or disk.
true--rgt
The no of stages in a job is usually equal to the no of RDD's in the DAG. However
the scheduler can truncate the lineage when
This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00
https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
SparkContext
Both-------------
Spark cache the data automatically in the memory as and when needed
True--correct
Both
In-memory computing
Yarn
The cache() operation is a synonym of persist() that uses the default storage level
MEMORY_ONLY .
True
RDD--correct
Do you need to install Spark on all nodes of Yarn cluster while running Spark on
Yarn?
SparkContext.textFile--correct
This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00
https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
It is a shared variable--correct
True---correct
In-memory computing--correct
What is an Accumulator
Both
C++
This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00
https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
Powered by TCPDF (www.tcpdf.org)