Spark Preliminaries

Apache spark has which of the following capabilities?
All the options--rgt
Which of the following application types can Spark run in addition to batch-
processing jobs?
Which of the following is NOT a characteristic shared by Hadoop and Spark?

Both have their own file system--rgt
Programming paradigm used in Spark

generilies--rgt
Spark is 100x faster than MapReduce due to development in Scala

false--rgt
Spark has API's in?

What kind of data can be handled by Spark ?

What year was Apache Spark made an open source technology?

2010--rgt
The transformation which produces one output value for each input value and the
operation which produces an arbitrary number values for each input value.
map(),flatmap()--rgt
Choose correct statement

Execution starts with the call of Action--rgt
Choose correct statement about RDD

RDD is a distributed data structure--rgt
Which action returns all the elements of the dataset as an array.

collect()--rgt
RDD is
Identify correct transformation

We can edit the data of RDD like conversion to uppercase

false--rgt
Spark can integrate with which of the following data storage systems?
Spark supports loading data from Hbase.

True--rgt
Benefits of using appropriate file formats in Spark

An instance of the Spark SQL execution engine that integrates with data stored in
Hive:
This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00
https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
HiveContext--rgt
Which of the following file formats are supported by Spark ?

Types of operations that can be performed on RDDs

Action and Map--rgt
Which of the following is true of running a Spark application on Hadoop YARN?

There are two deploy modes that can be used to launch Spark applications on YARN –
client mode and cluster mode--rgt
Which tells spark how and where to access a cluster

Spark Context
To launch a Spark application in any one of the four modes(local, standalone, MESOS
or YARN) use
./bin/spark-submit--rgt
Which of the following Scala statement would be most appropriate to load the data
(sfpd.txt) into an RDD? Assume that SparkContext is available as the variable “sc”
and SQLContext as the variable “sqlContext.”
val sfpd=sc.textFile(“/path to file/sfpd.txt”)--rgt
Which tells spark how and where to access a cluster

Spark Context--rgt
Which is responsible for task scheduling and memory management ?

Spark Core--rgt
By default Spark uses which algorithm to remove old and unused RDD to release more
memory.
Least Recently Used (LRU)rgt
Which is not a Storage level in Spark ?

HEAPANDDISK--rgt
RDDs can also be unpersisted to remove RDD from a permanent storage like memory
and/or disk.
true--rgt
Which is the default Storage level in Spark ?

MEMORY_ONLY--rgt
Which of the following is true of caching the RDD ?

Spark can store its data in?
The no of stages in a job is usually equal to the no of RDD's in the DAG. However
the scheduler can truncate the lineage when
There is no movement of data from the parent RDD
In Spark-Shell, which all contexts are available by default?
SparkContext
Both-------------
What is meant by RDD Lazy Evaluation
All the options
Spark cache the data automatically in the memory as and when needed
True--correct
Choose correct statement about Spark Context
Both
What happens if RDD partition is lost due to worker node failure
Lost partition is recomputed
Which of the following is true of the spark interactive shell
Allows you to write programs interactively--correct
In-memory computing
For resource management spark can use
Yarn
What is an action in Spark
Return a value to the driver after running a computation on the dataset
The cache() operation is a synonym of persist() that uses the default storage level
MEMORY_ONLY .
True
Spark Core Abstraction
RDD--correct
Do you need to install Spark on all nodes of Yarn cluster while running Spark on
Yarn?
No because Spark runs on top of Yarn.

Yes-------------------------------------
How can you create an RDD for a text file?
SparkContext.textFile--correct
Which are true of a broadcast variable
It is a shared variable--correct
Which all statements about Spark are true?
All the options
Which all are the ways to configure Spark Properties ?
All the options
Which all types of file system Spark supports?
All the options--correct
Transformations are computed lazily.
True---correct
Which are the various data sources available in Spark Sql?
Spark is 100x faster than MapReduce due to
In-memory computing--correct
What is an Accumulator
Which are the methods to create RDD in spark
By parallelizing a collection in your Driver program.--correct
Which type of processing Apache Spark can handle
All the options
How would you control the number of partitions of a RDD?
Both
Which language is not supported for Spark Development ?
C++
Powered by TCPDF (www.tcpdf.org)

Spark Preliminaries

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spark Preliminaries

Uploaded by

Copyright:

Available Formats

Apache spark has which of the following capabilities?

All the options--rgt

Which of the following is NOT a characteristic shared by Hadoop and Spark?

Programming paradigm used in Spark

Spark is 100x faster than MapReduce due to development in Scala

Spark has API's in?

What kind of data can be handled by Spark ?

What year was Apache Spark made an open source technology?

Choose correct statement

Choose correct statement about RDD

Which action returns all the elements of the dataset as an array.

Identify correct transformation

We can edit the data of RDD like conversion to uppercase

Spark supports loading data from Hbase.

Benefits of using appropriate file formats in Spark

Which of the following file formats are supported by Spark ?

Types of operations that can be performed on RDDs

Which of the following is true of running a Spark application on Hadoop YARN?

Which tells spark how and where to access a cluster

Which tells spark how and where to access a cluster

Which is responsible for task scheduling and memory management ?

Which is not a Storage level in Spark ?

Which is the default Storage level in Spark ?

Which of the following is true of caching the RDD ?

Spark can store its data in?

All the options--rgt

There is no movement of data from the parent RDD

In Spark-Shell, which all contexts are available by default?

What is meant by RDD Lazy Evaluation

All the options

Choose correct statement about Spark Context

What happens if RDD partition is lost due to worker node failure

Lost partition is recomputed

Which of the following is true of the spark interactive shell

Allows you to write programs interactively--correct

For resource management spark can use

What is an action in Spark

Return a value to the driver after running a computation on the dataset

Spark Core Abstraction

No because Spark runs on top of Yarn.

How can you create an RDD for a text file?

Which are true of a broadcast variable

Which all statements about Spark are true?

All the options

Which all are the ways to configure Spark Properties ?

All the options

Which all types of file system Spark supports?

All the options--correct

Transformations are computed lazily.

Which are the various data sources available in Spark Sql?

All the options--correct

Spark is 100x faster than MapReduce due to

All the options--correct

Which are the methods to create RDD in spark

By parallelizing a collection in your Driver program.--correct

Which type of processing Apache Spark can handle

All the options

How would you control the number of partitions of a RDD?

Which language is not supported for Spark Development ?

You might also like