You are on page 1of 4

Apache spark has which of the following capabilities?

All the options--rgt

Which of the following application types can Spark run in addition to batch-
processing jobs?
All the options--rgt

Which of the following is NOT a characteristic shared by Hadoop and Spark?


Both have their own file system--rgt

Programming paradigm used in Spark


generilies--rgt

Spark is 100x faster than MapReduce due to development in Scala


false--rgt

Spark has API's in?


All the options--rgt

What kind of data can be handled by Spark ?


All the options--rgt

What year was Apache Spark made an open source technology?


2010--rgt

The transformation which produces one output value for each input value and the
operation which produces an arbitrary number values for each input value.
map(),flatmap()--rgt

Choose correct statement


Execution starts with the call of Action--rgt

Choose correct statement about RDD


RDD is a distributed data structure--rgt

Which action returns all the elements of the dataset as an array.


collect()--rgt

RDD is
All the options--rgt

Identify correct transformation


All the options--rgt

We can edit the data of RDD like conversion to uppercase


false--rgt

Spark can integrate with which of the following data storage systems?
All the options--rgt

Spark supports loading data from Hbase.


True--rgt

Benefits of using appropriate file formats in Spark


All the options--rgt

An instance of the Spark SQL execution engine that integrates with data stored in
Hive:

This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00

https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
HiveContext--rgt

Which of the following file formats are supported by Spark ?


All the options--rgt

Types of operations that can be performed on RDDs


Action and Map--rgt

Which of the following is true of running a Spark application on Hadoop YARN?


There are two deploy modes that can be used to launch Spark applications on YARN –
client mode and cluster mode--rgt

Which tells spark how and where to access a cluster


Spark Context

To launch a Spark application in any one of the four modes(local, standalone, MESOS
or YARN) use
./bin/spark-submit--rgt

Which of the following Scala statement would be most appropriate to load the data
(sfpd.txt) into an RDD? Assume that SparkContext is available as the variable “sc”
and SQLContext as the variable “sqlContext.”
val sfpd=sc.textFile(“/path to file/sfpd.txt”)--rgt

Which tells spark how and where to access a cluster


Spark Context--rgt

Which is responsible for task scheduling and memory management ?


Spark Core--rgt

By default Spark uses which algorithm to remove old and unused RDD to release more
memory.
Least Recently Used (LRU)rgt

Which is not a Storage level in Spark ?


HEAPANDDISK--rgt

RDDs can also be unpersisted to remove RDD from a permanent storage like memory
and/or disk.
true--rgt

Which is the default Storage level in Spark ?


MEMORY_ONLY--rgt

Which of the following is true of caching the RDD ?


All the options--rgt

Spark can store its data in?

All the options--rgt

The no of stages in a job is usually equal to the no of RDD's in the DAG. However
the scheduler can truncate the lineage when

There is no movement of data from the parent RDD

In Spark-Shell, which all contexts are available by default?

This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00

https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
SparkContext
Both-------------

What is meant by RDD Lazy Evaluation

All the options

Spark cache the data automatically in the memory as and when needed

True--correct

Choose correct statement about Spark Context

Both

What happens if RDD partition is lost due to worker node failure

Lost partition is recomputed

Which of the following is true of the spark interactive shell

Allows you to write programs interactively--correct

In-memory computing

For resource management spark can use

Yarn

What is an action in Spark

Return a value to the driver after running a computation on the dataset

The cache() operation is a synonym of persist() that uses the default storage level
MEMORY_ONLY .

True

Spark Core Abstraction

RDD--correct

Do you need to install Spark on all nodes of Yarn cluster while running Spark on
Yarn?

No because Spark runs on top of Yarn.


Yes-------------------------------------

How can you create an RDD for a text file?

SparkContext.textFile--correct

Which are true of a broadcast variable

This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00

https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
It is a shared variable--correct

Which all statements about Spark are true?

All the options

Which all are the ways to configure Spark Properties ?

All the options

Which all types of file system Spark supports?

All the options--correct

Transformations are computed lazily.

True---correct

Which are the various data sources available in Spark Sql?

All the options--correct

Spark is 100x faster than MapReduce due to

In-memory computing--correct

What is an Accumulator

All the options--correct

Which are the methods to create RDD in spark

By parallelizing a collection in your Driver program.--correct

Which type of processing Apache Spark can handle

All the options

How would you control the number of partitions of a RDD?

Both

Which language is not supported for Spark Development ?

C++

This study source was downloaded by 100000839058166 from CourseHero.com on 06-09-2022 20:52:22 GMT -05:00

https://www.coursehero.com/file/55020499/Spark-Preliminariestxt/
Powered by TCPDF (www.tcpdf.org)

You might also like