You are on page 1of 3

12/27/2015

SparkSQLandHive|Coursera

Spark SQL and Hive


6questions

1.
What makes DataFrames and Database tables conceptually equivalent?
Both support any number of rows
They are collections of rows with typed columns
They are collections of columns with typed rows

2.
What is the functionality of registerTempTable?
Save a temporary table to Hive
Prepare a temporary database table interface for a DataFrame
Save a DataFrame to Hive
Save a temporary table to HDFS

3.
What is the most efficient interface to analyze data with DataFrames and
why?
Either DataFrame calls or SQL are equally efficient because they
feed to the same optimizer.
DataFrame calls are native to Spark so are more efficient.
https://www.coursera.org/learn/bigdataanalytics/exam/ZQuyq/sparksqlandhive

1/3

12/27/2015

SparkSQLandHive|Coursera

Either DataFrame calls or SQL are equally efficient because


DataFrame calls are translated to SQL under the hood.
SQL is more efficient because is more low level, therefore there
is less overhead.

4.
Why would you want to use the SQL interface instead of DataFrame calls?
Check all the multiple options that apply
In a PySpark shell it is a lot easier to debug SQL than DataFrame
calls
My analysis is written more easily in SQL
Have already SQL code from a previous application
It is more efficient

5.
How to setup Spark so it can connect to Hive?
Copy hive-site.xml to Spark's conf folder
Configure Hive properties on the SparkContext object sc
Open pyspark shell with --hive argument

6.
Which of these objects are persistent across different PySpark shell
instances (i.e. close shell and restart it again)?
DataFrames registered with registerTempTable
DataFrames saved to Hive with saveAsTable
DataFrames cached in memory with .cache
https://www.coursera.org/learn/bigdataanalytics/exam/ZQuyq/sparksqlandhive

2/3

12/27/2015

SparkSQLandHive|Coursera

DataFrames

Submit Quiz

https://www.coursera.org/learn/bigdataanalytics/exam/ZQuyq/sparksqlandhive

3/3

You might also like