Exam 2023

1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review
Dashboard / My courses / 2022-2023 / 2º ciclo / Pós-Graduações / Outono / ABD-400083-202223-S1 / ABD22 1st Exame 6-January
/ ABD22 1st Exam - 6 January
Started on Friday, 6 January 2023, 6:30 PM

State Finished
Completed on Friday, 6 January 2023, 7:53 PM
Time taken 1 hour 23 mins
Question 1
Complete
Marked out of 0.50
What is the meaning of the data processing model: “Scale Out”?
a. Means adding bigger, more powerful machines

b. Means adding more, smaller machines
c. Means processing data with scalable machines
d. Means processing data outside the cluster
Question 2
Complete
Marked out of 0.50
Is a Spark ML program, what is the purpose of the code bellow?
model.fit(mydata)
a. Train a machine learning model based on the data of ‘mydata’

b. Apply the model in ‘model’ to the data in ‘mydata’
c. Create a new model based on ‘mydata’
d. Adjust the model based on the data in ‘mydata’
Question 3
Complete
Marked out of 0.50
What is a Pair RDD?
a. Two RDDs in sequence in a transformation statement

b. An RDD with two rows
c. An RDD with a pair operation
d. An RDD with key-value pairs
https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 1/13
Question 4
Complete
Marked out of 0.50
What is a DataFrame?
a. A data processing method in Spark

b. A distributed dataset in-memory
c. A distributed dataset on-disk
d. A distributed dataset on-disk and in-memory
Question 5
Complete
Marked out of 0.50
What is a Tumbling Window in Spark Streaming?
a. A fixed-sized, non-overlapping and contiguous window of data
b. An overlapping and contiguous window of data
c. A non-contiguous window of data
d. A dynamic size window of data
Question 6
Complete
Marked out of 0.50
What is the difference between Spark Streaming and Structured Streaming?
a. Structured Streaming is for structured streaming data processing and Spark Streaming is for unstructured streaming data processing
b. Spark Streaming is the new ASF library for Streaming Data and Structured Streaming the old one
c. Structured Steaming is a stream processing engine and Spark Streaming is an extension to the core Spark API to streaming data
processing
d. Structured Streaming relies on micro batch and RDDs while Spark Streaming relies on DataFrames and Datasets
Question 7
Complete
Marked out of 0.50
What is DSL used for in GraphFrames?
a. Formatting the output of a GraphFrame query
b. Declare a GraphFrame object
c. Search for patterns in a graph
d. Define properties in a GraphFrame
Question 8
Complete
Marked out of 0.50
What is the result of the Spark statement bellow?

sc.parallelize(mydata,3)
a. Creates an RDD with a minimum of 3 partitions
b. Creates an RDDs named mydata and the value 3
c. Generates an error of too many parameters
d. Creates 3 RDDs with mydata
Question 9
Complete
Marked out of 0.50
In Databricks notebooks you can?
a. Program only in Python and Spark

b. Program only in the language defined in the Notebook creation
c. Select a language at the cell level
d. Select a language for a set of cells
Question 10
Complete
Marked out of 0.50
1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.

2. How many rows are in the file?
a. 98520
b. 53940
c. 12434
d. 10365
Question 11
Complete
Marked out of 0.50
Consider the friends graphframe that we viewed in the classes and that can be create with the following code:
from graphframes import *
from graphframes.examples import Graphs
g = Graphs(spark).friends()
What is the number of vertices of the graphframe?
a. 6
b. 4
c. 7
d. 5
Question 12
Complete
Marked out of 0.50
In Spark, lazy execution means that:
a. Execution will take some time because it needs to be sent to the worker nodes
b. Execution will take some time because the code is interpreted
c. Execution is done one line at the time
d. Execution is triggered only when an action is found
Question 13
Complete
Marked out of 0.50
In a Databricks notebook, to access the cluster driver node console, what magic command is used?
a. %fs
b. %drive
c. dbutils.fs.mount()
d. %sh
Question 14
Complete
Marked out of 0.50

2. What is the value of all diamonds (the sum of the column "price")?
a. 933455678
b. 123478543
c. 212135217
d. 523135234
Question 15
Complete
Marked out of 0.50
What is Avro in Hadoop?
a. A program to load data with high parallelization
b. A column-based data format
c. A row-based data format
d. A text-based data format for compatibility and portability
Question 16
Complete
Marked out of 0.50
With the instruction sc.textFile(“file:/data”) you are?
a. Reading a file from your hdfs file system

b. Reading a file called “file:/data”
c. Reading a file from your local non-Hadoop file system
d. Reading a file called “data” stored in a folder called "file"
Question 17
Complete
Marked out of 0.50
Select the false statement regarding Spark terminology:
a. A Job is a set of tasks executed as a result of an action

b. A Stage is a set of tasks in a job that can be executed in parallel
c. A task is an individual unit of work sent to an executor
d. An Application can only contain one job
Question 18
Complete
Marked out of 0.50
Select the right statement to create a Dataframe:
a. spark.range(10)
b. sc.textFile(“mydata”)
c. spark.dataFrame(“mydata”)
d. sc.parallelize(“mydata”)
Question 19
Complete
Marked out of 0.50
1. Load the file 'poems.txt' file from Moodle ABD class page and technical resources folder to the Databricks file system and create an RDD
with it.
2. How many words are in the file "poems.txt"?
Note: words with special characters or numbers are valid words
a. 245
b. 98
c. 232
d. 124
Question 20
Complete
Marked out of 0.50
How many files are in the dbfs folder '/databricks-datasets/adult/'?
a. 3
b. 9
c. 0
d. 2
Question 21
Complete
Marked out of 0.50
1. Create a DF based on the file '/databricks-datasets/samples/population-vs-price/data_geo.csv' in the dbfs.

2. How many null values are in the column "2015 median sales price" of the DF?
a. 142
b. 84
c. 234
d. 185
Question 22
Complete
Marked out of 0.50
2. What is the sum of the "price" where color = 'E' and cut = 'Premium'?
a. 9381456
b. 1287909
c. 2288145
d. 8270443
Question 23
Complete
Marked out of 0.50
Select the right statement regarding Spark transformations:
a. Wide transformations are very efficient because they don’t move data from the node
b. Narrow transformations are very efficient because they don’t move data from the node
c. Both wide and narrow transformations move data from the node
d. None of the narrow or wide transformations move data from the node
Question 24
Complete
Marked out of 0.50
with it.
2. How many distinct words are in the file "poems.txt"?
Note: words with special characters or numbers are valid words
a. 152
b. 53
c. 131
d. 174
Question 25
Complete
Marked out of 0.50

2. What is the kurtosis of the column "price" of the DF?
a. 3.21
b. 2.17
c. 5.46
d. 1.53
Question 26
Complete
Marked out of 0.50
Select the right statement to create a DataFrame:
a. DF = spark.createDataFrame([“Line 1”, “Line 2”])

b. DF = spark.table("mytable")
c. DF = spark.load("mytable")
d. DF = sc.parallelize([“Line1”, “Line2”])
Question 27
Complete
Marked out of 0.50

2. What is the lowest value of the (expression) "price" / "carat" (columns) in the DF?
Note: do not consider the decimal places for checking the right answer
a. 7627
b. 2121
c. 1051
d. 1134
Question 28
Complete
Marked out of 0.50
What kind of Managed table will be created with the Spark statement bellow?
df.write.saveAsTable(“table_name”)
a. A Managed table
b. An Unmanaged table
c. A shared managed table
d. A semi-managed table
Question 29
Complete
Marked out of 0.50
What is the output object type that results from applying a map() function to an RDD that was created from a text file with the sc.textFile()
method?
a. String
b. Tuple
c. List
d. Dictionary
Question 30
Complete
Marked out of 1.00
1. Load the file 'd2buy.csv' file from Moodle ABD class page and technical resources folder to the Databricks file system and create a DF
with it.
2. Join the diamonds DF: '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' with the 'd2buy' DF, using the join columns /
condition: diamonds._c0 == d2buy.d_id.
3. Calculate the sum of the prices of the diamonds with the value 'Y' in the field 'd2buy' of the DF 'd2buy.csv' and with the field 'color' = 'E'.
a. 2401
b. 2089
c. 2101
d. 2826
Question 31
Complete
Marked out of 0.50
1. Load the file 'd2buy.csv' file from Moodle ABD class page and technical resources folder to the Databricks file system and create a DF with
it.
2. Join the diamonds DF: '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' with the 'd2buy' DF, using the join columns /
condition: diamonds._c0 == d2buy.d_id.
3. Calculate the sum of the prices of the diamonds with the value 'Y' in the field 'd2buy' of the DF 'd2buy.csv' and with the word 'Good' in the
column 'cut'.
a. 64304
b. 93489
c. 32604
d. 11534
Question 32
Complete
Marked out of 0.50
Consider the friends graphframe that we viewed in the classes and that can be create with the following code:
from graphframes import *

from graphframes.examples import Graphs
g = Graphs(spark).friends()
What is the node/vertex with the higher number of outgoing edges?
a. c
b. b
c. e
d. f
Question 33
Complete
Marked out of 0.50
In a Spark ML program, what is the purpose of the code bellow?
model.transform(mydata)
a. Create a machine learning model based on the data of ‘mydata’

b. Apply the model in ‘model’ to the data in ‘mydata’
c. Create a new model based on ‘mydata’
d. Adjust the model based on the data in ‘mydata’
Question 34
Complete
Marked out of 0.50
The vertex DataFrame in a GraphFrame is?
a. A free form DataFrame
b. A DataFrame that must contain a column named 'id'
c. A DataFrame that must contain a column named 'src' and 'dst'
d. A DataFrame that must contain a column named 'id', 'src' and 'dst'
Question 35
Complete
Marked out of 0.50
What is the value of the Spark configuration variable "spark.sql.shuffle.partitions"?
a. 50
b. 500
c. 100
d. 200
Question 36
Complete
Marked out of 0.50
What is a lambda function?
a. It’s a function defined without a name and with only one parameter
b. It’s a function defined without a name and with only one expression
c. It’s a function that can be reused with many parameters
d. It’s a function that can be reused with many expressions
Question 37
Complete
Marked out of 0.50
What is the result of the Spark ML instruction bellow?
lr = LogisticRegression(maxIter=10)
a. A logistic regression object is declared with a maximum of 10 interactions
b. A logistic regression is executed with a maximum of 10 interactions
c. A logistic regression is trained with a maximum of 10 interactions
d. A logistic regression is estimated with a maximum of 10 interactions
Question 38
Complete
Marked out of 0.50
What is the data type that results from the following spark instruction: spark.range(10)?
a. A list
b. A tuple
c. An RDD
d. A DataFrame
Question 39
Complete
Marked out of 0.50
Select the right statement regarding reduceByKey():
a. reduceByKey() is a wide transformation
b. reduceByKey() is a narrow transformation
c. reduceByKey() is a lazy transformation
d. reduceByKey() is an action
Question 40
Complete
Marked out of 0.50
with it.
2. How many lines are in the file "poems.txt" with the word "the"?
a. 3
b. 8
c. 15
d. 11
◄ ABD 1st Exam instructions
Jump to...

Exam 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exam 2023

Uploaded by

Copyright:

Available Formats

1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

/ ABD22 1st Exam - 6 January

Started on Friday, 6 January 2023, 6:30 PM

Marked out of 0.50

What is the meaning of the data processing model: “Scale Out”?

a. Means adding bigger, more powerful machines

Marked out of 0.50

Is a Spark ML program, what is the purpose of the code bellow?

a. Train a machine learning model based on the data of ‘mydata’

Marked out of 0.50

What is a Pair RDD?

a. Two RDDs in sequence in a transformation statement

Marked out of 0.50

a. A data processing method in Spark

Marked out of 0.50

What is a Tumbling Window in Spark Streaming?

a. A fixed-sized, non-overlapping and contiguous window of data

b. An overlapping and contiguous window of data

c. A non-contiguous window of data

d. A dynamic size window of data

Marked out of 0.50

What is the difference between Spark Streaming and Structured Streaming?

Marked out of 0.50

What is DSL used for in GraphFrames?

a. Formatting the output of a GraphFrame query

b. Declare a GraphFrame object

c. Search for patterns in a graph

d. Define properties in a GraphFrame

Marked out of 0.50

What is the result of the Spark statement bellow?

a. Creates an RDD with a minimum of 3 partitions

b. Creates an RDDs named mydata and the value 3

c. Generates an error of too many parameters

d. Creates 3 RDDs with mydata

Marked out of 0.50

In Databricks notebooks you can?

a. Program only in Python and Spark

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.

Marked out of 0.50

What is the number of vertices of the graphframe?

Marked out of 0.50

In Spark, lazy execution means that:

b. Execution will take some time because the code is interpreted

c. Execution is done one line at the time

d. Execution is triggered only when an action is found

Marked out of 0.50

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.

Marked out of 0.50

What is Avro in Hadoop?

a. A program to load data with high parallelization

b. A column-based data format

c. A row-based data format

d. A text-based data format for compatibility and portability

Marked out of 0.50

With the instruction sc.textFile(“file:/data”) you are?

a. Reading a file from your hdfs file system

Marked out of 0.50

Select the false statement regarding Spark terminology:

a. A Job is a set of tasks executed as a result of an action

Marked out of 0.50

Select the right statement to create a Dataframe: