You are on page 1of 13

1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Dashboard / My courses / 2022-2023 / 2º ciclo / Pós-Graduações / Outono / ABD-400083-202223-S1 / ABD22 1st Exame 6-January

/ ABD22 1st Exam - 6 January

Started on Friday, 6 January 2023, 6:30 PM


State Finished
Completed on Friday, 6 January 2023, 7:53 PM
Time taken 1 hour 23 mins

Question 1
Complete

Marked out of 0.50

What is the meaning of the data processing model: “Scale Out”?

a. Means adding bigger, more powerful machines


b. Means adding more, smaller machines
c. Means processing data with scalable machines
d. Means processing data outside the cluster

Question 2
Complete

Marked out of 0.50

Is a Spark ML program, what is the purpose of the code bellow?

model.fit(mydata)

a. Train a machine learning model based on the data of ‘mydata’


b. Apply the model in ‘model’ to the data in ‘mydata’
c. Create a new model based on ‘mydata’
d. Adjust the model based on the data in ‘mydata’

Question 3
Complete

Marked out of 0.50

What is a Pair RDD?

a. Two RDDs in sequence in a transformation statement


b. An RDD with two rows
c. An RDD with a pair operation
d. An RDD with key-value pairs

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 1/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 4
Complete

Marked out of 0.50

What is a DataFrame?

a. A data processing method in Spark


b. A distributed dataset in-memory
c. A distributed dataset on-disk
d. A distributed dataset on-disk and in-memory

Question 5
Complete

Marked out of 0.50

What is a Tumbling Window in Spark Streaming?

a. A fixed-sized, non-overlapping and contiguous window of data

b. An overlapping and contiguous window of data

c. A non-contiguous window of data

d. A dynamic size window of data

Question 6
Complete

Marked out of 0.50

What is the difference between Spark Streaming and Structured Streaming?

a. Structured Streaming is for structured streaming data processing and Spark Streaming is for unstructured streaming data processing
b. Spark Streaming is the new ASF library for Streaming Data and Structured Streaming the old one
c. Structured Steaming is a stream processing engine and Spark Streaming is an extension to the core Spark API to streaming data
processing
d. Structured Streaming relies on micro batch and RDDs while Spark Streaming relies on DataFrames and Datasets

Question 7
Complete

Marked out of 0.50

What is DSL used for in GraphFrames?

a. Formatting the output of a GraphFrame query 

b. Declare a GraphFrame object

c. Search for patterns in a graph

d. Define properties in a GraphFrame

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 2/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 8
Complete

Marked out of 0.50

What is the result of the Spark statement bellow?


 
        sc.parallelize(mydata,3)

a. Creates an RDD with a minimum of 3 partitions 

b. Creates an RDDs named mydata and the value 3 

c. Generates an error of too many parameters

d. Creates 3 RDDs with mydata

Question 9
Complete

Marked out of 0.50

In Databricks notebooks you can?

a. Program only in Python and Spark


b. Program only in the language defined in the Notebook creation
c. Select a language at the cell level
d. Select a language for a set of cells

Question 10
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.


2. How many rows are in the file?

a. 98520

b. 53940

c. 12434

d. 10365

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 3/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 11
Complete

Marked out of 0.50

Consider the friends graphframe that we viewed in the classes and that can be create with the following code:
from graphframes import *
from graphframes.examples import Graphs
g = Graphs(spark).friends()

What is the number of vertices of the graphframe?

a. 6

b. 4

c. 7

d. 5

Question 12
Complete

Marked out of 0.50

In Spark, lazy execution means that:

a. Execution will take some time because it needs to be sent to the worker nodes

b. Execution will take some time because the code is interpreted

c. Execution is done one line at the time

d. Execution is triggered only when an action is found

Question 13
Complete

Marked out of 0.50

In a Databricks notebook, to access the cluster driver node console, what magic command is used?

a. %fs
b. %drive
c. dbutils.fs.mount()
d. %sh

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 4/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 14
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.


2. What is the value of all diamonds (the sum of the column "price")?

a. 933455678

b. 123478543

c. 212135217

d. 523135234

Question 15
Complete

Marked out of 0.50

What is Avro in Hadoop?

a. A program to load data with high parallelization

b. A column-based data format

c. A row-based data format

d. A text-based data format for compatibility and portability

Question 16
Complete

Marked out of 0.50

With the instruction sc.textFile(“file:/data”) you are?

a. Reading a file from your hdfs file system


b. Reading a file called “file:/data”
c. Reading a file from your local non-Hadoop file system
d. Reading a file called “data” stored in a folder called "file"

Question 17
Complete

Marked out of 0.50

Select the false statement regarding Spark terminology:

a. A Job is a set of tasks executed as a result of an action


b. A Stage is a set of tasks in a job that can be executed in parallel
c. A task is an individual unit of work sent to an executor
d. An Application can only contain one job

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 5/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 18
Complete

Marked out of 0.50

Select the right statement to create a Dataframe:

a. spark.range(10)

b. sc.textFile(“mydata”)

c. spark.dataFrame(“mydata”)

d. sc.parallelize(“mydata”)

Question 19
Complete

Marked out of 0.50

1. Load the file 'poems.txt' file from Moodle ABD class page and technical resources folder to the Databricks file system and create an RDD
with it.
2. How many words are in the file "poems.txt"?

Note: words with special characters or numbers are valid words

a. 245

b. 98

c. 232

d. 124

Question 20
Complete

Marked out of 0.50

How many files are in the dbfs folder '/databricks-datasets/adult/'?

a. 3

b. 9

c. 0

d. 2

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 6/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 21
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/samples/population-vs-price/data_geo.csv' in the dbfs.


2. How many null values are in the column "2015 median sales price" of the DF?

a. 142

b. 84

c. 234

d. 185

Question 22
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.

2. What is the sum of the "price" where color = 'E' and cut = 'Premium'?

a. 9381456
b. 1287909
c. 2288145
d. 8270443

Question 23
Complete

Marked out of 0.50

Select the right statement regarding Spark transformations:

a. Wide transformations are very efficient because they don’t move data from the node

b. Narrow transformations are very efficient because they don’t move data from the node

c. Both wide and narrow transformations move data from the node

d. None of the narrow or wide transformations move data from the node

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 7/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 24
Complete

Marked out of 0.50

1. Load the file 'poems.txt' file from Moodle ABD class page and technical resources folder to the Databricks file system and create an RDD
with it.
2. How many distinct words are in the file "poems.txt"?

Note: words with special characters or numbers are valid words

a. 152

b. 53

c. 131

d. 174

Question 25
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.


2. What is the kurtosis of the column "price" of the DF?

a. 3.21

b. 2.17

c. 5.46

d. 1.53

Question 26
Complete

Marked out of 0.50

Select the right statement to create a DataFrame:

a. DF = spark.createDataFrame([“Line 1”, “Line 2”])


b. DF = spark.table("mytable")
c. DF = spark.load("mytable")
d. DF = sc.parallelize([“Line1”, “Line2”])

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 8/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 27
Complete

Marked out of 0.50

1. Create a DF based on the file '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' in the dbfs.


2. What is the lowest value of the (expression) "price" / "carat" (columns) in the DF? 

Note: do not consider the decimal places for checking the right answer

a. 7627

b. 2121

c. 1051

d. 1134

Question 28
Complete

Marked out of 0.50

What kind of Managed table will be created with the Spark statement bellow?
df.write.saveAsTable(“table_name”)

a. A Managed table

b. An Unmanaged table

c. A shared managed table

d. A semi-managed table

Question 29
Complete

Marked out of 0.50

What is the output object type that results from applying a map() function to an RDD that was created from a text file with the sc.textFile()
method?

a. String 

b. Tuple 

c. List

d. Dictionary

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 9/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 30
Complete

Marked out of 1.00

1. Load the file 'd2buy.csv' file from Moodle ABD class page and technical resources folder to the Databricks file system and create a DF
with it.
2. Join the diamonds DF: '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' with the 'd2buy' DF, using the join columns /
condition: diamonds._c0 == d2buy.d_id.
3. Calculate the sum of the prices of the diamonds with the value 'Y' in the field 'd2buy' of the DF 'd2buy.csv' and with the field 'color' = 'E'.

a. 2401

b. 2089

c. 2101

d. 2826

Question 31
Complete

Marked out of 0.50

1. Load the file 'd2buy.csv' file from Moodle ABD class page and technical resources folder to the Databricks file system and create a DF with
it.
2. Join the diamonds DF: '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv' with the 'd2buy' DF, using the join columns /
condition: diamonds._c0 == d2buy.d_id.

3. Calculate the sum of the prices of the diamonds with the value 'Y' in the field 'd2buy' of the DF 'd2buy.csv' and with the word 'Good' in the
column 'cut'.

a. 64304

b. 93489

c. 32604

d. 11534

Question 32
Complete

Marked out of 0.50

Consider the friends graphframe that we viewed in the classes and that can be create with the following code:

from graphframes import *


from graphframes.examples import Graphs
g = Graphs(spark).friends()
What is the node/vertex with the higher number of outgoing edges?

a. c

b. b

c. e

d. f

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 10/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 33
Complete

Marked out of 0.50

In a Spark ML program, what is the purpose of the code bellow?

model.transform(mydata)

a. Create a machine learning model based on the data of ‘mydata’


b. Apply the model in ‘model’ to the data in ‘mydata’
c. Create a new model based on ‘mydata’
d. Adjust the model based on the data in ‘mydata’

Question 34
Complete

Marked out of 0.50

The vertex DataFrame in a GraphFrame is?

a. A free form DataFrame

b. A DataFrame that must contain a column named 'id'

c. A DataFrame that must contain a column named 'src' and 'dst'

d. A DataFrame that must contain a column named 'id', 'src' and 'dst'

Question 35

Complete

Marked out of 0.50

What is the value of the Spark configuration variable "spark.sql.shuffle.partitions"?

a. 50

b. 500

c. 100

d. 200

Question 36
Complete

Marked out of 0.50

What is a lambda function?

a. It’s a function defined without a name and with only one parameter
b. It’s a function defined without a name and with only one expression
c. It’s a function that can be reused with many parameters
d. It’s a function that can be reused with many expressions

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 11/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 37
Complete

Marked out of 0.50

What is the result of the Spark ML instruction bellow?

lr = LogisticRegression(maxIter=10)

a. A logistic regression object is declared with a maximum of 10 interactions 

b. A logistic regression is executed with a maximum of 10 interactions 

c. A logistic regression is trained with a maximum of 10 interactions 

d. A logistic regression is estimated with a maximum of 10 interactions 

Question 38
Complete

Marked out of 0.50

What is the data type that results from the following spark instruction: spark.range(10)?

a. A list

b. A tuple

c. An RDD

d. A DataFrame

Question 39
Complete

Marked out of 0.50

Select the right statement regarding reduceByKey():

a. reduceByKey() is a wide transformation

b. reduceByKey() is a narrow transformation

c. reduceByKey() is a lazy transformation

d. reduceByKey() is an action

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 12/13
1/6/23, 7:53 PM ABD22 1st Exam - 6 January: Attempt review

Question 40
Complete

Marked out of 0.50

1. Load the file 'poems.txt' file from Moodle ABD class page and technical resources folder to the Databricks file system and create an RDD
with it.

2. How many lines are in the file "poems.txt" with the word "the"?

a. 3

b. 8

c. 15

d. 11

◄ ABD 1st Exam instructions

Jump to...

https://elearning.novaims.unl.pt/mod/quiz/review.php?attempt=89191&cmid=55154 13/13

You might also like