Dashboard / My courses / 2022-2023 2° cielo / Pés-Graduaces / Outono / ABD-400083-202223-51 / Class B - 2/3 November
{L intermediat Test 1
Started on Wednestay, November 2022, 650 PM,
State Finished
Completed on Wednesday, 2 November 2022, 800 PM:
Time taken 1 hour 9 mins
mome ‘Test instruction’s:
1. This isan open book individual test witha uration of 60 minutes.
2.You ae nat alowed to use your mobilephone nora communiate wih anyone but the professor
2. Questons 1 to 6 worth 1 pont
4. Question 7 and 8 worth 2 points (code must be fled in question and 10)
5. Quesvon 7 and 9 wil worth point without te corresponding code(obe file) in question 8 and 10.
onan panera rears 280 owQuesionT In the following ds path: databricks-datasts/defintive-quide/data/ight-data/cs, you wil find several es files.
Cores ‘What isthe s2e of the fle: 2010-summary.csv?
a. 6999
brea
ema
7007
auason Load the following fle toa DF (Dataframe):éatabricks-datasets/definitive-guide/data/Might-data/esv/201S-summary.cs
Soret How many rows do we have in the DF?
a sa
b 246
296
256
onan panera rears 280ues Load the following fle toa DF: '/databricks-datasets/definitive-quide/data/Might-data/esy/2015-summary sv
Cores How many different origin counties (origin. countty_name) do we have in the DF?
a 125
b 12
18
am
asin Load the following fle toa DF: Ydatabricks-datasets/definitive-gude/lata/Might-data/esv/2015-summarycsv
How many fights have the origin Corgin_county_ name) ‘Portugal’?
a M6
b 14
6 156
4123
onan panera rears 280 wae Load the following file to a DF: '/databricks-datasets/definitive-guide/data/flight-data/csv/2015-summary.csv'
Cer | tating rg USA de nye =" Ste? Qo may
Read the text following text fle toa DF: Ydatabricks-datasets/samples/dlocs/README md
How many tines do we have in the OF withthe word ‘Spark’ and ending with a?
a3
b 4
es
a2
onan pra tracey oter ebb aasin Lond the follonng leo» OF aac tase Raat dt-00/cov/ggl amend
Cevete _hatisth avenge ricer the damonds th olor» eat = "ream ou may group tat
eo a 3538
368
< Re
4. 2828
on8
Please write the code for question 7 - Groupay | Average price of eiamands
Not raded #Ovestio 7
from pysparksqlfunctions import avg
ark read sv /databricks-datasets/Rdatasets/data-001/esv/ggplot2/élamonds.sv,header="True!)
aft fiter(sftcolor] == e) steréf7teue)
Premium’ select(avg( price’) show)
onan panera rears 280ausin
1. Load the fle 'é2buy from Moodle ABD class page and tecnica resources alder to the Databrcks file system.
2. Join the diamonds DF: /databrcks-detasets/Rdstasets/data-001/esv/ggplot2/diamonds.cv' (same as the previous
exercise) withthe “d2buy' DF, using the join columns / condition: diamonds.c0 == d2buy 4d
3. Caleulate the sum of the prices of the diamonds withthe flag "Yin the field “d2buy’ ofthe DF 's2buy" and withthe field
‘olor = €
a. 2401
b. 2009
2626
2107
Please write the code for question 8 - Jin | Sum of the prices ofthe diamonds
‘rom pysparksql functions import *
191 =sparkread csi fdatabricks-datasets/Rdatasets/data-00'/csv/ggplot2/diamonds sv head
f92=sparkreadcsu /fleStore/tables/d2buy sv, header="True)
9 display
9 fterit9rdbuy
"yfltertsf9reolor)
*E).agglsum¢ price’) show0)
onan panera rears 280= ASD 07 Notebooks
oxida snares 280