You are on page 1of 1

BAN 5753, Big Data & Spark Exercise (10 Points)

You must do it alone (it is not a group activity)

1. Install Spark setup on local machine – Standalone cluster installation


2. Download the lab files - notebooks and datasets to the desired location(Documents or
Downloads ); paths that you can access via read/write operations
3. Examine the Spark versions, configurations and Web UI using the below screenshot

4. Run the below command and examine the spark web UI different tabs and share your
findings.

//Modify the path based on the file location

scala> val rawDF=spark.read.option("inferSchema","true") .option("header","true").csv("/Users


/Downloads/spark_modules/Lab/data/diabetes.csv")
rawDF: org.apache.spark.sql.DataFrame = [Pregnancies: int, Glucose: int ... 7 more fields]

Deliverables:
As you complete the exercise, create a short report in Microsoft Word (max 3 pages) and
in this report answer the questions in the exercise description. Copy and paste supporting
documents/diagrams/screenshots as needed to justify your answer. Make sure you print
your name, section number, student ID# on the report and turn-in the report as
communicated by your instructor.

You might also like