Professional Documents
Culture Documents
reviewer4@nptel.iitm.ac.in ▼
Courses » Big Data Computing Announcements Course Ask a Question Progress FAQ
Unit 4 - Week-3
Register for
Certification exam Assignment-3
The due date for submitting this assignment has passed.
Course As per our records you have not submitted this Due on 2019-03-20, 23:59 IST.
outline
assignment.
How to access 1) In spark, a ______________________is a read-only collection of objects partitioned across 1 point
the portal a set of machines that can be rebuilt if a partition is lost.
Spark Built-in 2) Given the following definition about the join transformation in Apache Spark: 1 point
Libraries
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
Design of
Key-Value
Stores Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and
(K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
Quiz :
Assignment-3
Output the result of joinrdd, when the following code is run.
Week-3:
Lecture
val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
material
val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
Big Data val joinrdd = rdd1.join(rdd2)
Computing: joinrdd.collect
Feedback for
Week-03
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)),
Assignment-3 (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
Solution
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)),
Week-4 (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
Funded by
Array[(String,Powered
(Int, Int))]
by= Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)),
Week-8
(s,(54,61)), (s,(54,62)))
Statement 1: Spark also gives you control over how you can partition your Resilient Distributed
Datasets (RDDs)
Statement 2: Spark allows you to choose whether you want to persist Resilient Distributed Dataset
(RDD) onto disk or not.
4) ______________ leverages Spark Core fast scheduling capability to perform streaming 1 point
analytics.
MLlib
Spark Streaming
GraphX
RDDs
MLlib
Spark streaming
GraphX
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines
Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS
machines (Components Off the Shelf)
HBase
SQL Server
Cassandra
Key-value
Wide-column
Document
It is designed to handle large amounts of data across many commodity servers, providing
high availability with no single point of failure.
It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing