You are on page 1of 7

BDA Qbank (2016-2020)

Chapter 1: Introduction to Big Data and Hadoop


1. Give difference between Traditional Data management and analytics
approach versus Big Data Approach. (5M)
2. Describe any five characteristics of Big Data. (5M)
3. What are the three Vs of Big Data? Give two examples
4. Write a short note on Types of Big Data. (10M)
5. What is Big Data? What is Hadoop? How Big Data and Hadoop are
linked? (5M)
6. How big data problems are handled by Hadoop system. (10M)
7. Explain Hadoop Ecosystem with core components, Explain its
Physical architecture. State Limitations of Hadoop.
8. What do you mean by the Hadoop Ecosystem? Describe any three
components of a typical Hadoop Ecosystem. (10M)

Chapter 2: Hadoop HDFS and MapReduce


1. What is Hadoop? Describe HDFS architecture with diagram. (10M)
2. Define HDFS. Discuss its architecture and commands. (10M)
3. What is the role of JobTracker and TaskTracker in MapReduce.
Illustrate Map Reduce execution pipeline with Word count example.
(10M)
4. Define Hadoop? List down the limitations of 1.x resolved in 2.x?
(10M) (2020)
5. Discuss different phases of MapReduce with example. (10M) (2020)
6. Explain how Hadoop goals are covered in hadoop distributed file
system. (10M)
7. Write pseudo code for Matrix vector Multiplication by MapReduce.
Illustrate with an example showing all the steps. (10M)
8. What is MapReduce ? Explain How Map and Reduce Work? What is
Shuffling in MapReduce? (10M)
9. Write Map Reduce Pseudocode to multiply two matrices. Illustrate the
procedure on the following matrices. Clearly show all the steps.

By PRATHAM MEHTA
(10M)

Chapter 3: NoSQL
1. What is NoSQL? What are the business drivers for NoSQL? Discuss
any two architectural patterns of NoSQL. (10M)
2. When it comes to big data how NoSQL scores over RDBMS. (5M)
3. What are the different architectural patterns in NoSQL? Explain
Graph data store and Column Family Store patterns with relevant
examples. (10M)
4. Explain different ways by which big data problems are handled by
NoSQL. (10M)
5. Explain NoSQL database and alternative to ACID properties. (10M)
(2020)
6. What do you understand by BASE properties in NOSQL Database?
Explain in detail any one NOSQL architecture pattern. Identify two
applications that can use this pattern. (10M)

Chapter 4: Mining Data Streams


1. Explain with block diagram architecture of Data Stream Management
System. (10M)
2. What is Data Stream Management System? Explain with Block
diagram. (10M)
3. With respect to data stream querying, give example of
(a) One Time queries
(b) Continuous Queries
(c) Pre-defined queries
(d) Ad-hoc queries (5M)

4. What do you mean by Counting Distinct Elements in a stream.


Illustrate with example → working of Flajolet-- Martin Algorithm used
to count number of distinct elements. (10M)
OR

By PRATHAM MEHTA
How to count distinct elements in a stream? Explain Flajolet-Martin
Algorithm. (10M)

5. The snapshot of 10 transactions is given below for online shopping


that generates big data. Threshold value = 4 and Hash function= (i*j)
mod 10

Find the frequent item sets purchased for such big data by using
suitable algorithm. Analyse the memory requirements for it. (10M)

6. Write short note on Bloom Filter. (5-10M) (2020)


7. Explain DGIM algorithm for counting ones in stream with example.
(10M)

Chapter 5: Finding Similar Items and Clustering


1. Explain Edit distance measure with an example. (5M)
2. What are the challenges in clustering of Data Streams. Explain
stream clustering algorithm in detail. (10M)
3. Find Manhattan distance for the points X1= (1,2,2), X2= (2,5,3). (5M)
4. Explain how the CURE algorithm can be used to cluster big data sets.
(10M)
5. Explain the CURE algorithm for clustering large datasets. Please
illustrate the algorithm using appropriate figures. (10M)
6. What do you mean by Jaccard Similarity? Illustrate with an example.
Describe any two applications that can use Jaccard Similarity. (5M)

Chapter 6: Real-Time Big Data Models


RECOMMENDATION SYSTEM
1. What is the use of Recommender System? How is classification
algorithm used in recommendation system. (10M)
2. How would you get the features of the document in a content-based
system? Explain document similarity. (5M)

By PRATHAM MEHTA
3. How recommendation is done based on properties of product?
Explain with suitable example. (10M)
4. Explain the design of a recommender system used to recommend
movies to users. The recommender system should use Collaborative
filtering. (10M)
5. Explain the following terms with diagram
a) Hubs and Authorities. (*)
b) Structure of the Web. (10M)
6. How finding plagiarism in documents is a nearest neighbor problem.
(5M)
7. Explain Collaborative Filtering based recommendation System. How
is it different from content based recommendation systems ? (10M)
PAGE RANK & SOCIAL NETWORK QUESTIONS
8. Given a l Dim Dataset (1,5,8,10,2} Use the agglomerative clustering
algorithm with Euclidean distance to establish hierarchical grouping
relationship. Draw the dendrogram. (10M)
9. Consider a Portion of Web Graph Shown below:

a. Compute the hub and authority scores for all the nodes.
b. Does this graph contain spider traps? Dead ends? If so, which
nodes?
c. Compute the page rank of the rank of the nodes with the
teleportation 𝛃 = 0.8? (Show two iterations only) (10M)

By PRATHAM MEHTA
10. Give applications of Social Network Mining. (5M)
11. For the graph given below show the page ranks of all the nodes
after running the PageRank algorithm for two iterations with
teleportation factor with Beta (B) value=0.8. (10M)

12. Describe Girvan Newman Algorithm. For the following graph show
how the Girvan Newman algorithm finds the different communities.

(10M)
13. Explain Girvan-Newman algorithm to mine Social Graphs.(10M)
14. Compute the page rank of each page after running the PageRank
algorithm for two iterations with teleportation factor Beta (B) value =
0.8

(10M)
15. What is page rank? Calculate the page rank of web Graph? (2020)
(10M)
16. Explain Page Rank with Example. Can a Website's Page rank
Ever Increase? What are its chances of Decreasing?
17. What are social network graphs? Explain Girvan-Newman
Algorithm in detail. (10M) (2020)

By PRATHAM MEHTA
18. Define concept of a Link Farm using a diagram. How does it lead
to Link Spam? (5M)
19. Explain what characteristics of Social Networks make it Big Data.
(5M)
20. For the Graph given below use betweenness factor and find all
communities.

21. What is a "Community" in a Social Network Graph? For the


following graph show how the Girvan Newman algorithm finds the
different communities.

Out of Syllabus:
1. Explain Park-Chen-Yu algorithm. How memory mapping is done in
PCY. (10M)
2. Explain the SON algorithm for Frequent Pattern mining. Illustrate how
Map Reduce can be used for implementing this algorithm. (10M)

Choice Based

By PRATHAM MEHTA
Winter 2019 Completed
Winter 2020 Completed
Old Syllabus
May 2019 Completed
May 2018 Completed
May 2017 Completed
May 2016 Completed
December 2018 Completed
December 2017 Completed
December 2016 Completed

By PRATHAM MEHTA

You might also like