You are on page 1of 3

Big Data Analytics

Question Bank

Unit: 1

SA – BIG Data and Analytics , Sima Acharya, Subhashini Chhellappan, Willey


HD – Tom White, “HADOOP: The definitive Guide”, O Reilly 2012
HA – Chuck Lam, “Hadoop in Action”.

1) Explain Distributed file system.


http://www.unf.edu/~sahuja/cis6302/filesystems.html

2) Explain difference between Parallel system and Distributed system.


SA: page no. 46
3) What is Big Data? Explain characteristics of Big Data.
OR
Explain 4v’s of Big Data.
http://www.ibmbigdatahub.com/infographic/four-vs-big-data

4) What are the benefits of Big Data? Discuss challenges under Big Data. How Big Data
Analytics can be useful in the development of smart cities.(Discuss one application).
5) Write applications of Big Data.
http://www.datascienceassn.org/content/how-top-10-industries-use-big-data-
applications

6) Explain types of Data.


SA: page no. 2
7) Explain key drivers of Big Data.
http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/

8) What is Big Data Analytics? Why it is important? Explain in detail.


SA:37,42
9) Explain challenges of Big Data.
SA:41

Unit: 2

1) Compare traditional database system with hadoop.


HA:7, SA:77
2) What are the advantages of Hadoop? Explain Hadoop Architecture.
HA: 22 to 25
3) Draw YARN Architecture. Explain each component of it.
SA:96
4) How Hadoop runs a MapReduce job using the classic framework?
HD: 188
5) How Hadoop runs a MapReduce job using YARN?
HD:194
6) How status updates are propagated through the MapReduce 1 system?
HD: 195 Fig. 6.3
7) Explain relationship of the Streaming and Pipes executable to the tasktracker and its
child.
HD:192 Fig. 6.2
8) Explain Basic File commands of HDFS.
HA:38
9) Explain difference between Hadoop 1.X and Hadoop 2.X.
https://acadgild.com/blog/10-big-differences-between-hadoop1-and-hadoop2/
http://www.journaldev.com/8806/differences-between-hadoop1-and-hadoop2

10) Explain hadoop eco-system.


https://www.mssqltips.com/sqlservertip/3262/big-data-basics--part-6--related-
apache-projects-in-hadoop-ecosystem/
http://hadooptutorial.info/

11) Explain features and key advantages of Hadoop.


SA:65
12) Explain anatomy of File Read in Hadoop.
HD:69
13) Explain anatomy of File Write in Hadoop.
HD: 72
14) Explain File Replica Placement Steategy.
HD:74
15) Explain working of following phases of Map Reduce with one common example. (i)
Map Phase (ii) Combiner Phase (iii) Shuffle and Sort Phase (iv) Reducer Phase.
HA: 45 to 50 & HD 205 to 207
16) Explain Job Scheduling in Map Reduce. How it is done in case of (i) The Fair Scheduler
(ii) The Capacity Scheduler.
HD:204
17) Write Map Reduce code for counting occurrences of specific words in the input text
file(s). Also write the commands to compile and run the code.
SA:92
18) Draw HDFS Architecture. Explain any two commands of HDFS from following
commands with syntax and at least one example of each. (i) copyFromLocal (ii) setrep
(iii) checksum
Ans: In HDFS Architecture, explain Namende, Datanode and secondary namenode
Short Questions:

1) Why is an email placed in the “unstructured category”?


2) Hadoop has _____________ type of file structure.
3) Hadoop = ______________ + _________________.
4) If heartbeat is functioning properly that means _________________ is working properly.
5) Who maintains metadata in HDFS? Where?
6) What is the role of Secondary NameNode?
7) ____________________ is a single point of failure of Hadoop cluster.
8) Write hadoop read-write policy.
9) The size of file is 100 MB. You are going to store this file in HDFS. How many blocks
will occupy to store this file? (For calculation, Consider default block size of Hadoop
1.X)
10)What is default replication factor? What is the policy to set it?
11)Hadoop run on _______________ Hardware.
12)Which file is used for updating MapReduce setting?
13)List Hadoop’s three configuration files.
14)YARN is responsible for __________________________.
15)Which is open source distributed realtime computation system?

You might also like