Professional Documents
Culture Documents
Big Data(KCS-061)
Big Data(KCS-061)
Course Outcome ( CO)
At the end of course , the student will be able
to understand
Demonstrate knowledge of Big Data Analytics concepts and its applications in business
CO 1
DETAILED SYLLABUS
Unit Topic
Introduction to Big Data: Types of digital data, history of Big Data innovation, introduction to Big
Data platform, drivers for Big Data, Big Data architecture and characteristics, 5 Vs of Big Data, Big Data
I technology components, Big Data importance and applications, Big Data features – security, compliance,
auditing and protection, Big Data privacy and ethics, Big Data Analytics, Challenges of conventional
systems, intelligent data analysis, nature of data, analytic processes and tools, analysis vs reporting,
modern data analytic tools.
Hadoop: History of Hadoop, Apache Hadoop, the Hadoop Distributed File System, components of
II Hadoop, data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes,
Hadoop Echo System.
Map Reduce: Map Reduce framework and basics, how Map Reduce works, developing a Map Reduce
application, unit tests with MR unit, test data and local tests, anatomy of a Map Reduce job run, failures,
job scheduling, shuffle and sort, task execution, Map Reduce types, input formats, output formats, Map
Reduce features, Real-world Map Reduce
HDFS (Hadoop Distributed File System): Design of HDFS, HDFS concepts, benefits and challenges, file
sizes, block sizes and block abstraction in HDFS, data replication, how does HDFS store, read, and write
III files, Java interfaces to HDFS, command line interface, Hadoop file system interfaces, data flow, data
ingest with Flume and Scoop, Hadoop archives, Hadoop I/O: compression, serialization, Avro and file-
based data structures.
Hadoop Environment: Setting up a Hadoop cluster, cluster specification, cluster setup and installation,
Hadoop configuration, security in Hadoop, administering Hadoop, HDFS monitoring & maintenance,
Hadoop benchmarks, Hadoop in the cloud
Hadoop Eco System and YARN: Hadoop ecosystem components, schedulers, fair and capacity, Hadoop
2.0 New Features - NameNode high availability, HDFS federation, MRv2, YARN, Running MRv1 in
IV YARN.
NoSQL Databases: Introduction to NoSQL
MongoDB: Introduction, data types, creating, updating and deleing documents, querying, introduction to
indexing, capped collections
Spark: Installing spark, spark applications, jobs, stages and tasks, Resilient Distributed Databases,
anatomy of a Spark job run, Spark on YARN
SCALA: Introduction, classes and objects, basic types and operators, built-in control structures, functions
and closures, inheritance
1 University Academy
BIG DATA(KCS-061) 2020-21
Hadoop Eco System Frameworks: Applications on Big Data using Pig, Hive and HBase
V Pig - Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin,
User Defined Functions, Data Processing operators,
Hive - Apache Hive architecture and installation, Hive shell, Hive services, Hive metastore, comparison
with traditional databases, HiveQL, tables, querying data and user defined functions, sorting and
aggregating, Map Reduce scripts, joins & subqueries.
HBase – Hbase concepts, clients, example, Hbase vs RDBMS, advanced usage, schema design, advance
indexing, Zookeeper – how it helps in monitoring a cluster, how to build applications with Zookeeper. IBM
Big Data strategy, introduction to Infosphere, BigInsights and Big Sheets, introduction to Big SQL.
2 University Academy
BIG DATA(KCS-061) 2020-21
Big Data(KCS-061)
Solved MCQ
1. Unit-I ………………………...……………………………………………………………….3
2. Unit-II …………………………………………………………….……….…………...…….11
3. Unit-III ……………………………………………………………………..……..….....…... 13
4. Unit-IV …………………………..…….…………………………………………….....…….28
5. Unit- V …………………………………………………………………….....................…….39
3 University Academy
BIG DATA(KCS-061) 2020-21
Unit-I
1. How many V's are present in the Big Answer: c
Data? 6. What are the main components present in
a. 3 the Big Data Analytics?
b. 4
a. MapReduce
c. 5
b. HDFS
d. 6
c. YARN
Answer: c
d. All of the above
Answer: d
2. Data in a Relational Database is:
6. What are the major benefits of Big Data
a. Structured
Processing?
b. Un-Structured
a. Businesses can utilize outside
c. Semi Structured
intelligence while taking decisions
d. Meta Data
b. Improved customer service
Answer: a
c. Better operational efficiency
d. All of the above
3. Data is found in the big data, in how many
Answer: d
forms?
7. The Hadoop is written in which
a. 2
programming language?
b. 3
a. C
c. 4
b. C++
d. 5
c. Java
Answer: b
d. Python
5 University Academy
BIG DATA(KCS-061) 2020-21
Answer: b apploicable.
20. . refers to the ability to turn your a. TRUE
data useful for business. b. FALSE
a. Velocity Answer: b
b. Variety
c. Value 26. Which of the following options is not the
d. Volume example of NoSql ?
Answer: c a. Google
21. Value tells the trustworthiness of data in b. NetFlix
terms of quality and accuracy. c. Amazon
a. TRUE d. CERN
b. FALSE Answer: c
Answer: b 27. Scalability and better performance of No
22. Files are divided into sized SQL is Achieved by sacrificing ACID
Chunks. Compatibility Is it TRUE?
a. Static a. TRUE
b. Dynamic b. FALSE
c. Fixed Answer: a
d. Variable 28. For Scalability and better performance of
Answer: c No SQL is attained by compromising
ACID Compatibility Is it TRUE?
23. ____________is an open source a. TRUE
framework for storing data and b. FALSE
running application on clusters of Answer: a
commodity hardware.
a. HDFS 29. is a programming
b. Hadoop model for writing applications that
c. MapReduce can process Big Data in parallel on
d. Cloud multiple nodes.
Answer: b a. HDFS
24. Hadoop MapReduce allows you to b. MAP REDUCE
perform distributed parallel processing c. HADOOP
on large volumes of data quickly and d. HIVE
efficiently: statement is True or False Answer: b
a. TRUE 30. Which of the following is a widely used and
b. FALSE effective machine learning algorithm based
Answer: a on the idea of bagging?
25. In Relational database Management
a. Decision Tree
System the property of Scaling is
6 University Academy
BIG DATA(KCS-061) 2020-21
b. Regression Answer: a
c. Classification
d. Random Forest 35. Data variety refers to;
Answer: d a. Multiple schemas
b. Multiple formats and types of data
31. Data Set is the: c. Multiple Data Models
a. Tweets stored in a flat file d. None of above
b. A collection of image files in a Answer: b
directory
c. An extract of rows from a database 36. Unstructured Data Consists of:
table stored in a CSV formatted file a. Text file, Audio Files.
d. All the above b. Video files, Text data
Answer: d c. Tagged Data
d. a) and b)
32. Data analysis is the process of: Answer: d
a. Examining data to find facts
b. Relationships, 37. Multiple internal and external data in the big
c. Patterns, insights and/or trends. data comes from the multiple sources as :
d. All the above a. Sensors, Social network sites
Answer: d b. Email, Xml, Multimedia
c. a) and b)
33. What are the general categories of d. None of the above
analytics that are distinguished by the Answer: c
results they produce:
a. Descriptive analytics 38. Ingestion Layer Should have the capability to:
b. Diagnostic analytics a. validate, cleanse, transform, reduce
c. Predictive analytics b. integrate
d. All the above c. Preprocess the data
Answer: d d. a) and b)
Answer: d
34. BI enables an organization to gain insight
into the performance of an enterprise
a. By analyzing data generated by its 39. According to analysts, for what can traditional
business processes and information IT systems provide a foundation when they’re
systems. integrated with big data technologies like
b. By examining data to find facts Hadoop?
c. From relationships, a. Big data management and data mining
d. All the above b. Data warehousing and business
7 University Academy
BIG DATA(KCS-061) 2020-21
intelligence c. Oozie
c. Management of Hadoop clusters d. None of the above
d. Collecting and storing unstructured data Answer: (a)
Answer: a
45. The examination of large amounts of data to see
40. What are the main components of Big Data? what patterns or other useful information can be
a. MapReduce found is known as
b. HDFS a. Data examination
c. YARN b. Information analysis
d. All of these c. Big data analytics
Answer: (d) d. Data analysis
Answer: (c)
41. What are the different features of Big Data
Analytics? 46. Big data analysis does the following except
a. Open-Source a. Collects data
b. Scalability b. Spreads data
c. Data Recovery c. Organizes data
d. All the above d. Analyzes data
Answer: (d) Answer: (b)
42. What are the four V’s of Big Data?
a. Volume 47. What makes Big Data analysis difficult to
b. Velocity optimize?
c. Variety a. Big Data is not difficult to optimize
d. All the above b. Both data and cost effective ways to
Answer: (d) mine data to make business sense
out of it
43. All of the following accurately describe c. The technology to mine data
Hadoop, EXCEPT: d. All of the above
a. Open-source Answer: (b)
b. Real-time 48. The new source of big data that will trigger a
c. Java-based Big Data revolution in the years to come is
d. Distributed computing approach a. Business transactions
Answer: (b) b. Social media
c. Transactional data and sensor data
44. ___________ is general-purpose computing d. RDBMS
model and runtime system for distributed data Answer: (c)
analytics. 49. The unit of data that flows through a Flume
a. Mapreduce agent is
b. Drill a. Log
8 University Academy
BIG DATA(KCS-061) 2020-21
b. Row c. 1998
c. Event d. 2005
d. Record Answer: (c)
Answer:( c)
55. Concerning the Forms of Big Data, which one
50. Listed below are the three steps that are of these is odd?
followed to deploy a Big Data Solution except a. Structured
a. Data Ingestion b. Unstructured
b. Data Processing c. Processed
c. Data dissemination d. Semi-Structured
d. Data Storage Answer: ( c )
Answer: (c)
56. Big Data applications benefit the media and
51. Check below the best answer to "which entertainment industry by
industries employ the use of so-called "Big a. Predicting what the audience wants
Data" in their day to day operations? b. Ad targeting
a. Weather forecasting c. Scheduling optimization
b. Marketing d. All of the above
c. Healthcare Answer: (d)
d. All of the above
Answer: (d) 57. The feature of big data that refers to the quality
of the stored data is ______
52. There are almost as many bits of information in a. Variety
the digital universe as there are stars in the b. Volume
actual universe? c. Variability
a. True d. Veracity
b. False Answer: (d)
Answer: (a)
58. ______ is a framework for performing remote
53. The word 'Big data' was coined by procedure calls and data serialization.
a. Roger Mougalas a. Drill
b. John Philips b. BigTop
c. Simon Woods c. Avro
d. Martin Green d. Chukwa
Answer: (a) Answer: c
59. Which of the following is a characteristic of Big
54. The word 'Big Data' was coined in the year Data?
a. 2000 a. Huge volume of data
b. 1970 b. Complexity of data types and structures
9 University Academy
BIG DATA(KCS-061) 2020-21
Unit- II
Answer: (b)
1. Which one of the following is false about
Hadoop? 6. Which type of data Hadoop can deal with is
a. It is a distributed framework a. Structured
b. The main algorithm used in it is b. Semi-structured
Map Reduce c. Unstructured
c. It runs with commodity hardware d. All of the above
d. All are true Answer: (d)
Answer: (d)
2. What license is Apache Hadoop distributed 7. Which statement is false about Hadoop
under? a. It runs with commodity hardware
a. Apache License 2.0 b. It is a part of the Apache project
b. Shareware sponsored by the ASF
c. Mozilla Public License c. It is best for live streaming of
d. Commercial data
Answer: (a) d. None of the above
3. Which of the following platforms does Apache Answer: (c)
Hadoop run on ? 8. As compared to RDBMS, Apache Hadoop
a. Bare metal a. Has higher data Integrity
b. Unix-like b. Does ACID transactions
c. Cross-platform c. Is suitable for read and write
d. Debian many times
Answer: (c) d. Works better on unstructured
4. Apache Hadoop achieves reliability by and semi-structured data.
replicating the data across multiple hosts and Answer: (d)
hence does not require ________ storage on
hosts. 9. Hadoop can be used to create distributed
a. Standard RAID levels clusters, based on commodity servers, that
b. RAID provide low-cost processing and storage for
c. ZFS unstructured data
d. Operating system a. True
Answer: Option (b) b. False
Answer: (a)
5. Hadoop works in 10. ______ is a framework for performing remote
a. master-worker fashion procedure calls and data serialization.
b. master – slave fashion a. Drill
c. worker/slave fashion b. BigTop
d. All of the mentioned c. Avro
11 University Academy
BIG DATA(KCS-061) 2020-21
12 University Academy
BIG DATA(KCS-061) 2020-21
Which among the following is true about 25. Is it mandatory to set input and output
SequenceFileInputFormat type/format in Hadoop MapReduce?
a. Key- byte offset. Value- It is the a. Yes
contents of the line b. No
b. Key- Everything up to tab Answer: (b)
character. Value- Remaining part
of the line after tab character 26. The parameters for Mappers are:
c. Key and value- Both are user- a. text (input)
defined b. LongWritable(input)
d. None of the above c. text (intermediate output)
Answer:(c) d. All of the above
Answer: (d)
22. Which is key and value in TextInputFormat
a. Key- byte offset Value- It is the 27. For 514 MB file how many InputSplit will be
contents of the line created
b. Key- Everything up to tab a. 4
character Value- Remaining part b. 5
of the line after tab character c. 6
c. Key and value- Both are user- d. 10
defined Answer: (b)
d. None of the above 28. Which among the following is used to provide
Answer: (a) multiple inputs to Hadoop?
a. MultipleInputs class
23. Which of the following are Built-In Counters in b. MultipleInputFormat
Hadoop? c. FileInputFormat
a. FileSystem Counters d. DBInputFormat
13 University Academy
BIG DATA(KCS-061) 2020-21
14 University Academy
BIG DATA(KCS-061) 2020-21
a. Pig is also a data ware house system used 43. The name node used, when the secondary node
for analysing the Big Data Stored in the get failed is .
HDFS a. Rack
b. .It uses the Data Flow Language for b. Data node
analysing the data c. Secondary node
c. a and b d. None of the mentioned
d. Relational Database Management System Answer: (c)
Answer: (c) 44. Which of the following scenario may not be a
good fit for HDFS?
39. Which of the following platforms does Hadoop a. HDFS is not suitable for scenarios
run on? requiring multiple/simultaneous writes
a. Bare metal to the same file
b. Debian b. HDFS is suitable for storing data related to
c. Cross-platform applications requiring low latency data
d. Unix-like access
Answer: (c) c. HDFS is suitable for storing data related to
40. The Hadoop list includes the HBase database, applications requiring low latency data
the Apache Mahout ________ system, and access
matrix operations. d. None of the mentioned
a. Machine learning Answer: (a)
b. Pattern recognition 45. The need for data replication occurs:
c. Statistical classification a. Replication Factor is changed
d. Artificial intelligence b. DataNode goes down
Answer: (a) c. Data Blocks get corrupted
41. Which of the Node serves as the master and d. All of the mentioned
there is only one NameNode per cluster. Answer: (d)
a. Data Node 46. HDFS uses only one language for
b. NameNode implementation:
c. Data block a. C++
d. Replication b. Java
Answer: (b) c. Scala
d. None of the Above
42. HDFS consists as the Answer: (d)
a. master-worker 47. In YARN which node is responsible for
b. master node and slave node managing the resources
c. worker/slave a. Data Node
d. all of the mentioned b. NameNode
Answer: (b) c. Resource Manager
15 University Academy
BIG DATA(KCS-061) 2020-21
d. Replication c. excuted
Answer: (c) d. archived
48. As Hadoop framework is implemented in Java, Answer: (d)
MapReduce applications are required to be 54. The datanode and namenode are, respectiviley,
written in Java Language which of the following?
a. True a. Slave and Master nodes
b. False b. Master and Worker nodes
Answer: (b) c. Both worker nodes
49. _________ maps input key/value pairs to a set d. both master nodes
of intermediate key/value pairs. Answer: (a)
a. Mapper
b. Reducer 55. Hadoop is a framework that works with a
c. Both Mapper and Reducer variety of related tools. Common cohorts
d. None of the mentioned include
Answer: (d) a. MapReduce, Hive and HBase
50. The number of maps is usually driven by the b. MapReduce, MySQL and Google Apps
total size of ___________ c. MapReduce, Hummer and Iguana
a. Inputs d. MapReduce, Heron and Trumpet
b. Outputs Answer: (a)
c. Tasks
d. None of the mentioned 56. Hadoop was named after?
Answer: (a) a. Creator Doug Cuttings favorite circus act
51. which of the File system is used by HBase b. The toy elephant of Cuttings son
a. Hive c. Cuttings high school rock band
b. Imphala d. A sound Cuttings laptop made during
c. Hadoop Hadoops development
d. Scala Answer: (b)
Answer: (c)
52. The information mapping data blocks with their 57. All of the following accurately describe
corresponding files is stored in Hadoop, EXCEPT:
a. Namenode a. Open source
b. Datanode b. Java-based
c. Job Tracker c. Distributed computing approach
53. In HDFS the files cannot be 58. Hive also support custom extensions written in
a. read :
b. deleted a. C
16 University Academy
BIG DATA(KCS-061) 2020-21
to b. C#
NFS stored by primary namenode 67. What is the meaning of commodity Hardware in
c. Monitor if the primary namenode is up Hadoop
17 University Academy
BIG DATA(KCS-061) 2020-21
18 University Academy
BIG DATA(KCS-061) 2020-21
Unit- III
a. Replication Factor can be configured at a
1. A ________ serves as the master and there is cluster level (Default is set to 3) and also at
only one NameNode per cluster. a file level
a. Data Node b. Block Report from each DataNode contains
b. NameNode a list of all the blocks that are stored on that
c. Data block DataNode
d. Replication c. User data is stored on the local file system
Answer: b of DataNodes
2. Point out the correct statement. d. DataNode is aware of the files to which
a. DataNode is the slave/worker node and the blocks stored on it belong to
holds the user data in the form of Data Answer: d
Blocks
b. Each incoming file is broken into 32 MB by 6. Which of the following scenario may not be a
default good fit for HDFS?
c. Data blocks are replicated across different a. HDFS is not suitable for scenarios
nodes in the cluster to ensure a low degree requiring multiple/simultaneous writes to
of fault tolerance the same file
d. None of the mentioned b. HDFS is suitable for storing data related to
Answer: a applications requiring low latency data
access
3. HDFS works in a __________ fashion. c. HDFS is suitable for storing data related to
a. master-worker applications requiring low latency data
b. master-slave access
c. worker/slave
d. None of the mentioned
d. all of the mentioned
Answer: a
Answer: a 7. The need for data replication can arise in
various scenarios like ____________
4. ________ NameNode is used when the a. Replication Factor is changed
Primary NameNode goes down. b. DataNode goes down
a. Rack c. Data Blocks get corrupted
b. Data d. All of the mentioned
c. Secondary Answer: d
d. None of the mentioned
Answer: c 8. ________ is the slave/worker node and holds
the user data in the form of Data Blocks.
5. Point out the wrong statement. a. DataNode
b. NameNode
c. Data block
19 University Academy
BIG DATA(KCS-061) 2020-21
d. Replication b. Oozie
Answer: a c. Kafka
d. All of the mentioned
9. HDFS provides a command line interface Answer: a
called __________ used to interact with HDFS.
a. “HDFS Shell” 14. During start up, the ___________ loads the file
b. “FS Shell” system state from the fsimage and the edits log
c. “DFS Shell” file.
d. None of the mentioned a. DataNode
Answer: b b. NameNode
10. HDFS is implemented in ___________ c. ActionNode
programming language. d. None of the mentioned
a. C++ Answer: b
b. Java
c. Scala 15. What is the utility of the HBase ?
d. None of the mentioned a. It is the tool for Random and Fast
Answer: b Read/Write operations in Hadoop
11. For YARN, the ___________ Manager UI b. Acts as Faster Read only query engine in
provides host and port information. Hadoop
a. Data Node c. It is MapReduce alternative in Hadoop
b. NameNode d. It is Fast MapReduce layer in Hadoop
c. Resource
Answer: a
d. Replication
Answer: c
16. What is Hive used as?
12. Point out the correct statement.
a. Hadoop query engine
a. The Hadoop framework publishes the
b. MapReduce wrapper
job flow status to an internally
c. Hadoop SQL interface
running web server on the master
d. All of the above
nodes of the Hadoop cluster
b. Each incoming file is broken into 32 MB Answer: d
by default
c. Data blocks are replicated across 17. What is the default size of the HDFS block ?
different nodes in the cluster to ensure a a. 32 MB
low degree of fault tolerance b. 64 KB
d. None of the mentioned c. 128 KB
Answer: a d. 64 MB
13. For ________ the HBase Master UI provides Answer: d
information about the HBase Master uptime.
a. HBase
20 University Academy
BIG DATA(KCS-061) 2020-21
18. In the HDFS what is the default replication c. In either phase, but not on both sides
factor of the Data Node? simultaneously
a. 4 d. In either phase
b. 1 Answer: d
c. 3
d. 2 23. Which of the following type of joins can be
Answer: c performed in Reduce side join operation?
a. Equi Join
19. What is the protocol name that is used to create b. Left Outer Join
replica in HDFS? c. Right Outer Join
a. Forward protocol d. Full Outer Join
b. Sliding Window Protocol e. All of the above
c. HDFS protocol Answer: e
d. Store and Forward protocol 24. A Map reduce function can be written:
a. Java
Answer: c
b. Ruby
20. HDFS data blocks can be read in parallel. c. Python
a. True d. Any Language which can read from
b. False input stream
Answer: a
Answer: d
21 University Academy
BIG DATA(KCS-061) 2020-21
a. Bzip2
27. Which method of the FileSystem object is used b. LZO
c. Gzip
for reading a file in HDFS
d. both Dand C
a. open() Answer: a
b. access() 33. Which of the following is provides search
22 University Academy
BIG DATA(KCS-061) 2020-21
23 University Academy
BIG DATA(KCS-061) 2020-21
24 University Academy
BIG DATA(KCS-061) 2020-21
25 University Academy
BIG DATA(KCS-061) 2020-21
d. None c. listStatus
Answer: a d. listPaths
Answer: C
68. Which HDFS command checks file system and 73. What is the operation that use wildcard
lists the blocks? characters to match multiple files with a single
a. hfsck expression called?
b. fcsk a. globbing
c. fblock b. pattern matching
d. fsck c. regex
d. regexfilter
Answer: d
69. What is an administered group used to manage 74. What does the globStatus() methods return?
cache permissions and resource usage?
a. Cache pools a. an array of FileStatus objects
b. block pool b. an array of ListStatus objects
c. Namenodes c. an array of PathStatus objects
d. HDFS Cluster d. an array of FilterStatus objects
Answer: a Answer: a
70. Which object encapsulates a client or server's 75. What does the glob question mark(?) matches?
configuration? a. zero or more characters
a. File Object b. one or more characters
b. Configuration object c. a single character
c. Path Object d. metacharacter
d. Stream Object Answer: c
Answer: b
71. Which interface permits seeking to a position in 76. Which method on FileSystem is used to
the file and provides a query method for the permanently remove files or directories?
current offset from the start of the file? a. remove()
DataStream b. rm()
a. Seekable c. del()
b. PositionedReadable d. delete()
c. Progressable Answer: d
77. Which streams the packets to the first datanode
Answer: b in the pipeline?
72. Which method is used to list the contents of a a. DataStreamer
directory? b. FileStreamer
a. listFiles c. InputStreamer
b. listContents d. PathStreamer
26 University Academy
BIG DATA(KCS-061) 2020-21
Answer: a
78. Which queue is responsible for asking the
namenode to allocate new blocks by picking a
list of suitable datanodes to store the replicas?
a. ack queue
b. data queue
c. path queue
d. stream queue
Answer: b
79. Which command is used to copy
files/directories?
a. distcp
b. hcp
c. copy
d. cp
Answer: a
27 University Academy
BIG DATA(KCS-061) 2020-21
Unit-IV
d. Resource constraints
1. Which among the following is Hadoop's cluster
resource management system? Answer: c
a. GLOB 6. Which among the following can be used to
b. YARN model YARN applications?
c. ARM a. one application per user job
d. SPARK b. run one application per workflow
Answer: b c. long-running application that is shared by
different users
2. Which of the following processing framework d. All of the above
interacts with YARN directly?
a. Pig Answer: d
b. Hive
c. Crunch 7. Which follows one application per user job
d. None of these model?
Answer: D a. MapReduce
b. Spark
3. Which of the following processing frameworks c. Apache Slider
run on MapReduce? d. Samza
a. Pig Answer: a
b. Hive
c. Crunch 8. Which application runs per user session?
d. All of the above a. MapReduce
Answer: d b. Spark
4. Which among the following are the core c. Apache Slider
services of YARN? d. None of the above
a. resource manager and node manager Answer: b
b. namenode and datanode
c. data manager and resource manager 9. Which among the following has a long-running
d. data manager and application manager application master for launching other
Answer: a applications on the cluster?
a. MapReduce
5. Which constraints can be used to request a b. Spark
container on a specific node or rack, or c. Apache Slider
anywhere on the cluster in YARN? d. None of the above
a. Container constraints Answer: c
b. Space constraints
c. Locality constraints
28 University Academy
BIG DATA(KCS-061) 2020-21
29 University Academy
BIG DATA(KCS-061) 2020-21
19. In YARN, the responsibility of jobtracker is 24. Which are/is the schedulers available in
handled by YARN?
a. Resource manager a. FIFO
b. application master b. Capacity
c. timeline server c. Fair Schedulers
d. All of the above d. All of the above
Answer: d
Answer: d 25. Which among the following schedulers attempts
20. In YARN, the responsibility of tasktracker is to allocate resources so that all running
handled by applications get the same share of resources in
YARN
a. Resource manager a. FIFO
b. application master b. Capacity
c. timeline server c. Fair Schedulers
d. Node manager d. Round Robin
Answer: d
21. Which stores the application history in YARN? Answer: c
a. Resource manager
b. application master 26. Which among the following schedulers
c. timeline server provides queue elasticity in YARN?
d. Node manager a. FIFO
b. Capacity
Answer: c c. Fair Schedulers
22. Which among the following are the features of d. Round Robin
YARN?
a. Scalability Answer: b
b. Multitenancy
c. Availabilit 27. Which among the following schedulers in
d. All of the above YARN is used by default?
Answer: d FIFO
23. Which among the following schedulers Capacity
available in YARN? Fair Schedulers
a. FIFO Round Robin
b. Shortest Job First
c. Round Robin Answer: b
d. Shortest Remaining Time 28. In which xml, is the default configuration of
Answer: a schedulers to be changed?
a. yarn-site.xml
b. config.xml
30 University Academy
BIG DATA(KCS-061) 2020-21
c. scheduler.xml
d. yarn-scheduler.xml 33. What is the default period of heartbeat request
sent by node manager?
Answer: a a. one per millisecond
29. Which among the following queue scheduling b. one per second
policies are/is supported by Fair Schedulers in c. one per minute
YARN? d. one per nanosecond
a. FIFO
b. Dominant Resource Fairness Answer: b
c. preemption 34. Which error detection code is used in HDFS?
d. All of the above a. CRC-32
b. CRC-32C
Answer: d c. SHA
d. SHA-1
30. Which holds the list of rules for queue Answer: b
placement in Fair Scheduling?
a. queuePlacementPolicy 35. CRC-32C has the storage overhead
b. rulePlacementolicy a. less than 1%
c. scheduleQueuePolicy b. less than 5%
d. schedulingPolicy c. less than 10%
d. less than 2.5%
Answer: a Answer: a
31. Which of the setting is used to set preemption 36. The heartbeat signal are sent from
globally? a. Jobtracker to Tasktracker
a. yarn.scheduler.fair.preemption = true b. Tasktracker to Job tracker
b. yarn.scheduler.preemption = true c. Jobtracker to namenode
c. yarn.scheduler.global.preemption = true d. Tasktracker to namenode
d. yarn.scheduler.enable.preemption = true Answer: b
Answer: a 37. Spark was initially started by ________ at UC
Berkeley AMPLab in 2009.
32. Which among the following supports delay a. Mahek Zaharia
scheduling? b. Matei Zaharia
a. FIFO c. Doug Cutting
b. Capacity Scheduler d. Stonebraker
c. Fair Scheduler Answer: (b)
d. Both Capacity and Fair Scheduler 38. ________ is a component on top of Spark Core.
a. Spark Streaming
Answer: d b. Spark SQL
31 University Academy
BIG DATA(KCS-061) 2020-21
c. RDDs a. SIM
d. All of the mentioned b. SIMR
Answer: (b) c. SIR
d. RIS
39. Spark SQL provides a domain-specific Answer: (b)
language to manipulate ___________ in Scala,
Java, or Python. 44. Which of the following language is not
a. Spark Streaming supported by Spark?
b. Spark SQL a. Java
c. RDDs b. Pascal
d. All of the mentioned c. Scala
Answer: (c) d. Python
40. ______________ leverages Spark Core fast Answer: (b)
scheduling capability to perform streaming
analytics. 45. Spark is packaged with higher level libraries,
a. MLlib including support for _________ queries.
b. Spark Streaming a. SQL
c. GraphX b. C
d. RDDs c. C++
Answer: (b) d. None of the mentioned
Answer: (a)
41. ________ is a distributed machine learning
framework on top of Spark. 46. Spark includes a collection over ________
a. MLlib operators for transforming data and familiar
b. Spark Streaming data frame APIs for manipulating semi-
c. GraphX structured data.
d. RDDs a. 50
Answer: (a) b. 60
c. 70
42. Users can easily run Spark on top of Amazon’s d. 80
__________ Answer: (d)
a. Infosphere
b. EC2 47. Spark is engineered from the bottom-up for
c. EMR performance, running ___________ faster than
d. None of the mentioned Hadoop by exploiting in memory computing
Answer: (b) and other optimizations.
a. 100x
43. Which of the following can be used to launch b. 150x
Spark jobs inside MapReduce? c. 200x
32 University Academy
BIG DATA(KCS-061) 2020-21
33 University Academy
BIG DATA(KCS-061) 2020-21
34 University Academy
BIG DATA(KCS-061) 2020-21
c. Pipelines c. MAN
d. All of the above d. All of the mentioned
Answer: (d) Answer: (b)
68. __________ is a online NoSQL developed by 73. Most NoSQL databases support automatic
Cloudera. __________ meaning that you get high
a. HCatalog availability and disaster recovery.
b. Hbase a. processing
c. Imphala b. scalability
d. Oozie c. replication
Answer: (b) d. all of the mentioned
Answer: (c)
69. Which of the following is not a NoSQL
database? 74. Which of the following are the simplest NoSQL
a. SQL Server databases?
b. MongoDB a. Key-value
c. Cassandra b. Wide-column
d. None of the mentioned c. Document
Answer: (a) d. All of the mentioned
Answer: (a)
70. Which of the following is a NoSQL Database
Type? 75. ________ stores are used to store information
a. SQL about networks, such as social connections.
b. Document databases a. Key-value
c. JSON b. Wide-column
d. All of the mentioned c. Document
Answer: (b) d. Graph
Answer: (d)
71. Which of the following is a wide-column store?
a. Cassandra 76. NoSQL databases is used mainly for handling
b. Riak large volumes of _____ data.
c. MongoDB a. unstructured
d. Redis b. structured
Answer: (a) c. semi-structured
d. all of the mentioned
72. “Sharding” a database across many server Answer: (a)
instances can be achieved with _ 77. Which of the following language is MongoDB
a. LAN written in?
b. SAN a. Javascript
35 University Academy
BIG DATA(KCS-061) 2020-21
36 University Academy
BIG DATA(KCS-061) 2020-21
37 University Academy
BIG DATA(KCS-061) 2020-21
38 University Academy
BIG DATA(KCS-061) 2020-21
Unit-V
a. HDFS is not suitable for scenarios
1. A ________ serves as the master and there requiring multiple/simultaneous writes
is only one NameNode per cluster. to the same file
a. Data Node b. HDFS is suitable for storing data related to
b. NameNode applications requiring low latency data
c. Data block access
d. Replication c. HDFS is suitable for storing data related to
Answer: (b) applications requiring low latency data
access
2. Point out the correct statement. d. None of the mentioned
a. DataNode is the slave/worker node and Answer: (a)
holds the user data in the form of Data 6. ________ is the slave/worker node and
Blocks holds the user data in the form of Data Blocks.
b. Each incoming file is broken into 32 MB a. DataNode
by default b. NameNode
c. Data blocks are replicated across different c. Data block
nodes in the cluster to ensure a low degree d. Replication
of fault tolerance Answer: (a)
d. None of the mentioned 7. HDFS provides a command line interface
Answer: (a) called __________ used to interact with HDFS.
a. “HDFS Shell”
3. HDFS works in a __________ fashion. b. “FS Shell”
a. master-worker c. “DFS Shell”
b. master-slave d. None of the mentioned
c. worker/slave Answer: (b)
d. all of the mentioned 8. For YARN, the ___________ Manager UI
Answer: (a) provides host and port information.
4. ________ NameNode is used when the a. Data Node
Primary NameNode goes down. b. NameNode
a. Rack c. Resource
b. Data d. Replication
c. Secondary Answer: (c)
d. None of the mentioned 9. During start up, the ___________ loads
Answer: (c) the file system state from the fsimage and the edits
log file.
5. Which of the following scenario may not a. DataNode
be a good fit for HDFS? b. NameNode
c. ActionNode
39 University Academy
BIG DATA(KCS-061) 2020-21
40 University Academy
BIG DATA(KCS-061) 2020-21
Answer: (a)
21. HBase is ________ defines only column Answer: (a)
families. 26. The minimum number of row versions to
a. Row Oriented keep is configured per column family via _______
b. Schema-less a. HBaseDecriptor
c. Fixed Schema b. HTabDescriptor
d. All of the mentioned c. HColumnDescriptor
Answer: (b) d. All of the mentioned
Answer: (c)
22. The _________ Server assigns regions to 27. HBase supports a ____________ interface
the region servers and takes the help of Apache via Put and Result.
ZooKeeper for this task. a. “bytes-in/bytes-out”
a. Region b. “bytes-in”
b. Master c. “bytes-out”
c. Zookeeper d. none of the mentioned
d. All of the mentioned Answer: (a)
Answer: (b) 28. One supported data type that deserves
special mention are ____________
23. Which of the following command a. money
provides information about the user? b. counters
a. status c. smallint
b. version d. tinyint
c. whoami Answer: (b)
d. user 29. __________ does re-write data and pack
Answer: (c) rows into columns for certain time-periods.
24. _________ command fetches the contents a. OpenTS
of a row or a cell. b. OpenTSDB
a. select c. OpenTSD
b. get d. OpenDB
c. put Answer: (b)
d. none of the mentioned 30. __________ command disables drops and
Answer: (b) recreates a table.
25. HBaseAdmin and ____________ are the a. drop
two important classes in this package that provide b. truncate
DDL functionalities. c. delete
a. HTableDescriptor d. none of the mentioned
b. HDescriptor Answer: (b)
c. HTable
d. HTabDescriptor
41 University Academy
BIG DATA(KCS-061) 2020-21
42 University Academy
BIG DATA(KCS-061) 2020-21
43 University Academy
BIG DATA(KCS-061) 2020-21
44 University Academy
BIG DATA(KCS-061) 2020-21
45 University Academy
BIG DATA(KCS-061) 2020-21
46 University Academy