You are on page 1of 6

1. What are the main components of Big Data?

1. MapReduce
2. HDFS
3. YARN
4. All of these
2. What are the five V’s of Big Data?
1. Volume
2. velocity
3. Variety
4. All of the above
3. All of the following accurately describe Hadoop, EXCEPT:
1. Open source
2. Real-time
3. Java-based
4. Distributed computing approach
4. Hadoop is good for:
a) Processing transactions (random access)
b) Massive amounts of data through parallelism
c) Processing lots of small files
d) Intensive calculations with little data
e) Low latency data access.
5. ………….. is data whose scale, distribution, diversity, and/or timeliness require the use of new
technical architectures and analytics to enable insights that unlock new sources of business value.
a)Big data (b) Mapreduce (c) Data mining (d) Hadoop
6. …………………. Charactistics at which Big data is collected and created in various formats and
sources.
1. Volume
2. velocity
3. Variety
4. All of the above
7. …………………. Is the speed or frequency at which data is collected in various forms and from
different sources for processing
1. Volume
2. velocity
3. Variety
4. All of the above
8. ………………. refers to the humungous amounts of data generated each second from social
media, cell phones, cars, credit cards, M2M sensors, photographs, video, etc.
1. Volume
2. velocity
3. Variety
4. Veracity
5. All of the above

9. ……………………. defined as the different types of data we can now use.


1. Volume
2. velocity
3. Variety
4. Veracity
5. All of the above.
10. ……………………. is the quality or trustworthiness of the data. Just how accurate is all this
data?
1. Volume
2. velocity
3. Variety
4. Veracity
11. …………….. can this data produce a meaningful return on investment?
1. Volume
2. Value
3. Variety
4. Veracity
5. All of the above.

12…………… Data that can be stored and processed in a fixed format, aka schema
1. Structured
2. Semi-structured
3. Unstructured.
4. other
13. ……………… Data that does not have a formal structure of a data model, but nevertheless it
has some organizational properties like tags and other markers t
1. Structured
2. Semi-structured
3. Unstructured.
4. Other.
14………………….. The data which have unknown form and cannot be stored in RDBMS and
cannot be analyzed unless it is transformed into a structured format.
1. Structured
2. Semi-structured
3. Unstructured.
4. Other.
15…………………. Apache open source software framework for reliable, scalable, distributed
computing of massive amount of data
(a) Big data (b) Mapreduce (c) Data mining (d) Hadoop

16. ……………….. Hadoop is not good for:


1. Massive amounts of data through parallelism.
2. A variety of data (structured, unstructured, semi-structured)
3. Inexpensive commodity hardware
4. low latency data access.
17. ………………….. is a powerful platform for managing Big Data at rest.
a)HDP (b) Mapreduce (c) Data mining (d) Hadoop .
18. ……………………. Is not component of HDP
(a) Governance, Integration
(b) Tools
(c) Security
(d) Operations
(e) Recovery
19. ……………. easily import information from structured databases (Db2, MySQL, Netezza,
Oracle, etc.) and related Hadoop systems (such as Hive and HBase) into your Hadoop cluster.
(a) Sqoop
(b) Kafka
(c) MapReduce
(d) Hive
20. …………………… is a fast, scalable, durable, and fault-tolerant publishsubscribe messaging
system.
(a) Sqoop
(b) Kafka
(c) MapReduce
(d) Hive.
21………………… is a data warehouse facilitates easy data summarization, ad-hoc queries, and
the analysis of very large datasets that are stored in Hadoop.
(a) Sqoop
(b) Kafka
(c) MapReduce
(d) Hive.
22. ………………. consists of a high-level language called Pig Latin, which was designed to
simplify MapReduce programming.
(a) Sqoop
(b) Pig
(c) MapReduce
(d) Hive.
23. Use Apache …………………… when you need random, real-time read/write
access to your Big Data.
(a) Sqoop
(b) Pig
(c) HBase
(d) Hive.
24. ……………….. is a fast, open source enterprise search platform built on
the Apache Lucene Java search library.
(a) Sqoop
(b) Pig
(c) SoIr
(d) Hive.
25…………………..is a fast and general engine for large-scale data processing has a lot of
characteristics Generality, speed.
(a) Sqoop
(b) Pig
(c) Spark
(d) Hive.
26. ………………… is used for managing the data life cycle in Hadoop clusters
and provide Data governance engine.
(a) Falcon
(b) Pig
(c) Spark
(d) Hive.
27…………………… is a scalable and extensible set of core foundational
governance services. Exchange metadata with other tools and processes within and
outside of the Hadoop
(a) Falcon
(b) Pig
(c) Atlas
(d) Hive.
28……………… is used to control data security across the entire Hadoop
platform. Centralized security framework to enable, monitor and manage
comprehensive data security across the Hadoop platform.
(a) Falcon
(b) Pig
(c) Atlas
(d) Ranger.
29. …………….. provides perimeter level security for Hadoop. REST API and
Application Gateway for the Apache Hadoop Ecosystem
(a) Falcon
(b) Pig
(c) Atlas
(d) Knox.
30…………………. provisioning, managing, and monitoring Apache Hadoop
clusters. Provides intuitive, easy-to-use Hadoop management web UI backed by
its RESTful APIs.
(a) Falcon
(b) Pig
(c) Atlas
(d) Ambari.

31…………………. is centralized service for maintaining configuration


information, naming, providing distributed synchronization, and providing group
services.
(a) ZooKeeper
(b) Pig
(c) Atlas
(d) Ambari.
32…………… is a Web-based notebook that enables data-driven, interactive data
analytics and collaborative documents combine code samples, source data,
descriptive markup, result sets, and rich visualizations in one place.
(a) Zeppelin
(b) Pig
(c) Atlas
(d) Ambari.
33. ……………………. provide a built-in set of views for Hive, Pig, Tez, Capacity
Schedule, File, HDFS which allows developers to monitor and manage the cluster.
(a) Ambari Views
(b) Pig
(c) Atlas
(d) Hive
34……………. System for collecting, aggregating and serving Hadoop and
System metrics in Ambari-managed clusters.
(a) Apache Ambari
(b) Pig
(c) Atlas
(d) AMS

35. True or False? Ambari is backed by RESTful APIs for developers to easily integrate
with their own applications. True
36. Which Hadoop functionalities does Ambari provide?
Provision, manage, monitor and integrate
37. Which page from the Ambari UI allows you to check the versions of the software
installed on your cluster?
The Admin > Manage Ambari page
38. True or False? Creating users through the Ambari UI will also create the user on the
HDFS. False.
39. True or False? You can use the CURL commands to issue commands to Ambari.
True.

36.Which of the following not Hadoop Architecture ?


(a) MapReduce
(b) HDFS
(c) Hadoop Common
(d) AMS

37.What hardware is not used for Hadoop


a) RAID
b) Linux Logical Volume Manager (LVM)
c) Solid-state disk (SSD)
38…………………….Apache open source software framework for reliable, scalable,
distributed computing over massive amount of data
a)Big data (b) Mapreduce (c) Data mining (d) Hadoop

38.T/F Hadoop Consists of 4 sub projects:MapReduce,Hadoop Distributed File ystem


(HDFS),Yarn, Hadoop Common .

39…………… manages the file system namespace and metadata


NameNode b)DataNode c) Yarn d) Sqoop
40. ……………. manages storage attached to the nodes
a)NameNode b)DataNode c) Yarn d) Sqoop
41……………………. Stores the mapping of blocks to files and file system properties.
a) FsImage b) EditLog c)LogFile d)MapFile.
42 Each file is split into blocks: Hadoop default is ……….. MB
a)16 b)64 c)128 d)32
1. True or False? Hadoop systems are designed for transaction processing.
F : Hadoop systems are not designed for transaction processing, and would be very
terrible at it. Hadoop systems are designed for batch processing.

2. List the Hadoop open source projects.


=>MapReduce, Yarn, Ambari, Hive, HBase, etc.

3. What is the default number of replicas in a Hadoop system? 3

4. True or False? One of the driving principal of Hadoop is that the data
is brought to the program.
=>False. The program is brought to the data, to eliminate the need to move large
amounts of data.

5. True or False? At least 2 NameNodes are required for a standalone


Hadoop cluster.
=>Only 1 NameNode is required per cluster; 2 are required for high-availability

You might also like