You are on page 1of 11

Big Data Mock Exam

Right or Wrong

1. RDD data generated in file systems compatible with Hadoop can be partially
upgraded by invoking RDD operators.

Answer:Wrong

2. If there are two queues of the same level and the same capacity on YARN, the
scheduler preferentially allocates resources to the one with more tasks. For example, the
capacities of Q1 and Q2 are 50, and Q1 has 10 tasks that occupy 40 and Q2 has 2 tasks
that occupy 30. In this case, the scheduler preferentially allocates resources to Q1.

Answer:Wrong

3. The NameNode of Hadoop stores metadata of file systems.

Answer:Right

4. If a Loader job fails to be executed, the data imported during the job execution will
not be automatically deleted and must be manually deleted.

Answer:Wrong

5. FusionInsight Manager supports the unified management of multiple tenants.

Answer:Right

Single Choice

1. What is the role of the Kafka server in a Kafka cluster?

A. Producer

B. Broker

C. ZooKeeper

D. Consumer

Answer:B
2. Which statement is correct about the basic operations of Hive table creation?

A. Once a table has been created, you cannot change its name.

B. Once a table has been created, you cannot change its column names.

C. Once a table has been created, you cannot add columns to it.

D. An external keyword must be specified during the creation of an external table.

Answer:D

3. Which of the following is the default block size of HDFS in the FusionInsight HD
system?

A. 256 MB

B. 128 MB

C. 32 MB

D. 64 MB

Answer:B

4. Which statement about file uploading from a client to HDFS in Hadoop is correct?

A. The file data of the client is transmitted to DataNodes via the NameNode.

B. The client writes files to each DataNode based on the DataNode address sequence
and then the DataNode divides files into multiple blocks.

C. The client divides files to multiple blocks and writes data to each DataNode based on
the DataNode address sequence.

D. The client only uploads data to a DataNode and the NameNode copies blocks.

Answer:C

5. In FusionInsight HD, when Loader imports files from the SFTP server, which of the
following file types does not require code and data conversion and delivers the most
rapid speed?

A. sequence_file

B. graph_file

C. text_file

D. binary_file

Answer:D
6. In the FusionInsight HD system, the minimum processing unit of HBase is Region.
Where is the routing information between the User Region and RegionServer stored?

A. HDFS

B. ZooKeeper

C. Master

D. META table

Answer:D

7. Which statement is correct about the Loader jobs in FusionInsight HD?

A. Junk data is generated after a Loader job fails to be executed. You need to clear it
manually.

B. If an exception occurs in the Loader service after Loader submits a job to YARN, the
job will fail to be executed.

C. After Loader submits a job to YARN and before the job execution is complete, other
jobs cannot be submitted.

D. If a Mapper task fails to be executed after Loader submits a job to YARN, the task
will retry automatically.

Answer:D

8. Which of the following is an abstract notion of YARN in Hadoop?

A. CPU

B. Memory

C. Disk space

D. Container

Answer:D

9. which of the following commands is used to check the integrity of data blocks in
HDFS

A. hdfs fsck /

B. hdfs fsck / -delete

C. hdfs dfsadmin –report

D. hdfs balancer -threshold 1

Answer:A
10. In Flink, the ( ) interface is used for stream data processing, and the ( ) interface
is used for batch processing.

A. Stream API, Batch API

B. DataStream API, DataSet API

C. Batch API, Stream API

D. DataBatch API, DataStream API

Answer:B

11. Which of the following calculation tasks is MapReduce good at dealing with?

A. Offline computing

B. Iterative computing

C. Streaming computing

D. Real-time interactive computing

Answer:A

12. If you need to view the current user and permission group of HBase in
FusionInsight HD, what command can you run in the HBase shell?

A. whoami

B. get_user

C. user_permission

D. who

Answer:A

13. Which of the following is used to divide stages when Spark applications are
running?

A. taskSet

B. task

C. shuffle

D. action

Answer:C
14. Which parameter should you configure to set the maximum resource usage of
QueueA in YARN?

A. yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent

B. yarn.scheduler.capacity.root.QueueA.user-limit-factor

C. yarn.scheduler.capacity.root.QueueA.state

D. yarn.scheduler.capacity.root.QueueA.maximum-capacity

Answer:D

15. Which statement about the common table and external table in Hive is incorrect?

A. The common table is created by default.

B. When you delete a common table, the metadata and data in the table are deleted at
the same time.

C. The external table associates file paths on HDFS with a table.

D. When you delete an external table, only the data in the external table is deleted.

Answer:D

16. Which of the following is not the a transformation operation in Spark?

A. reduceByKey

B. Join

C. reduce

D. distinct

Answer:C

17. Which of the following programming languages is used for Spark implementation?

A. JAVA

B. C++

C. C

D. Scala

Answer:D
18. Which of the following service processes manages the HBase Region of the Hadoop
platform?

A. Hmaster

B. ZooKeeper

C. DataNode

D. RegionServer

Answer:D

19. Which of the following storage modes follow the concept of Yarn resources in the
Hadoop system?

A. Disk space

B. Container

C. Memory

D. CPU

Answer:B

20. Which statement is incorrect about the Hive log collection on the FusionInsight HD
Manager page?

A. You can specify a time period for log collection. For example, only logs generated
from January 1, 2016 to January 10, 2016 are collected.

B. You can specify the IP address of a node for log collection. For example, only logs
generated in a specified IP address are downloaded.

C. You can specify a user for log collection. For example, only logs generated by userA
are downloaded.

D. You can specify an instance for log collection, for example, a specified instance for
collecting MetaStore logs.

Answer:C

21. Which method for loading data to Hive tables is incorrect?

A. You can load files on HDFS to Hive tables.

B. You can insert the result set of other tables into Hive tables.

C. You can directly insert a single record into a Hive table by CLI.

D. You can directly load files in the local path to Hive tables
Answer:C
22. Which of the following is not a MapReduce feature?

A. High scalability

B. Real-time computing

C. Easy to program

D. High fault tolerance

Answer:B

23. Which Hadoop module stores HDFS data?

A. NameNode

B. DataNode

C. JobTracker

D. ZooKeeper

Answer:B

24. Which of the following statements is incorrect about the function for configuring
services on FusionInsight Manager?

A. Instance-level configuration takes effect only for this instance.

B. The saved configuration takes effect after the services are restarted.

C. Service-level configuration takes effect for all instances.

D. Instance-level configuration takes effect for other instances.

Answer:D

25. Before and after the execution of Loader jobs, which node needs to communicate
with external data sources?

A. Active node of the Loader service

B. Neither A nor B

C. Node that executes YARN jobs

D. Both A and B

Answer:D
Multiple Choice

1. Which of the following processes are contained in the HBase service of FusionInsight
HD?

A. DataNode

B. Slave

C. Hmaster

D. RegionServer

Answer:C D

2. Which of the following scenarios does Hive apply to?

A. Real-time online data analysis

B. Data aggregation (daily/weekly click count and click count rankings)

C. Non-real-time data analysis (log analysis and statistics analysis)

D. Data mining (user behavior analysis, interest analysis, and partition demonstration)

Answer:B C D

3. Which statements about HDFS are correct?

A. Metadata on the standby NameNode of HDFS is synchronized from the active


NameNode.

B. HDFS stores the first replica on the nearest node.

C. HDFS consists of NameNodes, DataNodes, and clients.

D. HDFS is ideal for Write Once Read Many (WORM) tasks.

Answer:B C D

4. The Hadoop-based open source big data platform provides distributed data computing
and storage capabilities. Which of the following are distributed storage components?

A. HDFS

B. MapReduce

C. Hbase

D. Spark

Answer:A C
5. What are highlights of MapReduce?

A. Easy to program

B. Outstanding scalability

C. Real-time computing

D. High fault tolerance

Answer:A B D

6. When the Flume process is cascaded, which of the following sink types are used to
receive data sent by the last-hop Flume?

A. avro sink

B. Null sink

C. thrift sink

D. HDFS sink

Answer:C D

7. Which of the following methods or APIs are provided by Loader to manage tasks?

A. Web UI

B. Linux CLI

C. Java API

D. REST API

Answer:A B C D

8. Which resources can be managed by YARN in Hadoop?

A. CPU

B. Network

C. Memory

D. Disk space

Answer:A C
9. If the Hive service status is Bad on the FusionInsight HD Manager page, what are the
possible causes?

A. The HDFS service is unavailable.

B. The DBService is unavailable.

C. The Metastore instance is unavailable.

D. The HBase service is unavailable.

Answer:A B C D

10. Which of the following types of cache can be used in the search operation?

A. Query Result Cache

B. Document Cache

C. Filter Cache

D. Index Cache

Answer:A B C

11. Which statements about the standby NameNode in FusionInsight HD are correct?

A. The standby NameNode serves as the hot spare for the active NameNode.

B. The active and standby NameNodes must be deployed on the same node.

C. The standby NameNode helps the active NameNode merge edit logs, reducing the
startup time of the active NameNode.

D. The standby NameNode does not have specific memory requirements.

Answer:A C

12. Which are permanent processes of Spark?

A. SparkResource

B. JDBCServer

C. JobHistory

D. NodeManager

Answer:B C
13. When using kafka-topics.sh alter command to alter a topic configuration, which of
the following are mandatory parameters?

A. Partition

B. Topic information

C. Topic name

D. Zookeeper address

Answer:B C D

14. What information does the KeyValue of HBase HFiles contain?

A. Key Type

B. Value

C. Key

D. TimeStamp

Answer:A B C D

15. What are the main features of the YARN capacity scheduler?

A. Flexibility

B. Multi-leasing

C. Capacity assurance

D. Dynamic update of configuration files

Answer:A B C D

You might also like