Professional Documents
Culture Documents
Right or Wrong
1. RDD data generated in file systems compatible with Hadoop can be partially
upgraded by invoking RDD operators.
Answer:Wrong
2. If there are two queues of the same level and the same capacity on YARN, the
scheduler preferentially allocates resources to the one with more tasks. For example, the
capacities of Q1 and Q2 are 50, and Q1 has 10 tasks that occupy 40 and Q2 has 2 tasks
that occupy 30. In this case, the scheduler preferentially allocates resources to Q1.
Answer:Wrong
Answer:Right
4. If a Loader job fails to be executed, the data imported during the job execution will
not be automatically deleted and must be manually deleted.
Answer:Wrong
Answer:Right
Single Choice
A. Producer
B. Broker
C. ZooKeeper
D. Consumer
Answer:B
2. Which statement is correct about the basic operations of Hive table creation?
A. Once a table has been created, you cannot change its name.
B. Once a table has been created, you cannot change its column names.
C. Once a table has been created, you cannot add columns to it.
Answer:D
3. Which of the following is the default block size of HDFS in the FusionInsight HD
system?
A. 256 MB
B. 128 MB
C. 32 MB
D. 64 MB
Answer:B
4. Which statement about file uploading from a client to HDFS in Hadoop is correct?
A. The file data of the client is transmitted to DataNodes via the NameNode.
B. The client writes files to each DataNode based on the DataNode address sequence
and then the DataNode divides files into multiple blocks.
C. The client divides files to multiple blocks and writes data to each DataNode based on
the DataNode address sequence.
D. The client only uploads data to a DataNode and the NameNode copies blocks.
Answer:C
5. In FusionInsight HD, when Loader imports files from the SFTP server, which of the
following file types does not require code and data conversion and delivers the most
rapid speed?
A. sequence_file
B. graph_file
C. text_file
D. binary_file
Answer:D
6. In the FusionInsight HD system, the minimum processing unit of HBase is Region.
Where is the routing information between the User Region and RegionServer stored?
A. HDFS
B. ZooKeeper
C. Master
D. META table
Answer:D
A. Junk data is generated after a Loader job fails to be executed. You need to clear it
manually.
B. If an exception occurs in the Loader service after Loader submits a job to YARN, the
job will fail to be executed.
C. After Loader submits a job to YARN and before the job execution is complete, other
jobs cannot be submitted.
D. If a Mapper task fails to be executed after Loader submits a job to YARN, the task
will retry automatically.
Answer:D
A. CPU
B. Memory
C. Disk space
D. Container
Answer:D
9. which of the following commands is used to check the integrity of data blocks in
HDFS
A. hdfs fsck /
Answer:A
10. In Flink, the ( ) interface is used for stream data processing, and the ( ) interface
is used for batch processing.
Answer:B
11. Which of the following calculation tasks is MapReduce good at dealing with?
A. Offline computing
B. Iterative computing
C. Streaming computing
Answer:A
12. If you need to view the current user and permission group of HBase in
FusionInsight HD, what command can you run in the HBase shell?
A. whoami
B. get_user
C. user_permission
D. who
Answer:A
13. Which of the following is used to divide stages when Spark applications are
running?
A. taskSet
B. task
C. shuffle
D. action
Answer:C
14. Which parameter should you configure to set the maximum resource usage of
QueueA in YARN?
A. yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent
B. yarn.scheduler.capacity.root.QueueA.user-limit-factor
C. yarn.scheduler.capacity.root.QueueA.state
D. yarn.scheduler.capacity.root.QueueA.maximum-capacity
Answer:D
15. Which statement about the common table and external table in Hive is incorrect?
B. When you delete a common table, the metadata and data in the table are deleted at
the same time.
D. When you delete an external table, only the data in the external table is deleted.
Answer:D
A. reduceByKey
B. Join
C. reduce
D. distinct
Answer:C
17. Which of the following programming languages is used for Spark implementation?
A. JAVA
B. C++
C. C
D. Scala
Answer:D
18. Which of the following service processes manages the HBase Region of the Hadoop
platform?
A. Hmaster
B. ZooKeeper
C. DataNode
D. RegionServer
Answer:D
19. Which of the following storage modes follow the concept of Yarn resources in the
Hadoop system?
A. Disk space
B. Container
C. Memory
D. CPU
Answer:B
20. Which statement is incorrect about the Hive log collection on the FusionInsight HD
Manager page?
A. You can specify a time period for log collection. For example, only logs generated
from January 1, 2016 to January 10, 2016 are collected.
B. You can specify the IP address of a node for log collection. For example, only logs
generated in a specified IP address are downloaded.
C. You can specify a user for log collection. For example, only logs generated by userA
are downloaded.
D. You can specify an instance for log collection, for example, a specified instance for
collecting MetaStore logs.
Answer:C
B. You can insert the result set of other tables into Hive tables.
C. You can directly insert a single record into a Hive table by CLI.
D. You can directly load files in the local path to Hive tables
Answer:C
22. Which of the following is not a MapReduce feature?
A. High scalability
B. Real-time computing
C. Easy to program
Answer:B
A. NameNode
B. DataNode
C. JobTracker
D. ZooKeeper
Answer:B
24. Which of the following statements is incorrect about the function for configuring
services on FusionInsight Manager?
B. The saved configuration takes effect after the services are restarted.
Answer:D
25. Before and after the execution of Loader jobs, which node needs to communicate
with external data sources?
B. Neither A nor B
D. Both A and B
Answer:D
Multiple Choice
1. Which of the following processes are contained in the HBase service of FusionInsight
HD?
A. DataNode
B. Slave
C. Hmaster
D. RegionServer
Answer:C D
D. Data mining (user behavior analysis, interest analysis, and partition demonstration)
Answer:B C D
Answer:B C D
4. The Hadoop-based open source big data platform provides distributed data computing
and storage capabilities. Which of the following are distributed storage components?
A. HDFS
B. MapReduce
C. Hbase
D. Spark
Answer:A C
5. What are highlights of MapReduce?
A. Easy to program
B. Outstanding scalability
C. Real-time computing
Answer:A B D
6. When the Flume process is cascaded, which of the following sink types are used to
receive data sent by the last-hop Flume?
A. avro sink
B. Null sink
C. thrift sink
D. HDFS sink
Answer:C D
7. Which of the following methods or APIs are provided by Loader to manage tasks?
A. Web UI
B. Linux CLI
C. Java API
D. REST API
Answer:A B C D
A. CPU
B. Network
C. Memory
D. Disk space
Answer:A C
9. If the Hive service status is Bad on the FusionInsight HD Manager page, what are the
possible causes?
Answer:A B C D
10. Which of the following types of cache can be used in the search operation?
B. Document Cache
C. Filter Cache
D. Index Cache
Answer:A B C
11. Which statements about the standby NameNode in FusionInsight HD are correct?
A. The standby NameNode serves as the hot spare for the active NameNode.
B. The active and standby NameNodes must be deployed on the same node.
C. The standby NameNode helps the active NameNode merge edit logs, reducing the
startup time of the active NameNode.
Answer:A C
A. SparkResource
B. JDBCServer
C. JobHistory
D. NodeManager
Answer:B C
13. When using kafka-topics.sh alter command to alter a topic configuration, which of
the following are mandatory parameters?
A. Partition
B. Topic information
C. Topic name
D. Zookeeper address
Answer:B C D
A. Key Type
B. Value
C. Key
D. TimeStamp
Answer:A B C D
15. What are the main features of the YARN capacity scheduler?
A. Flexibility
B. Multi-leasing
C. Capacity assurance
Answer:A B C D