You are on page 1of 64

1.

Before you create a Jupyter notebook in Watson Studio, which two items are
necessary? (Two answers)
A. Project
B. file
C. scala
D. url
E. spark instance
2.What is the architecture of Watson Studio centered on?
A. Collaborator
B. Projects
C. Analytic assets
D. data assets
3.Which Watson Studio offering used to be available through something known as
IBM Bluemix?
A. Watson Studio Cloud
B. // // Local
C. // // Desktop
D. // // Business
4.You need to add a collaborator to your project. What do you need?
A. The email of the collaborator
B. The list of deployments
C. list of your saved bookmarks
D. your project ID
5.Where does the unstructured data of a project reside in Watson Studio?
A. Wrapper
B. Tables
C. Database
D. Object Storage
6.What does the user interface for Jupyter look like to a user?
A. Common desktop app
B. Database Interface
C. App in web browser
D. Linux SSH session
7.What is a “magic” command used for in Jupyter?
A. Running common statistical analyses.
B. Autoconfig
C. extending the core language with shortcuts
D. Parsing and loading data into a notebook.
9.Why might a data scientist need a particular kind of GPU (graphics processing
unit)?
A. To display a simple bar chart of data on the screen
B. to collect video for use in streaming data applications
C. to perform certain data transformations quickly
D. to input commands to a data science notebook
10.What is a markdown cell used for in a data science notebook?
A. Documenting the computational process
B. writing code to transform data
C. holding the output of a computation
D. configuring data connections
11.Which is an advantage that Zeppelin holds over Jupyter?
A. Users must auth
B. Notebooks can be used by multiple ppl at the same time
C. notebooks can be connected to big data engines such as spark
D. Zeppelin is able to use the R language.
12.Which statement about Apache Spark is true?
A. it supports HDFS, MS-SQL, and Oracle
B. it is much faster than MapReduce for complex applications on disk
C. It runs on hadoop clusters with ram drives configured on each DataNode
D. It features API’S for C++ and .NET
13.Which two are valid watches for ZNodes in ZooKeeper?
A. NodeExpired.
B. NodeDeleted.
C. NodeRefreshed
D. NodeChildrenChanged
14.Which component of an Hadoop system is the primary cause of poor
performance?
A. disk latency
B. network
C. CPU
D. Ram
15.What is an example of a NoSQL datastore of the “Document Store” type?
A. Cassandra(column stores)
B. REDIS(key value)
C. HBase(column store)
D. MongoDB(document store)
16.What is the preferred replacement for Flume?
A. Hortonworks Data Flow
B. Storm
C. NiFi
D. Druid
17.Which statement describes an example of an application using streaming data?
A. An application evaluating sensor data in real-time
B. one time export and import of a db
C. a web application that supports 10,000 users
D. a system that stores many records in a database
18.Which Spark Core function provides the main element of Spark API?
A. MLlib
B. RDD
C. Mesos
D. YARN
19.Which statement is true about the combiner phase of the MapReduce architecture?
A. it aggregates all input data before it goes through the Map Phase
B. it reduces the amount of data that is sent to the Reducer task nodes
C. It determines the size and distribution of data split in the map phase
D. It is performed after the reducer phase to produce the ...
20.Apache Spark provides a single, unifying platform for which three of the following
types of operations? (Three answers)
A. transaction processing
B. ACID transaction
C. machine learning
D. record locking
E. batch processing
F. graph operations
21.Which is the java class prefix for the MapReduce v1 APIs?
A. org.apache.hadoop.mapred(v1)
B. org.apache.mr
C. org.apache.hadoop.mr
D. org.apache.mapreduce(v2)
22.Which two factors in a Hadoop cluster increase performance most significantly?
A. solid state disks
B. immediate failover of failed disk
C. parallel reading of large data
D. data redundancy on management nodes
E. high-speed networking between nodes
F. large number of small data
23.Which statement accurately describes how ZooKeeper works?
A. There can be more than one leader server at a time.
B. .all servers keep a copy of shared data memory
C. clients connect to multiple servers at the same time
D. Writes to a leader server will always succeed
24.Hadoop 2 consists of which three open-source sub-projects maintained by the
Apache Software Foundation?
A. HDFS
B. Hive
C. BigSQL
D. MapReduce
E. YARN
F. CloudBreak
25.Hadoop uses which two Google technologies as its foundation?
A. Google File System
B. HBase
C. YARN
D. Ambari
E. MapReduce
26.Which component of the Apache Ambari architecture provides statistical data to
the dashboard about the performance of a Hadoop cluster?

A. Ambari Server
B. Ambari metric system

27.What are two security features Apache Ranger provides?


Authorization, auditing

28.Under the YARN/MRv2 framework, which daemon arbitrates the execution of tasks
among all the applications in the system?
ResourceManager
29.Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns?
A. Sequence files
B. Parquet
C. RCFile
D. Flat
30.Which feature makes Apache Spark much easier to use than MapReduce?

A. Suitable for transaction processing.


B. Applications run in-memory
C. Libraries that support SQL queries
D. APIs for Scala, Python, C++, and .NET

31.Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases?
A. Oozie
B. Sqoop
C. Accumulo
D. HBase
32.Which two are attributes of streaming data?
A. Simple, numeric data.
B. Data is processed in batch.
C. Requires extremely rapid processing.
D. Sent in high volume.
33.Which type of cell can be used to document and comment on a process in a
Jupyter notebook?
A. output
B. markdown
C. code
D. kernel
34.Which component of the Spark Unified Stack allows developers to intermix
structured database queries with Spark's programming language?

A. MLlib
B. Mesos
C. Java
D. Spark SQL

35.What is the default directory in HDFS where tables are stored?

A. /app/hive/warehouse/data
B. /app/hive/warehouse/bigsql
C. /app/hive/warehouse/schema
D. /app/hive/warehouse/

36.What is an advantage of the ORC file format?

A. Big SQL can exploit advanced features


B. Efficient compression
C. Supported by multiple I/O engines
D. Data interchange outside Hadoop

37.What two security functions does Apache Knox provide?


A. Management of Kerberos in the cluster.
B. Database field access auditing.
C. Proxying services.
D. API and perimeter security.

38.Which Apache Hadoop application provides an SQL-like interface to allow


abstraction of data on semi-structured data in a Hadoop datastore?

A. YARN
B. Spark
C. Hive
D. Pig

39.Which two commands would you use to give or remove certain privileges to/from a
user?

A. REVOKE
B. GRANT
C. INSERT
D. LOAD
E. SELECT
40.When connecting to an external database in a federation, you need to use the
correct database driver and protocol. What is this federation component called in Big
SQL?
wrapper

41.Using the Java SQL Shell, which command will connect to a database called
mybigdata?
./jsqsh mybigdata

42.Which Big SQL feature allows users to join a Hadoop data set to data in external
databases?
A. Integration
B. Impersonation
C. Grant/Revoke privileges
D. Fluid query

43.Which definition best describes RCAC?


A. It limits the rows or columns returned based on certain criteria.
B. It grants or revokes certain user privileges.
C. It grants or revokes certain directory privileges.
D. It limits access by using views and stored procedures.

44.You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true?
bigsql.alltables.io.doAs
bigsql.impersonation.create.table.grant.public

45.Which command creates a user-defined schema function?


CREATE FUNCTION

46.You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format?
STORED AS parquetfile

47.Which tool would you use to create a connection to your Big SQL database?
DSM
48.Which directory permissions need to be set to allow all users to create their own
schema?
777

49.How many Big SQL management nodes do you need at minimum?


A. 1
B. 4
C. 3
D. 2

50.You need to determine the permission setting for a new schema directory. Which
tool would you use?
umask

51.What are Big SQL database tables organized into?


Schemas

52.What is the first step in a data science pipeline?


acquisition

53.What command is used to list the "magic" commands in Jupyter?


%lsmagic

54.Under the MapReduce v1 programming model, which optional phase is executed


simultaneously with the Shuffle phase?
combiner

55.Which hardware feature on an Hadoop datanode is recommended for cost efficient


performance?
JBOD

56.Which statement is true about Hortonworks Data Platform (HDP)?


It is a Hadoop distribution based on a centralized architecture with YARN at its core.

57.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
58.What are two ways the command-line parameters for a Sqoop invocation can be
simplified?
Include the --options-file command line argument.
Place the commands in a file.

59.If a Hadoop node goes down, which Ambari component will notify the
Administrator?
Ambari Alert Framework
60.Which three programming languages are directly supported by Apache Spark?
(Three answers)
Scala
Python
Java

61.Which Apache Hadoop application provides a high-level programming language


for data transformation on unstructured data?
Pig

62.Which component of an Hadoop system is the primary cause of poor


performance?
disk latency

63.Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are


components of which daemon?
ResourceManager

65.What is the final agent in a Flume chain named?


Collector

66..Which component of the Hortonworks Data Platform (HDP) is the architectural


center of Hadoop and provides resource management and a central platform for
Hadoop applications?
YARN

68.What are two primary limitations of MapReduce v1?


Scalability
Resource utilization

69.Which description characterizes a function provided by Apache Ambari?


A wizard for installing Hadoop services on host servers.

70.How can a Sqoop invocation be constrained to only run one mapper?


Use the -m 1 parameter.
71.What does the split-by parameter tell Sqoop?
The column to use as the primary key

72.Which NoSQL datastore type began as an implementation of Google's BigTable


that can store any type of data and scale to many petabytes?
HBase

73.Under the MapReduce v1 programming model, which shows the proper order of
the full set of MapReduce phases?
Map -> Combine -> Shuffle -> Reduce

74.Which computing technology provides Hadoop's high performance?


Parallel Processing

75.Which statement is true about MapReduce v1 APIs?


MapReduce v1 APIs are implemented by applications which are largely independent
of the execution environment.

76.Under the YARN/MRv2 framework, the JobTracker functions are split into which
two daemons?
ResourceManager
ApplicationMaster

77.Which component of the Apache Ambari architecture integrates with an


organization's LDAP or Active Directory service?
Authorization Provider

78.Which Apache Hadoop component can potentially replace an RDBMS as a large


Hadoop datastore and is particularly good for "sparse data"?
HBase

79.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
80.What are three IBM value-add components to the Hortonworks Data Platform
(HDP)?
Big Replicate
Big Match
Big SQL

81.Which three are a part of the Five Pillars of Security?


Audit
Data Protection
Administration

82.What are two services provided by ZooKeeper?


Providing distributed synchronization.
Maintaining configuration information.

83.Under the YARN/MRv2 framework, which daemon is tasked with negotiating with
the NodeManager(s) to execute and monitor tasks?
A. TaskManager
B. ResourceManager
C. JobMaster
D. ApplicationMaster

84.Which statement best describes a Big SQL database table?


A directory with zero or more data files.

85.You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run?

A. hdfs dfs -chmod 755 /hive/warehouse


B. hdfs dfs -chmod 666 /hive/warehouse
C. hdfs dfs -chmod 770 /hive/warehouse
D. hdfs dfs -chmod 700 /hive/warehouse

86.What is an example of a Key-value type of NoSQL datastore?


REDIS

87.Under the MapReduce v1 programming model, what happens in a "Reduce" step?


Data is aggregated by worker nodes.
88.Apache Spark can run on which two of the following cluster managers? (Two
answers)
Hadoop YARN
Apache Mesos

89.What are two common issues in distributed systems? (Two answers)


Finding a particular node within the cluster.
Partial failure of the nodes during execution.

90.Apache Spark can run on which two of the following cluster managers?
A. Apache Mesos
B. oneSIS
C. Nomad
D. Linux Cluster Manager
E. Hadoop YARN
V11.2
Unit 1. Introduction to big data

Uempty

Review answers
1. True or False: the number of Vs of big data are exactly four.

2. Data that can be stored and processed in a fixed format is


called:
A. Structured
B. Semi-structured
C. Unstructured
D. Machine generated

3. True or False: Agriculture is one of the industry sectors that


are using big data and analytics to help to improve and
transform their industries.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-46. Review answers

© Copyright IBM Corp. 2016, 2021 1-60


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Review answers (cont.)


4. Hadoop is good for:
A. Processing transactions (random access)
B. Massive amounts of data through parallelism
C. Processing lots of small files
D. Intensive calculations with little data
E. Low latency data access

5. True or False: One of Hadoop main characteristics is that


applications are written in low-level language code.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-47. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 1-61


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Introduction to Hortonworks Data Platform (HDP)

Uempty

Review answers
1. Which of thes components of HDP provides data access
capabilities?
A. MapReduce
B. Falcon
C. Ranger
D. Ambari
2. Identify the component that is a messaging system used for
real-time data pipelines
A. Nifi
B. Sqoop
C. Kafka
D. None of the following
3. True or False: Big Match is added value from IBM.

4. True or False: IBM BigIntegrate Provides data quality


features of Information Server.

Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

Figure 2-57. Review answers

© Copyright IBM Corp. 2016, 2021 2-68


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Introduction to Hortonworks Data Platform (HDP)

Uempty

Review answers
5. IBM BigQuality provides scalable engine to
A. Manage
B. Design
C. Connect
D. Cleanse

Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

Figure 2-58. Review answers

© Copyright IBM Corp. 2016, 2021 2-69


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 3. Introduction to Apache Ambari

Uempty

Review answers
1. True or False: Apache Ambari is backed by RESTful APIs for
developers to easily integrate with their own applications.
2. Which functions does AMS provide?
A. Monitors the health and status of the Hadoop cluster.
B. Starts, stops, and reconfigures Hadoop services across the cluster.
C. Collects, aggregates, and serves Hadoop and system metrics.
D. Handles the configuration of Hadoop services for the cluster.
3. Which page from the Apache Ambari UI enables you to check the
versions of the software that is installed on your cluster?
A. Cluster Admin > Stack and Versions
B. admin > Service Accounts
C. Services
D. Hosts

Introduction to Apache Ambari © Copyright IBM Corporation 2021

Figure 3-30. Review answers

Write your answers here:


1.
2.
3.

© Copyright IBM Corp. 2016, 2021 3-37


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 3. Introduction to Apache Ambari

Uempty

Review answers (cont.)


4. True or False: Creating users through the Apache Ambari Web
UI also creates the user on the HDFS.
5. True or False: You can use the cURL commands to issue
commands to Apache Ambari.

Introduction to Apache Ambari © Copyright IBM Corporation 2021

Figure 3-31. Review answers (cont.)

Write your answers here:


1.
2.
3.

© Copyright IBM Corp. 2016, 2021 3-38


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 4. Apache Hadoop and HDFS

Uempty

Review answers
1. True or False: Hadoop systems are designed for using a
single server.
2. What is the default number of replicas in a Hadoop system?
A. 1
B. 2
C. 3
D. 4
3. True or False: One of the Hadoop goals is fault tolerance by
detecting faults and applying quick and automatic recovery.
4. True or False: At least two NameNodes are required for a
stand-alone Hadoop cluster.
5. The default Hadoop block size is:
A. 16
B. 32
C. 64
D. 128

Apache Hadoop and HDFS © Copyright IBM Corporation 2021

Figure 4-34. Review answers

© Copyright IBM Corp. 2016, 2021 4-49


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Review answers
1. Which of the following phases in a MapReduce job is
optional?
A. Map
B. Shuffle
C. Reduce
D. Combiner
2. True or False: Interactive, online, and streaming
applications are not allowed to run on Hadoop v2
3. The JobTracker in MRv1 is replaced by which components
in YARN? (Select all that apply.)
A. ResourceManager
B. NodeManager
C. ApplicationMaster
D. TaskTracker

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-63. Review answers

Write your answers here:


1. D. (slide 13)
2. False (slide 48)
3. A. and C

© Copyright IBM Corp. 2016, 2021 5-75


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Review answers (cont.)


4. True or False: The major change from Hadoop v1 to
Hadoop v2 is the separation of cluster and resource
management from the execution and data processing
environment.
5. True or False: It is possible to run unmodified MapReduce
v1 jobs by using the same MapReduce API and CLI in
Hadoop v2.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-64. Review answers (cont.)

Write your answers here:


4. True (slide 57)
5. True (slide 49)

© Copyright IBM Corp. 2016, 2021 5-76


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 6. Introduction to Apache Spark

Uempty

Review answers
1. True or False: Ease of use is one of the benefits of using
Apache Spark.
2. Which language is supported by Apache Spark?
A. C++
B. C#
C. Java
D. Node.js
3. True or False: Scala is the primary abstraction of Apache
Spark.
4. In RDD actions, which function returns all the elements of
the data set as an array of the driver program?
A. Collect
B. Take
C. Count
D. Reduce
5. True or False: Referencing a data set is one of the methods
to create RDD.
Introduction to Apache Spark © Copyright IBM Corporation 2021

Figure 6-62. Review answers

© Copyright IBM Corp. 2016, 2021 6-84


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 7. Storing and querying data

Uempty

Review answers
1. What is the data representation format of an RC or ORC
file?
A. Row-based encoding
B. Record-based encoding
C. Column-based storage
D. NoSQL data store
2. True or False: A NoSQL database is designed for those
developers that do not want to use SQL.
3. HBase is an example of which of the following NoSQL data
store type?
A. Key-value store
B. Graph store
C. Column store
D. Document store

Storing and querying data © Copyright IBM Corporation 2021

Figure 7-71. Review answers

Write your answers here:


1.
2.
3.

© Copyright IBM Corp. 2016, 2021 7-86


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 7. Storing and querying data

Uempty

Review answers
4. Which database provides an SQL for Hadoop interface?
A. Hbase
B. Apache Hive
C. Cloudant
D. MongoDB
5. True or False: R is a real programming language, and
Python is an interactive environment for doing statistics.

Storing and querying data © Copyright IBM Corporation 2021

Figure 7-72. Review answers

Write your answers here:


1.
2.
3.

© Copyright IBM Corp. 2016, 2021 7-87


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 8. Security and governance

Uempty

Review answers
1. Kerberos is used by Hadoop for:
A. Authentication
B. Authorization
C. Auditing
D. Data protection
2. ______ is used by Hadoop for API and perimeter security.
A. Apache Ambari
B. Apache Knox
C. Apache Ranger
D. Data Steward Studio
3. True or False: Kerberos provides automation and
management of Apache Ambari in the Hadoop cluster.

Security and governance © Copyright IBM Corporation 2021

Figure 8-23. Review answers

© Copyright IBM Corp. 2016, 2021 8-30


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 8. Security and governance

Uempty

Review answers (cont.)


4. ______ is a common set of services to manage, secure, and
govern data assets across multiple tiers and types.
A. Data Services Catalog
B. Data Lifecycle Manager
C. DataPlane Service
D. Data Steward Studio
5. True or False: Ethnic or racial origin and cards or numbers are
types of sensitive personal information (SPI).

Security and governance © Copyright IBM Corporation 2021

Figure 8-24. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 8-31


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 9. Stream computing

Uempty

Review answers
1. True or False: IBM Streams needs Apache Storm or Apache
Spark to provide the analytics
2. True or False: Streaming data is limited to sensors,
cameras, and video.
3. What are the differences between NiFi and MiNiFi?
A. NiFi is small and has low resource consumption.
B. NiFi is subproject of MiNiFi.
C. NiFi is a disk-based and microbatch ETL tool.
D. They are the same.

Stream computing © Copyright IBM Corporation 2021

Figure 9-30. Review answers

© Copyright IBM Corp. 2016, 2021 9-42


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 9. Stream computing

Uempty

Review answers (cont.)


4. True or False: Development support is one of the features
that IBM Streams provides as a streaming data platform.
5. True or False: IBM Streams uses a Java engine.

Stream computing © Copyright IBM Corporation 2021

Figure 9-31. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 9-43


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Using IBM Db2 Big SQL to access Apache Hadoop Distributed File System data

Uempty

Review answers
1. What are the reasons that your organization would use Db2
Big SQL?
A. Want to access your Hadoop data without using MapReduce.
B. It has superior SQL-on-Hadoop performance to optimize data
ingestion and query performance for your enterprise.
C. No deep learning curve because it is compliant with industry
standards for SQL.
D. No need to change the format or migrate the data out of
Hadoop to do any work on the data.
E. All the above.
2. True or False: For optimal performance and function, the
ORC file format is recommended for Db2 Big SQL.
3. True or False: You can access Db2 Big SQL through Db2
Big SQL console only.

Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data

Figure 1-25. Review answers

© Copyright IBM Corp. 2016, 2021 1-36


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Using IBM Db2 Big SQL to access Apache Hadoop Distributed File System data

Uempty

Review answers (cont.)


4. What is the feature that Db2 Big SQL uses to query data
efficiently on Hadoop and combine data that is spread
across different enterprise data warehouses?
A. Self-tuning memory manager (STMM).
B. Query optimizer.
C. Federation capability.
D. Workload manager (WLM).
5. Which of the following items is responsible for
communicating with the Hive metastore to gather metadata,
and to bridge the RDBMS domain and the Hadoop domain?
A. Db2 Big SQL head.
B. SQL engine's query optimizer.
C. Db2 Big SQL worker.
D. Db2 Big SQL scheduler.

Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data

Figure 1-26. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 1-37


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Creating IBM Db2 Big SQL schemas and tables

Uempty

Review answers
1. True or False: The INSERT operation is the recommended
method for getting data into your Db2 Big SQL table for best
performance.

2. What does the EXTERNAL keyword indicate, and when it is


added to the CREATE TABLE statement?
A. The table is not managed by the database manager.
B. When the table is dropped, the definition of that table is
removed, but the data remains untouched.
C. It specifies the directory within the DFS to store the data files.
D. A and B.

3. True or False: Using the default STRING data type causes


performance degradation.

Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021

Figure 2-30. Review answers

© Copyright IBM Corp. 2016, 2021 2-43


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Creating IBM Db2 Big SQL schemas and tables

Uempty

Review answers (cont.)


4. Which of the flowing data types does Db2 Big SQL support?
A. TINYINT.
B. CHAR.
C. TIMESTAMP.
D. ARRAY.
E. All the above.

5. True or False: The PARTITIONED BY clause creates a


directory in the DFS that contains data for each unique value
in the column that is being partitioned, which enhances
performance for the query's search criteria.

Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021

Figure 2-31. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 2-44


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 3. File formats and querying IBM Db2 Big SQL tables

Uempty

Review answers
1. Which file format is supported by Db2 Big SQL?
A. Text
B. Sequence
C. Parquet
D. ORC
E. Avro
F. All the above

2. Which file format provides excellent performance when used


with Db2 Big SQL?
A. Parquet
B. Avro
C. ORC
D. A and B
E. A and C

File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021

Figure 3-36. Review answers

© Copyright IBM Corp. 2016, 2021 3-61


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 3. File formats and querying IBM Db2 Big SQL tables

Uempty

Review answers (cont.)


3. True or False: Parquet relies on schemas for communicating
the structure of the data.

4. Which file format maintains extra metadata to recognize


record boundaries?
A. Text
B. Sequence
C. Parquet
D. ORC
E. Avro.

5. True or False: The ORC format uses type-specific encoders


for each column and divides the file into large stripes.
.

File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021

Figure 3-37. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 3-62


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 4. Configuring IBM Db2 Big SQL security

Uempty

Review answers
1. You can control authorization of Db2 Big SQL objects on
which of the following levels?
A. With authorization in the distributed file system.
B. Authorization with the GRANT command.
C. Authorization at the row and column levels.
D. By using VIEWS or STORED PROCEDURES
E. All the above.

2. True or False: You can Kerberize the cluster only after


installing Db2 Big SQL.

3. True or False: RCAC controls access to a table at the row


level, column level, or both.

Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021

Figure 4-48. Review answers

© Copyright IBM Corp. 2016, 2021 4-66


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 4. Configuring IBM Db2 Big SQL security

Uempty

Review answers (cont.)


4. Which of the following items is true about Apache Ranger?
A. You must first install Apache Ranger, and then you can enable
the Db2 Big SQL Apache Ranger plug-in
B. It is a framework to enable, monitor, and manage
comprehensive data security across the Hadoop platform.
C. It can audit all access to Db2 Big SQL tables, views, and
nicknames.
D. All the above.

5. True or False: UMASK is used to determine the permissions


settings for new directories and files that are created.

Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021

Figure 4-49. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 4-67


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. Data federation with IBM Db2 Big SQL

Uempty

Review answers
1. What are the characteristics of a federated system?
A. Transparent
B. Extensible
C. Autonomous
D. High performance
E. High function
F. All the above

2. Which of the following products are supported data sources?


A. Db2
B. Oracle
C. Teradata
D. MS SQL Server
E. All the above

3. True or False: User mappings are used to authenticate to


the remote data source.

Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021

Figure 5-25. Review answers

© Copyright IBM Corp. 2016, 2021 5-37


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. Data federation with IBM Db2 Big SQL

Uempty

Review answers (cont.)


4. True or False: The federated database manages the
federated system and often sends parts of the requests it
receives to the data sources for processing.

5. Which of the following items is responsible for defining the


data sources to the federated database, besides defining the
property and values of the connection?
A. Wrappers
B. Server definitions
C. User mappings
D. Nicknames

6. True or False: Nicknames can be used for wrappers and


servers.

Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021

Figure 5-26. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 5-38


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to IBM Watson Studio

Uempty

Review answers
1. True or False: Watson Studio is designed only for the data
scientist persona.

2. True or False: Object storage provides the space where


unstructured data for your project is stored.

3. True or False: Collaborators can be given different access


levels.

4. True or False: Collaboration within Watson Studio is an


optional add-on component that must be purchased.

5. True or False: Watson Studio Gallery provides access to


sample Notebooks and data sets that you can use.

Introduction to IBM Watson Studio © Copyright IBM Corporation 2021

Figure 1-34. Review answers

1. True or false: Watson Studio is designed only for the data scientist persona.
False: The data engineer, the data scientist, the business analyst, and the app developer can
use Watson Studio. Each persona has primary tools that help them do their job. For example,
the data scientist's main tool is Watson Studio, but other personas can use it as it relates to their
jobs. Also, there might be a team of different personas. Whatever the case is, you must decide
what tool is correct for the job regardless of the personas. The definitions of personas can vary
between different companies and evolve over time.
2. True or False: Object storage provides the space where unstructured data for your project is
stored.
3. True or False: Collaborators can be given certain access levels.
True: Add collaborators to your project by using their email addresses. If they have an existing
account on IBM Cloud, they are added immediately. Otherwise, they receive an invite to create
a Watson Studio account. Choose the permissions for each collaborator. The Admin role can
control project assets, collaborators, and settings. The Editor role can control project assets.
The Viewer role can view the project. Collaborators can be removed from a project or have their
permissions updated.
4. True or False: Collaboration within Watson Studio is an optional add-on component that must
be purchased.
False: Collaboration is a standard feature in Watson Studio.

© Copyright IBM Corp. 2016, 2021 1-56


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Analyzing data with IBM Watson Studio

Uempty

Review answers
1. Which of the following statements are true about Apache Spark
environments in Watson Studio?
A. Apache Spark environments can be accessed by using the S3
API.
B. Apache Spark environments define the hardware and software
configurations to start custom Apache Spark clusters on demand.
C. Apache Spark environments provide the storage that is required
by Watson Studio projects.
D. Apache Spark environments in Watson Studio are used by tools
such as Notebooks
E. Users must always create their own Apache Spark environment
definition.
F. Watson Studio provides default Apache Spark environment
definitions to get started quickly.
2. True or False: Multiple Notebooks cannot be associated with
the same Apache Spark environment definition.
3. True or False: /tmp is not a shared file system and cannot be
accessed by Apache Spark executors.

Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021

Figure 2-42. Review answers

Write your answers here:


1. B, D, and F
2. False (see slide 8)
3. True (see slide 10)

© Copyright IBM Corp. 2016, 2021 2-56


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 2. Analyzing data with IBM Watson Studio

Uempty

Review answers (cont.)


4. True or False: IBM Cloud Block Storage can be associated
with a Watson Studio project.
5. ____ are used to organize the data in Cloud Object Storage.
A. Objects
B. Device sets
C. Buckets
D. Storage pools
E. IOPS.

Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021

Figure 2-43. Review answers (cont.)

Write your answers here:


4. False. Currently, the only choice for project storage is IBM Cloud Object Storage.
5. C.

© Copyright IBM Corp. 2016, 2021 2-57


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
02/02/2023 23:17 ChatGPT Conversation

ChatGPT Conversation
Time: 2023-02-02T22:15:20.570Z

Generated by ChatGPT Exporter

blockchain :pptx The response must be short.

I'm sorry, I am a text-based AI and am unable to create or provide PowerPoint


presentations. However, I can provide information on blockchain technology if you
have any specific questions.

Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.

c. Requires extremely rapid processing. d. Sent in high volume.

What are two primary limitations of MapReduce v1? a. Number of TaskTrackers


limited to 1,000 b. Resource utilization c. Workloads limited to MapReduce d.
Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs

d. Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs.

Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System

d. Ambari Metrics System.

Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL

d. Spark SQL.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 1/10
02/02/2023 23:17 ChatGPT Conversation

Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are


components of which daemon? a. ResourceManager b. ApplicationMaster c.
ScheduleManager d. TaskManager

a. ResourceManager.

Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns? a. SequenceFiles b. RCFile c.
Parquet d. Flat

a. SequenceFiles.

Hadoop uses which two Google technologies as its foundation? a. HBase b. Google
File System c. YARN d. MapReduce e. Ambari

d. MapReduce b. Google File System.

What are two security features Apache Ranger provides? a. Authentication b.


Availability c. Authorization d. Auditing

a. Authentication c. Authorization.

Which component of the HDFS architecture regulates client access to files? a.


DataNode b. NameNode c. SlaveNode d. WorkerNode

b. NameNode.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 2/10
02/02/2023 23:17 ChatGPT Conversation

Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety

c. Veracity.

Which statement is true about MapReduce v1 APIs? a. MapReduce v1 APIs provide


a flexible execution environment to run MapReduce. b. MapReduce v1 APIs define
how MapReduce jobs are executed. c. MapReduce v1 APIs cannot be used with
YARN. d. MapReduce v1 APIs are implemented by applications which are largely
independent of the execution environment.

b. MapReduce v1 APIs define how MapReduce jobs are executed.

Which statement is true about the Combiner phase of the MapReduce architecture?
a. It is performed after the Reducer phase to produce the final output. b. It reduces
the amount of data that is sent to the Reducer task nodes. c. It aggregates all input
data before it goes through the Map phase. d. It determines the size and distribution
of data split in the Map phase.

b. It reduces the amount of data that is sent to the Reducer task nodes.

Which description characterizes a function provided by Apache Ambari? a. Moves


large amounts of streaming event data. b. Moves information to/from structured
databases. c. A wizard for installing Hadoop services on host servers. d. A messaging
system for real-time data pipelines.

c. A wizard for installing Hadoop services on host servers.

Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.
ResourceManager c. TaskManager d. JobMaster

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 3/10
02/02/2023 23:17 ChatGPT Conversation

a. ApplicationMaster.

What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML

b. Unstructured data.

Which Spark Core function provides the main element of Spark API? a. MLlib b.
Mesos c. YARN d. RDD

d. RDD (Resilient Distributed Datasets).

Hadoop 2 consists of which three open-source sub-projects maintained by the


Apache Software Foundation? a. Big SQL b. Hive c. YARN d. MapReduce e.
Cloudbreak f. HDFS

Hadoop 2 consists of three open-source sub-projects maintained by the Apache


Software Foundation: HDFS, YARN, and MapReduce.

Which Apache Hadoop application provides an SQL-like interface to allow abstraction


of data on semi-structured data in a Hadoop datastore? a. YARN b. Pig c. Hive d.
Spark

c. Hive

Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager

Under the YARN/MRv2 framework, the JobTracker functions are split into two
daemons: the ResourceManager and the ApplicationMaster.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 4/10
02/02/2023 23:17 ChatGPT Conversation

Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce

a. Map -> Combine -> Shuffle -> Reduce

Which is the java class prefix for the MapReduce v1 APIs? a.


org.apache.hadoop.mapred b. org.apache.mapreduce c. org.apache.hadoop.mr d.
org.apache.mr

a. org.apache.hadoop.mapred

Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN

Apache Spark can run on the following two cluster managers:

a. Apache Mesos b. Hadoop YARN

Which Apache Hadoop application provides a high-level programming language for


data transformation on unstructured data? a. Sqoop b. Hive c. Pig d. Zookeeper

c. Pig

Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing

Apache Spark provides a single, unifying platform for the following three types of
operations:

a. Batch processing b. Machine learning c. Graph operations


file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 5/10
02/02/2023 23:17 ChatGPT Conversation

What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.

a. Management of Kerberos in the cluster. c. API and perimeter security.

Under the MapReduce v1 programming model, which optional phase is executed


simultaneously with the Shuffle phase? a. Combiner b. Map c. Reduce d. Split

a. Combiner

Which component of the Apache Ambari architecture integrates with an organization's


LDAP or Active Directory service? a. Postgres RDBMS b. Ambari Alert Framework c.
REST API d. Authorization Provider

d. Authorization Provider

Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety

e. Variety

What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN

a. Db2 Big SQL

b. Big Replicate

c. Big Index

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 6/10
02/02/2023 23:17 ChatGPT Conversation

Which Apache Hadoop component can potentially replace an RDBMS as a large


Hadoop datastore and is particularly good for "sparse data"? a. Spark b. HBase c.
MapReduce d. Ambari

b. HBase

Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN

d. YARN

Which statement describes an example of an application using streaming data? a. A


system that stores many records in a database. b. One time export and import of a
database. c. A web application that supports 10,000 users. d. An application
evaluating sensor data in real-time.

d. An application evaluating sensor data in real-time.

Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase

c. Sqoop

Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit

The three components that are a part of the Five Pillars of Security are:

1. Data Protection
2. Administration
3. Audit

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 7/10
02/02/2023 23:17 ChatGPT Conversation

What is an example of a NoSQL datastore of the "Document Store" type? a. HBase b.


MongoDB c. Cassandra d. REDIS

b. MongoDB

You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT

You would use the "hadoop fs -chmod" command in HDFS to determine the
permission setting for a new schema directory in Hadoop.

choose only one correct answer without explaining You need to determine the
permission setting for a new schema directory. Which tool would you use? a. umask
b. HDFS c. Kerberos d. GRANT

b. HDFS

Which definition best describes RCAC? a. It grants or revokes certain directory


privileges. b. It grants or revokes certain user privileges. c. It limits the rows or
columns returned based on certain criteria. d. It limits access by using views and
stored procedures.

c. It limits the rows or columns returned based on certain criteria.

Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables

The correct command is `./jsqsh mybigdata`.

You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true? a. DB2_ATS_ENABLE b.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 8/10
02/02/2023 23:17 ChatGPT Conversation

bigsql.impersonation.create.table.grant.public c. DB2COMPOPT d.
bigsql.alltables.io.doAs e. $BIGSQL_HOME/conf

d. bigsql.alltables.io.doAs e. bigsql.impersonation.enabled

Which directory permissions need to be set to allow all users to create their own
schema? a. 700 b. 755 c. 777 d. 666

c. 777

When connecting to an external database in a federation, you need to use the correct
database driver and protocol. What is this federation component called in Db2 Big
SQL? a. Data source b. Wrapper c. Nickname d. User mapping

a. Data source

What is the default directory in HDFS where tables are stored? a.


/apps/hive/warehouse/ b. /apps/hive/warehouse/bigsql c.
/apps/hive/warehouse/schema d. /apps/hive/warehouse/data

The default directory in HDFS where tables are stored is "/apps/hive/warehouse/".

Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.

d. A container for any record format.

You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 9/10
02/02/2023 23:17 ChatGPT Conversation

a. hdfs dfs -chmod 700 /hive/warehouse

What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas

Db2 Big SQL database tables are organized into schemas.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 10/10
02/02/2023 23:18 ChatGPT Conversation

ChatGPT Conversation
Time: 2023-02-02T22:17:37.412Z

Generated by ChatGPT Exporter

Which statement is true about Hortonworks Data Platform (HDP)? a. It is a powerful


platform for managing large volumes of structured data. b. It is engineered and
developed by IBM's BigInsights team. c. It is a Hadoop distribution based on a
centralized architecture with YARN at its core. d. It is designed specifically for IBM Big
Data customers.

c. It is a Hadoop distribution based on a centralized architecture with YARN at its


core.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-17-37.html 1/1
02/02/2023 23:18 ChatGPT Conversation

ChatGPT Conversation
Time: 2023-02-02T21:38:12.326Z

Generated by ChatGPT Exporter

Which three programming languages are directly supported by Apache Spark? a.


Scala b. C# c. Python d. C++ e. .NET f. Java

a. Scala b. Python c. Java

Which Apache Hadoop component can potentially replace an RDBMS as a large


Hadoop datastore and is particularly good for "sparse data"? a. Spark b. HBase c.
MapReduce d. Ambari

b. HBase

Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL

d. Spark SQL

Which feature makes Apache Spark much easier to use than MapReduce? a.
Applications run in-memory. b. APIs for Scala, Python, C++, and .NET. c. Libraries
that support SQL queries. d. Suitable for transaction processing.

a. Applications run in-memory.

Which component of the HDFS architecture regulates client access to files? a.


DataNode b. NameNode c. SlaveNode d. WorkerNode

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 1/12
02/02/2023 23:18 ChatGPT Conversation

b. NameNode

What is an example of a NoSQL datastore of the "Document Store" type? a. HBase b.


MongoDB c. Cassandra d. REDIS

b. MongoDB

What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN

a. Db2 Big SQL b. Big Replicate c. Big Index

Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager

a. ResourceManager b. ApplicationMaster

Which component of an Hadoop system is the primary cause of poor performance? a.


CPU b. Network c. RAM d. Disk latency

d. Disk latency

Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are


components of which daemon? a. ResourceManager b. ApplicationMaster c.
ScheduleManager d. TaskManager

a. ResourceManager

Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 2/12
02/02/2023 23:18 ChatGPT Conversation

ResourceManager c. TaskManager d. JobMaster

a. ApplicationMaster

Which NoSQL datastore type began as an implementation of Google's BigTable that


can store any type of data and scale to many petabytes? a. CouchDB b. HBase c.
MemcacheD d. Riak

b. HBase

Hadoop 2 consists of which three open-source sub-projects maintained by the


Apache Software Foundation? a. Big SQL b. Hive c. YARN d. MapReduce e.
Cloudbreak f. HDFS

c. YARN d. MapReduce f. HDFS

If a Hadoop node goes down, which Ambari component will notify the Administrator?
a. REST API b. Ambari Metrics System c. Ambari Alert Framework d. Ambari Wizard

c. Ambari Alert Framework

Which Apache Hadoop application provides a high-level programming language for


data transformation on unstructured data? a. Sqoop b. Hive c. Pig d. Zookeeper

c. Pig

Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 3/12
02/02/2023 23:18 ChatGPT Conversation

c. Machine learning d. Graph operations f. Batch processing

Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN

b. Apache Mesos e. Hadoop YARN

Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit

The three that are a part of the Five Pillars of Security are:

Data Protection
Administration
Audit

What are two primary limitations of MapReduce v1? a. Number of TaskTrackers


limited to 1,000 b. Resource utilization c. Workloads limited to MapReduce d.
Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs

d. Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs

Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety

c. Veracity

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 4/12
02/02/2023 23:18 ChatGPT Conversation

Which statement is true about MapReduce v1 APIs? a. MapReduce v1 APIs provide


a flexible execution environment to run MapReduce. b. MapReduce v1 APIs define
how MapReduce jobs are executed. c. MapReduce v1 APIs cannot be used with
YARN. d. MapReduce v1 APIs are implemented by applications which are largely
independent of the execution environment.

b. MapReduce v1 APIs define how MapReduce jobs are executed.

Under the MapReduce v1 programming model, what happens in a "Reduce" step? a.


Data is aggregated by worker nodes. b. Input is split into pieces. c. Worker nodes
process pieces in parallel. d. Worker nodes store results on their own local file
systems.

In the "Reduce" step of the MapReduce v1 programming model, data is aggregated


by worker nodes. The output from the "Map" step, which is usually a large number of
intermediate key-value pairs, is shuffled, sorted, and then reduced into a smaller set
of values. The "Reduce" step takes the intermediate key-value pairs and combines
the values associated with the same key, aggregating the data into a smaller set of
values.

What are two security features Apache Ranger provides? a. Authentication b.


Availability c. Authorization d. Auditing

Apache Ranger provides two security features:

a. Authentication: It enables secure authentication and login for users and


applications to access Hadoop data and metadata.

c. Authorization: It provides fine-grained access control to data and metadata stored


in Hadoop by enforcing defined policies based on user, role, and data access
permissions.

Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 5/12
02/02/2023 23:18 ChatGPT Conversation

d. Ambari Metrics System

What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce? a. Python b. Spark c. Pig d.
Hive

b. Spark

Which statement is true about Hortonworks Data Platform (HDP)? a. It is a powerful


platform for managing large volumes of structured data. b. It is engineered and
developed by IBM's BigInsights team. c. It is a Hadoop distribution based on a
centralized architecture with YARN at its core. d. It is designed specifically for IBM Big
Data customers.

c. It is a Hadoop distribution based on a centralized architecture with YARN at its


core.

Which component of the Apache Ambari architecture integrates with an organization's


LDAP or Active Directory service? a. Postgres RDBMS b. Ambari Alert Framework c.
REST API d. Authorization Provider

d. Authorization Provider

What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.

Apache Knox provides two security functions: API and perimeter security, and
proxying services.

Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 6/12
02/02/2023 23:18 ChatGPT Conversation

d. YARN

Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce

a. Map -> Combine -> Shuffle -> Reduce

Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase

c. Sqoop

Which statement describes an example of an application using streaming data? a. A


system that stores many records in a database. b. One time export and import of a
database. c. A web application that supports 10,000 users. d. An application
evaluating sensor data in real-time.

d. An application evaluating sensor data in real-time.

Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.

c. Requires extremely rapid processing.

d. Sent in high volume.

Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 7/12
02/02/2023 23:18 ChatGPT Conversation

e. Variety

Which computing technology provides Hadoop's high performance? a. Online


Analytical Processing b. Parallel Processing c. RAID-0 d. Online Transactional
Processing

Parallel Processing provides Hadoop's high performance.

What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML

Unstructured data

Which definition best describes RCAC? a. It grants or revokes certain directory


privileges. b. It grants or revokes certain user privileges. c. It limits the rows or
columns returned based on certain criteria. d. It limits access by using views and
stored procedures.

c. It limits the rows or columns returned based on certain criteria.

You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT

d. GRANT

What is an advantage of the ORC file format? a. Db2 Big SQL can exploit advanced
features b. Data interchange outside Hadoop c. Supported by multiple I/O engines d.
Efficient compression

d. Efficient compression

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 8/12
02/02/2023 23:18 ChatGPT Conversation

Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.

D. A container for any record format.

Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables

b. ./jsqsh go mybigdata

What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas

Db2 Big SQL database tables are organized into schemas.

You are creating a new table and need to specify a directory within the DFS to store
its data files. Which keyword would you use? a. EXTERNAL b. LOCATION c.
PARTITIONED BY d. HADOOP

b. LOCATION

You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format? a. STORED AS parquetfile b.
STORED AS parquet c. CREATE AS parquetfile d. CREATE AS parquet

b. STORED AS parquet

Which statement is NOT true about Db2 Big SQL? a. It supports a variety of storage
formats. b. It supports comprehensive and standard SQL. c. It provides powerful
optimization and performance. d. It uses MapReduce to access your Hadoop data.

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 9/12
02/02/2023 23:18 ChatGPT Conversation

d. It uses MapReduce to access your Hadoop data.

Db2 Big SQL does not use MapReduce to access Hadoop data, it uses its own
optimized parallel processing engine.

Which tool would you use to create a connection to your Db2 Big SQL database? a.
Jupyter b. Ambari c. Scheduler d. Db2 Big SQL console

d. Db2 Big SQL console

Which of the following operations is not parallelized and recommended to be used


just for testing simple and quick operations? a. INSERT INTO … SELECT FROM b.
INSERT INTO … VALUES (…) c. CREATE TABLE … AS SELECT … d. LOAD

b. INSERT INTO … VALUES (…)

Which Db2 Big SQL feature allows users to join a Hadoop data set to data in external
databases? a. Impersonation b. Fluid query c. Grant/Revoke privileges d. Integration

d. Integration

You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse

a. hdfs dfs -chmod 700 /hive/warehouse

Which two commands would you use to give or remove certain privileges to/from a
user? a. INSERT b. REVOKE c. GRANT d. SELECT e. LOAD

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 10/12
02/02/2023 23:18 ChatGPT Conversation

c. GRANT b. REVOKE

Which type of cell can be used to document and comment on a process in a Jupyter
notebook? a. Kernel b. Code c. Markdown d. Output

c. Markdown

What is the architecture of Watson Studio centered on? a. Projects b. Collaborators c.


Data Assets d. Analytic Assets

a. Projects

Which of the following products is a cloud-native solution that enables you to put your
data to work quickly and efficiently? It can run anywhere, like on Red Hat OpenShift
cluster, in the cloud, or on-premises. a. Hadoop b. Cloud Pak for Data c. Big SQL d.
Spark

b. Cloud Pak for Data

What is the field of study that gives computers the ability to learn without being
explicitly programmed? a. Yarn b. SQL c. Hadoop d. Machine Learning

d. Machine Learning

In data science and the AI domain, which of the following products help you with data
preparation in a fully managed IBM Cloud environment? a. Watson Studio on IBM
Cloud Pak for Data b. Watson Studio Cloud (IBM Cloud) c. Watson Studio Local d.
Watson Studio Desktop e. Watson Studio professional

b. Watson Studio Cloud (IBM Cloud)

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 11/12
02/02/2023 23:18 ChatGPT Conversation

Which of the following products do you use to organize your resources; work with
data; and add collaborators, data assets, and tools to derive insights from data? a.
Pig b. Big SQL project c. Watson Studio Project d. Cloud Object Storage

c. Watson Studio Project

Which machine learning approach detects patterns and relationships between data
without using labeled data? a. Supervised Learning b. Unsupervised Learning c.
Semi-supervised Learning d. Reinforcement Learning

b. Unsupervised Learning

Which of the following products provides integration with GitHub and an interface for
downloading Notebooks? a. Flume b. Pig c. Watson Studio d. Hive

c. Watson Studio provides integration with GitHub and an interface for downloading
Notebooks.

Which of the following products can be deployed in the private or public cloud of your
choice, including IBM Cloud™, AWS, Microsoft Azure, or Google Cloud? a. Watson
Studio Cloud (IBM Cloud) b. Watson Studio on IBM Cloud Pak for Data c. Watson
Studio Local d. Watson Studio Desktop

b. Watson Studio on IBM Cloud Pak for Data

file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 12/12

You might also like