Big Datafffff

1.
Before you create a Jupyter notebook in Watson Studio, which two items are
necessary? (Two answers)
A. Project
B. file
C. scala
D. url
E. spark instance
2.What is the architecture of Watson Studio centered on?
A. Collaborator
B. Projects
C. Analytic assets
D. data assets
3.Which Watson Studio offering used to be available through something known as
IBM Bluemix?
A. Watson Studio Cloud
B. // // Local
C. // // Desktop
D. // // Business
4.You need to add a collaborator to your project. What do you need?
A. The email of the collaborator
B. The list of deployments
C. list of your saved bookmarks
D. your project ID
5.Where does the unstructured data of a project reside in Watson Studio?
A. Wrapper
B. Tables
C. Database
D. Object Storage
6.What does the user interface for Jupyter look like to a user?
A. Common desktop app
B. Database Interface
C. App in web browser
D. Linux SSH session
7.What is a “magic” command used for in Jupyter?
A. Running common statistical analyses.
B. Autoconfig
C. extending the core language with shortcuts
D. Parsing and loading data into a notebook.
9.Why might a data scientist need a particular kind of GPU (graphics processing
unit)?
A. To display a simple bar chart of data on the screen
B. to collect video for use in streaming data applications
C. to perform certain data transformations quickly
D. to input commands to a data science notebook
10.What is a markdown cell used for in a data science notebook?
A. Documenting the computational process
B. writing code to transform data
C. holding the output of a computation
D. configuring data connections
11.Which is an advantage that Zeppelin holds over Jupyter?
A. Users must auth
B. Notebooks can be used by multiple ppl at the same time
C. notebooks can be connected to big data engines such as spark
D. Zeppelin is able to use the R language.
12.Which statement about Apache Spark is true?
A. it supports HDFS, MS-SQL, and Oracle
B. it is much faster than MapReduce for complex applications on disk
C. It runs on hadoop clusters with ram drives configured on each DataNode
D. It features API’S for C++ and .NET
13.Which two are valid watches for ZNodes in ZooKeeper?
A. NodeExpired.
B. NodeDeleted.
C. NodeRefreshed
D. NodeChildrenChanged
14.Which component of an Hadoop system is the primary cause of poor
performance?
A. disk latency
B. network
C. CPU
D. Ram
15.What is an example of a NoSQL datastore of the “Document Store” type?
A. Cassandra(column stores)
B. REDIS(key value)
C. HBase(column store)
D. MongoDB(document store)
16.What is the preferred replacement for Flume?
A. Hortonworks Data Flow
B. Storm
C. NiFi
D. Druid
17.Which statement describes an example of an application using streaming data?
A. An application evaluating sensor data in real-time
B. one time export and import of a db
C. a web application that supports 10,000 users
D. a system that stores many records in a database
18.Which Spark Core function provides the main element of Spark API?
A. MLlib
B. RDD
C. Mesos
D. YARN
19.Which statement is true about the combiner phase of the MapReduce architecture?
A. it aggregates all input data before it goes through the Map Phase
B. it reduces the amount of data that is sent to the Reducer task nodes
C. It determines the size and distribution of data split in the map phase
D. It is performed after the reducer phase to produce the ...
20.Apache Spark provides a single, unifying platform for which three of the following
types of operations? (Three answers)
A. transaction processing
B. ACID transaction
C. machine learning
D. record locking
E. batch processing
F. graph operations
21.Which is the java class prefix for the MapReduce v1 APIs?
A. org.apache.hadoop.mapred(v1)
B. org.apache.mr
C. org.apache.hadoop.mr
D. org.apache.mapreduce(v2)
22.Which two factors in a Hadoop cluster increase performance most significantly?
A. solid state disks
B. immediate failover of failed disk
C. parallel reading of large data
D. data redundancy on management nodes
E. high-speed networking between nodes
F. large number of small data
23.Which statement accurately describes how ZooKeeper works?
A. There can be more than one leader server at a time.
B. .all servers keep a copy of shared data memory
C. clients connect to multiple servers at the same time
D. Writes to a leader server will always succeed
24.Hadoop 2 consists of which three open-source sub-projects maintained by the
Apache Software Foundation?
A. HDFS
B. Hive
C. BigSQL
D. MapReduce
E. YARN
F. CloudBreak
25.Hadoop uses which two Google technologies as its foundation?
A. Google File System
B. HBase
C. YARN
D. Ambari
E. MapReduce
26.Which component of the Apache Ambari architecture provides statistical data to
the dashboard about the performance of a Hadoop cluster?
A. Ambari Server
B. Ambari metric system
27.What are two security features Apache Ranger provides?

Authorization, auditing
28.Under the YARN/MRv2 framework, which daemon arbitrates the execution of tasks
among all the applications in the system?
ResourceManager
29.Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns?
A. Sequence files
B. Parquet
C. RCFile
D. Flat
30.Which feature makes Apache Spark much easier to use than MapReduce?
A. Suitable for transaction processing.

B. Applications run in-memory
C. Libraries that support SQL queries
D. APIs for Scala, Python, C++, and .NET
31.Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases?
A. Oozie
B. Sqoop
C. Accumulo
D. HBase
32.Which two are attributes of streaming data?
A. Simple, numeric data.
B. Data is processed in batch.
C. Requires extremely rapid processing.
D. Sent in high volume.
33.Which type of cell can be used to document and comment on a process in a
Jupyter notebook?
A. output
B. markdown
C. code
D. kernel
34.Which component of the Spark Unified Stack allows developers to intermix
structured database queries with Spark's programming language?
A. MLlib
B. Mesos
C. Java
D. Spark SQL
35.What is the default directory in HDFS where tables are stored?
A. /app/hive/warehouse/data
B. /app/hive/warehouse/bigsql
C. /app/hive/warehouse/schema
D. /app/hive/warehouse/
36.What is an advantage of the ORC file format?
A. Big SQL can exploit advanced features

B. Efficient compression
C. Supported by multiple I/O engines
D. Data interchange outside Hadoop
37.What two security functions does Apache Knox provide?

A. Management of Kerberos in the cluster.
B. Database field access auditing.
C. Proxying services.
D. API and perimeter security.
38.Which Apache Hadoop application provides an SQL-like interface to allow

abstraction of data on semi-structured data in a Hadoop datastore?
A. YARN
B. Spark
C. Hive
D. Pig
39.Which two commands would you use to give or remove certain privileges to/from a
user?
A. REVOKE
B. GRANT
C. INSERT
D. LOAD
E. SELECT
40.When connecting to an external database in a federation, you need to use the
correct database driver and protocol. What is this federation component called in Big
SQL?
wrapper
41.Using the Java SQL Shell, which command will connect to a database called
mybigdata?
./jsqsh mybigdata
42.Which Big SQL feature allows users to join a Hadoop data set to data in external
databases?
A. Integration
B. Impersonation
C. Grant/Revoke privileges
D. Fluid query
43.Which definition best describes RCAC?

A. It limits the rows or columns returned based on certain criteria.
B. It grants or revokes certain user privileges.
C. It grants or revokes certain directory privileges.
D. It limits access by using views and stored procedures.
44.You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true?
bigsql.alltables.io.doAs
bigsql.impersonation.create.table.grant.public
45.Which command creates a user-defined schema function?

CREATE FUNCTION
46.You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format?
STORED AS parquetfile
47.Which tool would you use to create a connection to your Big SQL database?
DSM
48.Which directory permissions need to be set to allow all users to create their own
schema?
777
49.How many Big SQL management nodes do you need at minimum?

A. 1
B. 4
C. 3
D. 2
50.You need to determine the permission setting for a new schema directory. Which
tool would you use?
umask
51.What are Big SQL database tables organized into?

Schemas
52.What is the first step in a data science pipeline?

acquisition
53.What command is used to list the "magic" commands in Jupyter?

%lsmagic
54.Under the MapReduce v1 programming model, which optional phase is executed

simultaneously with the Shuffle phase?
combiner
55.Which hardware feature on an Hadoop datanode is recommended for cost efficient

performance?
JBOD
56.Which statement is true about Hortonworks Data Platform (HDP)?

It is a Hadoop distribution based on a centralized architecture with YARN at its core.
57.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
58.What are two ways the command-line parameters for a Sqoop invocation can be
simplified?
Include the --options-file command line argument.
Place the commands in a file.
59.If a Hadoop node goes down, which Ambari component will notify the
Administrator?
Ambari Alert Framework
60.Which three programming languages are directly supported by Apache Spark?
(Three answers)
Scala
Python
Java
61.Which Apache Hadoop application provides a high-level programming language

for data transformation on unstructured data?
Pig
62.Which component of an Hadoop system is the primary cause of poor

performance?
disk latency
63.Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are

components of which daemon?
ResourceManager
65.What is the final agent in a Flume chain named?

Collector
66..Which component of the Hortonworks Data Platform (HDP) is the architectural

center of Hadoop and provides resource management and a central platform for
Hadoop applications?
YARN
68.What are two primary limitations of MapReduce v1?

Scalability
Resource utilization
69.Which description characterizes a function provided by Apache Ambari?

A wizard for installing Hadoop services on host servers.
70.How can a Sqoop invocation be constrained to only run one mapper?

Use the -m 1 parameter.
71.What does the split-by parameter tell Sqoop?
The column to use as the primary key
72.Which NoSQL datastore type began as an implementation of Google's BigTable

that can store any type of data and scale to many petabytes?
HBase
73.Under the MapReduce v1 programming model, which shows the proper order of
the full set of MapReduce phases?
Map -> Combine -> Shuffle -> Reduce
74.Which computing technology provides Hadoop's high performance?

Parallel Processing
75.Which statement is true about MapReduce v1 APIs?

MapReduce v1 APIs are implemented by applications which are largely independent
of the execution environment.
76.Under the YARN/MRv2 framework, the JobTracker functions are split into which
two daemons?
ResourceManager
ApplicationMaster
77.Which component of the Apache Ambari architecture integrates with an

organization's LDAP or Active Directory service?
Authorization Provider
78.Which Apache Hadoop component can potentially replace an RDBMS as a large

Hadoop datastore and is particularly good for "sparse data"?
HBase
79.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
80.What are three IBM value-add components to the Hortonworks Data Platform
(HDP)?
Big Replicate
Big Match
Big SQL
81.Which three are a part of the Five Pillars of Security?

Audit
Data Protection
Administration
82.What are two services provided by ZooKeeper?

Providing distributed synchronization.
Maintaining configuration information.
83.Under the YARN/MRv2 framework, which daemon is tasked with negotiating with
the NodeManager(s) to execute and monitor tasks?
A. TaskManager
B. ResourceManager
C. JobMaster
D. ApplicationMaster
84.Which statement best describes a Big SQL database table?

A directory with zero or more data files.
85.You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run?
A. hdfs dfs -chmod 755 /hive/warehouse

B. hdfs dfs -chmod 666 /hive/warehouse
C. hdfs dfs -chmod 770 /hive/warehouse
D. hdfs dfs -chmod 700 /hive/warehouse
86.What is an example of a Key-value type of NoSQL datastore?

REDIS
87.Under the MapReduce v1 programming model, what happens in a "Reduce" step?

Data is aggregated by worker nodes.
88.Apache Spark can run on which two of the following cluster managers? (Two
answers)
Hadoop YARN
Apache Mesos
89.What are two common issues in distributed systems? (Two answers)

Finding a particular node within the cluster.
Partial failure of the nodes during execution.
90.Apache Spark can run on which two of the following cluster managers?
A. Apache Mesos
B. oneSIS
C. Nomad
D. Linux Cluster Manager
E. Hadoop YARN
V11.2
Unit 1. Introduction to big data
Uempty
Review answers
1. True or False: the number of Vs of big data are exactly four.
2. Data that can be stored and processed in a fixed format is

called:
A. Structured
B. Semi-structured
C. Unstructured
D. Machine generated
3. True or False: Agriculture is one of the industry sectors that

are using big data and analytics to help to improve and
transform their industries.
Introduction to big data © Copyright IBM Corporation 2021
Figure 1-46. Review answers
© Copyright IBM Corp. 2016, 2021 1-60

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data
Uempty
Review answers (cont.)

4. Hadoop is good for:
A. Processing transactions (random access)
B. Massive amounts of data through parallelism
C. Processing lots of small files
D. Intensive calculations with little data
E. Low latency data access
5. True or False: One of Hadoop main characteristics is that

applications are written in low-level language code.
Introduction to big data © Copyright IBM Corporation 2021
Figure 1-47. Review answers (cont.)

V11.2
Unit 2. Introduction to Hortonworks Data Platform (HDP)
Uempty
Review answers
1. Which of thes components of HDP provides data access
capabilities?
A. MapReduce
B. Falcon
C. Ranger
D. Ambari
2. Identify the component that is a messaging system used for
real-time data pipelines
A. Nifi
B. Sqoop
C. Kafka
D. None of the following
3. True or False: Big Match is added value from IBM.
4. True or False: IBM BigIntegrate Provides data quality

features of Information Server.
Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

V11.2
Unit 2. Introduction to Hortonworks Data Platform (HDP)
Uempty
Review answers
5. IBM BigQuality provides scalable engine to
A. Manage
B. Design
C. Connect
D. Cleanse
Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

V11.2
Unit 3. Introduction to Apache Ambari
Uempty
Review answers
1. True or False: Apache Ambari is backed by RESTful APIs for
developers to easily integrate with their own applications.
2. Which functions does AMS provide?
A. Monitors the health and status of the Hadoop cluster.
B. Starts, stops, and reconfigures Hadoop services across the cluster.
C. Collects, aggregates, and serves Hadoop and system metrics.
D. Handles the configuration of Hadoop services for the cluster.
3. Which page from the Apache Ambari UI enables you to check the
versions of the software that is installed on your cluster?
A. Cluster Admin > Stack and Versions
B. admin > Service Accounts
C. Services
D. Hosts
Introduction to Apache Ambari © Copyright IBM Corporation 2021
Write your answers here:

1.
2.
3.

V11.2
Unit 3. Introduction to Apache Ambari
Uempty

4. True or False: Creating users through the Apache Ambari Web
UI also creates the user on the HDFS.
5. True or False: You can use the cURL commands to issue
commands to Apache Ambari.
Introduction to Apache Ambari © Copyright IBM Corporation 2021

1.
2.
3.

V11.2
Unit 4. Apache Hadoop and HDFS
Uempty
Review answers
1. True or False: Hadoop systems are designed for using a
single server.
2. What is the default number of replicas in a Hadoop system?
A. 1
B. 2
C. 3
D. 4
3. True or False: One of the Hadoop goals is fault tolerance by
detecting faults and applying quick and automatic recovery.
4. True or False: At least two NameNodes are required for a
stand-alone Hadoop cluster.
5. The default Hadoop block size is:
A. 16
B. 32
C. 64
D. 128
Apache Hadoop and HDFS © Copyright IBM Corporation 2021

V11.2
Unit 5. MapReduce and YARN
Uempty
Review answers
1. Which of the following phases in a MapReduce job is
optional?
A. Map
B. Shuffle
C. Reduce
D. Combiner
2. True or False: Interactive, online, and streaming
applications are not allowed to run on Hadoop v2
3. The JobTracker in MRv1 is replaced by which components
in YARN? (Select all that apply.)
A. ResourceManager
B. NodeManager
C. ApplicationMaster
D. TaskTracker
MapReduce and YARN © Copyright IBM Corporation 2021

1. D. (slide 13)
2. False (slide 48)
3. A. and C

V11.2
Unit 5. MapReduce and YARN
Uempty

4. True or False: The major change from Hadoop v1 to
Hadoop v2 is the separation of cluster and resource
management from the execution and data processing
environment.
5. True or False: It is possible to run unmodified MapReduce
v1 jobs by using the same MapReduce API and CLI in
Hadoop v2.
MapReduce and YARN © Copyright IBM Corporation 2021

4. True (slide 57)
5. True (slide 49)

V11.2
Unit 6. Introduction to Apache Spark
Uempty
Review answers
1. True or False: Ease of use is one of the benefits of using
Apache Spark.
2. Which language is supported by Apache Spark?
A. C++
B. C#
C. Java
D. Node.js
3. True or False: Scala is the primary abstraction of Apache
Spark.
4. In RDD actions, which function returns all the elements of
the data set as an array of the driver program?
A. Collect
B. Take
C. Count
D. Reduce
5. True or False: Referencing a data set is one of the methods
to create RDD.
Introduction to Apache Spark © Copyright IBM Corporation 2021

V11.2
Unit 7. Storing and querying data
Uempty
Review answers
1. What is the data representation format of an RC or ORC
file?
A. Row-based encoding
B. Record-based encoding
C. Column-based storage
D. NoSQL data store
2. True or False: A NoSQL database is designed for those
developers that do not want to use SQL.
3. HBase is an example of which of the following NoSQL data
store type?
A. Key-value store
B. Graph store
C. Column store
D. Document store
Storing and querying data © Copyright IBM Corporation 2021

1.
2.
3.

V11.2
Unit 7. Storing and querying data
Uempty
Review answers
4. Which database provides an SQL for Hadoop interface?
A. Hbase
B. Apache Hive
C. Cloudant
D. MongoDB
5. True or False: R is a real programming language, and
Python is an interactive environment for doing statistics.
Storing and querying data © Copyright IBM Corporation 2021

1.
2.
3.

V11.2
Unit 8. Security and governance
Uempty
Review answers
1. Kerberos is used by Hadoop for:
A. Authentication
B. Authorization
C. Auditing
D. Data protection
2. ______ is used by Hadoop for API and perimeter security.
A. Apache Ambari
B. Apache Knox
C. Apache Ranger
D. Data Steward Studio
3. True or False: Kerberos provides automation and
management of Apache Ambari in the Hadoop cluster.
Security and governance © Copyright IBM Corporation 2021

V11.2
Unit 8. Security and governance
Uempty

4. ______ is a common set of services to manage, secure, and
govern data assets across multiple tiers and types.
A. Data Services Catalog
B. Data Lifecycle Manager
C. DataPlane Service
D. Data Steward Studio
5. True or False: Ethnic or racial origin and cards or numbers are
types of sensitive personal information (SPI).
Security and governance © Copyright IBM Corporation 2021

V11.2
Unit 9. Stream computing
Uempty
Review answers
1. True or False: IBM Streams needs Apache Storm or Apache
Spark to provide the analytics
2. True or False: Streaming data is limited to sensors,
cameras, and video.
3. What are the differences between NiFi and MiNiFi?
A. NiFi is small and has low resource consumption.
B. NiFi is subproject of MiNiFi.
C. NiFi is a disk-based and microbatch ETL tool.
D. They are the same.
Stream computing © Copyright IBM Corporation 2021

V11.2
Unit 9. Stream computing
Uempty

4. True or False: Development support is one of the features
that IBM Streams provides as a streaming data platform.
5. True or False: IBM Streams uses a Java engine.
Stream computing © Copyright IBM Corporation 2021

V11.2
Unit 1. Using IBM Db2 Big SQL to access Apache Hadoop Distributed File System data
Uempty
Review answers
1. What are the reasons that your organization would use Db2
Big SQL?
A. Want to access your Hadoop data without using MapReduce.
B. It has superior SQL-on-Hadoop performance to optimize data
ingestion and query performance for your enterprise.
C. No deep learning curve because it is compliant with industry
standards for SQL.
D. No need to change the format or migrate the data out of
Hadoop to do any work on the data.
E. All the above.
2. True or False: For optimal performance and function, the
ORC file format is recommended for Db2 Big SQL.
3. True or False: You can access Db2 Big SQL through Db2
Big SQL console only.
Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data

V11.2
Unit 1. Using IBM Db2 Big SQL to access Apache Hadoop Distributed File System data
Uempty

4. What is the feature that Db2 Big SQL uses to query data
efficiently on Hadoop and combine data that is spread
across different enterprise data warehouses?
A. Self-tuning memory manager (STMM).
B. Query optimizer.
C. Federation capability.
D. Workload manager (WLM).
5. Which of the following items is responsible for
communicating with the Hive metastore to gather metadata,
and to bridge the RDBMS domain and the Hadoop domain?
A. Db2 Big SQL head.
B. SQL engine's query optimizer.
C. Db2 Big SQL worker.
D. Db2 Big SQL scheduler.
Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data

V11.2
Unit 2. Creating IBM Db2 Big SQL schemas and tables
Uempty
Review answers
1. True or False: The INSERT operation is the recommended
method for getting data into your Db2 Big SQL table for best
performance.
2. What does the EXTERNAL keyword indicate, and when it is

added to the CREATE TABLE statement?
A. The table is not managed by the database manager.
B. When the table is dropped, the definition of that table is
removed, but the data remains untouched.
C. It specifies the directory within the DFS to store the data files.
D. A and B.
3. True or False: Using the default STRING data type causes

performance degradation.
Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021

V11.2
Unit 2. Creating IBM Db2 Big SQL schemas and tables
Uempty

4. Which of the flowing data types does Db2 Big SQL support?
A. TINYINT.
B. CHAR.
C. TIMESTAMP.
D. ARRAY.
E. All the above.
5. True or False: The PARTITIONED BY clause creates a

directory in the DFS that contains data for each unique value
in the column that is being partitioned, which enhances
performance for the query's search criteria.
Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021

V11.2
Unit 3. File formats and querying IBM Db2 Big SQL tables
Uempty
Review answers
1. Which file format is supported by Db2 Big SQL?
A. Text
B. Sequence
C. Parquet
D. ORC
E. Avro
F. All the above
2. Which file format provides excellent performance when used

with Db2 Big SQL?
A. Parquet
B. Avro
C. ORC
D. A and B
E. A and C
File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021

V11.2
Unit 3. File formats and querying IBM Db2 Big SQL tables
Uempty

3. True or False: Parquet relies on schemas for communicating
the structure of the data.
4. Which file format maintains extra metadata to recognize

record boundaries?
A. Text
B. Sequence
C. Parquet
D. ORC
E. Avro.
5. True or False: The ORC format uses type-specific encoders

for each column and divides the file into large stripes.
.
File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021

V11.2
Unit 4. Configuring IBM Db2 Big SQL security
Uempty
Review answers
1. You can control authorization of Db2 Big SQL objects on
which of the following levels?
A. With authorization in the distributed file system.
B. Authorization with the GRANT command.
C. Authorization at the row and column levels.
D. By using VIEWS or STORED PROCEDURES
E. All the above.
2. True or False: You can Kerberize the cluster only after

installing Db2 Big SQL.
3. True or False: RCAC controls access to a table at the row

level, column level, or both.
Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021

V11.2
Unit 4. Configuring IBM Db2 Big SQL security
Uempty

4. Which of the following items is true about Apache Ranger?
A. You must first install Apache Ranger, and then you can enable
the Db2 Big SQL Apache Ranger plug-in
B. It is a framework to enable, monitor, and manage
comprehensive data security across the Hadoop platform.
C. It can audit all access to Db2 Big SQL tables, views, and
nicknames.
D. All the above.
5. True or False: UMASK is used to determine the permissions

settings for new directories and files that are created.
Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021

V11.2
Unit 5. Data federation with IBM Db2 Big SQL
Uempty
Review answers
1. What are the characteristics of a federated system?
A. Transparent
B. Extensible
C. Autonomous
D. High performance
E. High function
F. All the above
2. Which of the following products are supported data sources?

A. Db2
B. Oracle
C. Teradata
D. MS SQL Server
E. All the above
3. True or False: User mappings are used to authenticate to

the remote data source.
Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021

V11.2
Unit 5. Data federation with IBM Db2 Big SQL
Uempty

4. True or False: The federated database manages the
federated system and often sends parts of the requests it
receives to the data sources for processing.
5. Which of the following items is responsible for defining the

data sources to the federated database, besides defining the
property and values of the connection?
A. Wrappers
B. Server definitions
C. User mappings
D. Nicknames
6. True or False: Nicknames can be used for wrappers and

servers.
Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021

V11.2
Unit 1. Introduction to IBM Watson Studio
Uempty
Review answers
1. True or False: Watson Studio is designed only for the data
scientist persona.
2. True or False: Object storage provides the space where

unstructured data for your project is stored.
3. True or False: Collaborators can be given different access

levels.
4. True or False: Collaboration within Watson Studio is an

optional add-on component that must be purchased.
5. True or False: Watson Studio Gallery provides access to

sample Notebooks and data sets that you can use.
Introduction to IBM Watson Studio © Copyright IBM Corporation 2021
1. True or false: Watson Studio is designed only for the data scientist persona.
False: The data engineer, the data scientist, the business analyst, and the app developer can
use Watson Studio. Each persona has primary tools that help them do their job. For example,
the data scientist's main tool is Watson Studio, but other personas can use it as it relates to their
jobs. Also, there might be a team of different personas. Whatever the case is, you must decide
what tool is correct for the job regardless of the personas. The definitions of personas can vary
between different companies and evolve over time.
2. True or False: Object storage provides the space where unstructured data for your project is
stored.
3. True or False: Collaborators can be given certain access levels.
True: Add collaborators to your project by using their email addresses. If they have an existing
account on IBM Cloud, they are added immediately. Otherwise, they receive an invite to create
a Watson Studio account. Choose the permissions for each collaborator. The Admin role can
control project assets, collaborators, and settings. The Editor role can control project assets.
The Viewer role can view the project. Collaborators can be removed from a project or have their
permissions updated.
4. True or False: Collaboration within Watson Studio is an optional add-on component that must
be purchased.
False: Collaboration is a standard feature in Watson Studio.

V11.2
Unit 2. Analyzing data with IBM Watson Studio
Uempty
Review answers
1. Which of the following statements are true about Apache Spark
environments in Watson Studio?
A. Apache Spark environments can be accessed by using the S3
API.
B. Apache Spark environments define the hardware and software
configurations to start custom Apache Spark clusters on demand.
C. Apache Spark environments provide the storage that is required
by Watson Studio projects.
D. Apache Spark environments in Watson Studio are used by tools
such as Notebooks
E. Users must always create their own Apache Spark environment
definition.
F. Watson Studio provides default Apache Spark environment
definitions to get started quickly.
2. True or False: Multiple Notebooks cannot be associated with
the same Apache Spark environment definition.
3. True or False: /tmp is not a shared file system and cannot be
accessed by Apache Spark executors.
Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021

1. B, D, and F
2. False (see slide 8)
3. True (see slide 10)

V11.2
Unit 2. Analyzing data with IBM Watson Studio
Uempty

4. True or False: IBM Cloud Block Storage can be associated
with a Watson Studio project.
5. ____ are used to organize the data in Cloud Object Storage.
A. Objects
B. Device sets
C. Buckets
D. Storage pools
E. IOPS.
Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021

4. False. Currently, the only choice for project storage is IBM Cloud Object Storage.
5. C.

02/02/2023 23:17 ChatGPT Conversation
ChatGPT Conversation
Time: 2023-02-02T22:15:20.570Z
Generated by ChatGPT Exporter
blockchain :pptx The response must be short.
I'm sorry, I am a text-based AI and am unable to create or provide PowerPoint

presentations. However, I can provide information on blockchain technology if you
have any specific questions.
Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.
c. Requires extremely rapid processing. d. Sent in high volume.
What are two primary limitations of MapReduce v1? a. Number of TaskTrackers

limited to 1,000 b. Resource utilization c. Workloads limited to MapReduce d.
Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs
d. Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs.
Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System
d. Ambari Metrics System.
Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL
d. Spark SQL.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 1/10
Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are

components of which daemon? a. ResourceManager b. ApplicationMaster c.
ScheduleManager d. TaskManager
a. ResourceManager.
Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns? a. SequenceFiles b. RCFile c.
Parquet d. Flat
a. SequenceFiles.
Hadoop uses which two Google technologies as its foundation? a. HBase b. Google
File System c. YARN d. MapReduce e. Ambari
d. MapReduce b. Google File System.
What are two security features Apache Ranger provides? a. Authentication b.

Availability c. Authorization d. Auditing
a. Authentication c. Authorization.
Which component of the HDFS architecture regulates client access to files? a.

DataNode b. NameNode c. SlaveNode d. WorkerNode
b. NameNode.
Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety
c. Veracity.
Which statement is true about MapReduce v1 APIs? a. MapReduce v1 APIs provide

a flexible execution environment to run MapReduce. b. MapReduce v1 APIs define
how MapReduce jobs are executed. c. MapReduce v1 APIs cannot be used with
YARN. d. MapReduce v1 APIs are implemented by applications which are largely
independent of the execution environment.
b. MapReduce v1 APIs define how MapReduce jobs are executed.
Which statement is true about the Combiner phase of the MapReduce architecture?
a. It is performed after the Reducer phase to produce the final output. b. It reduces
the amount of data that is sent to the Reducer task nodes. c. It aggregates all input
data before it goes through the Map phase. d. It determines the size and distribution
of data split in the Map phase.
b. It reduces the amount of data that is sent to the Reducer task nodes.
Which description characterizes a function provided by Apache Ambari? a. Moves

large amounts of streaming event data. b. Moves information to/from structured
databases. c. A wizard for installing Hadoop services on host servers. d. A messaging
system for real-time data pipelines.
c. A wizard for installing Hadoop services on host servers.
Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.
ResourceManager c. TaskManager d. JobMaster
a. ApplicationMaster.
What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML
b. Unstructured data.
Which Spark Core function provides the main element of Spark API? a. MLlib b.
Mesos c. YARN d. RDD
d. RDD (Resilient Distributed Datasets).
Hadoop 2 consists of which three open-source sub-projects maintained by the

Apache Software Foundation? a. Big SQL b. Hive c. YARN d. MapReduce e.
Cloudbreak f. HDFS
Hadoop 2 consists of three open-source sub-projects maintained by the Apache

Software Foundation: HDFS, YARN, and MapReduce.
Which Apache Hadoop application provides an SQL-like interface to allow abstraction

of data on semi-structured data in a Hadoop datastore? a. YARN b. Pig c. Hive d.
Spark
c. Hive
Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager
Under the YARN/MRv2 framework, the JobTracker functions are split into two
daemons: the ResourceManager and the ApplicationMaster.
Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce
a. Map -> Combine -> Shuffle -> Reduce
Which is the java class prefix for the MapReduce v1 APIs? a.

org.apache.hadoop.mapred b. org.apache.mapreduce c. org.apache.hadoop.mr d.
org.apache.mr
a. org.apache.hadoop.mapred
Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN
Apache Spark can run on the following two cluster managers:
a. Apache Mesos b. Hadoop YARN
Which Apache Hadoop application provides a high-level programming language for

data transformation on unstructured data? a. Sqoop b. Hive c. Pig d. Zookeeper
c. Pig
Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing
Apache Spark provides a single, unifying platform for the following three types of
operations:
a. Batch processing b. Machine learning c. Graph operations

What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.
a. Management of Kerberos in the cluster. c. API and perimeter security.
Under the MapReduce v1 programming model, which optional phase is executed

simultaneously with the Shuffle phase? a. Combiner b. Map c. Reduce d. Split
a. Combiner
Which component of the Apache Ambari architecture integrates with an organization's

LDAP or Active Directory service? a. Postgres RDBMS b. Ambari Alert Framework c.
REST API d. Authorization Provider
d. Authorization Provider
Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety
e. Variety
What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN
a. Db2 Big SQL
b. Big Replicate
c. Big Index
Which Apache Hadoop component can potentially replace an RDBMS as a large

Hadoop datastore and is particularly good for "sparse data"? a. Spark b. HBase c.
MapReduce d. Ambari
b. HBase
Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN
d. YARN
Which statement describes an example of an application using streaming data? a. A

system that stores many records in a database. b. One time export and import of a
database. c. A web application that supports 10,000 users. d. An application
evaluating sensor data in real-time.
d. An application evaluating sensor data in real-time.
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase
c. Sqoop
Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit
The three components that are a part of the Five Pillars of Security are:
1. Data Protection
2. Administration
3. Audit
What is an example of a NoSQL datastore of the "Document Store" type? a. HBase b.

MongoDB c. Cassandra d. REDIS
b. MongoDB
You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT
You would use the "hadoop fs -chmod" command in HDFS to determine the
permission setting for a new schema directory in Hadoop.
choose only one correct answer without explaining You need to determine the
permission setting for a new schema directory. Which tool would you use? a. umask
b. HDFS c. Kerberos d. GRANT
b. HDFS
Which definition best describes RCAC? a. It grants or revokes certain directory

privileges. b. It grants or revokes certain user privileges. c. It limits the rows or
columns returned based on certain criteria. d. It limits access by using views and
stored procedures.
c. It limits the rows or columns returned based on certain criteria.
Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables
The correct command is `./jsqsh mybigdata`.
You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true? a. DB2_ATS_ENABLE b.
bigsql.impersonation.create.table.grant.public c. DB2COMPOPT d.
bigsql.alltables.io.doAs e. $BIGSQL_HOME/conf
d. bigsql.alltables.io.doAs e. bigsql.impersonation.enabled
Which directory permissions need to be set to allow all users to create their own
schema? a. 700 b. 755 c. 777 d. 666
c. 777
When connecting to an external database in a federation, you need to use the correct
database driver and protocol. What is this federation component called in Db2 Big
SQL? a. Data source b. Wrapper c. Nickname d. User mapping
a. Data source
What is the default directory in HDFS where tables are stored? a.

/apps/hive/warehouse/ b. /apps/hive/warehouse/bigsql c.
/apps/hive/warehouse/schema d. /apps/hive/warehouse/data
The default directory in HDFS where tables are stored is "/apps/hive/warehouse/".
Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.
d. A container for any record format.
You have a distributed file system (DFS) and need to set permissions on the the
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse
a. hdfs dfs -chmod 700 /hive/warehouse
What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas
Db2 Big SQL database tables are organized into schemas.
Time: 2023-02-02T22:17:37.412Z
Which statement is true about Hortonworks Data Platform (HDP)? a. It is a powerful

platform for managing large volumes of structured data. b. It is engineered and
developed by IBM's BigInsights team. c. It is a Hadoop distribution based on a
centralized architecture with YARN at its core. d. It is designed specifically for IBM Big
Data customers.
c. It is a Hadoop distribution based on a centralized architecture with YARN at its

core.
Time: 2023-02-02T21:38:12.326Z
Which three programming languages are directly supported by Apache Spark? a.

Scala b. C# c. Python d. C++ e. .NET f. Java
a. Scala b. Python c. Java
Which Apache Hadoop component can potentially replace an RDBMS as a large

Hadoop datastore and is particularly good for "sparse data"? a. Spark b. HBase c.
MapReduce d. Ambari
b. HBase
Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL
d. Spark SQL
Which feature makes Apache Spark much easier to use than MapReduce? a.
Applications run in-memory. b. APIs for Scala, Python, C++, and .NET. c. Libraries
that support SQL queries. d. Suitable for transaction processing.
a. Applications run in-memory.
Which component of the HDFS architecture regulates client access to files? a.

DataNode b. NameNode c. SlaveNode d. WorkerNode
b. NameNode
What is an example of a NoSQL datastore of the "Document Store" type? a. HBase b.

MongoDB c. Cassandra d. REDIS
b. MongoDB
What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN
a. Db2 Big SQL b. Big Replicate c. Big Index
Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager
a. ResourceManager b. ApplicationMaster
Which component of an Hadoop system is the primary cause of poor performance? a.

CPU b. Network c. RAM d. Disk latency
d. Disk latency
Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are

components of which daemon? a. ResourceManager b. ApplicationMaster c.
ScheduleManager d. TaskManager
a. ResourceManager
Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.
ResourceManager c. TaskManager d. JobMaster
a. ApplicationMaster
Which NoSQL datastore type began as an implementation of Google's BigTable that

can store any type of data and scale to many petabytes? a. CouchDB b. HBase c.
MemcacheD d. Riak
b. HBase
Hadoop 2 consists of which three open-source sub-projects maintained by the

Apache Software Foundation? a. Big SQL b. Hive c. YARN d. MapReduce e.
Cloudbreak f. HDFS
c. YARN d. MapReduce f. HDFS
If a Hadoop node goes down, which Ambari component will notify the Administrator?
a. REST API b. Ambari Metrics System c. Ambari Alert Framework d. Ambari Wizard
c. Ambari Alert Framework
Which Apache Hadoop application provides a high-level programming language for

data transformation on unstructured data? a. Sqoop b. Hive c. Pig d. Zookeeper
c. Pig
Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing
c. Machine learning d. Graph operations f. Batch processing
Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN
b. Apache Mesos e. Hadoop YARN
Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit
The three that are a part of the Five Pillars of Security are:
Data Protection
Administration
Audit
What are two primary limitations of MapReduce v1? a. Number of TaskTrackers

limited to 1,000 b. Resource utilization c. Workloads limited to MapReduce d.
Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs
d. Scalability e. TaskTrackers can be a bottleneck to MapReduce jobs
Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety
c. Veracity
Which statement is true about MapReduce v1 APIs? a. MapReduce v1 APIs provide

a flexible execution environment to run MapReduce. b. MapReduce v1 APIs define
how MapReduce jobs are executed. c. MapReduce v1 APIs cannot be used with
YARN. d. MapReduce v1 APIs are implemented by applications which are largely
independent of the execution environment.
b. MapReduce v1 APIs define how MapReduce jobs are executed.
Under the MapReduce v1 programming model, what happens in a "Reduce" step? a.

Data is aggregated by worker nodes. b. Input is split into pieces. c. Worker nodes
process pieces in parallel. d. Worker nodes store results on their own local file
systems.
In the "Reduce" step of the MapReduce v1 programming model, data is aggregated

by worker nodes. The output from the "Map" step, which is usually a large number of
intermediate key-value pairs, is shuffled, sorted, and then reduced into a smaller set
of values. The "Reduce" step takes the intermediate key-value pairs and combines
the values associated with the same key, aggregating the data into a smaller set of
values.
What are two security features Apache Ranger provides? a. Authentication b.

Availability c. Authorization d. Auditing
Apache Ranger provides two security features:
a. Authentication: It enables secure authentication and login for users and

applications to access Hadoop data and metadata.
c. Authorization: It provides fine-grained access control to data and metadata stored

in Hadoop by enforcing defined policies based on user, role, and data access
permissions.
Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System
d. Ambari Metrics System
What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce? a. Python b. Spark c. Pig d.
Hive
b. Spark
Which statement is true about Hortonworks Data Platform (HDP)? a. It is a powerful

platform for managing large volumes of structured data. b. It is engineered and
developed by IBM's BigInsights team. c. It is a Hadoop distribution based on a
centralized architecture with YARN at its core. d. It is designed specifically for IBM Big
Data customers.
c. It is a Hadoop distribution based on a centralized architecture with YARN at its

core.
Which component of the Apache Ambari architecture integrates with an organization's

LDAP or Active Directory service? a. Postgres RDBMS b. Ambari Alert Framework c.
REST API d. Authorization Provider
d. Authorization Provider
What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.
Apache Knox provides two security functions: API and perimeter security, and
proxying services.
Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN
d. YARN
Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce
a. Map -> Combine -> Shuffle -> Reduce
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase
c. Sqoop
Which statement describes an example of an application using streaming data? a. A

system that stores many records in a database. b. One time export and import of a
database. c. A web application that supports 10,000 users. d. An application
evaluating sensor data in real-time.
d. An application evaluating sensor data in real-time.
Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.
c. Requires extremely rapid processing.
d. Sent in high volume.
Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety
e. Variety
Which computing technology provides Hadoop's high performance? a. Online

Analytical Processing b. Parallel Processing c. RAID-0 d. Online Transactional
Processing
Parallel Processing provides Hadoop's high performance.
What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML
Unstructured data
Which definition best describes RCAC? a. It grants or revokes certain directory

privileges. b. It grants or revokes certain user privileges. c. It limits the rows or
columns returned based on certain criteria. d. It limits access by using views and
stored procedures.
c. It limits the rows or columns returned based on certain criteria.
You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT
d. GRANT
What is an advantage of the ORC file format? a. Db2 Big SQL can exploit advanced
features b. Data interchange outside Hadoop c. Supported by multiple I/O engines d.
Efficient compression
d. Efficient compression
Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.
D. A container for any record format.
Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables
b. ./jsqsh go mybigdata
What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas
Db2 Big SQL database tables are organized into schemas.
You are creating a new table and need to specify a directory within the DFS to store
its data files. Which keyword would you use? a. EXTERNAL b. LOCATION c.
PARTITIONED BY d. HADOOP
b. LOCATION
You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format? a. STORED AS parquetfile b.
STORED AS parquet c. CREATE AS parquetfile d. CREATE AS parquet
b. STORED AS parquet
Which statement is NOT true about Db2 Big SQL? a. It supports a variety of storage
formats. b. It supports comprehensive and standard SQL. c. It provides powerful
optimization and performance. d. It uses MapReduce to access your Hadoop data.
d. It uses MapReduce to access your Hadoop data.
Db2 Big SQL does not use MapReduce to access Hadoop data, it uses its own
optimized parallel processing engine.
Which tool would you use to create a connection to your Db2 Big SQL database? a.
Jupyter b. Ambari c. Scheduler d. Db2 Big SQL console
d. Db2 Big SQL console
Which of the following operations is not parallelized and recommended to be used

just for testing simple and quick operations? a. INSERT INTO … SELECT FROM b.
INSERT INTO … VALUES (…) c. CREATE TABLE … AS SELECT … d. LOAD
b. INSERT INTO … VALUES (…)
Which Db2 Big SQL feature allows users to join a Hadoop data set to data in external
databases? a. Impersonation b. Fluid query c. Grant/Revoke privileges d. Integration
d. Integration
You have a distributed file system (DFS) and need to set permissions on the the
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse
a. hdfs dfs -chmod 700 /hive/warehouse
Which two commands would you use to give or remove certain privileges to/from a
user? a. INSERT b. REVOKE c. GRANT d. SELECT e. LOAD
c. GRANT b. REVOKE
Which type of cell can be used to document and comment on a process in a Jupyter
notebook? a. Kernel b. Code c. Markdown d. Output
c. Markdown
What is the architecture of Watson Studio centered on? a. Projects b. Collaborators c.

Data Assets d. Analytic Assets
a. Projects
Which of the following products is a cloud-native solution that enables you to put your
data to work quickly and efficiently? It can run anywhere, like on Red Hat OpenShift
cluster, in the cloud, or on-premises. a. Hadoop b. Cloud Pak for Data c. Big SQL d.
Spark
b. Cloud Pak for Data
What is the field of study that gives computers the ability to learn without being
explicitly programmed? a. Yarn b. SQL c. Hadoop d. Machine Learning
d. Machine Learning
In data science and the AI domain, which of the following products help you with data
preparation in a fully managed IBM Cloud environment? a. Watson Studio on IBM
Cloud Pak for Data b. Watson Studio Cloud (IBM Cloud) c. Watson Studio Local d.
Watson Studio Desktop e. Watson Studio professional
b. Watson Studio Cloud (IBM Cloud)
Which of the following products do you use to organize your resources; work with
data; and add collaborators, data assets, and tools to derive insights from data? a.
Pig b. Big SQL project c. Watson Studio Project d. Cloud Object Storage
c. Watson Studio Project
Which machine learning approach detects patterns and relationships between data
without using labeled data? a. Supervised Learning b. Unsupervised Learning c.
Semi-supervised Learning d. Reinforcement Learning
b. Unsupervised Learning
Which of the following products provides integration with GitHub and an interface for
downloading Notebooks? a. Flume b. Pig c. Watson Studio d. Hive
c. Watson Studio provides integration with GitHub and an interface for downloading
Notebooks.
Which of the following products can be deployed in the private or public cloud of your
choice, including IBM Cloud™, AWS, Microsoft Azure, or Google Cloud? a. Watson
Studio Cloud (IBM Cloud) b. Watson Studio on IBM Cloud Pak for Data c. Watson
Studio Local d. Watson Studio Desktop
b. Watson Studio on IBM Cloud Pak for Data

Big Datafffff

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Datafffff

Uploaded by

Copyright:

Available Formats

1.

27.What are two security features Apache Ranger provides?

A. Suitable for transaction processing.

35.What is the default directory in HDFS where tables are stored?

36.What is an advantage of the ORC file format?

A. Big SQL can exploit advanced features

37.What two security functions does Apache Knox provide?

38.Which Apache Hadoop application provides an SQL-like interface to allow

43.Which definition best describes RCAC?

45.Which command creates a user-defined schema function?

49.How many Big SQL management nodes do you need at minimum?

51.What are Big SQL database tables organized into?

52.What is the first step in a data science pipeline?

53.What command is used to list the "magic" commands in Jupyter?

54.Under the MapReduce v1 programming model, which optional phase is executed

55.Which hardware feature on an Hadoop datanode is recommended for cost efficient

56.Which statement is true about Hortonworks Data Platform (HDP)?

61.Which Apache Hadoop application provides a high-level programming language

62.Which component of an Hadoop system is the primary cause of poor

63.Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are

65.What is the final agent in a Flume chain named?

66..Which component of the Hortonworks Data Platform (HDP) is the architectural

68.What are two primary limitations of MapReduce v1?

69.Which description characterizes a function provided by Apache Ambari?

70.How can a Sqoop invocation be constrained to only run one mapper?

72.Which NoSQL datastore type began as an implementation of Google's BigTable

74.Which computing technology provides Hadoop's high performance?

75.Which statement is true about MapReduce v1 APIs?

77.Which component of the Apache Ambari architecture integrates with an

78.Which Apache Hadoop component can potentially replace an RDBMS as a large

81.Which three are a part of the Five Pillars of Security?

82.What are two services provided by ZooKeeper?

84.Which statement best describes a Big SQL database table?

A. hdfs dfs -chmod 755 /hive/warehouse

86.What is an example of a Key-value type of NoSQL datastore?

87.Under the MapReduce v1 programming model, what happens in a "Reduce" step?

89.What are two common issues in distributed systems? (Two answers)

2. Data that can be stored and processed in a fixed format is

3. True or False: Agriculture is one of the industry sectors that

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-46. Review answers

© Copyright IBM Corp. 2016, 2021 1-60

Review answers (cont.)

5. True or False: One of Hadoop main characteristics is that

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-47. Review answers (cont.)

© Copyright IBM Corp. 2016, 2021 1-61

4. True or False: IBM BigIntegrate Provides data quality

Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

Figure 2-57. Review answers

© Copyright IBM Corp. 2016, 2021 2-68

Introduction to Hortonworks Data Platform (HDP) © Copyright IBM Corporation 2021

Figure 2-58. Review answers

© Copyright IBM Corp. 2016, 2021 2-69

Introduction to Apache Ambari © Copyright IBM Corporation 2021

Figure 3-30. Review answers

Write your answers here:

© Copyright IBM Corp. 2016, 2021 3-37

Review answers (cont.)

Introduction to Apache Ambari © Copyright IBM Corporation 2021

Figure 3-31. Review answers (cont.)

Write your answers here:

© Copyright IBM Corp. 2016, 2021 3-38