Professional Documents
Culture Documents
Big Datafffff
Big Datafffff
Before you create a Jupyter notebook in Watson Studio, which two items are
necessary? (Two answers)
A. Project
B. file
C. scala
D. url
E. spark instance
2.What is the architecture of Watson Studio centered on?
A. Collaborator
B. Projects
C. Analytic assets
D. data assets
3.Which Watson Studio offering used to be available through something known as
IBM Bluemix?
A. Watson Studio Cloud
B. // // Local
C. // // Desktop
D. // // Business
4.You need to add a collaborator to your project. What do you need?
A. The email of the collaborator
B. The list of deployments
C. list of your saved bookmarks
D. your project ID
5.Where does the unstructured data of a project reside in Watson Studio?
A. Wrapper
B. Tables
C. Database
D. Object Storage
6.What does the user interface for Jupyter look like to a user?
A. Common desktop app
B. Database Interface
C. App in web browser
D. Linux SSH session
7.What is a “magic” command used for in Jupyter?
A. Running common statistical analyses.
B. Autoconfig
C. extending the core language with shortcuts
D. Parsing and loading data into a notebook.
9.Why might a data scientist need a particular kind of GPU (graphics processing
unit)?
A. To display a simple bar chart of data on the screen
B. to collect video for use in streaming data applications
C. to perform certain data transformations quickly
D. to input commands to a data science notebook
10.What is a markdown cell used for in a data science notebook?
A. Documenting the computational process
B. writing code to transform data
C. holding the output of a computation
D. configuring data connections
11.Which is an advantage that Zeppelin holds over Jupyter?
A. Users must auth
B. Notebooks can be used by multiple ppl at the same time
C. notebooks can be connected to big data engines such as spark
D. Zeppelin is able to use the R language.
12.Which statement about Apache Spark is true?
A. it supports HDFS, MS-SQL, and Oracle
B. it is much faster than MapReduce for complex applications on disk
C. It runs on hadoop clusters with ram drives configured on each DataNode
D. It features API’S for C++ and .NET
13.Which two are valid watches for ZNodes in ZooKeeper?
A. NodeExpired.
B. NodeDeleted.
C. NodeRefreshed
D. NodeChildrenChanged
14.Which component of an Hadoop system is the primary cause of poor
performance?
A. disk latency
B. network
C. CPU
D. Ram
15.What is an example of a NoSQL datastore of the “Document Store” type?
A. Cassandra(column stores)
B. REDIS(key value)
C. HBase(column store)
D. MongoDB(document store)
16.What is the preferred replacement for Flume?
A. Hortonworks Data Flow
B. Storm
C. NiFi
D. Druid
17.Which statement describes an example of an application using streaming data?
A. An application evaluating sensor data in real-time
B. one time export and import of a db
C. a web application that supports 10,000 users
D. a system that stores many records in a database
18.Which Spark Core function provides the main element of Spark API?
A. MLlib
B. RDD
C. Mesos
D. YARN
19.Which statement is true about the combiner phase of the MapReduce architecture?
A. it aggregates all input data before it goes through the Map Phase
B. it reduces the amount of data that is sent to the Reducer task nodes
C. It determines the size and distribution of data split in the map phase
D. It is performed after the reducer phase to produce the ...
20.Apache Spark provides a single, unifying platform for which three of the following
types of operations? (Three answers)
A. transaction processing
B. ACID transaction
C. machine learning
D. record locking
E. batch processing
F. graph operations
21.Which is the java class prefix for the MapReduce v1 APIs?
A. org.apache.hadoop.mapred(v1)
B. org.apache.mr
C. org.apache.hadoop.mr
D. org.apache.mapreduce(v2)
22.Which two factors in a Hadoop cluster increase performance most significantly?
A. solid state disks
B. immediate failover of failed disk
C. parallel reading of large data
D. data redundancy on management nodes
E. high-speed networking between nodes
F. large number of small data
23.Which statement accurately describes how ZooKeeper works?
A. There can be more than one leader server at a time.
B. .all servers keep a copy of shared data memory
C. clients connect to multiple servers at the same time
D. Writes to a leader server will always succeed
24.Hadoop 2 consists of which three open-source sub-projects maintained by the
Apache Software Foundation?
A. HDFS
B. Hive
C. BigSQL
D. MapReduce
E. YARN
F. CloudBreak
25.Hadoop uses which two Google technologies as its foundation?
A. Google File System
B. HBase
C. YARN
D. Ambari
E. MapReduce
26.Which component of the Apache Ambari architecture provides statistical data to
the dashboard about the performance of a Hadoop cluster?
A. Ambari Server
B. Ambari metric system
28.Under the YARN/MRv2 framework, which daemon arbitrates the execution of tasks
among all the applications in the system?
ResourceManager
29.Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns?
A. Sequence files
B. Parquet
C. RCFile
D. Flat
30.Which feature makes Apache Spark much easier to use than MapReduce?
31.Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases?
A. Oozie
B. Sqoop
C. Accumulo
D. HBase
32.Which two are attributes of streaming data?
A. Simple, numeric data.
B. Data is processed in batch.
C. Requires extremely rapid processing.
D. Sent in high volume.
33.Which type of cell can be used to document and comment on a process in a
Jupyter notebook?
A. output
B. markdown
C. code
D. kernel
34.Which component of the Spark Unified Stack allows developers to intermix
structured database queries with Spark's programming language?
A. MLlib
B. Mesos
C. Java
D. Spark SQL
A. /app/hive/warehouse/data
B. /app/hive/warehouse/bigsql
C. /app/hive/warehouse/schema
D. /app/hive/warehouse/
A. YARN
B. Spark
C. Hive
D. Pig
39.Which two commands would you use to give or remove certain privileges to/from a
user?
A. REVOKE
B. GRANT
C. INSERT
D. LOAD
E. SELECT
40.When connecting to an external database in a federation, you need to use the
correct database driver and protocol. What is this federation component called in Big
SQL?
wrapper
41.Using the Java SQL Shell, which command will connect to a database called
mybigdata?
./jsqsh mybigdata
42.Which Big SQL feature allows users to join a Hadoop data set to data in external
databases?
A. Integration
B. Impersonation
C. Grant/Revoke privileges
D. Fluid query
44.You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true?
bigsql.alltables.io.doAs
bigsql.impersonation.create.table.grant.public
46.You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format?
STORED AS parquetfile
47.Which tool would you use to create a connection to your Big SQL database?
DSM
48.Which directory permissions need to be set to allow all users to create their own
schema?
777
50.You need to determine the permission setting for a new schema directory. Which
tool would you use?
umask
57.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
58.What are two ways the command-line parameters for a Sqoop invocation can be
simplified?
Include the --options-file command line argument.
Place the commands in a file.
59.If a Hadoop node goes down, which Ambari component will notify the
Administrator?
Ambari Alert Framework
60.Which three programming languages are directly supported by Apache Spark?
(Three answers)
Scala
Python
Java
73.Under the MapReduce v1 programming model, which shows the proper order of
the full set of MapReduce phases?
Map -> Combine -> Shuffle -> Reduce
76.Under the YARN/MRv2 framework, the JobTracker functions are split into which
two daemons?
ResourceManager
ApplicationMaster
79.What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce?
Spark
80.What are three IBM value-add components to the Hortonworks Data Platform
(HDP)?
Big Replicate
Big Match
Big SQL
83.Under the YARN/MRv2 framework, which daemon is tasked with negotiating with
the NodeManager(s) to execute and monitor tasks?
A. TaskManager
B. ResourceManager
C. JobMaster
D. ApplicationMaster
85.You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run?
90.Apache Spark can run on which two of the following cluster managers?
A. Apache Mesos
B. oneSIS
C. Nomad
D. Linux Cluster Manager
E. Hadoop YARN
V11.2
Unit 1. Introduction to big data
Uempty
Review answers
1. True or False: the number of Vs of big data are exactly four.
Uempty
Uempty
Review answers
1. Which of thes components of HDP provides data access
capabilities?
A. MapReduce
B. Falcon
C. Ranger
D. Ambari
2. Identify the component that is a messaging system used for
real-time data pipelines
A. Nifi
B. Sqoop
C. Kafka
D. None of the following
3. True or False: Big Match is added value from IBM.
Uempty
Review answers
5. IBM BigQuality provides scalable engine to
A. Manage
B. Design
C. Connect
D. Cleanse
Uempty
Review answers
1. True or False: Apache Ambari is backed by RESTful APIs for
developers to easily integrate with their own applications.
2. Which functions does AMS provide?
A. Monitors the health and status of the Hadoop cluster.
B. Starts, stops, and reconfigures Hadoop services across the cluster.
C. Collects, aggregates, and serves Hadoop and system metrics.
D. Handles the configuration of Hadoop services for the cluster.
3. Which page from the Apache Ambari UI enables you to check the
versions of the software that is installed on your cluster?
A. Cluster Admin > Stack and Versions
B. admin > Service Accounts
C. Services
D. Hosts
Uempty
Uempty
Review answers
1. True or False: Hadoop systems are designed for using a
single server.
2. What is the default number of replicas in a Hadoop system?
A. 1
B. 2
C. 3
D. 4
3. True or False: One of the Hadoop goals is fault tolerance by
detecting faults and applying quick and automatic recovery.
4. True or False: At least two NameNodes are required for a
stand-alone Hadoop cluster.
5. The default Hadoop block size is:
A. 16
B. 32
C. 64
D. 128
Uempty
Review answers
1. Which of the following phases in a MapReduce job is
optional?
A. Map
B. Shuffle
C. Reduce
D. Combiner
2. True or False: Interactive, online, and streaming
applications are not allowed to run on Hadoop v2
3. The JobTracker in MRv1 is replaced by which components
in YARN? (Select all that apply.)
A. ResourceManager
B. NodeManager
C. ApplicationMaster
D. TaskTracker
Uempty
Uempty
Review answers
1. True or False: Ease of use is one of the benefits of using
Apache Spark.
2. Which language is supported by Apache Spark?
A. C++
B. C#
C. Java
D. Node.js
3. True or False: Scala is the primary abstraction of Apache
Spark.
4. In RDD actions, which function returns all the elements of
the data set as an array of the driver program?
A. Collect
B. Take
C. Count
D. Reduce
5. True or False: Referencing a data set is one of the methods
to create RDD.
Introduction to Apache Spark © Copyright IBM Corporation 2021
Uempty
Review answers
1. What is the data representation format of an RC or ORC
file?
A. Row-based encoding
B. Record-based encoding
C. Column-based storage
D. NoSQL data store
2. True or False: A NoSQL database is designed for those
developers that do not want to use SQL.
3. HBase is an example of which of the following NoSQL data
store type?
A. Key-value store
B. Graph store
C. Column store
D. Document store
Uempty
Review answers
4. Which database provides an SQL for Hadoop interface?
A. Hbase
B. Apache Hive
C. Cloudant
D. MongoDB
5. True or False: R is a real programming language, and
Python is an interactive environment for doing statistics.
Uempty
Review answers
1. Kerberos is used by Hadoop for:
A. Authentication
B. Authorization
C. Auditing
D. Data protection
2. ______ is used by Hadoop for API and perimeter security.
A. Apache Ambari
B. Apache Knox
C. Apache Ranger
D. Data Steward Studio
3. True or False: Kerberos provides automation and
management of Apache Ambari in the Hadoop cluster.
Uempty
Uempty
Review answers
1. True or False: IBM Streams needs Apache Storm or Apache
Spark to provide the analytics
2. True or False: Streaming data is limited to sensors,
cameras, and video.
3. What are the differences between NiFi and MiNiFi?
A. NiFi is small and has low resource consumption.
B. NiFi is subproject of MiNiFi.
C. NiFi is a disk-based and microbatch ETL tool.
D. They are the same.
Uempty
Uempty
Review answers
1. What are the reasons that your organization would use Db2
Big SQL?
A. Want to access your Hadoop data without using MapReduce.
B. It has superior SQL-on-Hadoop performance to optimize data
ingestion and query performance for your enterprise.
C. No deep learning curve because it is compliant with industry
standards for SQL.
D. No need to change the format or migrate the data out of
Hadoop to do any work on the data.
E. All the above.
2. True or False: For optimal performance and function, the
ORC file format is recommended for Db2 Big SQL.
3. True or False: You can access Db2 Big SQL through Db2
Big SQL console only.
Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data
Uempty
Using IBM Db2 Big SQL to access Apache Hadoop Distributed File
© Copyright IBM Corporation 2021
System data
Uempty
Review answers
1. True or False: The INSERT operation is the recommended
method for getting data into your Db2 Big SQL table for best
performance.
Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021
Uempty
Creating IBM Db2 Big SQL schemas and tables © Copyright IBM Corporation 2021
Uempty
Review answers
1. Which file format is supported by Db2 Big SQL?
A. Text
B. Sequence
C. Parquet
D. ORC
E. Avro
F. All the above
File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021
Uempty
File formats and querying IBM Db2 Big SQL tables © Copyright IBM Corporation 2021
Uempty
Review answers
1. You can control authorization of Db2 Big SQL objects on
which of the following levels?
A. With authorization in the distributed file system.
B. Authorization with the GRANT command.
C. Authorization at the row and column levels.
D. By using VIEWS or STORED PROCEDURES
E. All the above.
Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021
Uempty
Configuring IBM Db2 Big SQL security © Copyright IBM Corporation 2021
Uempty
Review answers
1. What are the characteristics of a federated system?
A. Transparent
B. Extensible
C. Autonomous
D. High performance
E. High function
F. All the above
Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021
Uempty
Data federation with IBM Db2 Big SQL © Copyright IBM Corporation 2021
Uempty
Review answers
1. True or False: Watson Studio is designed only for the data
scientist persona.
1. True or false: Watson Studio is designed only for the data scientist persona.
False: The data engineer, the data scientist, the business analyst, and the app developer can
use Watson Studio. Each persona has primary tools that help them do their job. For example,
the data scientist's main tool is Watson Studio, but other personas can use it as it relates to their
jobs. Also, there might be a team of different personas. Whatever the case is, you must decide
what tool is correct for the job regardless of the personas. The definitions of personas can vary
between different companies and evolve over time.
2. True or False: Object storage provides the space where unstructured data for your project is
stored.
3. True or False: Collaborators can be given certain access levels.
True: Add collaborators to your project by using their email addresses. If they have an existing
account on IBM Cloud, they are added immediately. Otherwise, they receive an invite to create
a Watson Studio account. Choose the permissions for each collaborator. The Admin role can
control project assets, collaborators, and settings. The Editor role can control project assets.
The Viewer role can view the project. Collaborators can be removed from a project or have their
permissions updated.
4. True or False: Collaboration within Watson Studio is an optional add-on component that must
be purchased.
False: Collaboration is a standard feature in Watson Studio.
Uempty
Review answers
1. Which of the following statements are true about Apache Spark
environments in Watson Studio?
A. Apache Spark environments can be accessed by using the S3
API.
B. Apache Spark environments define the hardware and software
configurations to start custom Apache Spark clusters on demand.
C. Apache Spark environments provide the storage that is required
by Watson Studio projects.
D. Apache Spark environments in Watson Studio are used by tools
such as Notebooks
E. Users must always create their own Apache Spark environment
definition.
F. Watson Studio provides default Apache Spark environment
definitions to get started quickly.
2. True or False: Multiple Notebooks cannot be associated with
the same Apache Spark environment definition.
3. True or False: /tmp is not a shared file system and cannot be
accessed by Apache Spark executors.
Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021
Uempty
Analyzing data with IBM Watson Studio © Copyright IBM Corporation 2021
ChatGPT Conversation
Time: 2023-02-02T22:15:20.570Z
Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.
Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System
Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL
d. Spark SQL.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 1/10
02/02/2023 23:17 ChatGPT Conversation
a. ResourceManager.
Which data encoding format supports exact storage of all data in binary
representations such as VARBINARY columns? a. SequenceFiles b. RCFile c.
Parquet d. Flat
a. SequenceFiles.
Hadoop uses which two Google technologies as its foundation? a. HBase b. Google
File System c. YARN d. MapReduce e. Ambari
a. Authentication c. Authorization.
b. NameNode.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 2/10
02/02/2023 23:17 ChatGPT Conversation
Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety
c. Veracity.
Which statement is true about the Combiner phase of the MapReduce architecture?
a. It is performed after the Reducer phase to produce the final output. b. It reduces
the amount of data that is sent to the Reducer task nodes. c. It aggregates all input
data before it goes through the Map phase. d. It determines the size and distribution
of data split in the Map phase.
b. It reduces the amount of data that is sent to the Reducer task nodes.
Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.
ResourceManager c. TaskManager d. JobMaster
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 3/10
02/02/2023 23:17 ChatGPT Conversation
a. ApplicationMaster.
What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML
b. Unstructured data.
Which Spark Core function provides the main element of Spark API? a. MLlib b.
Mesos c. YARN d. RDD
c. Hive
Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager
Under the YARN/MRv2 framework, the JobTracker functions are split into two
daemons: the ResourceManager and the ApplicationMaster.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 4/10
02/02/2023 23:17 ChatGPT Conversation
Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce
a. org.apache.hadoop.mapred
Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN
c. Pig
Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing
Apache Spark provides a single, unifying platform for the following three types of
operations:
What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.
a. Combiner
d. Authorization Provider
Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety
e. Variety
What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN
b. Big Replicate
c. Big Index
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 6/10
02/02/2023 23:17 ChatGPT Conversation
b. HBase
Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN
d. YARN
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase
c. Sqoop
Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit
The three components that are a part of the Five Pillars of Security are:
1. Data Protection
2. Administration
3. Audit
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 7/10
02/02/2023 23:17 ChatGPT Conversation
b. MongoDB
You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT
You would use the "hadoop fs -chmod" command in HDFS to determine the
permission setting for a new schema directory in Hadoop.
choose only one correct answer without explaining You need to determine the
permission setting for a new schema directory. Which tool would you use? a. umask
b. HDFS c. Kerberos d. GRANT
b. HDFS
Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables
You need to enable impersonation. Which two properties in the bigsql-conf.xml file
need to be marked true? a. DB2_ATS_ENABLE b.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 8/10
02/02/2023 23:17 ChatGPT Conversation
bigsql.impersonation.create.table.grant.public c. DB2COMPOPT d.
bigsql.alltables.io.doAs e. $BIGSQL_HOME/conf
d. bigsql.alltables.io.doAs e. bigsql.impersonation.enabled
Which directory permissions need to be set to allow all users to create their own
schema? a. 700 b. 755 c. 777 d. 666
c. 777
When connecting to an external database in a federation, you need to use the correct
database driver and protocol. What is this federation component called in Db2 Big
SQL? a. Data source b. Wrapper c. Nickname d. User mapping
a. Data source
Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.
You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 9/10
02/02/2023 23:17 ChatGPT Conversation
What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-15-20.html 10/10
02/02/2023 23:18 ChatGPT Conversation
ChatGPT Conversation
Time: 2023-02-02T22:17:37.412Z
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T22-17-37.html 1/1
02/02/2023 23:18 ChatGPT Conversation
ChatGPT Conversation
Time: 2023-02-02T21:38:12.326Z
b. HBase
Which component of the Spark Unified Stack allows developers to intermix structured
database queries with Spark's programming language? a. Mesos b. Java c. MLlib d.
Spark SQL
d. Spark SQL
Which feature makes Apache Spark much easier to use than MapReduce? a.
Applications run in-memory. b. APIs for Scala, Python, C++, and .NET. c. Libraries
that support SQL queries. d. Suitable for transaction processing.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 1/12
02/02/2023 23:18 ChatGPT Conversation
b. NameNode
b. MongoDB
What are three IBM value-add components to the Hortonworks Data Platform (HDP)?
a. Db2 Big SQL b. Big Replicate c. Big Index d. Big Match e. Big Data f. Big YARN
Under the YARN/MRv2 framework, the JobTracker functions are split into which two
daemons? a. ResourceManager b. ApplicationMaster c. TaskManager d. JobMaster
e. ScheduleManager
a. ResourceManager b. ApplicationMaster
d. Disk latency
a. ResourceManager
Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the
NodeManager(s) to execute and monitor tasks? a. ApplicationMaster b.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 2/12
02/02/2023 23:18 ChatGPT Conversation
a. ApplicationMaster
b. HBase
If a Hadoop node goes down, which Ambari component will notify the Administrator?
a. REST API b. Ambari Metrics System c. Ambari Alert Framework d. Ambari Wizard
c. Pig
Apache Spark provides a single, unifying platform for which three of the following
types of operations? a. ACID transactions b. Transaction processing c. Machine
learning d. Graph operations e. Record locking f. Batch processing
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 3/12
02/02/2023 23:18 ChatGPT Conversation
Apache Spark can run on which two of the following cluster managers? a. oneSIS b.
Apache Mesos c. Nomad d. Linux Cluster Manager e. Hadoop YARN
Which three are a part of the Five Pillars of Security? a. Speed b. Resiliency c. Data
Protection d. Administration e. Audit
The three that are a part of the Five Pillars of Security are:
Data Protection
Administration
Audit
Which of the "Five Vs" of big data describes the possibility that such a huge amount
of data has some uncertainty that is associated with it? a. Value b. Volume c. Veracity
d. Velocity e. Variety
c. Veracity
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 4/12
02/02/2023 23:18 ChatGPT Conversation
Which component of the Apache Ambari architecture provides statistical data to the
dashboard about the performance of a Hadoop cluster? a. Ambari Wizard b. Ambari
Server c. Ambari Alert Framework d. Ambari Metrics System
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 5/12
02/02/2023 23:18 ChatGPT Conversation
What is the name of the Hadoop-related Apache project that utilizes an in-memory
architecture to run applications faster than MapReduce? a. Python b. Spark c. Pig d.
Hive
b. Spark
d. Authorization Provider
What two security functions does Apache Knox provide? a. Management of Kerberos
in the cluster. b. Proxying services. c. API and perimeter security. d. Database field
access auditing.
Apache Knox provides two security functions: API and perimeter security, and
proxying services.
Which component of the Hortonworks Data Platform (HDP) is the architectural center
of Hadoop and provides resource management and a central platform for Hadoop
applications? a. HDFS b. MapReduce c. HBase d. YARN
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 6/12
02/02/2023 23:18 ChatGPT Conversation
d. YARN
Under the MapReduce v1 programming model, which shows the proper order of the
full set of MapReduce phases? a. Map -> Combine -> Shuffle -> Reduce b. Map ->
Combine -> Reduce -> Shuffle c. Map -> Split -> Reduce -> Combine d. Split -> Map
-> Combine -> Reduce
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2,
MySQL, or other databases? a. Oozie b. Accumulo c. Sqoop d. HBase
c. Sqoop
Which two are attributes of streaming data? a. Data is processed in batch. b. Simple,
numeric data. c. Requires extremely rapid processing. d. Sent in high volume.
Which of the "Five Vs" of big data is related to different formats of data, from
traditional documents and databases to semi-structured and unstructured data? a.
Value b. Volume c. Veracity d. Velocity e. Variety
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 7/12
02/02/2023 23:18 ChatGPT Conversation
e. Variety
What is the type of data that has an unknown form and cannot be stored in an
RDBMS and analyzed unless it is transformed into a structured format? a. Structured
data b. Unstructured data c. Semi-structured data d. JSON e. XML
Unstructured data
You need to determine the permission setting for a new schema directory. Which tool
would you use? a. umask b. HDFS c. Kerberos d. GRANT
d. GRANT
What is an advantage of the ORC file format? a. Db2 Big SQL can exploit advanced
features b. Data interchange outside Hadoop c. Supported by multiple I/O engines d.
Efficient compression
d. Efficient compression
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 8/12
02/02/2023 23:18 ChatGPT Conversation
Which statement best describes a Db2 Big SQL database table? a. The defined
format and rules around a delimited file. b. A data type of a column describing its
value. c. A directory with zero or more data files. d. A container for any record format.
Using the Java SQL Shell, which command will connect to a database called
mybigdata? a. ./jsqsh mybigdata b. ./jsqsh go mybigdata c. ./java mybigdata d. ./java
tables
b. ./jsqsh go mybigdata
What are Db2 Big SQL database tables organized into? a. Files b. Hives c.
Directories d. Schemas
You are creating a new table and need to specify a directory within the DFS to store
its data files. Which keyword would you use? a. EXTERNAL b. LOCATION c.
PARTITIONED BY d. HADOOP
b. LOCATION
You are creating a new table and need to format it with parquet. Which partial SQL
statement would create the table in parquet format? a. STORED AS parquetfile b.
STORED AS parquet c. CREATE AS parquetfile d. CREATE AS parquet
b. STORED AS parquet
Which statement is NOT true about Db2 Big SQL? a. It supports a variety of storage
formats. b. It supports comprehensive and standard SQL. c. It provides powerful
optimization and performance. d. It uses MapReduce to access your Hadoop data.
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 9/12
02/02/2023 23:18 ChatGPT Conversation
Db2 Big SQL does not use MapReduce to access Hadoop data, it uses its own
optimized parallel processing engine.
Which tool would you use to create a connection to your Db2 Big SQL database? a.
Jupyter b. Ambari c. Scheduler d. Db2 Big SQL console
Which Db2 Big SQL feature allows users to join a Hadoop data set to data in external
databases? a. Impersonation b. Fluid query c. Grant/Revoke privileges d. Integration
d. Integration
You have a distributed file system (DFS) and need to set permissions on the the
/hive/warehouse directory to allow access to ONLY the bigsql user. Which command
would you run? a. hdfs dfs -chmod 700 /hive/warehouse b. hdfs dfs -chmod 770
/hive/warehouse c. hdfs dfs -chmod 755 /hive/warehouse d. hdfs dfs -chmod 666
/hive/warehouse
Which two commands would you use to give or remove certain privileges to/from a
user? a. INSERT b. REVOKE c. GRANT d. SELECT e. LOAD
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 10/12
02/02/2023 23:18 ChatGPT Conversation
c. GRANT b. REVOKE
Which type of cell can be used to document and comment on a process in a Jupyter
notebook? a. Kernel b. Code c. Markdown d. Output
c. Markdown
a. Projects
Which of the following products is a cloud-native solution that enables you to put your
data to work quickly and efficiently? It can run anywhere, like on Red Hat OpenShift
cluster, in the cloud, or on-premises. a. Hadoop b. Cloud Pak for Data c. Big SQL d.
Spark
What is the field of study that gives computers the ability to learn without being
explicitly programmed? a. Yarn b. SQL c. Hadoop d. Machine Learning
d. Machine Learning
In data science and the AI domain, which of the following products help you with data
preparation in a fully managed IBM Cloud environment? a. Watson Studio on IBM
Cloud Pak for Data b. Watson Studio Cloud (IBM Cloud) c. Watson Studio Local d.
Watson Studio Desktop e. Watson Studio professional
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 11/12
02/02/2023 23:18 ChatGPT Conversation
Which of the following products do you use to organize your resources; work with
data; and add collaborators, data assets, and tools to derive insights from data? a.
Pig b. Big SQL project c. Watson Studio Project d. Cloud Object Storage
Which machine learning approach detects patterns and relationships between data
without using labeled data? a. Supervised Learning b. Unsupervised Learning c.
Semi-supervised Learning d. Reinforcement Learning
b. Unsupervised Learning
Which of the following products provides integration with GitHub and an interface for
downloading Notebooks? a. Flume b. Pig c. Watson Studio d. Hive
c. Watson Studio provides integration with GitHub and an interface for downloading
Notebooks.
Which of the following products can be deployed in the private or public cloud of your
choice, including IBM Cloud™, AWS, Microsoft Azure, or Google Cloud? a. Watson
Studio Cloud (IBM Cloud) b. Watson Studio on IBM Cloud Pak for Data c. Watson
Studio Local d. Watson Studio Desktop
file:///C:/Users/pablo/Downloads/ChatGPT-2023-02-02T21-38-12.html 12/12