You are on page 1of 30

21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

Big data engineer ibm exploree


13 consultations depuis hier

Termes dans cette liste (190)

C. It limits the rows or columns returned based on


certain criteria.
Which definition best describes
RCAC?
A. It limits access by using
views and stored procedures.
B. It grants or revokes certain
directory privileges.
C. It limits the rows or columns
returned based on certain
criteria.
D. It grants or revokes certain
user privileges

A. hdfs dfs -chmod 700 /hive/warehouse

You have a distributed file


system (DFS) and need to set
permissions on the the
/hive/warehouse directory to
allow access to ONLY the
bigsql user. Which command
would you run?
A. hdfs dfs -chmod 700
/hive/warehouse
B. hdfs dfs -chmod 666
/hive/warehouse
C. hdfs dfs -chmod 770
/hive/warehouse
D. hdfs dfs -chmod 755
/hive/warehouse

A. Efficient compression

What is an advantage of the


ORC file format?
A. Efficient compression
B. Big SQL can exploit
advanced features
C. Supported by multiple I/O
engines
D. Data interchange outside
Hadoop
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 1/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
Hadoop

C. bigsql.alltables.io.doAs
D. bigsql.impersonation.create.table.grant.public
u need to enable impersonation. Which
properties in the bigsql-conf.xml file need
be marked true?
$BIGSQL_HOME/conf
DB2COMPOPT
bigsql.alltables.io.doAs

sql.impersonation.create.table.grant.public
DB2_ATS_ENABLE

C. STORED AS parquetfile

You are creating a new table


and need to format it with
parquet. Which partial SQL
statement would create the
table in parquet format?
A. STORED AS parquet
B. CREATE AS parquetfile
C. STORED AS parquetfile
D. CREATE AS parquet

B. 777
Which directory permissions
need to be set to allow all
users to create their own
schema?
A. 666
B. 777
C. 700
D. 755

A. umask
You need to determine the
permission setting for a new
schema directory. Which tool
would you use?
A. umask
B. GRANT
C. HDFS
D. Kerberos

A. ./jsqsh mybigdata

Using the Java SQL Shell,


which command will connect to
a database called mybigdata?
A. ./jsqsh mybigdata
B. ./jsqsh go mybigdata
C. ./java mybigdata
D. ./java tables
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 2/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

B. GRANT
C. REVOKE
Which two commands would
you use to give or remove
certain privileges to/from a
user?
A. INSERT
B. GRANT
C. REVOKE
D. LOAD
E. SELECT

A. Schemas
What are Big SQL database
tables organized into?
A. Schemas
B. Directories
C. Files
D. Hives

A. Wrapper

When connecting to an
external database in a
federation, you need to use the
correct database driver and
protocol. What is this
federation component called in
Big SQL?
A. Wrapper
B. Data source
C. User mapping
D. Nickname

A. A directory with zero or more data files

Which statement best


describes a Big SQL database
table?
A. A directory with zero or more
data files.
B. The defined format and
rules around a delimited file.
C. A container for any record
format.

D. A data type of a column


describing its value

B. DSM
Which tool would you use to
create a connection to your Big
SQL database?
A. Jupyter

B DSM
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 3/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
B. DSM

C. Ambari

D. Scheduler

B. /apps/hive/warehouse/

What is the default directory in


HDFS where tables are
stored?
A.
/apps/hive/warehouse/schema
B. /apps/hive/warehouse/
C. /apps/hive/warehouse/data
D. /apps/hive/warehouse/bigsql

D. CREATE FUNCTION

Which command creates a


user-defined schema function?
A. ALTER MODULE PUBLISH
FUNCTION
B. TRANSLATE FUNCTION
C. ALTER MODULE ADD
FUNCTION
D. CREATE FUNCTION

A. graph operations
C. batch processing
Apache Spark provides a D. machine learning
single, unifying platform for
which three of the following
types of operations?
A. graph operations
B. record locking
C. batch processing
D. machine learning
E. ACID transactions
F. transaction processing

C. org.apache.hadoop.mapred

Which is the java class prefix


for the MapReduce v1 APIs?
A. org.apache.hadoop.mr
B. org.apache.mapreduce
C. org.apache.hadoop.mapred
D. org.apache.mr

A. Hortonworks Data Flow


What is the preferred
replacement for Flume?
A. Hortonworks Data Flow
B. Storm

C. NiFi
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 4/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
C. NiFi
D. Druid
D. ResourceManager

Under the YARN/MRv2


framework, which daemon
arbitrates the execution of
tasks among all the
applications in the system?
A. ScheduleManager
B. ApplicationMaster
C. JobMaster
D. ResourceManager

C. Ambari Metrics System

Which component of the


Apache Ambari architecture
provides statistical data to the
dashboard about the
performance of a Hadoop
cluster?
A. Ambari Wizard
B. Ambari Alert Framework
C. Ambari Metrics System
D. Ambari Server

A. ResourceManager

Under the YARN/MRv2


framework, the Scheduler and
ApplicationsManager are
components of which daemon?
A. ResourceManager
B. ApplicationMaster
C. TaskManager
D. ScheduleManager

A. Scala
B. Java
Which three programming D. Python
languages are directly
supported by Apache Spark?
A. Scala
B. Java
C. C++
D. Python
E. .NET
F. C#

B. ResourceManager
E. ApplicationMaster
Under the YARN/MRv2
framework, the JobTracker
functions are split into which

two daemons?
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 5/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
two daemons?
A. JobMaster

B. ResourceManager
C. ScheduleManager

D. TaskManager

E. ApplicationMaster

B. Finding a particular node within the cluster.


D. Partial failure of the nodes during execution.

What are two common issues


in distributed systems?
A. Reduced performance when
compared to a single server.
B. Finding a particular node
within the cluster.
C. Distributed systems are
harder to scale up.
D. Partial failure of the nodes
during execution.

B. disk latency
Which component of an
Hadoop system is the primary
cause of poor performance?
A. CPU
B. disk latency
C. network
D. RAM

A. Hive

Which Apache Hadoop


application provides an SQL-
like interface to allow
abstraction of data on semi-
structured data in a Hadoop
datastore?
A. Hive
B. YARN
C. Spark
D. Pig

B. Spark SQL

Which component of the Spark


Unified Stack allows
developers to intermix
structured database queries
with Spark's programming
language?
A. Mesos
B. Spark SQL
C. Java
D. MLlib

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 6/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

B. Google File System


E. MapReduce
Hadoop uses which two
Google technologies as its
foundation?
A. YARN
B. Google File System
C. Ambari
D. HBase
E. MapReduce

C. high-speed networking between nodes


F. parallel reading of large data files

Which two factors in a Hadoop


cluster increase performance
most significantly?
A. large number of small data
files
B. data redundancy on
management nodes
C. high-speed networking
between nodes
D. solid state disks
E. immediate failover of failed
disks
F. parallel reading of large data
files

A. RDD
Which Spark Core function
provides the main element of
Spark API?
A. RDD
B. MLlib
C. YARN
D. Mesos

A. Proxying services.
B. API and perimeter security.
What two security functions
does Apache Knox provide?
A. Proxying services.
B. API and perimeter security.
C. Management of Kerberos in
the cluster.
D. Database field access
auditing.

A. Place the commands in a file.


D. Include the --options-file command line
argument.
What are two ways the
command-line parameters for a
Sqoop invocation can be
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 7/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

simplified?

commands in a
A. Place the
file.

B. Run Sqoop using the vi


editor.
C. Use
the --import-command
line argument.
D. Include the
--options-file
command line argument.

C. Collector
What is the final agent in a
Flume chain named?
A. Stream
B. Agent
C. Collector
D. Source

D. Map -> Combine -> Shuffle -> Reduce

Under the MapReduce v1


programming model, which
shows the proper order of the
full set of MapReduce phases?
A. Map -> Combine -> Reduce
-> Shuffle
B. Split -> Map -> Combine ->
Reduce
C. Map -> Split -> Reduce ->
Combine
D. Map -> Combine -> Shuffle -
> Reduce

C. MapReduce v1 APIs are implemented by


applications which are largely independent of the
execution environment.

Which statement is true about


MapReduce v1 APIs?
A. MapReduce v1 APIs define
how MapReduce jobs are
executed.
B. MapReduce v1 APIs provide
a flexible execution
environment to run
MapReduce.
C. MapReduce v1 APIs are
implemented by applications
which are largely independent
of the execution environment.
D. MapReduce v1 APIs cannot
be used with YARN.

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 8/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

D. Ambari Alert Framework


If a Hadoop node goes down,
which Ambari component will
notify the Administrator?
A. Ambari Wizard
B. REST API
C. Ambari Metrics System
D. Ambari Alert Framework

B. MongoDB
What is an example of a
NoSQL datastore of the
"Document Store" type?
A. HBase
B. MongoDB
C. REDIS
D. Cassandra

D. Pig

Which Apache Hadoop


application provides a high-
level programming language
for data transformation on
unstructured data?
A. Sqoop
B. Hive
C. Zookeeper
D. Pig

D. JBOD
Which hardware feature on an
Hadoop datanode is
recommended for cost efficient
performance?
A. SSD
B. RAID
C. LVM
D. JBOD

D. REDIS
What is an example of a Key-
value type of NoSQL
datastore?
A. MongoDB
B. Sesame
C. Neo4j
D. REDIS

C. SequenceFiles
Which data encoding format
supports exact storage of all
data in binary representations
such as VARBINARY
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 9/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
columns?
A. Parquet
B. RCFile
C. SequenceFiles
D. Flat

A. Authorization Provider

Which component of the


Apache Ambari architecture
integrates with an
organization's LDAP or Active
Directory service?
A. Authorization Provider
B. Postgres RDBMS
C. REST API
D. Ambari Alert Framework

D. Libraries that support SQL queries.

Which feature makes Apache


Spark much easier to use than
MapReduce?
A. Suitable for transaction
processing.
B. APIs for Scala, Python, C++,
and .NET.
C. Applications run in-memory.
D. Libraries that support SQL
queries.

C. The column to use as the primary key.

What does the split-by


parameter tell Sqoop?
A. The number of rows to
commit per transaction.
B. The table name to export
from the database.
C. The column to use as the
primary key.
D. The number of rows to send
to each mapper.

A. Sqoop
Which Hadoop ecosystem tool
can import data into a Hadoop
cluster from a DB2, MySQL, or
other databases?
A. Sqoop
B. HBase
C. Accumulo
D. Oozie

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 10/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

C. HBase
Which NoSQL datastore type
began as an implementation of
Google's BigTable that can
store any type of data and
scale to many petabytes?
A. MemcacheD
B. CouchDB
C. HBase
D. Riak

D. Use the -m 1 parameter.

How can a Sqoop invocation


be constrained to only run one
mapper?
A. Use the --limit mapper=1
parameter.
B. Use the -mapper 1
parameter.
C. Use the --single parameter.
D. Use the -m 1 parameter.

B. YARN
E. MapReduce
F. HDFS
Hadoop 2 consists of which
three open-source sub-projects
maintained by the Apache
Software Foundation?
A. Big SQL
B. YARN
C. Hive
D. Cloudbreak
E. MapReduce
F. HDFS

B. Scalability
E. Resource utilization

What are two primary


limitations of MapReduce v1?
A. TaskTrackers can be a
bottleneck to MapReduce jobs
B. Scalability
C. Number of TaskTrackers
limited to 1,000
D. Workloads limited to
MapReduce
E. Resource utilization

B. It reduces the amount of data that is sent to the


Reducer task nodes.

Which statement is true about


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 11/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
Which statement is true about
the Combiner phase of the
MapReduce architecture?
A. It determines the size and

distribution of data split in the


Map phase.
B. It reduces
the amount of
data that is sent to the Reducer
task nodes.

C. It aggregates all input data


before it goes through the Map
phase. D. It is performed after
the Reducer phase to produce
the final output.

C. Projects
What is the architecture of
Watson Studio centered on?
A. Data Assets
B. Collaborators
C. Projects
D. Analytic Assets

D. The email address of the collaborator

You need to add a collaborator


to your project. What do you
need?
A. The list of deployments
B. A list of your saved
bookmarks
C. Your project ID
D. The email address of the
collaborator

B. Markdown

Which type of cell can be used


to document and comment on
a process in a Jupyter
notebook?
A. Code
B. Markdown
C. Kernel
D. Output

B. Spark Instance
D. Project
Before you create a Jupyter
notebook in Watson Studio,
which two items are
necessary?
A. File
B. Spark Instance
C. Scala
D. Project
E. URL
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 12/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

A. Watson Studio Cloud

Which Watson Studio offering


used to be available through
something known as IBM
Bluemix?
A. Watson Studio Cloud
B. Watson Studio Business
C. Watson Studio Local
D. Watson Studio Desktop

A. Extending the core language with shortcuts.

What is a "magic" command


used for in Jupyter?
A. Extending the core
language with shortcuts.
B. Parsing and loading data
into a notebook.
C. Autoconfiguring data
connections using a registry.
D. Running common statistical
analyses.

D. To perform certain data transformation quickly.

Why might a data scientist


need a particular kind of GPU
(graphics processing unit)?
A. To display a simple bar chart
of data on the screen.
B. To collect video for use in
streaming data applications.
C. To input commands to a
data science notebook.
D. To perform certain data
transformation quickly.

A. %lsmagic
What command is used to list
the "magic" commands in
Jupyter?
A. %lsmagic
B. %list-all-magic
C. %dirmagic
D. %list-magic

C. Notebooks can be used by multiple people at


the same time.
Which is an advantage that
Zeppelin holds over Jupyter?
A. Zeppelin is able to use the
R
language.

B. Users must authenticate


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 13/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
before using a notebook.
C. Notebooks can be used
by
multiple people at the same
time.

D. Notebooks can be
connected to big data engines
such as Spark.

A. App in web browser.


What does the user interface
for Jupyter look like to a user?
A. App in web browser.
B. Common desktop app.
C. Linux SSH session.
D. Database interface.

C. Combiner

Under the MapReduce v1


programming model, which
optional phase is executed
simultaneously with the Shuffle
phase?
A. Split
B. Map
C. Combiner
D. Reduce

D. ResourceManager

Under the YARN/MRv2


framework, the Scheduler and
ApplicationsManager are
components of which daemon?
A. ApplicationMaster
B. TaskManager
C. ScheduleManager
D. ResourceManager

B. Data is aggregated by worker nodes.

Under the MapReduce v1


programming model, what
happens in a "Reduce" step?
A. Worker nodes process
pieces in parallel.
B. Data is aggregated by
worker nodes.
C. Worker nodes store results
on their own local file systems.
D. Input is split into pieces.

B. It is much faster than MapReduce for complex


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 14/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

applications on disk.
Which statement about Apache
Spark is true?
A. It supports HDFS, MS-SQL,
and Oracle.
B. It is much faster than
MapReduce for complex
applications on disk.
C. It runs on Hadoop clusters
with RAM drives configured on
each DataNode.
D. It features APIs for C++ and
.NET.

B. Administration
D. Data Protection
Which three are a part of the
E. Audit
Five Pillars of Security?
A. Resiliency
B. Administration
C. Speed
D. Data Protection
E. Audit

C. All servers keep a copy of the shared data in


memory
Which statement accurately
describes how ZooKeeper
works?
A. Writes to a leader server will
always succeed.
B. There can be more than one
leader server at a time.
C. All servers keep a copy of
the shared data in memory.
D. Clients connect to multiple
servers at the same time.

A. Sent in high volume.


B. Requires extremely rapid processing.
Which two are attributes of
streaming data?
A. Sent in high volume.
B. Requires extremely rapid
processing.
C. Simple, numeric data.
D. Data is processed in batch.

B. Spark

What is the name of the


Hadoop-related Apache project
that utilizes an in-memory
architecture to run applications
faster than MapReduce?
A. Pig
B S k
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 15/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
B. Spark
C. Hive
D. Python

A. It is a Hadoop distribution based on a centralized


architecture with YARN at its core.

Which statement is true about


Hortonworks Data Platform
(HDP)?
A. It is a Hadoop distribution
based on a centralized
architecture with YARN at its
core.
B. It is a powerful platform for
managing large volumes of
structured data.
C. It is engineered and
developed by IBM's BigInsights
team.
D. It is designed specifically for
IBM Big Data customers.

B. Maintaining configuration information.


C. Providing distributed synchronization.
What are two services
provided by ZooKeeper?
A. Loading bulk data into an
Hadoop cluster.
B. Maintaining configuration
information.
C. Providing distributed
synchronization.
D. Authenticating and auditing
user access.

A. NodeChildrenChanged
Which two are valid watches B. NodeDeleted
for ZNodes in ZooKeeper?
A. NodeChildrenChanged
B. NodeDeleted
C. NodeRefreshed
D. NodeExpired

A. Authorization
B. Auditing
What are two security features
Apache Ranger provides?
A. Authorization
B. Auditing
C. Authentication
D. Availability

A. Apache Mesos
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 16/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

C. Hadoop YARN
Apache Spark can run on
which two of the following
cluster managers?
A. Apache Mesos
B. oneSIS
C. Hadoop YARN
D. Linux Cluster Manager E.
Nomad

B. HBase

Which Apache Hadoop


component can potentially
replace an RDBMS as a large
Hadoop datastore and is
particularly good for "sparse
data"?
A. MapReduce
B. HBase
C. Spark
D. Ambari

D. Parallel Processing

Which computing technology


provides Hadoop's high
performance?
A. RAID-0
B. Online Transactional
Processing
C. Online Analytical Processing
D. Parallel Processing

A. Fluid query

Which Big SQL feature allows


users to join a Hadoop data set
to data in external databases?
A. Fluid query
B. Grant/Revoke privileges
C. Integration
D. Impersonation

D. Object Storage
Where does the unstructured
data of a project reside in
Watson Studio?
A. Database
B. Wrapper
C. Tables
D. Object Storage

A. Acquisition

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 17/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

What is the first step in a data


science pipeline?
A. Acquisition

B. Analytics

C. Exploration
D. Manipulation

A. Documenting the computational process.

What is a markdown cell used


for in a data science notebook?
A. Documenting the
computational process.
B. Holding the output of a
computation.
C. Configuring data
connections.
D. Writing code to transform
data.

A. Parallel Processing

Which computing technology


provides Hadoop's high
performance?
A. Parallel Processing
B. Online Transactional
Processing
C. RAID-0
D. Online Analytical Processing

A. A wizard for installing Hadoop services on host


servers.

Which description
characterizes a function
provided by Apache Ambari?

A. A wizard for installing


Hadoop services on host
servers.
B. A messaging system for
real-time data pipelines.
C. Moves large amounts of
streaming event data.
D. Moves information to/from
structured databases.

D. Data is aggregated by worker nodes.

Under the Map Reduce v1


programming model, what
happens in a "Reduce" step?

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 18/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

A. Worker nodes process


pieces in parallel.

pieces.
B. Input is split into
C. Worker nodes store results

on their own local file systems.


D. Data is aggregated by

worker nodes.

A. Hortonworks Data Flow


What is the preferred
replacement for Flume?
A. Hortonworks Data Flow
B. NiFi
C. Druid
D. Storm

B. Big Match
D. Big SQL
F. Big Replicate
What are three IBM value-add
components to the
Hortonworks Data Platform
(HDP)?

A. Big YARN
B. Big Match
C. Big Index
D. Big SQL
E. Big Data
F. Big Replicate

A. An application evaluating sensor data in real-


time.
Which statement describes an
example of an application
using streaming data?

A. An application evaluating
sensor data in real-time.
B. A web application that
supports 10,000 users.
C. A system that stores many
records in a database.
D. One time export and import
of a database.

D. 1
How many Big SQL
management node do you
need at minimum?

A. 3
B. 4

C 2
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 19/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
C. 2
D. 1

B. Include the --options-file command line


argument.
D. Place the commands in a file.
What are two ways the
command-line parameters for a
Sqoop invocation can be
simplified?
A. Use the --import-command
line argument.
B. Include the --options-file
command line argument.
C. Run Sqoop using the vi
editor.
D. Place the commands in a
file.

D. Libraries that support SQL queries.

Which feature makes Apache


Spark much easier to use than
MapReduce?
A. APIs for Scala, Python, C++,
and .NET.
B. Applications run in-memory.
C. Suitable for transaction
processing.
D. Libraries that support SQL
queries.

A. MongoDB
What is an example of a
NoSQL datastore of the
"Document Store" type?
A. MongoDB
B. REDIS
C. Cassandra
D. HBase

B. YARN

Which component of the


Hortonworks Data Platform
(HDP) is the architectural
center of Hadoop and provides
resource management and a
central platform for Hadoop
applications?
A. HDFS
B. YARN
C. HBase
D. MapReduce

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 20/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

B. ApplicationMaster

Under the YARN/MRv2


framework, which daemon is
tasked with negotiating with the
NodeManager(s) to execute
and monitor tasks?
A. ResourceManager
B. ApplicationMaster
C. TaskManager
D. JobMaster

C. Markdown

Which type of cell can be used


to document and comment on
a process in a Jupyter
notebook?
A. Output
B. Kernel
C. Markdown
D. Code

D. A directory with zero or more data files.

Which statement best


describes a Big SQL database
table?
A. A container for any record
format.
B. The defined format and
rules around a delimited file.
C. A data type of a column
describing its value.
D. A directory with zero or
more data files.

A. Efficient compression
What is an advantage of the
ORC file format?
A. Efficient compression
B. Data interchange outside
Hadoop
C. Big SQL can exploit
advanced features

D. Supported by multiple I/O


engines

A. The column to use as the primary key.

What does the split-by


parameter tell Sqoop?

the
A. The column to use as
i k
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 21/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
primary key.

of rows to
B. The number
commit per transaction.

C. The table name to export


from the database.

D. The number of rows to send


to each mapper.

B. It is much faster than MapReduce for complex


applications on disk.
Which statement about Apache
Spark is true?
A. It runs on Hadoop clusters
with RAM drives configured on
each DataNode.
B. It is much faster than
MapReduce for complex
applications on disk. C. It
supports HDFS, MS-SQL, and
Oracle.
D. It features APIs for C++ and
.NET.

What is meant by data at rest? A data file that is not changing.


Spread data across a large cluster of computers.
Which two are the driving Run your programs on the nodes that have the
principles of MapReduce? data.

GraphX
Which spark RDD operation
creates a directed acyclic
graph through lazy
evaluations?

Fields must be positioned at a fixed offset from the


What is one disadvantage to beginning of the record.
using CSV formated data in a
Hadoop data store?
Parquet
Which two of the following are ORC
column-based data encoding
formats?
The data is spread out and replicated across the
Which statement describes the cluster.
action performed by HDFS
when data is written to the
Hadoop cluster?

Postgres RDBMS
Which component of the
apache ambari architecture
stores the cluster
configurations?

Specific rows and columns using a query.


Which two of the following can All rows of a table.

sqoop import from a relational


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 22/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
sqoop import from a relational
database?(select two)

Spark Streaming
Which component of the spark
unified stack provides
processing of data arriving at
the system in real-time?

It is used for provisioning,managing, and


Which statement describes the
monitoring.....
purpose of ambari?
Messages tweeted on twitter.
What are three examples of Web server logs.
Big Data?(select three) Photos posted on Insta.

Avro
Which of the following is a data
encoding format is a
compact,binary format that
supports interoperability with
multiple programming
languages and versioning?

Scala
What is the native
programming language for
spark?
Namenode
Which component of the HDFS
architecture manages the file
system namespace and
metadata?

Email address
Which two are examples of Medical record number
personally identifiable
information(PII)?(select two)
Lambda functions
What is the name of the scala
programming feature that
provides functions with no
names?

Non-conventional methods used by business and


Which statement describes Big organizations to capture, manage,process and
Data as it is used in the make sense of a large volume of data
modern business world?
manages storage and transmission of intermediate
Under the mapreduce v1 output.
architecture, which function is
performed by the tasktracker?

hdfs links the disks on multiple nodes into one large


Which statement is true about file system.
the hadoop distributed file
system(HDFS)?
Ambari
Which hortonworks data
l tf (HDP) t
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 23/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
platform(HDP) component
provides a common web user
interface for applications
running on a hadoop cluster?

REST APIs
Which feature allows
application developers to easily
use the ambari interface to
integrate hadoop provisioning,
management and monitoring
capabilities into their own
applications?

Actions
Which spark RDD operation
returns values after performing
the evaluations?
JobTracker
Under the mapreduce v1
architecture, which element of
mapreduce controls job
execution on multiple slaves?

It increases available processing power.


In a hadoop cluster,which two It adds capacity to the file system.
are the result of adding more
nodes to the cluster(select two)

Accepts mapreduce jobs submitted by clients.


Under the mapreduce v1
architecture, which function is
performed by the jobtracker?
Data stream management and processing.
What is the hortonworks
dataflow package used for?
zkCli.sh
What OS command starts the
zookeeper command-line
interface?
Kerberos
What is an authentication
mechanism in hortonworks
data platform?
Manage,secure and govern data stored across all
What is hortonworks dataplane storage environments.
services (DPS) used for?

log files.
What are three examples of cookies.
"Data Exhaust"?(select three) browser cache.

Is /
What ZK CLI command is used
to list all the ZNodes at the top
level of the zookeeper

hierarchy in the Zookeeper


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 24/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
hierarchy, in the Zookeeper
command-line interface?

CSV.
Which two of the following are Avro.
row-based data encoding
formats?(select two)
datanode
Which component of the HDFS
architecture manages storage
attached to the nodes?

Input is processed as individual splits.


Under the Mapreduce v1
programming model, what
happens in the Map step?
processing large volumes of data with high
Which two descriptions are throughput.
advantages of Hadoop?(select able to use inexpensive commodity hardware.
two)
$$
What must surround LaTeX
code so that it appears on its
own line in a jupyter notebook?

NumPy
What python package has
support for linear
algebra,optimization,
mathematical integration and
statistics?

import
What python statement is used
to add a library to the current
code cell?
Data modeling.
Which areas of expertise are Machine learning.
attributed to a data scientist?
(select two)
Substantive expertise.
Which three main areas make Math and statistics knowledge.
up data science according to Hacking skills.
drew conway?(select three)
String
Which data type can cause
significant performance
degradation and should be
avoided?

LOAD
Which command is used to
populate a big sql table?
Parquet
Which file format has the
highest performance?

Which two of the following data Oracle


Teredata
tl
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 25/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
sources are currently
supported by Big sql?
Command line.
Which two options can be used Ambari web interface.
to start and stop Big Sql?

Apache HIVE
Which type of foundation does
Big sql build on?
SMALLINT
Which data type is boolean
defined as in a Big sql
database?
The data is not human readable.
Which statement describes a
sequence file?
CREATE NICKNAME
Which command would you
run to make a remote table
accessible using an alias?
Apache Ranger
Which tool should you use to
enable Kerberos security?
Delimited
Which file format contains
human-readable data where
the column values are
seperated by a comma?

User-Defined
Which type of function
promotes code re-use and
reduces query complexity?
EXTERNAL
You need to create a table that
is not managed by the big sql
database manager. Which
keyword would you use to
create the table?

Impersonation
Which feature allows the bigsql
user to securely access data in
hadoop on behalf of another
user?

Apache Ranger
You need to monitor and
manage data security across a
Hadoop platform.Which tool
would you use?

Python
You can import preinstalled R
libraries if you are using which
languages?(select two)

When sharing a notebook, The permalink


what will always point to the

most recent version of the


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 26/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
most recent version of the
notebook?
IBM Cloud
Where must a spark
configuration be set up first?
Spark service
When creating a Watson
Studio project, what do you
need to specify?
PixieDust
Which visualization library is
developed by IBM as an add-
on to Python notebooks?
Collaborators
Who can access your data or
notebooks in your watson
studio project?
collaborators
Who can control watson studio
project assets?
import
what python statement is used
to add a library to the current
code cell?
***
what can be used to surround
a multi-line string in a python
code cell by appearing before
and after the multi-line string?

http://localhost:8080/
What is the default web
location for a local jupyter
instance
Jobtracker
Under the mapreduce v1
architecture, which element of
the system manages the map
and reduce functions?

Copy any appropriate JDBC driver JAR


What must be done before to$SQOOP_HOME/lib.
using sqoop to import from a
relational database?
REST APIs

Which feature allows


application developers to easily
use the ambari interface to
integrate to hadoop
provisioning, management,
and monitoring capabilites into
their own application?

Which statement is true about HDFS links the disks on multiple nodes into one
the hadoop distributed file large file system

system(hdfs)?
https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 27/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
system(hdfs)?
Scala.
Which two spark libraries Python.
provide a native shell?(select
two)
Data munging

What is the term for the


process of converting data
from one "raw" format to
another format making it more
appropriate and valuable for a
variety of downstream
purposes such as analytics
and that allows for efficient
consumption of the data?

It is a distributed collection of elements that are


Which statement is true about parallelized across the cluster.
spark's resilient distribution
dataset(RDD)?
MLlib
Which component of the spark
unified stack supports learning
algorithms such as, logistic
regression, naive Bayes
classification and SVM?

Ambari
Which Hortonworks Data
Platform(HDP) component
provides a common web user
interface for applications
running on a hadoop cluster?

Files are split into blocks.


Which two are features of the There are at least 3 replicas of each unit of data.
hadoop distributed file
system(hdfs)?(select two)
Zookeeper_home
Which environmental variable
needs to be set to properly
start zookeeper?
Value
Which of the FIve V's of Big
data describes the real
purpose of deriving business
insight from big data?

Lambda functions
What is the name of the scala
programming feature that
provides functions with no
names?

NameNode
Which component of the HDFS

architecture manages the file


https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 28/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet
architecture manages the file
system namespace and
metadata?
aid in the high availability of resource manager.
How does Mapreduce use
zookeeper?
MapReduce
Which element of hadoop is
responsible for spreading data
across the cluster?
The hive metastore
In big sql, what is used for
table definitions, location, and
storage format of input files?

Which type of foundation does Apache HIVE


big sql build on?
bigsql
The Big sql head node has a
set of processes running. what
is the name of the service ID
running these processes?

CREATE WRAPPER

You need to define a server to


act as the medium between an
application and a data source
in a Big Sql federation. Which
command would you use?

Kerberos

Which Big Sql authentication


mode is designed to provide
strong authentication for
client/server applications by
using secret-key cryptography?

The Hive metastore


In Big sql, what is used for
table definitions,locations, and
storage format of input files?

Apache ranger
You need to monitor and
manage data security across a
hadoop platform. which tool
would you use?

Collect and send data into a stream.


What is the primary purpose of
Apache NiFi?
SQL
What is the default data format
Sqoop parses to export data to
a database?

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 29/30
21/06/2022 16:49 Big data engineer ibm exploree Cartes | Quizlet

Under the HDFS storage 3 replicas, 2 on the same rack, 1 on a different rack
model, what is the default
method of replication?
Quick data exploration tasks that can be
For what are interactive reproduced
notebooks used by data
scientists?
facilitates sql based queries
Which is the primary
advantage of using column-
based data formats over
record-based formats?

What is the default number of


rows sqoop will export per
transaction?

https://quizlet.com/in/558154874/big-data-engineer-ibm-exploree-flash-cards/ 30/30

You might also like