You are on page 1of 28

C2090-101

Number: C2090-101
Passing Score: 800
Time Limit: 120 min
File Version: 1

C2090-101

https://www.gratisexam.com/

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Exam A

QUESTION 1
Which statement is TRUE concerning optimizing the load performance?

https://www.gratisexam.com/

A. You can improve the performance by increasing the number of map tasks assigned to the load
B. When loading large files the number of files that you load does not impact the performance of the LOAD HADOOP statement
C. You can improve the performance by decreasing the number of map tasks that are assigned to the load and adjusting the heap size
D. It is advantageous to run the LOAD HADOOP statement directly pointing to large files located in the host file system as opposed to copying the files to the DFS
prior to load

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.3/com.ibm.swg.im.bigsql.doc/doc/bigsql_loadperf.html

QUESTION 2
Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?

A. InfoSphere Streams can both read from and write data to HDFS
B. The Streams Big Data toolkit operators that interface with HDFS uses Apache Flume to integrate with Hadoop
C. Streams applications never need to be concerned with making the data schemas consistent with those on Hadoop
D. Big SQL can be used to preprocess the data as it flows through InfoSphere Streams before the data lands in HDFS

Correct Answer: D
Section: (none)
Explanation

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation/Reference:

QUESTION 3
Which of the following is TRUE about storing an Apache Spark object in serialized form?

A. It is advised to use Java serialization over Kryo serialization


B. Storing the object in serialized from will lead to faster access times
C. Storing the object in serialized from will lead to slower access times
D. All of the above

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://spark.apache.org/docs/latest/rdd-programming-guide.html

QUESTION 4
Which ONE of the following statements regarding Sqoop is TRUE?

A. HBase is not supported as an import target


B. Data imported using Sqoop is always written to a single Hive partition
C. Sqoop can be used to retrieve rows newer than some previously imported set of rows
D. Sqoop can only append new rows to a database table when exporting back to a database

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

QUESTION 5
Which of the following statements regarding Big SQL is TRUE?

A. Big SQL doesn’t support stored procedures


B. Big SQL can be deployed on a subset of data nodes in the BigInsights cluster

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
C. Big SQL provides a SQL-on-Hadoop environment based on map reduce
D. Only tables created or loaded via Big SQL can be accessed via Big SQL

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://books.google.com.pk/books?id=t13nCQAAQBAJ&pg=PA3&lpg=PA3&dq=Big+SQL+can+be+deployed+on+a+subset+of+data+nodes+in+the
+BigInsights+cluster&source=bl&ots=RBbad0Xkel&sig=pMgmgDNLGUrkvOSXoVBj64xTMgk&hl=en&sa=X&redir_esc=y#v=onepage&q=Big%20SQL%20can%
20be%20deployed%20on%20a%20subset%20of%20data%20nodes%20in%20the%20BigInsights%20cluster&f=false

QUESTION 6
The number of partitions created by DynamicPartitions in Hive can be controlled by which of the following?

A. hive.exec.max.dynamic.partitions.pernode
B. hive.exec.max.dynamic.partitions
C. hive.exec.max.created.files
D. All of the above

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://resources.zaloni.com/blog/partitioning-in-hive

QUESTION 7
Which of the following Jaq operators groups one or more arrays based on key values and applies an aggregate expression?

A. join
B. group
C. expand
D. transform

Correct Answer: B
Section: (none)
Explanation

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation/Reference:
Reference: https://books.google.com.pk/books?id=Qj-5BQAAQBAJ&pg=PA174&lpg=PA174&dq=Jaq+operators+groups+one+or+more+arrays+based+on+key
+values+and+applies+an+aggregate
+expression&source=bl&ots=zobr8AZzWy&sig=ZRCIH9ee4Un3Aam1hX8TzxfrfQI&hl=en&sa=X&redir_esc=y#v=onepage&q=Jaq%20operators%20groups%20one
%20or%20more%20arrays%20based%20on%20key%20values%20and%20applies%20an%20aggregate%20expression&f=false

QUESTION 8
Which of the following are CRUD operations available in HBase? (Choose two.)

A. HTable.Put
B. HTable.Read
C. HTable.Delete
D. HTable.Update
E. HTable.Remove

Correct Answer: AC
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.tutorialspoint.com/hbase/hbase_client_api.htm

QUESTION 9
A Resilient Distributed Dataset supports which of the following?

A. Creating a new dataset from an old one


B. Returning a computed value to the driver program
C. Both “Creating a new dataset from an old one” and “Returning a computed value to the driver program”
D. Neither “Creating a new dataset from an old one” nor “Returning a computed value to the driver program”

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://spark.apache.org/docs/latest/rdd-programming-guide.html (RDD operations)

QUESTION 10
In order for an SPSS Modeler stream to be incorporated for use in an InfoSphere Streams application leveraging SPSS Modeler Solution Publisher, you need to:

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
A. add a Type node
B. insert any Output node
C. add a Table node as the terminal node
D. Make the terminal node a scoring branch

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:

QUESTION 11
Which of the following Hive data types is directly supported in Big SQL without any changes?

A. INT
B. STRING
C. STRUCT
D. BOOLEAN

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.1/com.ibm.swg.im.bigsql.dev.doc/doc/biga_numbers.html

QUESTION 12
The GPFS implementation of Data Management API is compliant to which Open Group storage management Standard?

A. XSH
B. XBD
C. XDSM
D. X / Open

Correct Answer: C
Section: (none)

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs400.doc/bl1dmp_intro.htm

QUESTION 13
When we create a new table in Hive, which clause can be used in HiveSQL to indicate the storage file format?

A. SAVE AS
B. MAKE AS
C. FORMAT AS
D. STORED AS

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:

QUESTION 14
Given a file named readme.txt, which command will copy the readme.txt file to the <user> directory on the HDFS?

A. hadoop fs –cp readme.txt hdfs://test.ibm.com:9000/<user>


B. hadoop fs –cp hdfs://test.ibm.com:9000/<user> readme.txt
C. hadoop fs –put readme.txt hdfs://test.ibm.com:9000/<user>
D. hadoop fs –put hdfs://test.ibm.com:9000/<user> readme.text

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:

QUESTION 15
Which of the following is the most effective method for improving query performance on large Hive tables?

A. Indexing

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
B. Bucketing
C. Partitioning
D. De-normalizing data

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://dzone.com/articles/how-to-improve-hive-query-performance-with-hadoop

QUESTION 16
Which one of the following is NOT provided by the SerDe interface?

A. SerDe interface has to be built using C or C++ language


B. Allows SQL-style queries across data that is often not appropriate for a relational database
C. Serializer takes a Java object that Big SQL has been working with, and turns it into a format that BigSQL can write to HDFS
D. Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Big SQL can manipulate

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:

QUESTION 17
Which of the following are capabilities of the Apache Spark project?

A. Large scale machine learning


B. Large scale graph processing
C. Live data stream processing
D. All of the above

Correct Answer: B
Section: (none)
Explanation

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation/Reference:
Reference: https://spark.apache.org/

QUESTION 18
Which of the following techniques is NOT employed by Big SQL to improve performance?

A. Query Optimization
B. Predicate Push down
C. Compression efficiency
D. Load data into DB2 and return the data

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSZLC2_7.0.0/com.ibm.commerce.developer.soa.doc/refs/rsdperformanceworkspaces.htm

QUESTION 19
When embedding SPSS models within InfoSphere Streams, what SPSS product must be installed on the same machine with InfoSphere Streams?

A. SPSS Modeler
B. SPSS Solution Publisher
C. SPSS Accelerator for InfoSphere Streams
D. None, the SPSS software runs remotely to the Streams machine

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:

QUESTION 20
Which of the following statements regarding Sqoop is TRUE? (Choose two.)

A. All columns in a table must be imported


B. Sqoop bypasses MapReduce for enhanced performance

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
C. Each row from a source table is represented as a separate record in HDFS
D. When using a password file, the file containing the password must reside in HDFS
E. Multiple options files can be specified when invoking Sqoop from the command line

Correct Answer: CE
Section: (none)
Explanation

Explanation/Reference:
Reference: https://data-flair.training/blogs/apache-sqoop-tutorial/

QUESTION 21
Use of Bulk Load in HBase for loading large volume of data will result in which of the following?

A. It will use less CPU but will use more network resource
B. It will use less network resource but more CPU
C. It will behave same way as using HBase API for loading large volume of data
D. None of the above

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:

QUESTION 22
Extracting structured data from various database into a “sandbox” location without writing code can be performed using which tool include with BigInsights?

A. Flume
B. Data Click
C. DataStage
D. Big SQL Load

Correct Answer: A
Section: (none)
Explanation

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation/Reference:

QUESTION 23
Which of the following is TRUE about a Resilient Distributed Dataset?

A. It is always mutable
B. It is always immutable
C. It can be configured to be either mutable or immutable
D. It can be changed from mutable to immutable state during its life cycle

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: http://beyondcorner.com/learn-apache-spark/spark-rdd-resilient-distributed-datasets/

QUESTION 24
How are insights derived from Big Match moved to an MDM system?

A. Extract insights from HBase and load into MDM through an API call
B. Extract insights from Hive and load into MDM using standard tooling
C. Extract insights from HDFS and load into MDM by stimulating delta load
D. Extract insights from HBase and load into MDM using standard MDM batch processing tool

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:

QUESTION 25
Which of the following statements about MapReduce is true?

A. MapReduce source programs must be written in Java


B. The output from MapReduce is one or more files stored in the DFS

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
C. MapReduce programs always have four phases: Mapper, Shuffle, Combiner, and Reducer
D. Intermediate files, sent from Map tasks to Reduce tasks, are replicated with the number of copies equal to the number of Reducers

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:

QUESTION 26
Consider the following query:

curl “http://localhost:8983/sorl/gettingstarted/select?wt=json&indent=true&q=foundation&fl=id

What is the restricted field?

A. id
B. json
C. indent
D. foundation

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://lucene.apache.org/solr/6_3_0/quickstart.html

QUESTION 27
Which tool below can be used for extracting data directly from an RDBMS and placing a copy within BigInsights as a ready-to-query table?

A. Flume
B. Sqoop
C. NZ Load
D. Distributed Copy

Correct Answer: B
Section: (none)
https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/da/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.import.doc/doc/data_warehouse_sqoop.html

QUESTION 28
Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?

A. InfoSphere Streams utilized Spark Streaming to interface to Hadoop


B. The HDFS operators in InfoSphere Streams use Hadoop Java APIs to access HDFS or GPFS
C. Incoming streams from InfoSphere Streams into Hadoop are not buffered to ensure low latency
D. Since InfoSphere Streams processes data in Streams tuple format, Hadoop must store the data in this format

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJU_4.0.1/com.ibm.streams.toolkits.doc/doc/tk$com.ibm.streamsx.hdfs/
tk$com.ibm.streamsx.hdfs.html

QUESTION 29
What is a method for loading RDBMS data into an HBase table?

A. HDFS LOAD
B. SQOOP IMPORT
C. LOAD HADOOP USING
D. Hadoop jar hbase-VERSION.jar importtsv

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:

QUESTION 30
What is the primary purpose of Flume in the Hadoop ecosystem?

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
A. To stream data from Hadoop
B. To move static files from the local file system into HDFS
C. To import data from a relational database or data warehouse into HDFS
D. To capture log data as it is written to log files and move them into HDFS

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://blogs.apache.org/flume/entry/flume_ng_architecture

QUESTION 31
For a customer satisfaction application, a large eCommerce company is loading click stream data into MySQL and running Cognos Reports which are accessed by
50 plus customer service reps (CSR). The IT team decided to move the application to the new BigInsights-based Hadoop cluster and load the data into a Hive
Warehouse and run the queries. Which one of the following would enable them to reuse the SQL queries without extensive rewrite?

A. Pig
B. NoSQL
C. HiveQL
D. Big SQL

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:

QUESTION 32
Which of the following statements is TRUE regarding search visualization with Apache Hue?

A. Hue utilizes Java libraries which must be installed on your system


B. The Hue Beeswax application enables you to perform queries on HBase
C. For optimal performance, the Hue Server should be one of the nodes within your Hadoop cluster
D. The Hue Sqoop UI allows transferring data from a relational database to Hadoop, but not from Hadoop to a relational database

Correct Answer: D

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Section: (none)
Explanation

Explanation/Reference:
Reference: https://gethue.com/move-data-in-out-your-hadoop-cluster-with-the-sqoop/

QUESTION 33
When creating a configuration file for a Flume agent, which of the following must be configured?

A. An interceptor
B. A database configuration file
C. A source, a sink, and a channel
D. All of the above

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Explanation:
Flume agents consist of three elements: a source, a channel, and a sink. The channel connects the source to the sink. You must configure each element in the
Flume agent. Different source, channel, and sink types have different configurations

Reference: https://www.ibm.com/support/knowledgecenter/SSCKRH_1.0.3/alerts/c_fcai_modflumeagents.html

QUESTION 34
When loading data into Big SQL, which statement is TRUE concerning the underlying storage mechanisms supported?

A. Big SQL supports .DB2 files


B. Big SQL supports Parquet files
C. Big SQL natively supports XML file format
D. Big SQL supports file stored in PDF format

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSCRJT_5.0.3/com.ibm.swg.im.bigsql.doc/doc/biga_fileformats.html

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
QUESTION 35
Which source operator detects SPSS Collaboration and Deployment Services notification events for a specific SPSS Modeler file and downloads the indicated file
version for the refreshed scoring branch?

A. SPSSPublish operator
B. SPSSScoring operator
C. SPSSModeler operator
D. SPSSRepository operator

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SS69YH_8.1.0/cads_infosphere_ddita/cads/infosphere/is_overview.html

QUESTION 36
Which is a benefit of row oriented table design?

A. When writing a new row, if all of the row data is supplied at the same time the entire row can be written with a single disk seek
B. When columns of a single row are required at the same time, the entire row can be retrieved with a single disk seek regardless of row size
C. When new values of a column are supplied for all rows at once, that column data can be written efficiently and replace old column data without touching any
other columns for the rows
D. When an aggregate needs to be computed over many rows but only a notably smaller subset of all columns of data, reading that smaller subset of data can be
faster than reading all data

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: http://www.ijoart.org/docs/Column-Oriented-Databases-to-Gain-High-Performance-for-Data-Warehouse-System.pdf (7)

QUESTION 37
What Redaction feature needs to be selected when manually redacting a form through the Optim Review Tool?

A. Text Redaction
B. Image Redaction
C. Region Redaction

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
D. Redact by Information Type

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:

QUESTION 38
Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?

A. InfoSphere Streams utilizes Flume to interface to Hadoop


B. The HDFSFileSink operator writes files in parallel to a Hadoop Distributed File System
C. Buffering techniques are used to process incoming streams from InfoSphere Streams
D. When you use the HDFS operators to access GPFS, you must install InfoSphere Streams on an InfoSphere Big Insights data node

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://books.google.com.pk/books?id=JWfRAgAAQBAJ&pg=PA147&lpg=PA147&dq=IBm+Buffering+techniques+are+used+to+process+incoming
+streams+from+InfoSphere+Streams&source=bl&ots=Z6XhA0-
Owk&sig=ACfU3U3T9ydrZHWTMvB31qQOyf6FtoDQgw&hl=en&sa=X&ved=2ahUKEwj6wsnvqvfoAhVUSxUIHWNDAycQ6AEwA3oECBEQAQ#v=onepage&q=IBm
%20Buffering%20techniques%20are%20used%20to%20process%20incoming%20streams%20from%20InfoSphere%20Streams&f=false

QUESTION 39
Which BigInsights components are essential for Big Match operations?

A. Social Data Analytics (SDA)


B. HDFS, HBase, and BigSheets
C. HDFS and HBase
D. MapReduce framework, BigInsights cluster management

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Reference: https://www.ibm.com/support/knowledgecenter/SSWSR9_11.3.0/com.ibm.swg.im.mdmhs.pmebi.doc/topics/pme_bi_architecture.html

QUESTION 40
Which statement is TRUE when loading data into Hadoop and creating Big SQL tables?

A. It is optional to have INSERT privileges granted to LOAD into a table with the APPEND option
B. You can either have INSERT or DELETE privileges granted to LOAD into a table with the OVERWRITE
C. Authentication information is not necessary when you connect to a secured InfoSphere BigInsights cluster
D. By using the LOAD HADOOP USING command, you can import data from external data sources into target Big SQL tables

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.1/com.ibm.swg.im.bigsql.db2biga.doc/doc/biga_load_from.html

QUESTION 41
For what purpose SPSS models are embedded within InfoSphere Streams application?

A. To provide high availability


B. To score streaming data using existing models
C. To create new models based on streaming data
D. To ingest and parse binary and other complex data types

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/developerworks/data/tutorials/dm-1109spssscoringinfospherestreams1/dm-1109spssscoringinfospherestreams1-pdf.pdf

QUESTION 42
Which of the following is most commonly used by Hadoop to move data between clusters?

A. Pig
B. FTP
C. JAQL
D. distcp

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://developer.ibm.com/hadoop/2016/02/05/fast-can-data-transferred-hadoop-clusters-using-distcp/

QUESTION 43
A large Telecom company wants to store data from multiple databases into Hadoop. They plan to do bulk loads of data into Hadoop and run analytical queries.
Which data store would be ideal for this scenario?

A. Hive
B. HBase
C. BigSheets
D. Apache Spark

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://developer.ibm.com/recipes/tutorials/big-data-and-hadoop-on-ibm-cloud/

QUESTION 44
Which of the following is not a data-processing operations that is supported in Pig Latin?

A. filter
B. joins
C. group by
D. logistic regression

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://pig.apache.org/docs/r0.15.0/basic.html

QUESTION 45

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Which expression is making proper use of the "LOAD" function in Pig Latin?

A. A = LOAD data -> name,age,gpa


B. A = LOAD('data',[name, age, gpa]);
C. A = LOAD 'data' -> {name:chararray, age:int, gpa:float};
D. A = LOAD 'data' AS (name:chararray, age:int, gpa:float);

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://pig.apache.org/docs/r0.17.0/basic.html

QUESTION 46
The input to the Reduce function is __________.

A. Sorted by key
B. Sorted by value
C. Grouped by mapper
D. Shuffled into random order

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSZUMP_7.3.0/foundations_sym/symphony_developer_objects.html

QUESTION 47
What is a standard method of monitoring sensitive data in Hadoop?

A. Use any database monitoring tool


B. Use HBASE GET or hadoop fs -cat commands
C. Use STAP on each Hadoop Node
D. All of the above

Correct Answer: C
Section: (none)

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/developerworks/data/library/techarticle/dm-1210bigdatasecurity/index.html

QUESTION 48
To improve Big SQL query performance one should specify the INPUTFORMAT and OUTPUTFORMAT during which of the following?

A. SELECT statements
B. When the file is initially stored in HDFS
C. CREATE and ALTER table statements
D. While the files are accessed in your program

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Explanation:
The Hadoop environment within the IBM® InfoSphere BigInsights Version 3.0 can read a large number of storage formats. This flexibility is partially because of the
INPUTFORMAT and OUTPUTFORMAT classes that you can specify on the CREATE and ALTER table statements and because of the use of installed and
customized SerDe classes. The file formats listed here are available either by using explicit SQL syntax, such as STORED AS PARQUETFILE, or by using installed
interfaces, such as Avro.

Reference: https://developer.ibm.com/hadoop/2014/09/19/big-sql-3-0-file-formats-usage-performance/

QUESTION 49
Which Big SQL file format could be expected to result in a longer running query?

A. Text
B. Avro
C. Parquet
D. Sequence_text

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/developerworks/data/library/techarticle/dm-1510-parquet-big-sql/index.html

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
QUESTION 50
Which ONE of the following statements regarding Sqoop is TRUE?

A. By default, data is compressed with Sqoop


B. Sqoop can only read committed transactions from a source database, not uncommitted ones
C. When performing parallel imports, Sqoop always uses the primary key column in a table as the splitting column
D. When performing parallel imports, each degree of parallelism corresponds to a concurrent database connection

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Explanation:
By default, Sqoop uses the read committed transaction isolation in the mappers to import data. This may not be the ideal in all ETL workflows and it may desired to
reduce the isolation guarantees. The --relaxed-isolation option can be used to instruct Sqoop to use read uncommitted isolation level.

Reference: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

QUESTION 51
Which of the following must happen before the Big SQL EXPLAIN command can execute?

A. Run the ANALYZE command


B. Set the COMPATIBILITY_MODE global variable
C. Execute the SET HADOOP PROPERTY command
D. Call the SYSPROC.SYSINSTALLOBJECTS procedure

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://developer.ibm.com/hadoop/docs/biginsights-ibm-open-platform/getting-started/tutorials/big-sql-hadoop-tutorial/getting-started-with-big-sql-4-0-
lab-6-understanding-and-influencing-data-plans/

QUESTION 52
In which of the following the Flume open-source software is written?

A. C / C++

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
B. Primarily Java
C. Python and related tools
D. A combination of C++ and Perl

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://en.wikipedia.org/wiki/Apache_Flume

QUESTION 53
What keyword is used when loading in Hive from single file on the local file system?

(Hint: If we leave off this keyword with the LOAD command, Hive assumes the location we are referring to is on HDFS)

A. LOCAL
B. INSERT
C. DIRECTORY
D. OVERWRITE

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://books.google.com.pk/books?id=i6NODQAAQBAJ&pg=PA426&lpg=PA426&dq=keyword+is+used+when+loading+in+Hive+from+single+file+on
+the+local+file+system&source=bl&ots=2Gr_lfxt-
C&sig=ACfU3U2VvuP0cOVrf2jyPFyDmwmuXRpbvw&hl=en&sa=X&ved=2ahUKEwjV_eaVxvfoAhUP2KQKHZwtBPsQ6AEwA3oECBYQNg#v=onepage&q=keyword
%20is%20used%20when%20loading%20in%20Hive%20from%20single%20file%20on%20the%20local%20file%20system&f=false

QUESTION 54
Which component of BigInsights is able to mask data items so as restrict viewing of sensitive data?

A. Flume
B. HDFS
C. Oozie
D. Big SQL

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://developer.ibm.com/hadoop/docs/biginsights-ibm-open-platform/biginsights-value-add/big-sql/maximize-hadoop-data-security-ibm-infosphere-
biginsights/

QUESTION 55
When indexing a Hive Table, which of the following is TRUE?

A. Hive tables do not support indexes


B. It increases query speed without the need for additional disk space
C. It does not increase query speed but makes data insert/delete/update faster
D. It increases query speed but requires additional processing time for data insert/update/delete and needs more disk space

Correct Answer: B
Section: (none)
Explanation

Explanation/Reference:
Reference: https://acadgild.com/blog/indexing-in-hive

QUESTION 56
Which of the following statements is TRUE regarding the use of Data Click to load data into BigInsights?

A. Data Click uses Map Reduce to create a Hive table


B. Only relational databases are supported as sources for Data Click
C. Only whole tables can be moved from a relational database source, not individual columns within the table
D. If the metadata that you specify in InfoSphere Data Click activities changes, you need to reimport the metadata by using InfoSphere Metadata Asset Manager

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.dataclick.doc/topics/createblueprint.html

QUESTION 57
The Distributed File Copy application copies files to and from a remote source to the InfoSphere BigInsights distributed file system by using data movement options.

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Which of the following is the complete list of options for moving data with Distributed File Copy?

A. File Transfer Protocol (FTP) and SFTP


B. Hadoop Distributed File System (HDFS) and FTP
C. General Parallel File System (GPFS), FTP, and SFTP
D. Hadoop Distributed File System (HDFS), FTP, SFTP, GPFS, and also copy files to and from your local file system

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/en/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Less_imp_distcopy.html

QUESTION 58
Which of the following is NOT TRUE regarding InfoSphere Streams composite operators that are annotated as consistent regions?

A. Disconnected subgraphs within the composite participate in the consistent region


B. The composite operator and the reachability graph of the composite operator participate in a single consistent region
C. Merging composite operators into a single consistent region can yield unexpected results and is therefore not recommended
D. Operators outside of the composite may participate in the consistent region if they are within the composite's reachability graph

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.dev.doc/doc/consistentcompositeoperators.html

QUESTION 59
Suppose that you have some log files that you need to load into HBase.

What tool could you use to perform a bulk load of the log files?

A. Import
B. Fastload
C. ImportTsv
D. None of the above

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSWSR9_11.6.0/com.ibm.swg.im.mdmhs.pmebi.doc/topics/running_automatic.html

QUESTION 60
One option to import data to your BigInsights cluster is to use Sqoop.

Which of the following examples shows a data exchange with a DB2 database by using the built-in DB2 connector?

A. $SQOOP_HOME/bin/sqoop export --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/
dataFile.csv
B. $SQOOP_HOME/bin/sqoop import -connect jdbc:db2://db2.my.com:50000/testdb -username db2user -password db2pwd -table db2tbl -split-by tbl_primarykey -
target-dir sqoopimports
C. $SQOOP_HOME/bin/sqoop import --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --password db2pwd --table db2tbl --split-by
tbl_primarykey --target-dir sqoopimports
D. $BIGINSIGHTS_HOME/bin/sqoop import --connect jdbc:db2://db2.my.com:50000/testdb -- username db2user --password db2pwd --table db2tbl --split-by
tbl_primarykey --target-dir sqoopimports

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.import.doc/doc/data_warehouse_sqoop.html

QUESTION 61
You have social media data in a plain text file. You want to figure out the number of times the word "Watson" is mentioned in it.

Which of the following steps will correctly allow you to visualize that data?

A. Create a master workbook, add a filter sheet, run the workbook, add chart
B. Create a master workbook, create a child workbook, add a filter sheet, add chart
C. Create a master workbook, edit the workbook, add a filter sheet, run the workbook, add chart
D. Create a master workbook, create a child workbook, add a filter sheet, run the workbook, add chart

Correct Answer: C

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
Section: (none)
Explanation

Explanation/Reference:

QUESTION 62
A Resilient Distributed Dataset is a good solution for which of the following?

A. A search
B. An SQL based data transformation
C. A high throughput write oriented use cases
D. An interactive query and iterative algorithms

Correct Answer: D
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf

QUESTION 63
Which of the following is NOT TRUE regarding the Job Control Plane operator?

A. User-defined checkpoint logic can be coded in the logic section


B. It must be added to each application that includes a consistent region
C. It controls the notifications to set of operators that are included in a consistent region
D. It has no input or output ports and appears as a stand-alone operator in an application graph

Correct Answer: A
Section: (none)
Explanation

Explanation/Reference:
Reference: https://www.ibm.com/support/knowledgecenter/SSCRJU_4.3.0/com.ibm.streams.dev.doc/doc/jobcontrolplane.html

QUESTION 64
Which one of the following file formats is optimal for querying tables with many columns and performing aggregation operations such as SUM() and AVG()?

A. Text

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2
B. Avro
C. JSON
D. PARQUET

Correct Answer: C
Section: (none)
Explanation

Explanation/Reference:

https://www.gratisexam.com/
885CB989129A5F974833949052CFB2F2

You might also like