Professional Documents
Culture Documents
Data Science and Analytics (Dr. Vishwanath Karad MIT World Peace University)
Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip,
LZO, etc. Different algorithms can be used in different scenarios based on then- capabilities.
Explanation: The Hadoop framework itself is mostly written in the Java programming language,
with some native code in C and command line utilities written as shell-scripts.
Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data sets
reliably, and to stream those data sets at high bandwidth to user
4. The Hadoop list includes the HBase database, the Apache Mahout system, and
matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool.
Big Data
5. Point out the correct statement:
a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real time data
c) In Hadoop programming framework output files are divided in to lines or records
d) None of the mentioned
View Answer
Answer: b
Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s
and 1000s.
6. Hadoop is a framework that works with a variety of related tools. Common cohorts include:
Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run
HBase and the other to run Hive.
7 .can best be described as a programming model used to develop Hadoop-based applications that
can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
d) All of the mentioned
View Answer
Answer: a
Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is used
for Data warehousing.
Explanation: Prism automatically replicates and moves data wherever it’s needed across a vast
network of computing facilities.
10.is a platform for constructing data flows for extract, transform, and load (ETL) processing and
analysis of large datasets.
a) Pig Latin
b) Oozie
c)Pig
d) Hive
View Answer
Answer: c
Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high- level
language for expressing data analysis programs.
Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data
summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop- compatible file
systems.
12. hides the limitations of Java behind a powerful and concise Clojure API for Cascading.
a) Scalding
b) HCatalog
c) Cascalog
d) All of the mentioned
View Answer
Answer: c
Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the
name “Cascalog” is a contraction of Cascading and Datalog.
Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute
Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation
commands, either through the AWS Web Console or through command-line tools.
14.is the most popular high-level Java API in Hadoop Ecosystem a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
View Answer
Explanation: Cascading hides many of the complexities of MapReduce programming behind more
intuitive pipes and data flow abstractions.
15.is general-purpose computing model and runtime system for distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned
View Answer
Answer: a
Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional
reporting to leading-edge machine learning algorithms.
16. The Pig Latin scripting language is not only a higher-level data flow language but also has
operators similar to :
a) SQL
b) JSON
c) XML
d) All of the mentioned
View Answer
Answer: a
Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL
and the low-level procedural style of MapReduce.
17. jobs are optimized for scalability but not latency.
a) Mapreduce
b) Drill
c) Oozie
d) Hive
View Answer
Answer: d
Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of
MapReduce.
18. A node acts as the Slave and is responsible for executing a Task assigned to it
by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
View Answer
Answer: c
Explanation: TaskTracker receives the information necessary for execution of a Task from
JobTracker, Executes the Task, and Sends the Results back to JobTracker.
19. function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
View Answer
Answer: a
Explanation: Reduce function collates the work and resolves the results.
20. Although the Hadoop framework is implemented in Java , MapReduce applications need not
be written in :
a) Java b)C c)C#
d) None of the mentioned
View Answer
Answer: a
21.is a utility which allows users to create and run jobs with any executables as the mapper and/or
the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop
distribution.
22. _________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
View Answer
Answer: a
Explanation: Maps are the individual tasks that transform input records into intermediate records.
23. The number of maps is usually driven by the total size of:
a) inputs
b) outputs
c) tasks
d) None of the mentioned
View Answer
Answer: a
Explanation: Total size of inputs means total number of blocks of the input files.
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called
getPartition to partition.
25. Mapper implementations are passed the JobConf for the job via the _______method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable
d) None of the mentioned
View Answer
Answer: b
Explanation: In Shuffle phase the framework fetches the relevant partition of the output of all the
mappers, via HTTP.
28. The output of the is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
Explanation: The output of the reduce task is typically written to the FileSystem. The output of the
Reducer is not sorted.
29. Which of the following phases occur simultaneously ? a) Shuffle and Sort
b) Reduce and Sort c) Shuffle and Map d) All of the mentioned View Answer
Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged.
30. Mapper and Reducer implementations can use the to report progress or just
indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer
Answer: c
Explanation: Reporter is a facility for MapReduce applications to report progress, set application-
level status messages and update Counters.
31.is a generalization of the facility provided by the MapReduce framework to collect data output
by the Mapper or the Reducer
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer
Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.
32.is the primary interface for a user to describe a MapReduce job to the Hadoop framework for
33. are highly resilient and eliminate the single-point-of-failure risk with traditional
Hadoop deployments
a) EMR
b) Isilon solutions
c) AWS
d) None of the mentioned
View Answer
Answer: b
Explanation: enterprise data protection and security options including file system auditing and
data-at-rest encryption to address compliance requirements is also provided by Isilon solution.
34. Which is the most popular NoSQL database for scalable big data store with
Hadoop ?
a) Hbase
b) MongoDB
c) Cassandra
d) None of the mentioned
View Answer
Answer: a
Explanation: HBase is the Hadoop database: a distributed, scalable Big Data store that lets you
host very large tables — billions of rows multiplied by millions of columns — on clusters built
with commodity hardware.
35. The can also be used to distribute both jars and native libraries for use in
the map and/or reduce tasks.
a) DataCache
c) DistributedCache
d) All of the mentioned
View Answer
Answer: c
Explanation: The child-jvm always has its current working directory added to the java.library.path
and LD_LIBRARY_PATH.
Explanation: Google Bigtable leverages the distributed data storage provided by the Google File
System.
Hadoop Streaming
Explanation: Place the generic options before the streaming options, otherwise the command will
fail.
38. Which of the following Hadoop streaming command option parameter is required
?
a) output directoryname
b) mapper executable
c) input directoryname
Explanation: Required parameters is used for Input and Output location for mapper.
39. To set an environment variable in a streaming command use:
a) -cmden EXAMPLE_DIR=/home/example/dictionaries/
b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/
c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/ d) -cmenv
EXAMPLE_DIR=/home/example/dictionaries/ View Answer
Answer: c
40. class allows the Map/Reduce framework to partition the map outputs based on
certain key fields, not the whole keys.
a) KeyFieldPartitioner
b) KeyFieldBasedPartitioner
c) KeyFieldBased
d) None of the mentioned
View Answer
Answer: b
Explanation: The primary key is used for partitioning, and the combination of the primary and
secondary keys is used for sorting.
41. Which of the following class provides a subset of features provided by the
Unix/GNU Sort?
a) KeyFieldBased
b) KeyFieldComparator
c) KeyFieldBasedComparator
d) All of the mentioned
View Answer
Answer: c
Explanation: Hadoop has a library class, KeyFieldBasedComparator, that is useful for many
applications.
c) Reduce
d) None of the mentioned
View Answer
Answer: b
Explanation: Aggregate provides a special reducer class and a special combiner class, and a list of
simple aggregators that perform aggregations such as “sum”, “max”, “min” and so on over a
sequence of values
Explanation: The map function defined in the class treats each input key/value pair as a list of
fields.
Introduction to HDFS
44. A serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication
View Answer
Answer: b
Explanation: All the metadata related to HDFS including the information about data nodes, files
stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
Explanation: Secondary namenode is used for all time availability and reliability.
Explanation: NameNode is aware of the files to which the blocks stored on it belong to.
48. Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b)
HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS
is suitable for storing data related to applications requiring low latency data access d) None of the
mentioned
View Answer
Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing
49. The need for data replication can arise in various scenarios tike :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned
View Answer
Answer: d
Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-
tolerance.
50.is the slave/worker node and holds the user data in the form of Data Blocks, a) DataNode
b) NameNode
c) Data block
d) Replication
View Answer
Answer: a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more
than one DataNode, with data replicated across them.
51. HDFS provides a command line interface called used to interact with
HDFS.
a) “HDFS Shell”
b) “FS Shell”
c) “DFS Shell”
d) None of the mentioned
View Answer
Answer: b
Explanation: The File System (FS) shell includes various shell-like commands that directly interact
with the Hadoop Distributed File System (HDFS).
52. HDFS is implemented in programming language.
a) C++
b)Java
c) Scala
d) None of the mentioned
Explanation: HDFS is implemented in Java and any computer which can run Java can host a
NameNode/DataNode on it.
Java Interface
53. The output of the reduce task is typically written to the FileSystem via :
a) OutputCollector
b) InputCollector
c) OutputCollect
d) All of the mentioned
View Answer
Answer: a
54. Applications can use the provided to report progress or just indicate that they
are alive.
a) Collector
b) Reporter
c) Dashboard
d) None of the mentioned
View Answer
Answer: b
Explanation: In scenarios where the application takes a significant amount of time to process
individual key/value pairs, this is crucial since the framework might assume that the task has timed-
out and kill that task.
Data Flow
55.is a programming model designed for processing large volumes of data in parallel by dividing
the work into a set of independent tasks.
a) Hive
b) MapReduce
c)Pig
d) Lucene
View Answer
57. The daemons associated with the MapReduce phase are and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) all of the mentioned
View Answer
Answer: a
Explanation: Map-Reduce jobs are submitted on job-tracker.
58. The default InputFormat is which treats each value of input a new value and
the associated key is byte offset.
a) TextFormat
b) TextlnputFormat
c) InputFormat
d) All of the mentioned
View Answer
Answer: b
Explanation: A RecordReader is little more than an iterator over records, and the map task uses
one to generate record key-value pairs.
60. Output of the mapper is first written on the local disk for sorting and ________ process,
a) shuffling
b) secondary sorting
c) forking
d) reducing
View Answer
Answer: a
Explanation: All values corresponding to the same key will go the same reducer.
Hadoop Archives
61. The guarantees that excess resources taken from a queue will be restored to
it within N minutes of its need for them.
a) capacitor
b) scheduler
c) datanode
d) none of the mentioned
View Answer
Answer: b
Explanation: Free resources can be allocated to any queue beyond its guaranteed capacity.
62.is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.
a) Flow Scheduler
b) Data Scheduler
c) Capacity Scheduler
d) None of the mentioned
View Answer
Answer: c
Explanation: The Capacity Scheduler supports for multiple queues, where a job is submitted to a
queue.
Data Integrity
63. The __________ machine is a single point of failure for an HDFS cluster.
64. The and the EditLog are central data structures of HDFS.
a) Dslmage
b) Fslmage
c) Fshnages
d) All of the mentioned
View Answer
Answer: b
Explanation: A corruption of these files can cause the HDFS instance to be non-functional.
66. HDFS, by default, replicates each data block times on different nodes and on at
least racks.
a) 3,2
b) 1,2
c) 2,3
d) All of the mentioned
View Answer
Answer: a
Explanation: HDFS has a simple yet robust architecture that was explicitly designed for data
67. The HDFS file system is temporarily unavailable whenever the HDFS is down.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
View Answer
Answer: b
Explanation: When the HDFS NameNode is restarted it recovers its metadata.
Serialization
71. The _______ method in the ModelCountReducer class “reduces” the values the mapper
collects into a derived value
a) count
b) add
c) reduce
d) all of the mentioned
View Answer
Answer: c
Explanation: In some case, it can be simple sum of the values.
Metrics in Hbase
74. You can delete a column family from a table using the method of
HBAseAdmin class.
a) delColumn()
Mapreduce Development-2
MapReduce Features-1
80.is a generalization of the facility provided by the MapReduce framework to collect data output
by the Mapper or the Reducer
a) OutputCompactor
b) OutputCollector
c) InputCollector
d) All of the mentioned
View Answer
Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.
83. Maximum virtual memory of the launched child-task is specified using : a) mapv
b) mapred
c) mapvim
d) All of the mentioned
View Answer
Answer: b
Explanation: Admins can also specify the maximum virtual memory of the launched childtask, and
any sub-process it launches recursively, using mapred.
84.is percentage of memory relative to the maximum heapsize in which map outputs may be
retained during the reduce.
a) mapred.job.shuffle.merge.percent
b) mapred.job.reduce.inputbuffer.percen
c) mapred.inmem.merge.threshold
d) io.sort.factor
View Answer
Answer: b
Explanation: When the reduce begins, map outputs will be merged to disk until those that remain
are under the resource limit this defines.
MapReduce Features-2
87. Jobs can enable task JVMs to be reused by specifying the job configuration :
a) mapred.j ob.recycle.j vm.num.tasks
b) mapissue.job.reuse.jvm.num.tasks
c) mapred.j ob.reuse.j vm.num.tasks
d) all of the mentioned
View Answer
Answer: b
Explanation: Many of my tasks had performance improved over 50% using
mapissue.job.reuse.jvm.num.tasks.
88. During the execution of a streaming job, the names of the parameters are
transformed.
a) vmap
b) mapvim
c) mapreduce
d) mapred
View Answer
Answer: d
Explanation: To get the values in a streaming job’s mapper/reducer use the parameter names with
Hadoop Cluster-2
91. Hadoop data is not sequenced and is in 64MB to 256 MB block sizes of delimited record
values with schema applied on read based on:
a) HCatalog
b) Hive
c) Hbase
d) All of the mentioned
Answer: a
Monitoring HDFS
92. For YARN, the Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
Explanation: All the metadata related to HDFS including the information about data nodes, files
stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows
information about the NameNode itself.
Answer: c
Explanation: Secondary namenode is used for all time availability and reliability.
96)For, the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned
97)Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b)
HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS
is suitable for storing data related to applications requiring low latency data access d) None of the
mentioned
Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing
the data on low cost commodity hardware while ensuring a high degree of faulttolerance.
98)The need for data replication can arise in various scenarios like :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned
Answer: d
Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-
tolerance.
99)During start up, the loads the file system state from the fsimage and the
edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented on any computer which can run Java can host a
NameNode/DataNode on it.
HDFS Maintenance
Explanation: Manager’s Service feature presents health and performance data in a variety of
formats.
Introduction to Pig
Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various
mode: Interactive and Batch Mode.
Explanation: You can run Pig in either mode using the “pig” command (the bin/pig Perl script) or
the “java” command (java -cp pig.jar ...)
105)Pig Latin statements are generally organized in one of the following ways :
a) A LOAD statement to read data from the file system
b) A series of “transformation” statements to process the data
Pig Latin
114)Which of the following operator is used to view the map reduce execution plans ?
a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN
Answer: d
Explanation: EXPLAIN displays execution plans
120)Which of the following command is used to show values to keys used in Pig ?
a) set
b) declare
c) display
d) All of the mentioned
Answer: a
Explanation: All Pig and Hadoop properties can be set, either in the Pig script or via the Grunt
command line.
121)Use the command to run a Pig script that can interact with the Grunt shell
(interactive mode).
a) fetch
b) declare
c) run
d) all of the mentioned
Answer: c
Explanation: With the run command, every store triggers execution.
a) exec
b) execute
c) error
d) throw
Answer: a
Explanation: With the exec command, store statements will not trigger execution; rather, the entire
script is parsed before execution starts.
124)Which of the following is correct syntax for parameter substitution using cmd ?
a) pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun] script b)
{%declare | %default] param_name param_value
c) {%declare | %default] param_name param_value cmd
Pig in practice
125)Pig Latin is and fits very naturally in the pipeline paradigm while SQL is
instead declarative.
a) functional
b) procedural
c) declarative
d) all of the mentioned
Answer: b
Explanation: In SQL users can specify that data from two tables must be joined, but not what join
implementation to use.
Introduction to Hive
126)Which of the following command sets the value of a particular configuration variable (key)?
a) set -v
b) set <key>=<value>
c) set
d) reset
Answer: b
Explanation: If you misspell the variable name, the CLI will not show an error.
a) Batch
b) Interactive shell
c) Multiple
Answer: b
HiveQL-2
129)Hive specific commands can be run from Beeline, when the Hive driver is used.
a) ODBC
b) JDBC
c) ODBC-JDBC
d) All of the Mentioned
Answer: b
Explanation: Hive specific commands are same as Hive CLI commands.
Introduction to HBase
135)The Server assigns regions to the region servers and takes the help of Apache
ZooKeeper for this task.
a) Region
b) Master
c) Zookeeper
d) All of the mentioned
Answer: b
Explanation: Master Server maintains the state of the cluster by negotiating the load balancing.
137)is an RPC framework that defines a compact binary serialization format used to persist data
139. You can delete a column family from a table using the method of
HBAseAdmin class.
a) delColumnO
b) removeColumnO
c) deleteColumnO
d) all of the mentioned
View Answer
Answer: c
Explanation: Alter command also can be used to delete a column family.
Answer: c
Explanation: The data storage will be in the form of regions (tables). These regions will be split up
and stored in region servers.
142. A server is a machine that keeps a copy of the state of the entire system
and persists this information in local log files.
a) Master
b) Region
c) Zookeeper
d) All of the mentioned
View Answer
Answer: c
Explanation: A very large Hadoop cluster can be supported by multiple ZooKeeper servers.
144. has a design policy of using ZooKeeper only for transient data a) Hive
b) hnphala
c) Hbase
d) Oozie
View Answer
Answer: c
Explanation: If the HBase’s ZooKeeper data is removed, only the transient operations are affected
- data can continue to be written and read to/from HBase.
4.1 .leverages Spark Core’s fast scheduling capability to perform streaming analytics.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
View Answer
Answer: b
Explanation: Spark Streaming ingests data in mini-batches and performs RDD transformations on
those mini-batches of data.
153 . GraphX provides an API for expressing graph computation that can model the
abstraction.
a) GaAdt
154 .Which of the following storage policy is used for both storage and compute ?
a) Hot
b) Cold
c) Warm
d) A11_SSD
Answer: a
Explanation: When a block is hot, all replicas are stored in DISK.
155 .Which of the following is used to list out the storage policies ?
a) hdfs storagcpolicics
b) hdfs storage
c) hd storagepolicies
d) all of the mentioned
Answer: a
Explanation: Arguments are none for the hdfs storagepolicies command.
156 .Which of the following statement can be used get the storage policy of a file or a directory ?
a) hdfs dfsadmin -getStoragePolicy path
b) hdfs dfsadmin -setStoragePolicy path policyName c) hdfs dfsadmin -listStoragePolicy path
policyName d) all of the mentioned
Answer: a
Explanation: refers to the path referring to either a directory or a file.
157 .Which of the following method is used to get user-specified job name ?
a) getJobName()
b) getJobState()
c) getPriorityO
d) all of the mentioned
Answer: a
Explanation: getPriorityO is used to get scheduling info of the job.
158 .The number of maps is usually driven by the total size of:
a) inputs
b) outputs
c) tasks
159.is a utility which allows users to create and run jobs with any executable as the mapper and/or
the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer: b
Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop
distribution.
160 .part of the MapReduce is responsible for processing one or more chunks of data and
producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer: a
Explanation: Map Task in MapReduce is performed using the Map() function.
161 .function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: Reduce function collates the work and resolves the results.
Answer: c
Explanation: The Scheduler is pure scheduler in the sense that it performs no monitoring or
tracking of status for the application
169 .Which of the following method add a path or paths to the list of inputs ?
a) setInputPaths()
b) add!nputPath()
c) setlhput()
d) none of the mentioned
Answer: b
Explanation: FilelnputFormat offers four static convenience methods for setting a JobConf’s input
paths.
170 .The split size is normally the size of an block, which is appropriate for most
applications.
a) Generic
b) Task
c) Library
d) HDFS
Answer: d
Explanation: FilelnputFormat splits only large files(Here “large” means larger than an HDFS
block).
18O.Mapper implementations are passed the JobConf for the job via the method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable
d) None of the mentioned
Answer: b
Explanation: JobConfigurable.configure method is overrided to initialize themselves.
1 A open source build tool simple build tools sequential build complex build script build tool
for scala project is tool tool
2 which are tuple? val exam=(l,l) val exam=(“one,”t val exam=(l,”ele none
wo”,’’three”) ment”,10.2)
3 take(n) top()
which of the following is
CountByValue 0 mapPartitionW
transformation? ithlndex()
14
15
Hive support Complex TRUE FALSE
Index type
16 -/hdfs -Is -/hadoop dfs - Hadoop -Is dfs -Is
What is the correct
/user/hduser/in Is /user/hduser/i /user/hduser/i
statement to access
put /user/hduser/i nput nput
HDFS from HIVE CU
nput
25 TRUE FALSE
In order to use bucketing in
hive session for we should
set the below parameter
SET
HTVE.ENFORCE.BUC
KETING=TRUE
29 Denormalization data in improve the Avoid multiple Avoid unrelated Avoid multiple
Hive only performance disk seekd and data disk seekd
improve the
performance
30 True False
Hive support external table
37 Point out the wrong To run PIG in local The DISPLAY All the listed
statement mode ,you need operator will To run PIG in option
access to a single display the result mapreduce mode
machine to your terminal ,you need access
screen to a Hadoop
cluster and HDFS
installation
41 Pig supports following type Inner join Left outer join All the listed
Right outer join
of joins options
46 Piglatin statements are ALO AD statement A series of A DUMP All the listed
generally organized in one read transformation statement to option
of
49 Point out the correct LoadMeta has Pig load/store API All the listed
statement method to convert is aligned with LoadPush has options
byte arrays to hadoop methods to
specific types InputFormat class
only operations from
pig runtime into
loader
implementatio ns
53 True False
Pig can be called from java
62 In the mapreduce
Framework,map and No,because the Yes, because in No,because the Yes, because the
reduce functions can be run output of the functional output of the map functional use
in any order reduce function is programming, the function is the KVP as input and
the input of the order of execution input of the output,order is not
map function is not important reduce function important
71 ------ map input key/value Mapper Reducer Both mapper and None
pairs to a set of reducer
intermediate key/value
pairs
82 Point out the wrong It is legal to set the The output of none
The MR
statement number of reducer map-tasks go
framework does
task to zero if no directly to the
not sort the map
reduction is desired FileSystem
outputs before
writing them out
to the
92 Correct syntax for pig {-param .{%declare | .{%declare | all the listed
parameter substitution param_name=p %default} %default} options
using cmd aram_value | - param_name param_namr
param_file param_value param_value cmd
file_name} [ -
debug | - dryrun]
scriot
101 Client reading the data get the data from get the block get only the block get both the data
from HDFS filesystem in the namenode location from the location from the and the block
Hadoop datanode namenode location from the
namenode
109 TEZ
Pig -z tez_local will enable mapreduce Local None f the option
__________ mode in
Pig
114 True about HDFS HDFS is based on HDFS is written Sits on top of
All the list options
Google file in java native file
120 Which of the following Zookeeper Oozie Ambari All of the options
components provides Coordinator
support for automatic
execution of the workflows
based on events and the
presence of system
resources
125 Which of the following SerDe can be A SerDe is a All of the options
SerDe is a library
statement is true about customized to mechanism that listed
for Serialization
SerDes in HIVE allow HIVE HIVE uses to
and
understand your Parse
131 YARN stands for Yahoo another Yet another Yet another
resource name resource resouce need Yahoo archived
negotiator resource name
135 1 2 3 4
What is the default
replication data factor
139 What are the complex Array Maps Structs All of the
data types Hive options
supports