You are on page 1of 69

1.

Question
Which of the follwing is a platform for analyzing large data sets that consists of a high-level language
for expressing data analysis programs
Pig Latin
Oozie
Pig
Hive
2. Question
Pig Latin scripting language is not only a higher-level data flow language but also has operators
similar to
JSON
SQL
XML
3. Question
Which of the following is data flow scripting language for analyzing unstructured data?
Mahoot
Hive
Pig
4. Question
Which of the following command is used to show values to keys used in Pig ?
Set
Declare
Display
5. Question
Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive
mode).
Fetch
Declare
Run
6. Question
Which of the following command can be used for debugging?

Exec
Execute
Error
Throw
7. Question
____________ method will be called by Pig both in the front end and back end to pass a unique
signature to the Loader.

relativeToAbsolutePath()
setUdfContextSignature()
getCacheFiles()
getShipFiles
8. Question
Which of the following is a framework for collecting and storing script-level statistics for Pig Latin.
Pig Stats
PStatistics
Pig Statistics
9. Question
Which among the following is simple xUnit framework that enables you to easily test your Pig scripts.
PigUnit
PigXUnit
PigUnitX
10. Question
Which of the following will compile the Pigunit?
$pig_trunk ant pigunit-jar
$pig_tr ant pigunit-jar
$pig_ ant pigunit-jar
11. Question
PigUnit runs in Pig’s _______ mode by default.

Local
Tez
MapReduce
12. Question
Pig operates in mainly how many nodes?
2
3
4
5
13. Question
You can run Pig in batch mode using
Pig shell command
Pig scripts
Pig options
14. Question
Which of the following function is used to read data in PIG?
WRITE
READ
LOAD
15. Question
You can run Pig in interactive mode using the which of the following shell.
Grunt
FS
HDFS
1. Question
Which of the following will run pig in local mode?
$ pig -x tez_local
$ pig -x local
$ pig
None of the above

2. Question
Which of the following platform is used for constructing data flows for extract, transform, and load
(ETL) processing and analysis of large datasets.
Pig Latin
Pig
Oozie
Hive
3. Question
Which of the following component is of Pig Execution Environment?
Pig Scripts
Parser
Optimizer
All of the above
4. Question
Which among the following is the way of executing Pig script
Embedded Script
Grunt Shell
Script File
All of the above
5. Question
Which of the following is diagnostic operators in Pig
DUMP
DESCRIBE
EXPLAIN
All of the above
6. Question
‘ILLUSTRATE’ run a MapReduce job
False
True
7. Question
Which of the following is relational operators in Pig.

DUMP
DISTINCT
DESCRIBE
All of the above
8. Question
Which of the following is execution modes available in Pig
Local Mode
Map Mode
Reduce Mode
None of the above
9. Question
Pig script is
Case sensitive
Case insensitive
Both the above
None of the above
10. Question
Collection of Tuples is called
Map
Bag
Tuples
All of the above
11. Question
Apache Pig reduces the length of codes by using multi-query approach
True
False
12. Question
Which of the following is the feature of PIG

Rich Set of Operators


Extensibility
Optimization opportunities
All of the above
13. Question
Does Pig give any warning when there is a type mismatch or missing field?
Yes
No
14. Question
Which among the following is complex data types supported by Pig Latin.
Tuple
Bag
Map
All of the above
15. Question
In Hadoop Architecture, what is the primery purpose of Pig?
To move data into HDFS
To provide a high level scripting language on the top of MR
To run workflows
To move streaming data into HDFS

1. Which of the following statements is correct?


Pig is an execution engine that replaces the MapReduce core in Hadoop.
Pig is an execution engine that utilizes the MapReduce core in Hadoop.
Correct!
Pig is an execution engine that compiles Pig Latin scripts into database queries.
Pig is an execution engine that compiles Pig Latin scripts into HDFS.
2. Which of the following statements about Pig are not correct?
In general, to implement a task, the number of lines of code in Pig and Hadoop are roughly the same.
Pig makes use of Hadoop job chaining.
Code written for the Pig engine is compiled into Hadoop jobs.
Code written for the Pig engine is directly compiled into machine code.
3. Lets take the following file dataset.txt:
Frank,19,44,1st_year,12
John,23,,2nd_year,-1
Tom,21,,,0
and the following Pig Latin script:
A = load 'dataset.txt' using PigStorage(',');
B = filter A by $1>20;
C = group B by $2;
dump C;
How many records will be generated as output when running this script?
a. 0
b. 1
c. 2
d. 3
4. Let's consider the file above once more. You are tasked with writing a Pig Latin script that outputs
the unique names (first column) occurring in this file. Which Pig Latin operators do you use (choose
the minimum number)?
foreach, distinct
filter, distinct
foreach, filter
foreach
filter
5. Which of the following definitions of complex data types in Pig are correct?
Tuple: a set of key/value pairs
Tuple: an ordered set of fields.
Bag: a collection of key/value pairs.
Bag: an ordered set of fields.
Map: an ordered set of fields.
Map: a collection of tuples.
6. Which guarantee that Hadoop provides does Pig break?
Calls to the Reducer's reduce() method only occur after the last Mapper has finished running.
All values associated with a single key are processed by the same Reducer.
The Combiner (if defined) may run multiple times, on the Map-side as well as the Reduce-side.
Task stragglers due to slow machines (not data skew) can be sped up through speculative execution.
7. The file 'complex.txt' contains the following two lines (tab delimited):
TUDELFT EWI [adres#Mekelweg,number#4,buildingColor#redblue] {(computer science),
(mathematics), (electronics)}
TUDELFT 3ME [number#2,adres#Mekelweg,postcode#2628CD] {(mechanical engineering),
(maritime engineering), (materials engineering)}
What is the output of the following Pig script?
complex = load 'complex.txt' as (uni:chararray, faculty:chararray, location:map[],
departments:bag{dlist:(d:chararray)});
A = foreach complex generate uni, flatten(location#'street');
dump A;
a. ()
b. ()
()
c. (TUDELFT,)
(TUDELFT,)
d. (TUDELFT)
8. Assume you want to join two datasets within a Pig script: Data set 1 consists of all Wikipedia edits
(information about how a single Wikipedia page is edited) captured for all languages in one log file
across all the year's of Wikipedia's existance (billions of lines of log data); one line contains the
following fields [Unique ID,Wikipedia URL,Edit Timestamp,Editing UserID,Number of Words
Added]
. The lines are ordered in ascending order by the Editing UserID
. Data set 2 consists of information about Wikipedia articles written in Danish (less than 100,000
articles overall): [Unique ID,Wikipedia URL,Wikipedia Title]
. The join should be performed on [Wikipedia URL]
and the generated data set should look as follows: [Edit Timestamp,Wikipedia URL,Wikipedia Title]
. Which join is the most efficient one to use here (assuming a Hadop cluster with 20 machines, each
one with about 4GB of memory and 1TB of disk space)?
sort-merge join
skew join
fragment-replicate join
default join
9. Assume you want to join two datasets within a Pig script: Data set 1 has an entry per Wikipedia
article with the following information: [Wikipedia URL,Last Edit Timestamp,Editing UserID,Number
of Words Added]
. The lines are ordered in ascending order by URL. Data set 2 consists also has one line per Wikipedia
article and contains the following: [Unique ID,Wikipedia URL,Wikipedia Title]
. The lines are ordered in ascending order by URL. The join should be performed on [Wikipedia URL]
and the generated data set should look as follows: [Last Edit Timestamp,Wikipedia URL,Wikipedia
Title]
. Which join is the most efficient one to use here (assuming a Hadop cluster with 20 machines, each
one with about 4GB of memory and 1TB of disk space)?
sort-merge join
skew join
fragment-replicate join
default join
10. Which of the following statements about Pig is correct?
Pig always generates the same number of Hadoop jobs given a particular script, independent of the
amount/type of data that is being processed.
Pig replaces the MapReduce core with its own execution engine.
Pig may generate a different number of Hadoop jobs given a particular script, dependent on the
amount/type of data that is being processed.
When doing a default join, Pig will detect which join-type is probably the most efficient.
11. Specific static Java functions can be used in Pig like UDFs. Take a look at the following Pig script:
define hex InvokeForString(‘java.lang.Integer.toHexString’,’int’);nums = load 'numbers' as (n:int);
inHex = foreach nums generate hex(n);
Apart from these three lines of code, what additional coding responsibilities do we have as developer
here?
We need to write InvokeForString().
We need to register the jar containing java.lang.Integer.
We need to write the toHexString() functionality, extending java.lang.Integer.
There is nothing else to be done.
12. Which of the following definitions of complex data types in Pig are correct?
Tuple: a set of key/value pairs.
Tuple: an ordered set of fields.
Bag: a collection of tuples.
Bag: an ordered set of fields.
Map: a set of key/value pairs.
Map: a collection of tuples.

1. What is Apache Pig?


a) A database management system
b) A data processing platform
c) A distributed file system
d) A web server
Answer: b) A data processing platform
2. What language is used in Apache Pig?
a) Python
b) Java
c) Perl
d) Pig Latin
Answer: d) Pig Latin
Explanation: Pig Latin is the language used in Apache Pig for expressing data processing workflows.
3. Which of the following statements is true about Apache Pig?
a) It is an alternative to Hadoop
b) It can only process structured data
c) It supports multiple programming languages
d) It is not scalable
Answer: c) It supports multiple programming languages
Explanation: Apache Pig supports multiple programming languages such as Pig Latin, Python, and
Java.
4. What is the main advantage of using Apache Pig?
a) Faster data processing
b) Easier programming
c) Reduced data storage requirements
d) Better security
Answer: b) Easier programming
Explanation: Apache Pig provides a simpler programming model for processing large datasets,
making it easier to write data processing workflows.
5. What is the function of the Pig Latin statement “GROUP”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Joins two datasets
d) Performs a cross-product of two datasets
Answer: a) Groups data based on a specified key
Explanation: The “GROUP” statement in Pig Latin groups data based on a specified key, allowing for
aggregation and analysis.
6. What is the function of the Pig Latin statement “FILTER”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Filters data based on a specified condition
d) Performs a cross-product of two datasets
Answer: c) Filters data based on a specified condition
Explanation: The “FILTER” statement in Pig Latin filters data based on a specified condition,
allowing for data subset selection.
7. What is the function of the Pig Latin statement “FOREACH”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Applies a transformation to each record
d) Performs a cross-product of two datasets
Answer: c) Applies a transformation to each record
Explanation: The “FOREACH” statement in Pig Latin applies a transformation to each record in a
dataset, allowing for data cleaning and transformation.
8. What is the function of the Pig Latin statement “JOIN”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Joins two datasets based on a common key
d) Performs a cross-product of two datasets
Answer: c) Joins two datasets based on a common key
Explanation: The “JOIN” statement in Pig Latin joins two datasets based on a common key, allowing
for data integration.
9. What is the function of the Pig Latin statement “ORDER”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Filters data based on a specified condition
d) Performs a cross-product of two datasets
Answer: b) Sorts data in ascending order
Explanation: The “ORDER” statement in Pig Latin sorts data in ascending order based on a specified
key.
10. What is the function of the Pig Latin statement “LIMIT”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Filters data based on a specified condition
d) Limits the number of records returned
Answer: d) Limits the number of records returned
Explanation: The “LIMIT” statement in Pig Latin limits the number of records returned from a
dataset.
11. Which of the following statements is true about Pig Latin scripts?
a) They can be executed only on a single node
b) They must be written in Java
c) They can be run on a cluster of nodes
d) They require a web interface to execute
Answer: c) They can be run on a cluster of nodes
Explanation: Pig Latin scripts can be run on a cluster of nodes, allowing for distributed data
processing.
12. What is the name of the component in Apache Pig that translates Pig Latin scripts into
MapReduce jobs?
a) Pig Compiler
b) Pig Executor
c) Pig Runner
d) Pig Transformer
Answer: a) Pig Compiler
Explanation: The Pig Compiler component in Apache Pig translates Pig Latin scripts into
MapReduce jobs.
13. Which of the following statements is true about Pig Latin UDFs (User-Defined Functions)?
a) They can only be written in Java
b) They can be written in multiple programming languages
c) They are not allowed in Pig Latin scripts
d) They are pre-built functions provided by Pig
Answer: b) They can be written in multiple programming languages
Explanation: Pig Latin UDFs can be written in multiple programming languages such as Java,
Python, and JavaScript.
14. What is the function of the Pig Latin statement “DESCRIBE”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Provides metadata about a dataset
d) Performs a cross-product of two datasets
Answer: c) Provides metadata about a dataset
Explanation: The “DESCRIBE” statement in Pig Latin provides metadata about a dataset, including
schema information and data types.
15. Which of the following statements is true about Apache Pig Latin schemas?
a) They cannot be defined by the user
b) They must be defined using JSON
c) They are optional
d) They must be defined for all datasets
Answer: c) They are optional
Explanation: Schemas in Apache Pig Latin are optional and can be defined by the user if necessary.
16. What is the function of the Pig Latin statement “EXPLAIN”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Provides a detailed explanation of the execution plan for a Pig Latin script
d) Performs a cross-product of two datasets
Answer: c) Provides a detailed explanation of the execution plan for a Pig Latin script
Explanation: The “EXPLAIN” statement in Pig Latin provides a detailed explanation of the
execution plan for a Pig Latin script.
17. Which of the following statements is true about Pig Latin LOAD statements?
a) They are not required for reading data into Pig
b) They are used to write data to a file
c) They must be written in Java
d) They specify the location and format of the input data
Answer: d) They specify the location and format of the input data
Explanation: Pig Latin LOAD statements specify the location and format of the input data to be read
into Pig.
18. What is the function of the Pig Latin statement “STORE”?
a) Groups data based on a specified key
b) Sorts data in ascending order
c) Writes data to a file
d) Performs a cross-product of two datasets
Answer: c) Writes data to a file
Explanation: The “STORE” statement in Pig Latin writes the output data to a file.
19. Which of the following Pig Latin statements is used to group data based on a specified key?
a) GROUP BY
b) SORT BY
c) LIMIT
d) FOREACH
Answer: a) GROUP BY
Explanation: The “GROUP BY” statement in Pig Latin is used to group data based on a specified
key.
20. Which of the following Pig Latin statements is used to sort data in ascending order?
a) GROUP BY
b) SORT BY
c) LIMIT
d) FOREACH
Answer: b) SORT BY
Explanation: The “SORT BY” statement in Pig Latin is used to sort data in ascending order.
21. Which of the following Pig Latin statements is used to filter data based on a specified
condition?
a) GROUP BY
b) SORT BY
c) LIMIT
d) FILTER
Answer: d) FILTER
Explanation: The “FILTER” statement in Pig Latin is used to filter data based on a specified
condition.
22. Which of the following Pig Latin statements is used to join two datasets?
a) JOIN
b) UNION
c) CROSS
d) MERGE
Answer: a) JOIN
Explanation: The “JOIN” statement in Pig Latin is used to join two datasets.
23. Which of the following Pig Latin statements is used to combine two datasets?
a) JOIN
b) UNION
c) CROSS
d) MERGE
Answer: b) UNION
Explanation: The “UNION” statement in Pig Latin is used to combine two datasets.
24. Which of the following Pig Latin statements is used to perform a cross-product of two
datasets?
a) JOIN
b) UNION
c) CROSS
d) MERGE
Answer: c) CROSS
Explanation: The “CROSS” statement in Pig Latin is used to perform a cross-product of two datasets.
25. Which of the following Pig Latin statements is used to apply a function to each record in a
dataset?
a) GROUP BY
b) SORT BY
c) LIMIT
d) FOREACH
Answer: d) FOREACH
Explanation: The “FOREACH” statement in Pig Latin is used to apply a function to each record in a
dataset.
26. Which of the following Pig Latin statements is used to aggregate data based on a specified
key?
a) GROUP BY
b) SORT BY
c) LIMIT
d) FOREACH
Answer: a) GROUP BY
Explanation: The “GROUP BY” statement in Pig Latin is used to aggregate data based on a specified
key.
27. Which of the following Pig Latin statements is used to compute the sum of a specified
column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: a) SUM
Explanation: The “SUM” statement in Pig Latin is used to compute the sum of a specified column.
28. Which of the following Pig Latin statements is used to compute the average of a specified
column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: b) AVG
Explanation: The “AVG” statement in Pig Latin is used to compute the average of a specified
column.
29. Which of the following Pig Latin statements is used to compute the maximum value of a
specified column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: c) MAX
Explanation: The “MAX” statement in Pig Latin is used to compute the maximum value of a
specified column.
30. Which of the following Pig Latin statements is used to compute the minimum value of a
specified column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: d) MIN
Explanation: The “MIN” statement in Pig Latin is used to compute the minimum value of a specified
column.
31. Which of the following Pig Latin statements is used to load data from a Hadoop Distributed
File System (HDFS)?
a) LOAD
b) STORE
c) DUMP
d) FILTER
Answer: a) LOAD
Explanation: The “LOAD” statement in Pig Latin is used to load data from a Hadoop Distributed File
System (HDFS).
32. Which of the following Pig Latin statements is used to store data in a Hadoop Distributed
File System (HDFS)?
a) LOAD
b) STORE
c) DUMP
d) FILTER
Answer: b) STORE
Explanation: The “STORE” statement in Pig Latin is used to store data in a Hadoop Distributed File
System (HDFS).
33. Which of the following Pig Latin statements is used to display data on the console?
a) LOAD
b) STORE
c) DUMP
d) FILTER
Answer: c) DUMP
Explanation: The “DUMP” statement in Pig Latin is used to display data on the console.
34. Which of the following Pig Latin statements is used to remove duplicate records from a
dataset?
a) DISTINCT
b) GROUP BY
c) SORT BY
d) LIMIT
Answer: a) DISTINCT
Explanation: The “DISTINCT” statement in Pig Latin is used to remove duplicate records from a
dataset.
35. Which of the following Pig Latin statements is used to limit the number of records in a
dataset?
a) DISTINCT
b) GROUP BY
c) SORT BY
d) LIMIT
Answer: d) LIMIT
Explanation: The “LIMIT” statement in Pig Latin is used to limit the number of records in a dataset.
36. Which of the following Pig Latin statements is used to split a dataset into multiple datasets
based on a specified condition?
a) SPLIT
b) JOIN
c) UNION
d) CROSS
Answer: a) SPLIT
Explanation: The “SPLIT” statement in Pig Latin is used to split a dataset into multiple datasets
based on a specified condition.
37. Which of the following Pig Latin statements is used to define a user-defined function?
a) DEFINE
b) REGISTER
c) LOAD
d) STORE
Answer: a) DEFINE
Explanation: The “DEFINE” statement in Pig Latin is used to define a user-defined function.
38. Which of the following Pig Latin statements is used to register a user-defined function?
a) DEFINE
b) REGISTER
c) LOAD
d) STORE
Answer: b) REGISTER
Explanation: The “REGISTER” statement in Pig Latin is used to register a user-defined function.
39. Which of the following Pig Latin statements is used to load a user-defined function?
a) DEFINE
b) REGISTER
c) LOAD
d) STORE
Answer: c) LOAD
Explanation: The “LOAD” statement in Pig Latin is used to load a user-defined function.
40. Which of the following Pig Latin statements is used to store a user-defined function?
a) DEFINE
b) REGISTER
c) LOAD
d) STORE
Answer: d) STORE
Explanation: The “STORE” statement in Pig Latin is used to store data to a file.
41. Which of the following Pig Latin statements is used to perform a left outer join?
a) JOIN
b) COGROUP
c) CROSS
d) UNION
Answer: b) COGROUP
Explanation: The “COGROUP” statement in Pig Latin is used to perform a left outer join.
42. Which of the following Pig Latin statements is used to perform a right outer join?
a) JOIN
b) COGROUP
c) CROSS
d) UNION
Answer: b) COGROUP
Explanation: The “COGROUP” statement in Pig Latin is used to perform a right outer join.
43. Which of the following Pig Latin statements is used to perform a full outer join?
a) JOIN
b) COGROUP
c) CROSS
d) UNION
Answer: b) COGROUP
Explanation: The “COGROUP” statement in Pig Latin is used to perform a full outer join.
44. Which of the following Pig Latin statements is used to perform a self-join?
a) JOIN
b) COGROUP
c) CROSS
d) UNION
Answer: a) JOIN
Explanation: The “JOIN” statement in Pig Latin is used to join a dataset with itself.
45. Which of the following Pig Latin statements is used to filter out records that do not match a
specified condition?
a) DISTINCT
b) GROUP BY
c) FILTER
d) LIMIT
Answer: c) FILTER
Explanation: The “FILTER” statement in Pig Latin is used to filter out records that do not match a
specified condition.
46. Which of the following Pig Latin statements is used to sort a dataset based on a specified
column?
a) DISTINCT
b) GROUP BY
c) SORT BY
d) LIMIT
Answer: c) SORT BY
Explanation: The “SORT BY” statement in Pig Latin is used to sort a dataset based on a specified
column.
47. Which of the following Pig Latin statements is used to group a dataset based on a specified
column?
a) DISTINCT
b) GROUP BY
c) SORT BY
d) LIMIT
Answer: b) GROUP BY
Explanation: The “GROUP BY” statement in Pig Latin is used to group a dataset based on a specified
column.
48. Which of the following Pig Latin statements is used to generate a new dataset by combining
two or more datasets?
a) JOIN
b) COGROUP
c) CROSS
d) UNION
Answer: d) UNION
Explanation: The “UNION” statement in Pig Latin is used to generate a new dataset by combining
two or more datasets.
49. Which of the following Pig Latin statements is used to calculate the average value of a
specified column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: b) AVG
Explanation: The “AVG” statement in Pig Latin is used to calculate the average value of a specified
column.
50. Which of the following Pig Latin statements is used to calculate the total sum of a specified
column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: a) SUM
Explanation: The “SUM” statement in Pig Latin is used to calculate the total sum of a specified
column.
51. Which of the following Pig Latin statements is used to calculate the maximum value of a
specified column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: c) MAX
Explanation: The “MAX” statement in Pig Latin is used to calculate the maximum value of a
specified column.
52. Which of the following Pig Latin statements is used to calculate the minimum value of a
specified column?
a) SUM
b) AVG
c) MAX
d) MIN
Answer: d) MIN
Explanation: The “MIN” statement in Pig Latin is used to calculate the minimum value of a specified
column.
53. Which of the following Pig Latin statements is used to flatten a nested column in a dataset?
a) FLATTEN
b) NEST
c) GROUP
d) ORDER
Answer: a) FLATTEN
Explanation: The “FLATTEN” statement in Pig Latin is used to flatten a nested column in a dataset.
54. Which of the following Pig Latin statements is used to generate a schema for a dataset?
a) DESCRIBE
b) ILLUSTRATE
c) DUMP
d) EXPLAIN
Answer: a) DESCRIBE
Explanation: The “DESCRIBE” statement in Pig Latin is used to generate a schema for a dataset.
55. Which of the following Pig Latin statements is used to visualize a sample of a dataset?
a) DESCRIBE
b) ILLUSTRATE
c) DUMP
d) EXPLAIN
Answer: b) ILLUSTRATE
Explanation: The “ILLUSTRATE” statement in Pig Latin is used to visualize a sample of a dataset.
56. Which of the following Pig Latin statements is used to output the contents of a dataset to the
console?
a) DESCRIBE
b) ILLUSTRATE
c) DUMP
d) EXPLAIN
Answer: c) DUMP
Explanation: The “DUMP” statement in Pig Latin is used to output the contents of a dataset to the
console.
57. Which of the following Pig Latin statements is used to display the logical execution plan for a
Pig Latin script?
a) DESCRIBE
b) ILLUSTRATE
c) DUMP
d) EXPLAIN
Answer: d) EXPLAIN
Explanation: The “EXPLAIN” statement in Pig Latin is used to display the logical execution plan for
a Pig Latin script.
58. Which of the following Pig Latin statements is used to store the result of a Pig Latin script to
a file system?
a) STORE
b) SAVE
c) OUTPUT
d) WRITE
Answer: a) STORE
Explanation: The “STORE” statement in Pig Latin is used to store the result of a Pig Latin script to a
file system.
59. Which of the following Pig Latin statements is used to load a dataset from a file system?
a) LOAD
b) INPUT
c) GET
d) FETCH
Answer: a) LOAD
Explanation: The “LOAD” statement in Pig Latin is used to load a dataset from a file system.
60. Which of the following Pig Latin statements is used to specify the format of the data being
loaded?
a) FORMAT
b) TYPE
c) SCHEMA
d) USING
Answer: d) USING

1. Apache Pig is a high-level platform for creating programs that run on


A. Apache Hadoop
B. Apache Hiv
C. Java
D. Python
Explanation
Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. Hadoop is an
open-source framework that allows for distributed processing and storage of large datasets across
clusters of computers. Apache Pig provides a high-level language called Pig Latin, which simplifies
the process of writing MapReduce programs on Hadoop. By using Pig, developers can write complex
data transformations and analysis tasks more easily, without having to write low-level Java code.
Therefore, the correct answer is Apache Hadoop
2. Pig can execute its Hadoop jobs on
A. Sql
B. Java
C. MapReduce
D. HTML
Explanation
Pig can execute its Hadoop jobs on MapReduce. MapReduce is a programming model and software
framework used for processing large amounts of data in parallel across a cluster of computers. Pig is a
high-level data flow scripting language that allows developers to express their data transformations
using a language called Pig Latin. Pig Latin scripts are then compiled into MapReduce jobs, which are
executed on the Hadoop cluster. Therefore, MapReduce is the correct answer as it is the underlying
framework used by Pig to execute its Hadoop jobs.
3. The language for this platform is called
A. Pig Java
B. Pig Gana
C. Pig Latin
D. Pig Greek
Explanation
The correct answer is Pig Latin. Pig Latin is a high-level scripting language used for analyzing large
datasets in Apache Hadoop. It provides a simplified and expressive language for data manipulation
and analysis. Pig Latin is designed to work with Pig, a platform for analyzing and processing big data.
It allows users to write complex data transformations using a simple and intuitive syntax, making it
easier to work with big data
4. Apache Pig was released in what year
A. 2007
B. 2008
C. 2009
D. 2010
Explanation
Apache Pig was released in 2008.
5. Pig runs on the following operating system except
A. OS X
B. Linux
C. Android
D. Microsoft
Explanation
Android is not one of the operating systems on which Pig runs. Pig is a high-level platform for
analyzing large datasets in Apache Hadoop, and it primarily runs on OS X, Linux, and Microsoft
operating systems. However, Android is a mobile operating system designed for smartphones and
tablets, and it is not compatible with Pig
6. One of this operating system runs Pig
A. Android
B. Linux
C. Java
D. Ubuntu
Explanation
Linux is the correct answer because it is an operating system that is commonly used for running Pig, a
high-level platform for analyzing large datasets in Apache Hadoop. Android is a mobile operating
system, Java is a programming language, and Ubuntu is a Linux distribution, none of which are
specifically known for running Pig
7. Pig is a type of__software
A. Data management
B. Data transfer
C. Data storage
D. Data analysis
Explanation
The correct answer is "Data analysis" because a pig is a type of software that is commonly used for
analyzing large datasets. Pig is a high-level platform for creating MapReduce programs used with
Apache Hadoop, which is a framework for processing and analyzing big data. Pig scripts are written in
a language called Pig Latin, which provides a simplified and expressive way to perform data analysis
tasks. Therefore, pig is specifically designed for data analysis purposes.
8. Pig was initially developed by
A. Facebook
B. Yahoo
C. T Microsoft
D. Twitter
Explanation
Pig was initially developed by Yahoo.
9. Pig is developed by
A. Apache software foundation
B. Yahoo software foundation
C. Facebook
D. Twitter
Explanation
Pig is developed by the Apache Software Foundation. The Apache Software Foundation is a non-profit
organization that supports the development of open-source software projects. They provide resources
and infrastructure for developers to collaborate and contribute to various projects, including Pig. Pig is
a high-level platform for analyzing large datasets in Apache Hadoop. It provides a scripting language
called Pig Latin, which allows users to express complex data transformations and analysis tasks. The
Apache Software Foundation's involvement ensures that Pig is continuously maintained, improved,
and supported by a community of developers.
10. Pig license is
A. Apache license 2
B. Facebook license 2
C. Apache license 3
D. Twitter license 2

1. You can run Pig in interactive mode using the ______ shell
Grunt
HDFS
FS
Hadoop

2. Nhiều lựa chọn


Which of the following is the default mode?
mapreduce
local
tez
All of the mentioned

3. Nhiều lựa chọn


Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive
mode)
run
fetch
declare
all of the mentioned
4. Nhiều lựa chọn
What are the different complex data types in PIG
map
tuple

bag
All of these
5. Nhiều lựa chọn
What are the different complex data types in PIG
map
tuple
bag
All of these

6. Nhiều lựa chọn


Which of the follwing is a platform for analyzing large data sets that consists
of a high-level language for expressing data analysis programs
pig latin
pig
oozie
hive

7. Nhiều lựa chọn


Pig Latin scripting language is not only a higher-level data flow language but also has operators
similar to
json
sql
xml
mapreduce

8. Nhiều lựa chọn


Which of the following is data flow scripting language for analyzing unstructured data?
mahoot
hbase

pig
hive
9. Nhiều lựa chọn
Pig operates in mainly how many nodes?
2
3
4
5

10. Nhiều lựa chọn


You can run Pig in batch mode using
pig shell command
pig script
pig option
11. Nhiều lựa chọn
Which of the following function is used to read data in PIG?
load
read
write
append
12. Nhiều lựa chọn
Which of the
$ pig -x tez_local
$ pig -x local
pig
pig -x mapreduce
13. Nhiều lựa chọn
Which of the following platform is used for constructing data flows for extract,
transform, and load (ETL) processing and analysis of large datasets.
pig

oozie
hive
pig latin
14. Nhiều lựa chọn
Which of the following component is of Pig Execution Environment?
pig script
parser
optimizer
all of mentioned
15. Nhiều lựa chọn
Which among the following is the way of executing Pig script
embedded script
grunt shell
script file
all of the above
16. Nhiều lựa chọn
Which of the following is execution modes available in Pig
local mode
map mode
reduce mode
none of the above
17. Nhiều lựa chọn
Collection of Tuples is called
TUPLE
MAP
BAG
ALL OF THE ABOVE

18. Nhiều lựa chọn


Which of the following is the feature of PIG
Rich Set of Operators

Extensibility
Optimization opportunities
All of the above
19. Nhiều lựa chọn
Which among the following is complex data types supported by Pig Latin.
TUPLE
BAG
MAP
ALL OF THE ABOVE

20. Nhiều lựa chọn


In Hadoop Architecture, what is the primery purpose of Pig?
To move data into HDFS
To provide a high level scripting language on the top of MR
To run workflows
To move streaming data into HDFS

1. Pig operates in mainly how many nodes?


a) Two
b) Three
c) Four
d) Five
Answer: a
Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode:
Interactive and Batch Mode.
2. Point out the correct statement.
a) You can run Pig in either mode using the “pig” command
b) You can run Pig in batch mode using the Grunt shell
c) You can run Pig in interactive mode using the FS shell
d) None of the mentioned
Answer: a
Explanation: You can run Pig in either mode using the “pig” command (the bin/pig Perl script) or the
“java” command (java -cp pig.jar …).
3. You can run Pig in batch mode using __________
a) Pig shell command
b) Pig scripts
c) Pig options
d) All of the mentioned
Answer: b
Explanation: Pig script contains Pig Latin statements.
4. Pig Latin statements are generally organized in one of the following ways?
a) A LOAD statement to read data from the file system
b) A series of “transformation” statements to process the data
c) A DUMP statement to view results or a STORE statement to save the results
d) All of the mentioned
Answer: d
Explanation: A DUMP or STORE statement is required to generate output.
5. Point out the wrong statement.
a) To run Pig in local mode, you need access to a single machine
b) The DISPLAY operator will display the results to your terminal screen
c) To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation
d) All of the mentioned.
Answer: b
Explanation: The DUMP operator will display the results to your terminal screen.
6. Which of the following function is used to read data in PIG?
a) WRITE
b) READ
c) LOAD
d) None of the mentioned
Answer: c
Explanation: PigStorage is the default load function.
7. You can run Pig in interactive mode using the ______ shell.
a) Grunt
b) FS
c) HDFS
d) None of the mentioned
Answer: a
Explanation: Invoke the Grunt shell using the “pig” command (as shown below) and then enter your
Pig Latin statements and Pig commands interactively at the command line.
8. Which of the following is the default mode?
a) Mapreduce
b) Tez
c) Local
d) All of the mentioned
Answer: a
Explanation: To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS
installation.
9. Which of the following will run pig in local mode?
a) $ pig -x local …
b) $ pig -x tez_local …
c) $ pig …
d) None of the mentioned
Answer: a
Explanation: Specify local mode using the -x flag (pig -x local).
10.$ pig -x tez_local … will enable ________ mode in Pig.
a) Mapreduce
b) Tez
c) Local
d) None of the mentioned
Answer: d
Explanation: Tez Local Mode is similar to local mode, except internally Pig will invoke tez runtime
engine.

1. Question
The results of a hive query can be stored as

Local File
HDFS file
Both the above
Can not be stored
2. Question
If the database contains some tables then it can be forced to drop without dropping the tables by using
the keyword
RESTRICT
OVERWRITE
F DROP
CASCADE
3. Question
Users can pass configuration information to the SerDe using
SET SERDEPRPERTIES
WITH SERDEPRPERTIES
BY SERDEPRPERTIES
CONFIG SERDEPRPERTIES
4. Question
The property set to run hive in local mode as true so that it runs without creating a mapreduce job is
hive.exec.mode.local.auto
hive.exec.mode.local.override
hive.exec.mode.local.settings
hive.exec.mode.local.config
5. Question
Which kind of keys(CONSTRAINTS) Hive can have?
Primary Keys
Foreign Keys
Unique Keys
None of the above
6. Question
What is the disadvantage of using too many partitions in Hive tables?
It slows down the namenode
Storage space is wasted
Join quires become slow
All of the above
7. Question
The default delimiter in hive to separate the element in STRUCT is
'\001'
'\oo2'
'\oo3'
'\oo4'
8. Question
By default when a database is dropped in Hive
The tables are also deleted
The directory is deleted if there are no tables
The HDFS blocks are formatted
None of the above
9. Question
The main advantage of creating table partition is
Effective storage memory utilization
Faster query performance
Less RAM required by namenode
Simpler query syntax
10. Question
If the schema of the table does not match with the data types present in the file containing the table
then Hive
Automatically drops the file
Automatically corrects the data
Reports Null values for mismatched data
Does not allow any query to run on the table
11. Question
A view in Hive can be seen by using
SHOW TABLES
SHOW VIEWS
DESCRIBE VIEWS
VIEW VIEWS
12. Question
If an Index is dropped then
The underlying table is also dropped
The directory containing the index is deleted
The underlying table is not dropped
Error is thrown by hive
13. Question
Which file controls the logging of Mapreduce Tasks?
hive-log4j.properties
hive-exec-log4j.properties
hive-cli-log4j.properties
hive-create-log4j.properties
14. Question
What Hive can not offer
Storing data in tables and columns
Online transaction processing
Handling date time data
Partitioning stored data
15. Question
To see the partitions keys present in a Hive table the command used is
Describe
Describe extended
Show
Show extended

1. Question
For optimizing join of three tables, the largest sized tables should be placed as

The first table in the join clause


Second table in the join clause
Third table in the join clause
Does not matter
2. Question
Which of the following hint is used to optimize the join queries
/* joinlast(table_name) */
/* joinfirst(table_name) */
/* streamtable(table_name) */
/* cacheable(table_name) */
3. Question
Calling a unix bash script inside a Hive Query is an example of
Hive Pipeline
Hive Caching
Hive Forking
Hive Streaming
4. Question
Hive uses _________ for logging.
logj4
log4l
log4i
log4j
5. Question
HiveServer2 introduced in Hive 0.11 has a new CLI called
BeeLine
SqlLine
HiveLine
ClilLine
6. Question
In which mode HiveServer2 only accepts valid Thrift calls.

Remote
HTTP
Embedded
Interactive
7. Question
Which of the following data type is supported by Hive?
map
record
string
enum
8. Question
Which of the following is not a complex data type in Hive?
Matrix
Array
Map
STRUCT
9. Question
Each database created in hive is stored as
A file
A directory
A HDFS block
A jar file
10. Question
When a partition is archived in Hive it
Reduces space through compression
Reduces the length of records
Reduces the number of files stored
Reduces the block size
11. Question
When a Hive query joins 3 tables, How many mapreduce jobs will be started?

1
2
3
12. Question
The reverse() function reverses a string passed to it in a Hive query. This is an example of
Standard UDF
Aggregate UDF
Table Generating UDF
None of the above
13. Question
Hive can be accessed remotely by using programs written in C++, Ruby etc, over a single port. This is
achieved by using
HiveServer
HiveMetaStore
HiveWeb
Hive Streaming
14. Question
The thrift service component in hive is used for
Moving hive data files between different servers
Use multiple hive versions
Submit hive queries from a remote client
Installing hive
15. Question
The query “SHOW DATABASE LIKE ‘h.*’ ; gives the output with database name
Containing h in their name
Starting with h
Ending with h
Containing 'h.'

1. Question
Is it possible to change the default location of Managed Tables in Hive
Yes
No
2. Question
Which among the following command is used to change the settings within Hive session
RESET
SET
3. Question
How to change the column data type in Hive
ALTER and CHANGE
ALTER
CHANGE
4. Question
Which of the following is the data types in Hive
ARRAY
STRUCT
MAP
All of the above
5. Question
Which of the following is the Key components of Hive Architecture
User Interface
Metastore
Driver
All of the above
6. Question
Are multiline comments supported in Hive?
Yes
No
7. Question
Can we run UNIX shell commands from Hive?
Yes
No
8. Question
Which of the following is the commonly used Hive services
Command Line Interface (cli)
Hive Web Interface (hwi)
HiveServer (hiveserver)
All of the above
9. Question
Explode in Hive is used to convert complex data types into desired table formats.
True
False
10. Question
Is it possible to overwrite Hadoop MapReduce configuration in Hive?
Yes
No
11. Question
Point out the correct statement
Hive is not a relational database, but a query engine that supports the parts of SQL
Hive is a relational database with SQL support
Pig is a relational database with SQL support
None of the above
12. Question
Which of the following is used to analyse data stored in Hadoop cluster using SQL like query
Mahoot
Hive
Pig
All of the above
13. Question
If an Index is dropped then

The directory containing the index is deleted


The underlying table is not dropped
The underlying table is also dropped
Error is thrown by hive
14. Question
If the schema of the table does not match with the data types present in the file containing the table
then Hive
Automatically drops the file
Automatically corrects the data
Reports Null values for mismatched data
Does not allow any query to run on the table\
15. Question
By default when a database is dropped in Hive
The tables are also deleted
The directory is deleted if there are no tables
The HDFS blocks are formatted
None of the above

1. Nhiều lựa chọn


Hive created by
Facebook
Google
Amazon
Yahoo!

2. Nhiều lựa chọn


Hive is a
Data Warehousing Tool
DataBase Management Tool
Data Scrapping Tool
Hadoop Data Tool

3. Nhiều lựa chọn


Hive don't make use of following:
HDFS for Storage
MapReduce for Execution
Stores metadata in a RDBMS
GPU for Processing
4. Nhiều lựa chọn
HQL is ______________ to SQL
Similar
Dissimilar
5. Nhiều lựa chọn
Hive __________ SQL queries _________ MapReduce Jobs
create, convert into
compiles, into
execute, from
interpret, of
6. Nhiều lựa chọn
Which of them is not hive feature?
easy to code
support rich datatypes
supports group-by
UDF not supported
7. Nhiều lựa chọn
A database is namespace for
tables
fields
records
keys
8. Nhiều lựa chọn

Separation of data on basis of specific attribute is


Partition
Bucketting
Fragmentation
Slicing
9. Nhiều lựa chọn
Separation of data on basis of mathematical hash function is
Partition
Bucketting
Fragmenation
Slicing
10. Nhiều lựa chọn
What is the right visual charter of HIVE Application?
11. Nhiều lựa chọn
What hive is not
online transaction processing
Data Warehousing tool
Similar to SQL
Analyze log data

12. Nhiều lựa chọn


Hive is suitable for
real-time queries
row-level updates
queries over small data sets
analyze historical data

13. Nhiều lựa chọn


Applications of Apache Hive
Log processing
OLTP
Billing systems
Processing Web Forms

14. Nhiều lựa chọn


Hive enables easy data summarization, ad-hoc querying and analysis of large volumes of data.
true
false
15. Nhiều lựa chọn
In _________ due to equal volumes of data in each partition, joins at Map side will be quicker.
bucketing
partitioning
16. Nhiều lựa chọn
The results of a hive query can be stored as
Local File
HDFS file
Both the above

Can not be stored


17. Nhiều lựa chọn
If the database contains some tables then it can be forced to drop without dropping the tables by using
the keyword
RESTRICT
OVERWRITE
F DROP
CASCADE

18. Nhiều lựa chọn


Users can pass configuration information to the SerDe using
SET SERDEPRPERTIES
WITH SERDEPRPERTIES
BY SERDEPRPERTIES
CONFIG SERDEPRPERTIES
19. Nhiều lựa chọn
Point out the wrong statement:
There are four namespaces for variables in Hive
Custom variables can be created in a separate namespace with the defin
Custom variables can also be created in a separate namespace with hivevar
None of the mentioned

20. Nhiều lựa chọn


A user creates a UDF which accepts arguments of different data types, each time it is run. It is an
example of
Aggregate Function
Generic Function
Standard UDF
Super Functions
21. Nhiều lựa chọn
_______ supports a new command shell Beeline that works with HiveServer2

HiveServer2
HiveServer3
HiveServer4
None of the mentioned
22. Nhiều lựa chọn
The below expression in the where clause RLIKE '.*(Chicago|Ontario).*'; gives the result which
match
words containing both Chicago and Ontario
words containing either Chicago or Ontario
words Ending with Chicago or Ontario
words starting with Chicago or Ontario

23. Nhiều lựa chọn


The partitioning of a table in Hive creates more
subdirectories under the database name
subdirectories under the table name
files under database name
files under the table name

24. Nhiều lựa chọn


The clause used to limit the number of rows returned by a query is
Rownum
Restrict
Maxrow
Limit

25. Nhiều lựa chọn


In ______ mode HiveServer2 only accepts valid Thrift calls.
Remote
HTTP
Embedded
Interactive

26. Nhiều lựa chọn


Which of the following scenarios are not prevented by enabling strict mode in Hive?
Scanning all the partitions
Generating random sample of data
Running a order by clause without a LIMIT
Cartesian product
27. Nhiều lựa chọn
The property set to run hive in local mode as true so that it runs without creating a mapreduce
job is
hive.exec.mode.local.auto
hive.exec.mode.local.override
hive.exec.mode.local.settings
hive.exec.mode.local.config

28. Nhiều lựa chọn


Which kind of keys(CONSTRAINTS) Hive can have?
Primary Keys
Foreign Keys
Unique Keys
None of the above

29. Nhiều lựa chọn


What is the disadvantage of using too many partitions in Hive tables?
It slows down the namenode
Storage space is wasted
Join quires become slow
All of the above

30. Nhiều lựa chọn


Which of the follwing is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs
Pig Latin
Pig
Oozie
Hive

31. Nhiều lựa chọn


Pig operates in mainly how many nodes?
Two
Three
Four
Four
32. Nhiều lựa chọn
Point out the correct statement.
You can run Pig in either mode using the “pig” command
You can run Pig in batch mode using the Grunt shell
You can run Pig in interactive mode using the FS shell
None of the mentioned
33. Nhiều lựa chọn
You can run Pig in batch mode using __________
Pig shell command
Pig scripts
Pig options
All of the mentioned

34. Nhiều lựa chọn


Pig Latin statements are generally organized in one of the following ways?

A LOAD statement to read data from the file system


A series of “transformation” statements to process the data
A DUMP statement to view results or a STORE statement to save the results
All of the mentioned

35. Nhiều lựa chọn


Point out the wrong statement.
To run Pig in local mode, you need access to a single machine
The DISPLAY operator will display the results to your terminal screen
To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation
All of the mentioned

36. Nhiều lựa chọn


Which of the following function is used to read data in PIG?
WRITE
READ
LOAD
None of the mentioned
37. Nhiều lựa chọn
$ pig -x tez_local … will enable ________ mode in Pig.
Mapreduce
Tez
Local
None of the mentioned

38. Nhiều lựa chọn


What are the various diagnostic operators available in Apache Pig?
Dump Operator
Describe Operator
Explain Operator

All of these

39. Nhiều lựa chọn


If data has less elements than the specified schema elements in pig, then?
Pig will not do any thing
It will pad the end of the record columns with nulls
Pig will through error
Pig will warn you before it throws error

40. Nhiều lựa chọn


rel = sample xrel 0.1
samples 10% random records
samples first 10% records
samples last 10% records
None of the above

41. Nhiều lựa chọn


Point out the correct statement
LoadMeta has methods to convert byte arrays to specific types
LoadPush has methods to push operations from Pig runtime into loader implementations
The Pig load/store API is aligned with Hadoop’s InputFormat class only
All of the mentioned
42. Nhiều lựa chọn
Which of the following platform is used for constructing data flows for extract, transform, and load
(ETL) processing and analysis of large datasets.
Pig
Pig Latin
Oozie
Hive

43. Nhiều lựa chọn


‘ILLUSTRATE’ run a MapReduce job
True
False
44. Nhiều lựa chọn
Pig script is
Case sensitive
Case insensitive
Both the above
None of the above
45. Nhiều lựa chọn
Apache Pig reduces the length of codes by using multi-query approach
True
False

https://www.tutorialspoint.com/hive/hive_online_quiz.htm
Q 1 - in hive when the schema does not match the file content
A - It cannot read the file
B - It reads only the string data type
C - it throws an error and stops reading the file
D - It returns null values for mismatched fields.
Answer : D
Explanation
Instead of returning error, Hive returns null values for mismatch between schema and actual data.
Q 2 - If the database contains some tables then it can be forced to drop without dropping the
tables by using the keyword
A - RESTRICT
B - OVERWRITE
C - F DROP
D – CASCADE
Answer : D
Explanation
CASCADE clause drops the table first before dropping the database
Q 3 - The "strict" mode when querying a partitioned table is used to
A - stop queries of partitioned tables without a where clause
B - automatically add a where clause to the queries on a partitioned table
C - Limit the result of a query on partitioned table to 100
D - Ignore any error in the name of the partitioned table
Answer : A
Explanation
The strict mode is designed to avoid long running jobs.
Q 4 - When a partition is archived in Hive it
A - Reduces space through compression
B - Reduces the block size
C - reduces the length of records
D - reduces the number of files stored
Answer : D
Explanation
Archiving merges the files into one directory.
Q 5 - To select all columns starting with the word 'Sell' form the table GROSS_SELL the query
is
A - select '$Sell*' from GROSS_SELL
B - select 'Sell*' from GROSS_SELL
C - select 'sell.*' from GROSS_SELL
D - select 'sell[*]' from GROSS_SELL
Answer : C
Explanation
Hive supports java based regular expression for querying its metadata.
Q 6 - The name of a view in Hive
A - can be same as the name of another table in the same database
B - cannot be same as the name of another table in the same database
C - cannot contain a number
D - cannot be more than 10 character long
Answer : B
Explanation
Views and tables are treated similarly in the hive metadata
Q 7 - The identifiers in HiveQL are
A - case sensitive
B - case insensitive
C - sometimes case sensitive
D - Depends on the Hadoop environment

Answer : A
Explanation
Hive is case insensitive
Q 8 - Setting the local mode execution to true causes
A - All tasks are executed on data available closet to the namenode
B - All tasks are executed only on a single machine
C - All the data files are cached on a datanode before query execution
D - Random data is used for query execution
Answer : B
Explanation
Local mode avoid creating mapreduce job while running the job in a single machine.
Q 9 - A Table Generating Function is a Function that
A - Takes one or more columns form a row and returns a single value
B - Takes one or more columns form many rows and returns a single value
C - Take zero or more inputs and produce multiple columns or rows of output
D - Detects the type of input programmatically and provides appropriate response.
Q 10 - To add a new user defined Function permanently to Hive, we need to
A - Create a new version of HIve
B - Add the .class Java code to FunctionRegistry
C - Add the .jar Java code to FunctionRegistry
D - Add the .jar java code to $HOME/.hiverc
Answer : B
Explanation
Functionregistry holds the list of all permanent functions

1. pache Hive is data warehouse software project built on top


A. Apache groove
B. Apache Hadoop
C. Apache net
D. Apache loof
2. Hive was initially developed by
A. Facebook
B. Twitter
C. Amazon
D. Microsoft
Explanation
Hive was initially developed by Facebook.
3. Hive is written in what language
A. Linux
B. Python
C. Java
D. Gama
Explanation
Hive is a data warehouse infrastructure tool that is built on top of Hadoop. It provides a SQL-like
interface to query and analyze large datasets stored in Hadoop. Hive is written in Java, which makes it
platform-independent and allows it to run on any system that supports Java. This choice of language
ensures that Hive can be easily integrated with other Java-based tools and frameworks in the Hadoop
ecosystem.
4. Hive converts queries to all except
A. Apache tez
B. Spark ten
C. Map reduce
D. Spark jobs
5. By default, Hive stores metadata in an embedded
A. Apache tez
B. Apache hood
C. Apache derby
D. Apache hadoop
Explanation
Hive, by default, stores its metadata in an embedded Apache Derby database. Apache Derby is a
lightweight, Java-based relational database management system (RDBMS) that is included with Hive.
It is used to store and manage the metadata, such as table schemas, partitions, and column statistics,
for Hive tables. This allows Hive to efficiently query and analyze large datasets stored in Apache
Hadoop.
6. Apache Hive supports analysis of large data sets stored in
Hadoop's
A. HDFS
B. HDPS
C. HDFC
D. HFSP
Explanation
Apache Hive is a data warehouse infrastructure that provides tools to enable easy data summarization,
querying, and analysis of large datasets stored in Hadoop. Hadoop Distributed File System (HDFS) is
the primary storage system used by Hadoop, and it is designed to store and process large amounts of
data across multiple machines. Therefore, Apache Hive supports analysis of large data sets stored in
Hadoop's HDFS.
7. Other companies that use Hive include
A. Whatsapp
B. Twitter
C. Netflix
D. WeChat
8. Hive is a type of
A. Social media
B. Data interpreter
C. Data warehouse
D. Instant messaging
9. Hive has how many execution engines
A. 2
B. 3
C. 4
D. 5
Explanation
Hive has three execution engines. These execution engines are responsible for processing and
executing queries in Hive. The three execution engines in Hive are MapReduce, Tez, and Spark. Each
engine has its own advantages and can be chosen based on the specific requirements of the query and
the underlying infrastructure. MapReduce is the default execution engine, while Tez and Spark provide
faster and more efficient processing capabilities.
10. Major components of the Hive architecture includes the following
except
A. Metastore
B. Drivers
C. Compiler
D. Interpreter
Explanation
The Hive architecture consists of several major components that work together to process and analyze
data. These components include the Metastore, which stores metadata about the tables and partitions in
Hive, the Drivers, which handle the execution of Hive queries, and the Compiler, which translates
HiveQL queries into MapReduce jobs. The Interpreter, on the other hand, is not a part of the Hive
architecture. It is a component of other systems like Apache Zeppelin, which allows users to
interactively run queries and visualize data.

https://www.freshersnow.com/hive-quiz/
Top 60 Hive Multiple Choice Questions | Practice Online Quiz
1. What is Hive?
A. A data processing tool
B. A database management system
C. A distributed computing system
D. A cloud computing service
Answer: A. A data processing tool
Explanation: Hive is a data processing tool that provides an SQL-like interface to Hadoop, allowing
users to query and analyze large datasets stored in Hadoop Distributed File System (HDFS).
2. Which of the following is NOT a data warehouse system that can be integrated with Hive?
A. Apache HBase
B. Apache Cassandra
C. Apache Druid
D. Apache Kylin
Answer: B. Apache Cassandra
Explanation: Hive can integrate with various data warehouse systems, including Apache HBase,
Apache Druid, and Apache Kylin, but not Apache Cassandra, which is a NoSQL database.
3. What is the language used to write Hive queries?
A. Java
B. Python
C. SQL
D. HiveQL
Answer: D. HiveQL
Explanation: Hive provides a SQL-like interface called HiveQL, which allows users to write queries
to analyze data stored in Hadoop.
4. Which of the following is a Hive built-in function for filtering data based on multiple
conditions?
A. BETWEEN
B. IN
C. LIKE
D. CASE
Answer: D. CASE
Explanation: The CASE function in Hive allows users to filter data based on multiple conditions. It
works like a switch statement in other programming languages.
5. Which of the following commands is used to create a new database in Hive?
A. CREATE TABLE
B. CREATE PARTITION
C. CREATE DATABASE
D. CREATE VIEW
Answer: C. CREATE DATABASE
Explanation: The CREATE DATABASE command is used to create a new database in Hive.
6. What is the default file format used by Hive to store data in HDFS?
A. CSV
B. Avro
C. Parquet
D. ORC
Answer: D. ORC
Explanation: The default file format used by Hive to store data in HDFS is ORC (Optimized Row
Columnar).
7. What is a Hive partition?
A. A subset of data in a Hive table
B. A type of Hive table
C. A directory in HDFS
D. A Hive database
Answer: A. A subset of data in a Hive table
Explanation: A Hive partition is a subset of data in a Hive table that is based on a specific column
value.
8. Which of the following commands is used to create a Hive table?
A. CREATE DATABASE
B. CREATE PARTITION
C. CREATE VIEW
D. CREATE TABLE
Answer: D. CREATE TABLE
Explanation: The CREATE TABLE command is used to create a new table in Hive.
9. Which of the following is NOT a supported file format for storing data in Hive?
A. CSV
B. JSON
C. XML
D. YAML
Answer: D. YAML
Explanation: Hive supports various file formats for storing data, including CSV, JSON, and XML,
but not YAML.
10. What is Hive metastore?
A. A tool for managing Hive databases
B. A file format for storing Hive metadata
C. A component that stores metadata for Hive tables and partitions
D. A Hive server that processes queries
Answer: C. A component that stores metadata for Hive tables and partitions
Explanation: Hive metastore is a component that stores metadata for Hive tables and partitions,
including table schemas, column definitions, and partition locations.
11. Which of the following commands is used to load data into a Hive table?
A. INSERT INTO
B. LOAD DATA
C. CREATE TABLE
D. ALTER TABLE
Answer: B. LOAD DATA
Explanation: The LOAD DATA command is used to load data into a Hive table from an external file.
12. Which of the following is NOT a data type supported by Hive?
A. BOOLEAN
B. CHAR
C. ARRAY
D. FLOAT
Answer: B. CHAR
Explanation: Hive supports various data types, including BOOLEAN, ARRAY, and FLOAT, but not
CHAR.
13. What is the purpose of Hive’s EXPLAIN command?
A. To execute a Hive query
B. To display the query plan for a Hive query
C. To debug a Hive query
D. To optimize a Hive query
Answer: B. To display the query plan for a Hive query
Explanation: The EXPLAIN command in Hive is used to display the query plan for a Hive query,
showing how the query will be executed and which operations will be used.
14. Which of the following commands is used to remove a Hive table?
A. DROP DATABASE
B. DROP PARTITION
C. DROP VIEW
D. DROP TABLE
Answer: D. DROP TABLE
Explanation: The DROP TABLE command is used to remove a Hive table.
15. Which of the following is NOT a Hive function for manipulating strings?
A. SUBSTRING
B. LENGTH
C. CONCAT
D. ADD
Answer: D. ADD
Explanation: Hive provides various built-in functions for manipulating strings, including
SUBSTRING, LENGTH, and CONCAT, but not ADD.
16. Which of the following commands is used to create an external table in Hive?
A. CREATE TABLE
B. CREATE EXTERNAL TABLE
C. CREATE MANAGED TABLE
D. CREATE TEMPORARY TABLE
Answer: B. CREATE EXTERNAL TABLE
Explanation: The CREATE EXTERNAL TABLE command is used to create an external table in
Hive, which points to data stored outside of Hive.
17. What is the purpose of Hive’s GROUP BY clause?
A. To group data based on specific column values
B. To sort data based on specific column values
C. To filter data based on specific column values
D. To join multiple tables based on specific column values
Answer: A. To group data based on specific column values
Explanation: The GROUP BY clause in Hive is used to group data based on specific column values,
allowing users to aggregate and summarize data.
18. Which of the following commands is used to rename a Hive table?
A. RENAME TABLE
B. ALTER TABLE
C. UPDATE TABLE
D. MODIFY TABLE
Answer: A. RENAME TABLE
Explanation: The RENAME TABLE command is used to rename a Hive table.
19. Which of the following is NOT a supported join type in Hive?
A. INNER JOIN
B. LEFT OUTER JOIN
C. RIGHT OUTER JOIN
D. FULL OUTER JOIN
Answer: D. FULL OUTER JOIN
Explanation: Hive supports various join types, including INNER JOIN, LEFT OUTER JOIN, and
RIGHT OUTER JOIN, but not FULL OUTER JOIN.
20. Which of the following commands is used to add a new column to a Hive table?
A. ADD COLUMN
B. ALTER COLUMN
C. MODIFY COLUMN
D. CHANGE COLUMN
Answer: A. ADD COLUMN
Explanation: The ADD COLUMN command is used to add a new column to a Hive table.
21. Which of the following is NOT a Hive data format for storing data in HDFS?
A. ORC
B. Parquet
C. Avro
D. JSON
Answer: D. JSON
Explanation: Hive supports various data formats for storing data in HDFS, including ORC, Parquet,
and Avro, but not JSON.
22. What is the purpose of Hive’s HAVING clause?
A. To group data based on specific column values
B. To sort data based on specific column values
C. To filter data based on specific column values
D. To limit the number of results returned by a query
Answer: C. To filter data based on specific column values
Explanation: The HAVING clause in Hive is used to filter data based on specific column values after
the GROUP BY clause has been applied.
23. Which of the following is a valid way to insert data into a Hive table?
A. INSERT INTO my_table VALUES (1, ‘hello’, true)
B. LOAD DATA INPATH ‘/path/to/data’ INTO TABLE my_table
C. COPY FROM ‘/path/to/data’ TO TABLE my_table
D. IMPORT DATA ‘/path/to/data’ INTO TABLE my_table
Answer: B. LOAD DATA INPATH ‘/path/to/data’ INTO TABLE my_table
Explanation: The LOAD DATA INPATH command is used to insert data into a Hive table from an
external file.
24. Which of the following commands is used to list all of the tables in a Hive database?
A. SHOW DATABASES
B. SHOW TABLES
C. DESCRIBE DATABASE
D. DESCRIBE TABLE
Answer: B. SHOW TABLES
Explanation: The SHOW TABLES command is used to list all of the tables in a Hive database.
25. Which of the following is NOT a Hive function for working with dates and times?
A. YEAR
B. MONTH
C. HOUR
D. CONCAT
Answer: D. CONCAT
Explanation: Hive provides various built-in functions for working with dates and times, including
YEAR, MONTH, and HOUR, but not CONCAT.
26. Which of the following is a valid Hive query to select all of the columns from a table called
my_table?
A. SELECT * FROM my_table
B. SELECT ALL FROM my_table
C. SELECT COLUMNS FROM my_table
D. SELECT DATA FROM my_table
Answer: A. SELECT * FROM my_table
Explanation: The SELECT * FROM command is used to select all of the columns from a table in
Hive.
27. Which of the following commands is used to add a new partition to a Hive table?
A. ADD PARTITION
B. ALTER PARTITION
C. MODIFY PARTITION
D. CHANGE PARTITION
Answer: A. ADD PARTITION
Explanation: The ADD PARTITION command is used to add a new partition to a Hive table.
28. Which of the following is a valid way to create a Hive table with a custom delimiter?
A. CREATE TABLE my_table (col1 INT, col2 STRING) DELIMITER ‘,’
B. CREATE TABLE my_table (col1 INT, col2 STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ‘,’
C. CREATE TABLE my_table (col1 INT, col2 STRING) TERMINATED BY ‘,’
D. CREATE TABLE my_table (col1 INT, col2 STRING) DELIMITED BY ‘,’
Answer: B. CREATE TABLE my_table (col1 INT, col2 STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
Explanation: The ROW FORMAT DELIMITED FIELDS TERMINATED BY command is used to
create a Hive table with a custom delimiter.
29. Which of the following is a valid Hive query to select the top 10 rows from a table called
my_table?
A. SELECT * FROM my_table LIMIT 10
B. SELECT TOP 10 FROM my_table
C. SELECT FIRST 10 FROM my_table
D. SELECT ROW
Answer: A. SELECT * FROM my_table LIMIT 10
Explanation: The LIMIT clause is used to limit the number of rows returned by a Hive query, and it
can be used with the SELECT statement to select the top N rows from a table.
30. Which of the following commands is used to drop a Hive table?
A. DROP TABLE my_table
B. REMOVE TABLE my_table
C. DELETE TABLE my_table
D. DESTROY TABLE my_table
Answer: A. DROP TABLE my_table
Explanation: The DROP TABLE command is used to drop a Hive table.
31. Which of the following commands is used to list all of the databases in Hive?
A. SHOW DATABASES
B. LIST DATABASES
C. DESCRIBE DATABASES
D. DISPLAY DATABASES
Answer: A. SHOW DATABASES
Explanation: The SHOW DATABASES command is used to list all of the databases in Hive.
32. Which of the following is a valid way to create a Hive table that is partitioned by date?
A. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED BY (date_col DATE)
B. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED ON date_col
C. CREATE TABLE my_table (col1 INT, col2 STRING) DATE PARTITIONED
D. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED BY date_col
Answer: D. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED BY date_col
Explanation: The PARTITIONED BY command is used to create a Hive table that is partitioned by a
specific column, such as a date column.
33. Which of the following commands is used to modify the structure of a Hive table?
A. MODIFY TABLE
B. ALTER TABLE
C. CHANGE TABLE
D. UPDATE TABLE
Answer: B. ALTER TABLE
Explanation: The ALTER TABLE command is used to modify the structure of a Hive table, such as
adding or dropping columns.
34. Which of the following is a valid Hive query to select the distinct values of a column from a
table called my_table?
A. SELECT DISTINCT col1 FROM my_table
B. SELECT UNIQUE col1 FROM my_table
C. SELECT ALL DISTINCT col1 FROM my_table
D. SELECT DISTINCT ALL col1 FROM my_table
Answer: A. SELECT DISTINCT col1 FROM my_table
Explanation: The SELECT DISTINCT command is used to select the distinct values of a column
from a table in Hive.
35. Which of the following commands is used to set the delimiter for a Hive query output file?
A. SET DELIMITER
B. SET TERMINATOR
C. SET OUTPUT DELIMITER
D. SET OUTPUT TERMINATOR
Answer: C. SET OUTPUT DELIMITER
Explanation: The SET OUTPUT DELIMITER command is used to set the delimiter for a Hive query
output file.
36. Which of the following is a valid Hive query to join two tables called table1 and table2 on a
common column called col1?
A. SELECT * FROM table1, table2 WHERE table1.col1 = table2.col1
B. SELECT * FROM table1 JOIN table2 ON table1.col1 = table2.col1
C. SELECT * FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1
D. All of the above
Answer: D. All of the above
Explanation: All of the above options are valid ways to join two tables in Hive.
37. Which of the following is a valid Hive query to filter rows in a table called my_table where
the value of col1 is greater than 10?
A. SELECT * FROM my_table WHERE col1 > 10
B. SELECT * FROM my_table HAVING col1 > 10
C. SELECT * FROM my_table FILTER col1 > 10
D. All of the above
Answer: A. SELECT * FROM my_table WHERE col1 > 10
Explanation: The WHERE clause is used to filter rows in Hive, and the > operator can be used to
compare the value of a column to a specific value.
38. Which of the following is a valid Hive query to group the rows in a table called my_table by
the values in col1 and calculate the sum of col2 for each group?
A. SELECT col1, SUM(col2) FROM my_table GROUP BY col1
B. SELECT col1, AVG(col2) FROM my_table GROUP BY col1
C. SELECT col1, MAX(col2) FROM my_table GROUP BY col1
D. All of the above
Answer: A. SELECT col1, SUM(col2) FROM my_table GROUP BY col1
Explanation: The GROUP BY clause is used to group the rows in Hive by the values in one or more
columns, and aggregate functions like SUM can be used to calculate the sum of another column for
each group.
39. Which of the following commands is used to create a Hive database?
A. CREATE DATABASE my_db
B. MAKE DATABASE my_db
C. ADD DATABASE my_db
D. BUILD DATABASE my_db
Answer: A. CREATE DATABASE my_db
Explanation: The CREATE DATABASE command is used to create a Hive database.
40. Which of the following is a valid Hive query to order the rows in a table called my_table by
the values in col1 in descending order?
A. SELECT * FROM my_table ORDER BY col1 DESC
B. SELECT * FROM my_table SORT BY col1 DESC
C. SELECT * FROM my_table ARRANGE BY col1 DESC
D. SELECT * FROM my_table GROUP BY col1 DESC
Answer: A. SELECT * FROM my_table ORDER BY col1 DESC
Explanation: The ORDER BY clause is used to order the rows in Hive by the values in one or more
columns, and the DESC keyword can be used to order the rows in descending order.
41. Which of the following commands is used to load data into a Hive table from a file?
A. LOAD DATA my_table FROM ‘/path/to/file’
B. INSERT DATA my_table FROM ‘/path/to/file’
C. LOAD DATA INFILE ‘/path/to/file’ INTO TABLE my_table
D. INSERT INTO my_table FROM ‘/path/to/file’
Answer: C. LOAD DATA INFILE ‘/path/to/file’ INTO TABLE my_table
Explanation: The LOAD DATA INFILE command is used to load data into a Hive table from a file.
42. Which of the following is a valid Hive query to select the top 10 rows from a table called
my_table, ordered by the values in col1 in descending order?
A. SELECT * FROM my_table ORDER BY col1 DESC LIMIT 10
B. SELECT * FROM my_table ORDER BY col1 DESC FETCH FIRST 10 ROWS ONLY
C. SELECT * FROM my_table ORDER BY col1 DESC ROWS 10
D. SELECT * FROM my_table ORDER BY col1 DESC TOP 10
Answer: A. SELECT * FROM my_table ORDER BY col1 DESC LIMIT 10
Explanation: The LIMIT clause can be used with the SELECT statement to select the top N rows
from a table in Hive, and the ORDER BY clause can be used to order the rows by the values
43. Which of the following Hive functions is used to calculate the average value of a column?
A. SUM()
B. COUNT()
C. AVG()
D. MAX()
Answer: C. AVG()
Explanation: The AVG() function is used to calculate the average value of a column in Hive.
44. Which of the following commands is used to create a Hive table that is partitioned by the
values in a specific column?
A. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED BY (col3 INT)
B. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITION col3 BY (INT)
C. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITION BY col3 INT
D. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITION (col3 INT)
Answer: A. CREATE TABLE my_table (col1 INT, col2 STRING) PARTITIONED BY (col3 INT)
Explanation: The PARTITIONED BY clause is used to create a Hive table that is partitioned by the
values in a specific column.
45. Which of the following Hive functions is used to calculate the maximum value of a column?
A. SUM()
B. COUNT()
C. AVG()
D. MAX()
Answer: D. MAX()
Explanation: The MAX() function is used to calculate the maximum value of a column in Hive.
46. Which of the following commands is used to drop a Hive database?
A. DROP DATABASE my_db
B. DELETE DATABASE my_db
C. REMOVE DATABASE my_db
D. ERASE DATABASE my_db
Answer: A. DROP DATABASE my_db
Explanation: The DROP DATABASE command is used to drop a Hive database.
47. Which of the following is a valid Hive query to join two tables called table1 and table2 on the
values in col1?
A. SELECT * FROM table1 JOIN table2 ON table1.col1 = table2.col1
B. SELECT * FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1
C. SELECT * FROM table1 LEFT OUTER JOIN table2 ON table1.col1 = table2.col1
D. All of the above
Answer: D. All of the above
Explanation: All three of these queries are valid ways to join two tables in Hive.
48. Which of the following Hive functions is used to calculate the total number of rows in a
table?
A. SUM()
B. COUNT()
C. AVG()
D. MAX()
Answer: B. COUNT()
Explanation: The COUNT() function is used to calculate the total number of rows in a table in Hive.
49. Which of the following commands is used to insert data into a Hive table?
A. INSERT DATA INTO my_table VALUES (1, ‘value1’), (2, ‘value2’)
B. INSERT INTO my_table VALUES (1, ‘value1’), (2, ‘value2’)
C. INSERT my_table VALUES (1, ‘value1’), (2, ‘value2’)
D. None of the above
Answer: B. INSERT INTO my_table VALUES (1, ‘value1’), (2, ‘value2’)
Explanation: The INSERT INTO command is used to insert data into a Hive table.
50. Which of the following Hive functions is used to calculate the minimum value of a column?
A. SUM()
B. COUNT()
C. AVG()
D. MIN()
Answer: D. MIN()
Explanation: The MIN() function is used to calculate the minimum value of a column in Hive.
51. Which of the following commands is used to view the data in a Hive table?
A. SHOW DATA my_table
B. SELECT * FROM my_table
C. VIEW DATA my_table
D. DESCRIBE my_table
Answer: B. SELECT * FROM my_table
Explanation: The SELECT command is used to view the data in a Hive table.
52. Which of the following is a valid Hive query to filter rows in a table where col1 is equal to 1?
A. SELECT * FROM my_table WHERE col1 = 1
B. SELECT * FROM my_table HAVING col1 = 1
C. SELECT * FROM my_table GROUP BY col1 HAVING col1 = 1
D. None of the above
Answer: A. SELECT * FROM my_table WHERE col1 = 1
Explanation: The WHERE clause is used to filter rows in a Hive table based on a condition.
53. Which of the following Hive functions is used to concatenate two or more strings together?
A. CONCAT()
B. SUBSTR()
C. UPPER()
D. LOWER()
Answer: A. CONCAT()
Explanation: The CONCAT() function is used to concatenate two or more strings together in Hive.
54. Which of the following commands is used to view the structure of a Hive table?
A. SHOW my_table STRUCTURE
B. DESCRIBE my_table
C. VIEW my_table STRUCTURE
D. None of the above
Answer: B. DESCRIBE my_table
Explanation: The DESCRIBE command is used to view the structure of a Hive table.
55. Which of the following Hive functions is used to return a substring of a string?
A. CONCAT()
B. SUBSTR()
C. UPPER()
D. LOWER()
Answer: B. SUBSTR()
Explanation: The SUBSTR() function is used to return a substring of a string in Hive.
56. Which of the following commands is used to view the list of tables in a Hive database?
A. SHOW TABLES my_db
B. LIST TABLES my_db
C. DESCRIBE DATABASE my_db
D. None of the above
Answer: A. SHOW TABLES my_db
Explanation: The SHOW TABLES command is used to view the list of tables in a Hive database.
57. Which of the following Hive functions is used to convert a string to uppercase?
A. CONCAT()
B. SUBSTR()
C. UPPER()
D. LOWER()
Answer: C. UPPER()
Explanation: The UPPER() function is used to convert a string to uppercase in Hive.
58. Which of the following Hive functions is used to convert a string to lowercase?
A. CONCAT()
B. SUBSTR()
C. UPPER()
D. LOWER()
Answer: D. LOWER()
Explanation: The LOWER() function is used to convert a string to lowercase in Hive.
59. Which of the following commands is used to create a new Hive table?
A. CREATE my_table
B. ADD my_table
C. CREATE TABLE my_table
D. None of the above
Answer: C. CREATE TABLE my_table
Explanation: The CREATE TABLE command is used to create a new Hive table.
60. Which of the following commands is used to load data into a Hive table from an external file?
A. LOAD DATA INFILE ‘file_path’ INTO TABLE my_table
B. LOAD DATA INTO TABLE my_table FROM ‘file_path’
C. INSERT DATA INTO my_table FROM ‘file_path’
D. None of the above
Answer: A. LOAD DATA INFILE ‘file_path’ INTO TABLE my_table
Explanation: The LOAD DATA INFILE command is used to load data into a Hive table from an
external file.

https://www.sanfoundry.com/hadoop-questions-answers-introduction-hive/
1. Which of the following command sets the value of a particular configuration variable (key)?
a) set -v
b) set <key>=<value>
c) set
d) reset
Answer: b
Explanation: If you misspell the variable name, the CLI will not show an error.
2. Point out the correct statement.
a) Hive Commands are non-SQL statement such as setting a property or adding a resource
b) Set -v prints a list of configuration variables that are overridden by the user or Hive
c) Set sets a list of variables that are overridden by the user or Hive
d) None of the mentioned
Answer: a
Explanation: Commands can be used in HiveQL scripts or directly in the CLI or Beeline.
3. Which of the following operator executes a shell command from the Hive shell?
a) |
b) !
c) ^
d) +
Answer: b
Explanation: Exclamation operator is for execution of command.
4. Which of the following will remove the resource(s) from the distributed cache?
a) delete FILE[S] <filepath>*
b) delete JAR[S] <filepath>*
c) delete ARCHIVE[S] <filepath>*
d) all of the mentioned
Answer: d
Explanation: Delete command is used to remove existing resource.
5. Point out the wrong statement.
a) source FILE <filepath> executes a script file inside the CLI
b) bfs <bfs command> executes a dfs command from the Hive shell
c) hive is Query language similar to SQL
d) none of the mentioned
Answer: b
Explanation: dfs <dfs command> executes a dfs command from the Hive shell.
6. _________ is a shell utility which can be used to run Hive queries in either interactive or batch
mode.
a) $HIVE/bin/hive
b) $HIVE_HOME/hive
c) $HIVE_HOME/bin/hive
d) All of the mentioned
Answer: c
Explanation: Various types of command line operations are available in the shell utility.
7. Which of the following is a command line option?
a) -d,–define <key=value>
b) -e,–define <key=value>
c) -f,–define <key=value>
d) None of the mentioned
Answer: a
Explanation: Variable substitution to apply to hive commands. e.g. -d A=B or –define A=B.
8. Which is the additional command line option is available in Hive 0.10.0?
a) –database <dbname>
b) –db <dbname>
c) –dbase <<dbname>
d) All of the mentioned
Answer: a
Explanation: Database is specified which is to be used.
9. The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc and
$HOME/.hiverc as _______ files.
a) processing
b) termination
c) initialization
d) none of the mentioned
Answer: c
Explanation: Hiverc file is loaded as per options selected.
10. When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters _______ mode.
a) Batch
b) Interactive shell
c) Multiple
d) None of the mentioned
11.

You might also like