You are on page 1of 16

PIG,GRUNT,

HIVE

PRESENTED BY :AKILA 20SPCS01


PIG
 .​Pig is a high-level data processing language which makes it easy for developers to
write data analysis scripts, which are translated into MapReduce programs by the
Pig compiler
 Pig includes: ​
 a high-level language (called Pig Latin) for expressing data analysis programs​
a complier which produces sequences of MapReduce programs from the pig
scripts.​ Insert or Drag & Drop your photo
 Pig can be executed either in local mode or MapReduce mode.​
  In local mode, Pig runs inside a single JVM process on a local machine. ​
 Local mode is useful for development purpose and testing the scripts with small
data files on a single machine. ​
 MapReduce mode requires a Hadoop cluster. ​
 In MapReduce mode, Pig can analyze data stored in HDFS. ​

Contoso 2
S u i t e s
Pigcompiler translates the pig scripts into MapReduce programs which are executed in
a Hadoop cluster.​
Pig provides an interactive shell called grunt, for developing pig scripts.​
Data Types in Pig​
Pig support simple data types such as int, long, float, double, chararray, bytearray, boolean, datetime,
and complex data types such as tuple, bag and map.​The simple data types work the same way as in other
programming languages.​
Insert or Drag & Drop your photo
Complex data types ​
Tuple​
A tuple is an ordered set of fields.​
Bag​
A bag is an unordered collection of tuples. A bag is represented with curly braces.​
Map​
A Map is a set of key-value pairs. Map is represented with square brackets and a # is used to separate the
key and value.​

Contoso 3
S u i t e s
Data Filtering & Analysis​
 The FOREACH operator is used to process each row in a relation and the GENERATE
operator is used to define the fields and generate a new row from the original.​

 The FILTER operator is used to filter out tuples from a relation based on the


condition specified. ​
 The GROUP operator can be used to group data in one or more relations..​

 The UNION operator can be used to merge the contents of two or more relations.​
 The JOIN operator is used to join two relations.​

 Pig provides various built-in functions such as AVG, MIN, MAX, SUM, and COUNT.​

Contoso 4
S u i t e s
Storing Results​
 To save the results on the filesystem the STORE operator is used.
 Pig uses a lazy evaluation strategy and delays the evaluation of expressions till a STORE or DUMP
operator triggers the results to be stored or displayed.​
 
Debugging Operators​

 The DUMP operator is used to dump the results on the console. DUMP is used in interactive mode for
debugging purposes.​
 The DESCRIBE operator is used to view the schema of a relation.​
 The EXPLAIN operator is used to view the logical, physical, and MapReduce execution plans for computing a
relation. ​
 The ILLUSTRATE operator is used to display the step by step execution of statements to compute a relation
with a small sample of data.​

Contoso 5
S u i t e s
Apache Pig Grunt Shell Commands​
In order to write Pig Latin scripts, we use the Grunt shell of Apache Pig. ​
By using sh and fs we can invoke any shell commands, before that.​
i. sh Command​
we can invoke any shell commands from the Grunt shell, using the sh command. But
make sure, we cannot execute the commands that are a part of the shell environment (ex
− cd), using the sh command.​
Syntax​
The syntax of the sh command is:​
grunt> sh shell command parameters​
Example​
By using the sh option, we can invoke the ls command of Linux shell from the Grunt
shell. Here, it lists out the files in the /pig/bin/ directory.​
grunt> sh ls​
pig​
pig_1444799121955.log​
pig.cmd​
pig.py​

Contoso 6
S u i t e s
ii. help Command​
The help command gives you a list of Pig commands or Pig
properties.​
Syntax​
By using the help command, we can get a list of Pig commands.​
grunt> help​
iii.history Command​
It is the very useful command, it displays a list of
statements executed/used so far since the Grunt sell is invoked.​
iv. set Command​
Basically, to show/assign values to keys, we use set command in
Pig.​

Contoso 7
S u i t e s
ii. fs Command​
Moreover, we can invoke any fs Shell commands from the Grunt shell by using the fs command.​
Syntax​
The syntax of fs command is:​
grunt> sh File System command parameters​
Example​
By using fs command, we can invoke the ls command of  HDFS from the Grunt shell. Here, it lists the files
in the HDFS root directory.​
grunt> fs –ls​

Utility Commands​
It offers a set of Pig Grunt Shell utility commands. It involves clear, help, history, quiet, and set. Also, there
are some commands to control Pig from the Grunt shell, such as exec, kill, and run.​
i. clear Command​
In order to clear the screen of the Grunt shell, we use Clear Command.​
Syntax​
The syntax of the clear command is:​
grunt> clear​

Contoso 8
S u i t e s
There are several keys we can set values for, using this command. Such as:​
default_parallel​
By passing any whole number as a value to this key, we can set the number of reducers for a
map job.​
debug​
Also, by passing on/off to this key, we can turn off or turn on the debugging feature in Pig.​
job.name​
Moreover, by passing a string value to this key we can set the Job name to the required job.​
job.priority​
By passing one of the following values to this key, we can set the job priority to a job −​
very_low​
low​
normal​
high​
very_high​
stream.skippath​
By passing the desired path in the form of a string to this key, we can set the path from where
the data is not to be transferred, for streaming.​

Contoso 9
S u i t e s
v. quit Command​
We can quit from the Grunt shell, Using this command.​
Syntax​

It Quit from the Grunt shell:​
grunt> quit​

vi. exec Command​
Using the exec command, we can execute Pig scripts from the Grunt shell.​
Syntax​
The syntax of the utility command exec is:​
grunt> exec [–param param_name = param_value] [–param_file file_name] [script]​

Contoso 10
S u i t e s
vii. kill Command​
By using this command, we can kill a job from the Grunt shell.​
Syntax​
Given below is the syntax of the kill command.​
grunt> kill JobId​
viii. run Command​
By using the run command, we can run a Pig script from the Grunt
shell.​
Syntax​
The syntax of the run command is:​
grunt> run [–param param_name = param_value] [–
param_file file_name] script​

Contoso 11
S u i t e s
HIVE
• Architecture of Hive User Interface - Hive is a
data warehouse infrastructure software that can
create interaction between user and HDFS.
• What is Hive? Hive is a data warehouse • The user interfaces that Hive supports are Hive
infrastructure tool to process structure data in Web UI, Hive command line, and Hive HD.
Hadoop. It resides on top of Hadoop to summarize
• HiveQL Process Engine- HiveQL is similar to
Big Data, and makes querying and analyzing easy. SQL for querying on schema info on the
• Initially Hive was developed by Facebook, later Megastore.
the Apache Software Foundation took it up and • It is one of the replacements of traditional
developed it further as an open source under the approach for MapReduce program.
name Apache Hive. • Instead of writing MapReduce program in Java,
• Features of Hive It stores Schema in a database we can write a query for MapReduce job and
and processed data into HDFS(Hadoop process it.
Distributed File System). It is designed for OLAP.
• It provides SQL type language for querying
called HiveQL or HQL. It is familiar, fast,
scalable, and extensible.
•  Architecture of Hive

Contoso 12
S u i t e s
6. Execution Engine - The conjunction part of
HiveQL process Engine and MapReduce is Hive
Execution Engine.
• Execution engine processes the query and
generates results as same as MapReduce results. Get Metadata- The compiler sends metadata request to
• It uses the flavor of MapReduce. Megastore

• Working of Hive: Send Metadata- Metastore sends metadata as a response to the


compiler.
Working of Hive Execute Query- The Hive
• 9. Send Plan- The compiler checks the requirement and
interface such as Command Line or Web UI sends
resends the plan to the driver. Up to here, the parsing and
query Driver to execute.
compiling of a query is complete.
Get Plan- The driver takes the help of query
• Execute Plan- the driver sends the execute plan to the
complier that parses the query to check the syntax
execution engine.
and query plan or the requirement of query.
• Execute Job- Internally, the process of execution job is a
MapReduce job. The execution engine sends the job to
JobTracker, which is in Name node and it assigns this job to
TaskTracker, which is in Data node. Here, the query executes
MapReduce job.

Contoso 13
S u i t e s
• Metadata Ops- Meanwhile in execution, the execution • 14. Literals Floating Point Types - Floating point types
engine can execute metadata operations with Metastore. are nothing but numbers with decimal points. Generally,
Fetch Result- The execution engine receives the results this type of data is composed of DOUBLE data type.
from Data nodes. Send Results- The execution engine Decimal Type - Decimal type data is nothing but floating
sends those resultant values to the driver. Send Results- point value with higher range than DOUBLE data type.
The driver sends the results to Hive Interfaces. The range of decimal type is approximately -10-308 to
10308 .
• 11. Hive- Data Types All the data types in hive are
classified into four types Column Types, Literals, Null
Values ,Complex Types
• 12. Column Types Integral Types - Integer type data can
be specified using integral data types, INT. When the
data range exceeds the range of INT, you need to use
BIGINT and if the data range is smaller than the INT,
you use SMALLINT. TINYINT is smaller than
SMALLINT.

Contoso 14
S u i t e s
1.Complex Types Arrays - Arrays in Hive are used the 19. Partition Hive organizes tables into partitions. It is
same way they are used in Java. Syntax: a way of dividing a table into related parts based on
ARRAY<data_type> Maps - Maps in Hive are similar to the values of partitioned columns such as date, city,
Java Maps. Syntax: MAP<primitive_type, data_type> and department. Using partition, it is easy to query a
Structs - Structs in Hive is similar to using complex data portion of the data. Adding partition- Syntax - hive>
with comment. Syntax: STRUCT<col_name : data_type ALTER TABLE employee ADD PARTITION(year
[ COMMENT col_comment, … ]> =‘2013’) location ‘/2012/part2012’; Dropping partition
- Syntax - hive>ALTER TABLE employee DROP [IF
EXISTS] PARTITION (year=‘2013’);

Contoso 15
S u i t e s
THANK YOU

Contoso 16
S u i t e s

You might also like