Professional Documents
Culture Documents
Lecture
on
“Introduction to PIG”
.
Hadoop Ecosystem
Pig Introduction
Pig Introduction
Developed at Yahoo around 2006
Atom
Atom is defined as any single value in Pig Latin,
irrespective of their data. Basically, we can use it as
string and number and store it as the string. Atomic
values of Pig are int, long, float, double, char array, and
byte array. Moreover, a field is a piece of data or a
simple atomic value in Pig.
For Example − ‘Shubham’ or ‘25
Pig Data Model
Tuple
Bag
Relation
Relation is a bag
Pig Pros and Cons
Advantages of Apache Pig
Less development time
Easy to learn
Procedural language
Dataflow
Easy to control execution
UDFs
Usage of Hadoop features
Pig Pros and Cons
Limitations of Apache Pig
Errors of Pig
Not mature
Support
Minor one
Implicit data schema
Delay in execution
Pig Commands
Statement Description
Load Read data from the file system
Store Write data to the file system
Dump Write output to stdout
Foreach Apply expression to each record and generate one or
more records
Filter Apply predicate to each record and remove records
where false
Group / Cogroup Collect records with the same key from one or more
inputs
Join Join two or more inputs based on a key
Order Sort records based on a Key
Distinct Remove duplicate records
Union Merge two datasets
Pig Latin Example
Suppose we have a table
urls: (url, category, pagerank)
Prerequisite:
It is essential that you have Hadoop and Java installed
on your system before you go for Apache Pig
Installation
Step 1: Download Apache Pig by given Link
https://downloads.apache.org/pig/pig-0.17.0/
export PIG_HOME=/home/virendra/pig-0.17.0
export PIG_CLASSPATH=$HADOOP_HOME/conf
Pig Installation
Step 5: Run below command to make the changes get
updated in same terminal.
Command: source .bashrc
Step 6: Check pig version. This is to test that Apache Pig got
installed correctly.
Command: pig –version
Step 7: Check pig help to see all the pig command options.
Command: pig -help