You are on page 1of 3

UDFs:

Some functions are built-in functions in pig like COUNT, SUM, MAX…..So I can use them directly in
my pig command

Some functions are UDFs which is included in piggybank.


Actually piggybank is a collection of user defined functions contributed by users across the world and
included by the pig after strict checking…like Reverse, UPPER etc..

If I want to UDFs included in piggybank I need to do two things:-

 I have to use register command to register that udf jar


 I need to give fill path of function like org.apache.pig.piggybank.evalution….

Now if I do not want to use register command there are two things:-

1. Either my piggybank.jar or the udfs jar will be in lib folder of pig, I do not need to use register
command
2. There is a property variable called “pig.additional.jars” in pig.properties file, here I can give
the path of my piggybank jar or udfs jar or I can use this property in the command line like
Pig –d pig.additional.jars ‘path of jar’ then this property will be set for this particular session.

Now if I do not want to give full path of function then again there are two things:-

1. I use the define command or operator


2. There is a property variable called “udf.import.list” in pig.properties file, here I can give the
path of that function or I can use this property in the command line like
Pig –d udf.import.list ‘path of the function’ then this property will be set for this particular
session
Q. What is Pig
Ans: - Pig is a high level language used to analyse and process large data sets by applying
series of data transformation on the data sets.

Q. What are pig philosophies?


Ans: - Following are pig philosophies:-

 Pigs eats anything


 Pigs live anywhere
 Pigs are domestic animal
 Pig fly

Pig can eat anything means they can store and operate on any data whether it is structured,
unstructured or semi structured

Pig lives anywhere it mean pig is design to be a language for parallel processing but not tied to a
particular framework. It can live on mapreduce or spark or apache Tej framework.

Pig are domestic animal means it is designed to be easily controlled and modified by its users
Pig allows integration of user code wherever possible. It support UDFs written in java or in a scripting
language that can be compiled down to java like Jython.

Pigs fly means Pig processes data quickly. We want to consistently improve performance, and not
Implement features in ways that weigh Pig down so it can’t fly.

Q. What is local mode in pig?


Ans: - Running pig locally on your machine is known as local mode. Local mode is useful for
prototyping and debugging your Pig Latin scripts. Some people also use it for small data when they
want to apply the same processing to large data—so that their data pipeline is consistent across data
of different sizes—but they do not want to waste cluster resources on small files and small jobs.

Q. Difference between exec and rum command


Ans: - Exec command
exec script name
In this case, alias name and relations defined in the script are not known to grunt shell and thus after
running the script, if we use those name in the interactive session. They are not known to grunt and
thus error will be shown. In the grunt shell history also, those command will not appear which are
there in the script file

Run script name


In this case, alias name and relations defined in the script are known to grunt shell and thus after
running the script, if we use those name in the interactive session. They are known to grunt. Those
command also saved in the grunt history

Q. What is MAP data type?


Ans: - map is like a key value pair, where key will always be a chararray type and value can be of any
data type. Key will always be used as an index to find the related value.
Q. What is Tuple data type?
Ans: - It is a fixed length, ordered collection of element. A tuple is divided into fields. It is analogues
to SQL rows with columns as fields

Q. What is Bag data type?


Ans: - collection of tuples is known as bag. It is an unordered collection

Q. Comments in pig?
Ans: they use both SQL type single line comments (--) and java like multiline comments
(/*………….*/)

You might also like