You are on page 1of 3

1. Data file – Employee_data.txt in HDFS. Its content is.

• 001,mehul,chourey,21,9848022337,Hyderabad
• 002,Ankur,Dutta,22,9848022338,Kolkata
• 003,Shubham,Sengar,22,9848022339,Delhi
• 004,Prerna,Tripathi,21,9848022330,Pune
• 005,Sagar,Joshi,23,9848022336,Bhubaneswar
• 006,Monika,sharma,23,9848022335,Chennai
• 007,pulkit,pawar,24,9848022334,trivandrum
• 008,Roshan,Shaikh,24,9848022333,Chennai

2. Also, using the LOAD operator, we have read it into a relation Employee.
a) grunt> Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee_data.txt' USING
PigStorage(',')
b) as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray
);

3. Describe the relation named Employee. Then verify the schema.


grunt> describe Employee

4. Output #1
grunt> Employee: { id: int,firstname: chararray,lastname: chararray,phone: chararray,city:
chararray

5. Output#2
grunt> explain Employee;

6. Illustrate the relation named Employee as.


grunt> illustrate Employee;

7. Output#3
grunt> illustrate Employee;

8. Group the records/tuples in the relation by age.


grunt> group_data = GROUP Employee_details by age;

9. Verify the relation group_data.


grunt> Dump group_data;

10. Describe command see the schema of the table.


grunt> Describe group_data;

11. Illustrate command to get the sample illustration of the schema.


grunt> Illustrate group_data;

12. Group the relation by age and city.


grunt> group_multiple = GROUP Employee_details by (age, city);
13. Verify the content of the relation named group_multiple.
grunt> Dump group_multiple;

14. Group a relation by all the columns.


grunt> group_all = GROUP Employee_details All;

15. Verify the content of the relation group_all.


grunt> Dump group_all;

16. Two files, Employee_details.txt and Clients_details.txt in the HDFS directory /pig_data/.
• Employee_details.txt
001,mehul,chourey,21,9848022337,Hyderabad
002,Ankur,Dutta,22,9848022338,Kolkata
003,Shubham,Sengar,22,9848022339,Delhi
004,Prerna,Tripathi,21,9848022330,Pune
005,Sagar,Joshi,23,9848022336,Bhubaneswar
006,Monika,sharma,23,9848022335,Chennai
007,pulkit,pawar,24,9848022334,trivandrum
008,Roshan,Shaikh,24,9848022333,Chennai

• Clients_details.txt
001,Kajal,22,new york
002,Vaishnavi,23,Kolkata
003,Twinkle,23,Tokyo
004,Manish,25,London
005,Purva,23,Bhubaneswar
006,Vishal,22,Chennai

17. Load Files into PIG


grunt> Employee_details = LOAD
'hdfs://localhost:9000/pig_data/Employee_details.txt' USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray,
city:chararray);
grunt> Clients_details = LOAD 'hdfs://localhost:9000/pig_data/Clients_details.txt'
USING PigStorage(',')
as (id:int, name:chararray, age:int, city:chararray);

18. Group the records/tuples of the relations Employee_details and Clients_details


grunt> cogroup_data = COGROUP Employee_details by age, Clients_details by age;
grunt> Dump cogroup_data;

19. Two files namely Users.txt and orders.txt in HDFS – Users.txt


• Users.txt
1,Sanjeev,32,Ahmedabad,2000.00
2,Ankit,25,Delhi,1500.00
3,Raj,23,Kota,2000.00
4,Sumit,25,Mumbai,6500.00
5,Pankaj,27,Bhopal,8500.00
6,Vishnu,22,MP,4500.00
7,Ravi,24,Indore,10000.00
• orders.txt
102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060

20. CROSS operator


grunt> cross_data = CROSS Users, orders;

21. Verify

22. Two files namely Employee_data1.txt and Employee_data2.txt in the /pig_data/ directory of
HDFS
• Employee_data1.txt
001,mehul,chourey,9848022337,Hyderabad
002,Ankur,Dutta,9848022338,Kolkata
003,Shubham,Sengar,9848022339,Delhi
004,Prerna,Tripathi,9848022330,Pune
005,Sagar,Joshi,9848022336,Bhubaneswar
006,Monika,sharma,9848022335,Chennai

• Employee_data2.txt

7,Prachi,Yadav,9848022334,trivendram.
8,Avikal,Singh,9848022333,Chennai.

23. UNION operator


grunt> Employee = UNION Employee1, Employee2;

24. SPLIT operator


SPLIT Employee_details into Employee_details1 if age<23, Employee_details2 if (22<age and
age>25);

25. Verify Operator


grunt> Dump Employee
grunt> Dump Employee_details2;

26. Limit operator


grunt> limit_data = LIMIT Employee_details 4;
grunt> Dump limit_data;

You might also like