You are on page 1of 2

PIG Exercise # 1

@Create the below dataset and save the file as “store_transactions.txt”


person, dstore, spent
A, S, 3.3
A, S, 4.7
B, S, 1.2
B, T, 3.4
C, Z, 1.1
C, T, 5.5
D, R, 1.1

@ Apache Pig Script:

a) List total sales per department stores:

grunt>
data = LOAD 'cloudera/hadoop/store_transactions.txt' using
PigStorage(',') as (person:chararray, dstores:chararray, spent:float);

grp = FOREACH (GROUP data BY dstores) {


GENERATE group, COUNT(data.person) AS visitors,
(FLOAT)SUM(data.spent) AS revenue;
};

dump grp;

@Apache Pig Output on Grunt Shell:


The output has dept. store, customer count, total sales.

( R,1,1.1)
( S,3,9.2)
( T,2,8.9)
( Z,1,1.1)

@ Apache Pig Script:

b) List total sales per customer:

data = LOAD 'Documents/store_transactions.txt' using PigStorage(',') as


(person:chararray, dstores:chararray, spent:float);

grp = FOREACH (GROUP data BY person) {


GENERATE group, COUNT(data.person) AS visitors,
(FLOAT)SUM(data.spent) AS revenue; -- dereference is required to
original structure columns
};
dump grp;

@Apache Pig Output on Grunt Shell: The output has customer id, total
transactions per customer, total sales.

(A,2,8.0)
(B,2,4.6000004)
(C,2,6.6)
(D,1,1.1)

You might also like