Professional Documents
Culture Documents
Objec t ives
T his pro gr am e nable s the par ticipan ts to imple me nt the le ar nings o f
the B ig Data T echno lo gy - Hado o p co ur se .
T he pr imar y o bje ctive o f the pr o je ct is to e nhance the par ticipan t’ s
skills in HDFS, Pig, Hive , Sqo o p.
Pr o c edur e
Vie w Apache Sample Lo g
R e fe r apache _ sample .pdf
Unde r stand Apache L o gs
R e fe r apache _ de sc.pdf
So ur ce : http :// ht tpd .apache .o r g /do cs/2 .2 / lo gs.h tml
Use Datase t as give n
R e fe r se ctio n Apache Data Se ts
Par se & Analyz e
R e fe r se ctio n Pro ce dur e
Analyt ics Re quir e me nt
R e fe r se ctio n Analytics R equir e me nt
G e ne r ate Pr o je ct R e por t
R e fe r se ctio n Pro je ct Re po r t
A pa c he Da t a Set s
apache _ w or kse t.lo g - small apache lo g to cre ate yo ur pro to type
usask_ acce ss_ lo g.gz - co mpr e sse d file co ntainin g
" Uo fS_ acce ss_ lo g" ; an apache lo g o f appr o x 23 3 MB
N o te :
" Uo fS_ acce ss_ lo g" to be re name d as " apache _ datase t.lo g"
Site : We b lo gs fr o m N ASA W e b Site
So ur ce : http :// ita .e e .lb l.go v /htm l/co n tr ib /N ASA - HT T P.html
Pr o c edur e
C o py lo g file to L inux & the n to an hdfs fo lde r o f yo ur cho ice
Par se lo g file using Pig. T he fo r mat o f the data- fr ame sho uld match
w ith the “ apache _ http.x lsx” .
Sto r e data the par se d data in the hdfs fo lde r “ apache _ http” .
T he csv file sho uld have fo llo w ing fie lds
Ho st- IP, Date (dd- mmm- yyyy for mat), T ime , (hh:mm:ss fo r mat),
Pr o to co l, UR L , HttpVe r s, Status , B yte s. T ime Zo ne to be igno r e d.
Impo r t into hive by cre ating hive - w ar e ho use data
Pr o vide analysis r e sults as pe r se ctio n Analytic s R e quir e me nt be lo w
C o py r e sults to lo cal file syste m or lo cal MySQ L as per r e quir e me nt.
J ar File
Ple ase use " piggybank - apache .jar " pr o vide d in “ apache fo lde r ” fo r
par sing the apache lo g in pig.
http://ossec-docs.readthedocs.org/en/latest/log_samples/apache/apache.html
http://hadooptutorial.info/processing-logs-in-pig/#Example_Use_case_of_CommonLogLoader
http://kickstarthadoop.blogspot.in/2011/06/analyzing-apache-logs-with-pig.html