You are on page 1of 7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem

DeZyre
Callus18663132409(USTollFree)

Tutorials

ContactUs

Home

Competition

Blog

SignIn

InterestedinlearningmoreaboutBigDataandHadoopCertificationTraining

REQUESTINFO

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoop
Ecosystem
15Oct2014
HadooptechnologyisthebuzzwordthesedaysbutmostoftheITprofessionalsstillarenotawareofthekeycomponentsthatcomprisethe
HadoopEcosystem.Notonlythis,fewofthepeopleareaswellofthethoughtthatBigDataandHadoopareoneandthesame.
JustbeforewejumpontoadetaileddiscussionofthekeycomponentsoftheHadoopEcosystemandtrytounderstandthedifferences
betweenthemletushaveanunderstandingonwhatisHadoopandwhatisBigData.

Generallydatatobestoredinthedatabaseiscategorizedinto3typesnamelyStructuredData,SemiStructuredDataandUnstructured
Data.
StructuredDataisnothingbutdatathatcanbestoredindatabases,forinstance,thetransactionrecordsofanyonlinepurchasethatyou
makecanbestoredinadatabasewhereasdatathatcanonlybepartiallystoredinthedatabaseisreferredtoassemistructureddata,for
instance,thedatathatispresentintheXMLrecordscanbestoredpartiallyinthedatabase.
AnyotherformofdatathatcannotbecategorizedasStructuredorsemistructuredisreferredtoasUnstructuredData,forinstance,thedata
fromSocialNetworkingwebsitesortheweblogswhichcannotbeanalyzedorstoredforprocessinginthedatabasesareexamplesof
unstructureddata.

WegenerallyrefertoUnstructuredDataasBigDataandtheframeworkthatisusedforprocessingBigDataispopularlyknownas
Hadoop.
HadoopEcosystemcomprisesofthefollowingkeycomponents:
1)MapReduceFramework

2)HDFS(HadoopDistributedFileSystem)
3)Hive
4)HBase

5)Pig
6)Flume
7)Sqoop

8)Oozie
9)ZooKeeper

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

1/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem

TheHadoopEcoSystem

ImageCredit:jennyxiaozhang.com/6thingsyouneedtoknowabouthadoop/
ClickheretoknowmoreaboutourIBMCertifiedHadoopDevelopercourse
InthispostwewilldiscussaboutthetwomajorkeycomponentsofHadoopi.e.HiveandPigandhaveadetailedunderstandingofthe
differencebetweenPigandHive.

HIVEHadoop
HiveHadoopwasfoundedbyJeffHammerbacherwhowasworkingwithFacebook.WhenworkingwithFacebookherealizedthatthey
receivehugeamountsofdataonadailybasisandthereneedstobeamechanismwhichcanstore,mineandhelpanalysisofthedata.This
ideatomineandanalyzehugeamountsofdatagavebirthtoHive.ItisHivethathasenabledFacebooktodealwith10sofTerabytesofData
onadailybasiswithease.
HiveissimilartoaSQLInterfaceinHadoop.ThedatathatisstoredinHBasecomponentoftheHadoopEcosystemcanbeaccessed
throughHive.HiveisofgreatusefordeveloperswhoarenotwellversedwiththeMapReduceframeworkforwritingdataqueriesthatare
transformedintoMapReducejobsinHadoop.

WecanconsiderHiveasaDataWarehousingpackagethatisconstructedontopofHadoopforanalyzinghugeamountsofdata.Hiveis
mainlydevelopedforuserswhoarecomfortableinusingSQL.ThebestthingaboutHiveisthatitconceptualizesthecomplexityofHadoop
becausetheusersneednotwriteMapReduceprogramswhenusingHivesoanyonewhoisnotfamiliarwithJavaProgrammingand
HadoopAPIscanalsomakethebestuseofHive.
WecansummarizeHiveas:
a)ADataWarehouseInfrastructure

b)DefinerofaQueryLanguagepopularlyknownasHiveQL(similartoSQL)
C)Providesuswithvarioustoolsforeasyextraction,transformationandloadingofdata.
d)Hiveallowsitsuserstoembedcustomizedmappersandreducers.

WhatmakesHiveHadooppopular?
HiveHadoopprovidestheuserswithstrongandpowerfulstatisticsfunctions.
HiveHadoopislikeSQL,soforanySQLdeveloperthelearningcurveforHivewillalmostbenegligible.
HiveHadoopcanbeintegratedwithHBaseforqueryingthedatainHBasewhereasthisisnotpossiblewithPig.IncaseofPig,afunction
namedHbaseStorage()willbeusedforloadingthedatafromHBase.
HiveHadoophasgainedpopularityasitissupportedbyHue.

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

2/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem
HiveHadoophasvarioususergroupssuchasCNET,Last.fm,Facebook,andDiggandsoon.
PIGHadoop

PigHadoopwasdevelopedbyYahoointheyear2006sothattheycanhaveanadhocmethodforcreatingandexecutingMapReducejobs
onhugedatasets.ThemainmotivebehinddevelopingPigwastocutdownonthetimerequiredfordevelopmentviaitsmultiquery
approach.PigisahighleveldataflowsystemthatrendersyouasimplelanguageplatformpopularlyknownasPigLatinthatcanbeusedfor
manipulatingdataandqueries.
PigisusedbyMicrosoft,YahooandGoogle,tocollectandstorelargedatasetsintheformofwebcrawls,clickstreamsandsearchlogs.
Pigattimesfindsitsusageinadhocanalysisandprocessingofinformation.
WhatmakesPigHadooppopular?

PigHadoopfollowsamultiqueryapproachthusitcutsdownonthenumbertimesthedataisscanned.
PigHadoopisveryeasytolearnreadandwriteifyouarefamiliarwithSQL.
PigprovidestheuserswithawiderangeofnesteddatatypessuchasMaps,TuplesandBagsthatarenotpresentinMapReducealong
withsomemajordataoperationssuchasOrdering,Filters,andJoins.
PerformanceofPigisonparwiththeperformanceofrawMapReduce.
Pighasvarioususergroupsforinstance90%ofYahoosMapReduceisdonebyPig,80%ofTwittersMapReduceisalsodonebyPig
andvariousothercompaniessuchasSalesforce,LinkedIn,AOLandNokiaalsoemployPig.
DifferencebetweenPigandHive
ThebelowtabulardatawillgiveyouanoverviewonthebasicdifferencebetweenPigandHive:

PigvsHive
ImageCredit:srinimf.com/2013/08/23/pighivedifferences...

Pigvs.Hive

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

3/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem

DependingonyourpurposeandtypeofdatayoucaneitherchoosetouseHiveHadoopcomponentorPigHadoopComponentbasedonthe
belowdifferences:

1)HiveHadoopComponentisusedmainlybydataanalystswhereasPigHadoopComponentisgenerallyusedbyResearchersand
Programmers.
2)HiveHadoopComponentisusedforcompletelystructuredDatawhereasPigHadoopComponentisusedforsemistructureddata.
3)HiveHadoopComponenthasadeclarativeSQLishlanguage(HiveQL)whereasPigHadoopComponenthasaproceduraldataflow
language(PigLatin)

4)HiveHadoopComponentismainlyusedforcreatingreportswhereasPigHadoopComponentismainlyusedforprogramming.
5)HiveHadoopComponentoperatesontheserversideofanyclusterwhereasPigHadoopComponentoperatesontheclientsideofany
cluster.
6)HiveHadoopComponentishelpfulforETLwhereasPigHadoopisgreatforETLbecauseofitspowerfultransformationandprocessing
capabilities.

7)HivecanstartanoptionalthriftbasedserverthatcansendqueriesfromanynookandcornerdirectlytotheHiveserverwhichwillexecute

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

4/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem
themwhereasthisfeatureisnotavailablewithPig.
8)HivedirectlyleveragesSQLexpertiseandthuscanbelearnteasilywhereasPigisalsoSQLlikebutvariestoagreatextentandthusitwill
takesometimeeffortstomasterPig.

9)HivemakesuseofexactvariationoftheSQLDLLlanguagebydefiningthetablesbeforehandandstoringtheschemadetailsinanylocal
databasewhereasincaseofPigthereisnodedicatedmetadatadatabaseandtheschemasordatatypeswillbedefinedinthescriptitself.
10)TheHiveHadoopcomponenthasaprovisionforpartitionssothatyoucanprocessthesubsetofdatabydateorinanalphabeticalorder
whereasPigHadoopcomponentdoesnothaveanynotionforpartitionsthoughmightbeonecanachievethisthroughfilters.
11)PigsupportsAvrowhereasHivedoesnot.

12)PigcanbeinstalledeasilyoverHiveasitiscompletelybasedonshellinteraction
13)PigHadoopComponentrendersuserswithsampledataforeachscenarioandeachstepthroughitsIllustratefunctionwhereasthis
featureisnotincorporatedwiththeHiveHadoopComponent.
14)HivehassmartinbuiltfeaturesonaccessingrawdatabutincaseofPigLatinScriptswearenotprettysurethataccessingrawdatais
asfastaswithHiveQL.

15)Youcanjoin,orderandsortdatadynamicallyinanaggregatedmannerwithHiveandPighoweverPigalsoprovidesyouanadditional
COGROUPfeatureforperformingouterjoins.
ToconcludewithafterhavingunderstoodthedifferencebetweenPigandHive,tomebothHiveHadoopandPigHadoopComponentwillhelp
youachievethesamegoals,wecansaythatPigisascriptkiddyandHivecomesin,innateforallthenaturaldatabasedevelopers.Whenit
comestoaccesschoices,HiveissaidtohavemorefeaturesoverPig.BoththeHiveandPigcomponentsarereportedlyhavingnearabout
thesamenumberofcommittersineveryprojectandlikelyinthenearfuturewearegoingtoseegreatadvancementsinbothonthe
developmentfront.
ClickheretoknowmoreaboutourIBMCertifiedHadoopDevelopercourse

PREVIOUS

NEXT

Follow

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

5/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem
4Comments

DeZyre

Share

Recommend 1

Login

SortbyNewest

Jointhediscussion
rubalsood amonthago

HieeFolks,
I'mamtechstudentandimademythesisinbigdataandthereianalysesocialdatathroughhiveandishowthatittakelesstimeto
fetchdataascomparedtoexistingtool,butnowihavetocompareitwithotherandshowhiveisbetterthanothertool!!soanyone
tellmehowicancompareandfromwhichtoolicancompare!!

Reply Share

manikandankvpbca 6monthsago

NIceartical...

Reply Share

Newuser 8monthsago

GoodArticle.
Howhbasecanbecomparedwithpigandhive?

Reply Share

vijaykaushik>Newuser 4monthsago

Hbaseisadatabase.pigandhivearetools/languagesusedtoread/writetoHbase.

Subscribe

Reply Share

AddDisqustoyoursite

Privacy

Hadoop2.0(YARN)FrameworkTheGatewaytoEasier

HowBigDataAnalysishelpedincreaseWalmartsSalesturnover?

ProgrammingforHadoopUsers

DataScienceProgramming:PythonvsR

DeZyreInSync:HowtobecomeaDataScientist

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

6/7

8/25/2015

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem

DifferencebetweenPigandHiveTheTwoKeyComponentsofHadoopEcosystem

HadoopMapReducevs.ApacheSparkWhoWinstheBattle?

http://www.dezyre.com/article/differencebetweenpigandhivethetwokeycomponentsofhadoopecosystem/79

7/7