You are on page 1of 10

MarkoA.

Rodriguez
SupportingtheGraphLandscape

OnGraphComputing
January9,2013

i
34Votes
Theconceptofagraphhasbeenaroundsincethedawnofmechanicalcomputing
andformanydecadespriorinthedomainofpuremathematics.Dueinlargepartto
thisgoldenageofdatabases,graphsarebecomingincreasinglypopularinsoftware
engineering.Graphdatabasesprovideawaytopersistandprocessgraphdata.
However,thegraphdatabaseisnottheonlywayinwhichgraphscanbestoredand
analyzed.Graphcomputinghasahistorypriortotheuseofgraphdatabasesand
hasafuturethatisnotnecessarilyentangledwithtypicaldatabaseconcerns.There
arenumerousgraphtechnologiesthateachhavetheirrespectivebenetsanddrawbacks.Leveragingthe
righttechnologyattherighttimeisrequiredforeectivegraphcomputing.

Structure:ModelingRealWorldScenarioswithGraphs

Agraph(h p://en.wikipedia.org/wiki/Graph_(abstract_data_type))(ornetwork
(h p://en.wikipedia.org/wiki/Complex_network))isadatastructure.Itiscomposedofvertices(dots)
andedges(lines).Manyrealworldscenarioscanbemodeledasagraph.Thisisnotnecessarilyinherent
tosomeobjectivenatureofreality,butprimarilypredicatedonthethefactthathumanssubjectively
interprettheworldintermsofobjects(vertices)andtheirrespectiverelationshipstooneanother(edges)
(anargumentagainst(h p://www.amazon.com/DierentUniverseReinventingPhysics
Bo om/dp/0465038298)thisidea).Thepopulardatamodelusedingraphcomputingistheproperty
graph(h p://github.com/tinkerpop/blueprints/wiki/PropertyGraphModel).Thefollowingexamples
demonstrategraphmodelingviathreedierentscenarios.
ASoftwareGraph

ASoftwareGraph
Stephen(h ps://github.com/spmalle e)isamemberofagraphorientedengineeringgroupcalled
TinkerPop(h p://tinkerpop.com).StephencontributestoRexster(h p://rexster.tinkerpop.com).
Rexsterisrelatedtootherprojectsviasoftwaredependencies.WhenauserndsabuginRexster,they
issueaticket(h ps://github.com/tinkerpop/rexster/issues).Thisdescriptionofacollaborativecoding
environmentcanbeconvenientlycapturedbyagraph.Thevertices(orthings)arepeople,organizations,
projects,andtickets.Theedges(orrelationships)are,forexample,memberships,dependencies,and
issues.Agraphcanbevisualizedusingdotsandlinesandthescenariodescribedaboveisdiagrammed
below.

ADiscussionGraph
Ma hias(h ps://github.com/mbroecheler)isinterestedingraphs.HeistheCTOofAurelius
(h p://thinkaurelius.com)andtheprojectleadforthegraphdatabaseTitan
(h p://thinkaurelius.github.com/titan/).Aureliushasamailinglist
(h ps://groups.google.com/forum/#!forum/aureliusgraphs).Onthismailinglist,peoplediscussgraph
theoryandtechnology.Ma hiascontributestoadiscussion.Hiscontributionsbegetmorecontributions.
Inarecursivemanner,themailinglistmanifestsitselfasatree
(h p://en.wikipedia.org/wiki/Tree_(graph_theory)).Moreover,theunstructuredtextofthemessages
makereferencetosharedconcepts.

AConceptGraph
Agraphcanbeusedtodenotetherelationshipsbetweenarbitraryconcepts,eventheconceptsrelatedto
graph.Forexample,notehowconcepts(initalics)arerelatedinthesentencestofollow.Agraphcanbe
representedasanadjacencylist(h p://en.wikipedia.org/wiki/Adjacency_list).Thegeneralwayin
whichgraphsareprocessedareviagraphtraversals(h p://en.wikipedia.org/wiki/Graph_traversal).
Therearetwogeneraltypesofgraphtraversals:depthrst(h p://en.wikipedia.org/wiki/Depth
rst_search)andbreadthrst(h p://en.wikipedia.org/wiki/Breadthrst_search).Graphscanbe
persistedinasoftwaresystemknownasagraphdatabase
(h p://en.wikipedia.org/wiki/Graph_database).Graphdatabasesorganizeinformationinamanner
dierentfromtherelationaldatabases(h p://en.wikipedia.org/wiki/Relational_databases)ofcommon
softwareknowledge.Inthediagrambelow,theconceptsrelatedtographarelinkedtooneanother
demonstratingthatconceptrelationshipsformagraph.

AMultiDomainGraph

Thethreepreviousscenarios(software,discussion,andconcept)arerepresentationsofrealworld

Thethreepreviousscenarios(software,discussion,andconcept)arerepresentationsofrealworld
systems(e.g.GitHub(h p://github.com/),GoogleGroups(h p://groups.google.com),andWikipedia
(h p://wikipedia.org/)).Theseseeminglydisparatemodelscanbeseamlesslyintegratedintoasingle
atomicgraphstructurebymeansofsharedvertices.Forinstance,intheassociateddiagram,Gremlin
(h p://gremlin.tinkerpop.com)isaTitandependency,TitanisdevelopedbyMa hias,andMa hias
writesmessagesonAureliusmailinglist(softwaremergeswithdiscussion).Next,Blueprints
(h p://blueprints.tinkerpop.com)isaTitandependencyandTitanistaggedgraph(softwaremerges
withconcept).Thedo edlinesidentifyothersuchcrossdomainlinkagesthatdemonstratehowa
universalmodeliscreatedwhenverticesaresharedacrossdomains.Theintegrated,universalmodelcan
besubjectedtoprocessesthatprovidericher(perhaps,moreintelligent)servicesthanwhatany
individualmodelcouldprovidealone.

Process:SolvingRealWorldProblemswithTraversals
Whathasbeenpresentedthusfarisasinglegraphmodelofasetof
interrelateddomains.Amodelisonlyusefulifthereareprocesses
thatcanleverageittosolveproblems.Muchlikedataneeds
algorithms,agraphneedsatraversal
(h p://en.wikipedia.org/wiki/Graph_traversal).Atraversalisan
algorithmic/directedwalkoverthegraphsuchthatpathsare
determined(calledderivations)orinformationisgleaned(called
statistics).Eventhehumanvisualsystemviewingagraph
visualization(h p://en.wikipedia.org/wiki/Graph_drawing)isa
traversalengineleveragingsaccadic
(h p://en.wikipedia.org/wiki/Saccade)movementstoidentifypa erns.However,asgraphsgrowlarge

andproblemsdemandpreciselogic,visualizationsandthehumansinternalcalculatorbreakdown.A

andproblemsdemandpreciselogic,visualizationsandthehumansinternalcalculatorbreakdown.A
collectionoftraversalexamplesarepresentednextthatsolvetypicalproblemsinthepreviously
discusseddomains.
DeterminingCircularDependencies
Withthegrowthofopensourcesoftwareandtheeasebywhichmodulescanbeincorporatedinto
projects,circulardependencies(h p://en.wikipedia.org/wiki/Circular_dependency)aboundandcan
leadtoproblemsinsoftwareengineering.AcirculardependencyoccurswhenprojectAdependson
projectBand,throughsomedependencypath,projectBdependsonprojectA.Whendependenciesare
representedgraphically,atraversalcaneasilyidentifysuchcircularities(e.g.inthediagrambelow,A
>B>D>G>Aisacycle(h p://en.wikipedia.org/wiki/Cycle_(graph_theory))).

RankingDiscussionContributors
Mailinglistsarecomposedofindividualswithvaryinglevelsofparticipationandcompetence.Whena
mailinglistisfocusedonlearningthroughdiscussion,simplywritingamessageisnotnecessarilyasign
ofpositivecontribution.Ifanauthorsmessagesspawnreplies,thenitcanbeinterpretedthattheauthor
iscontributingdiscussionworthymaterial.However,ifanauthorsmessagesendtheconversation,then
theymaybecontributingnonsequitursorinformationthatisnotallowingthediscussiontoourish.In
theassociateddiagram,thebeigeverticesareauthorsandtheirrespectivenumberisauniqueauthorid.

Onewaytorankcontributorsonamailinglististocountthenumberof
messagestheyhaveposted(theauthorsoutdegree
(h p://en.wikipedia.org/wiki/Centrality#Degree_centrality)tomessages
inthemailinglist).However,iftherankingmustaccountforfruitful
contributions,thenauthorscanberankedbythedepthofthediscussion
theirmessagesspawn(thetreedepthoftheauthorsmessages).Finally,
notethatothertechniquessuchassentiment
(h p://en.wikipedia.org/wiki/Sentiment_analysis)andconcept
(h p://en.wikipedia.org/wiki/Formal_concept_analysis)analysiscanbeincludedinorderto
understandtheitentionandmeaningofamessage.
FindingRelatedConcepts
Stephensunderstandingofgraphswasdevelopedwhileworkingon
TinkerPopsgraphtechnologystack.Nowadaysheisinterestedin
learningmoreaboutthetheoreticalaspectsofgraphs.Viahisweb
browser,hevisitsthegraph
(h p://en.wikipedia.org/wiki/Graph_(mathematics))Wikipediapage.In
amanualfashion,Stephenclickslinksandreadsarticlesdepthrst,
graphtraversals,adjacencylists,etc.Herealizesthatpagesreferenceeach
otherandthatsomeconceptsaremorerelatedtoothersdueto
Wikipediaslinkstructure.Themanualprocessofwalkinglinkscanbe
automatedusingagraphtraversal.Insteadofclicking,atraversalcan
startatthegraphvertex,emanateoutwards,andreportwhichconcepts
havebeentouchedthemost.Theconceptthathasseenthemostow
(h p://en.wikipedia.org/wiki/Spreading_activation),isaconceptthathasmanyties(i.e.paths)tograph
(seepriorsalgorithms(h p://dl.acm.org/citation.cfm?id=956782)).Withsuchatraversal,Stephencanbe
providedarankedlistofgraphrelatedconcepts.Thistraversalisanalogoustoawavediusingovera
bodyofwateralbeitrealworldgraphtopologiesarerarelyassimpleasatwodimensionalplane(see
la ice(h p://en.wikipedia.org/wiki/La ice_graph)).
AMultiDomainTraversal
Thedierentgraphmodelsdiscussedpreviously(i.e.software,discussion,andconcept)wereintegrated
intoasingleworldmodelviasharedvertices.Analogously,theaforementionedgraphtraversalscanbe
composedtoyieldasolutiontoacrossdomainproblem.Forexample:
Recommendmeprojectstoparticipateinthatmaintainaproperdependencystructure,haveengaging
contributorspromotingthespace,andareconceptuallyrelatedtotechnologiesIveworkedonpreviously.
Thistypeofproblemsolvingispossiblewhenaheterogenousnetworkofthingsislinkedtogetherand
eectivelymovedwithin.Themeansoflinkingandmovingisthegraphandthetraversal,respectively.
Toconcludethissection,otherusefultraversalexamplesareprovided.
Computeastabilityrankforaprojectbasedonthenumberofissuesithasandthenumberofissuesits
dependencieshave,soforthandsooninarecursivemanner.
Clusterprojectsaccordingtoshared(orsimilar)conceptsbetweenthem.
RecommendateamofdevelopersforanupcomingprojectthatwilluseXdependenciesandisrelatedtoY

RecommendateamofdevelopersforanupcomingprojectthatwilluseXdependenciesandisrelatedtoY
concepts.
Rankissuesbythenumberofprojectsthateachissuessubmierhascontributedto.

GraphComputingTechnologies
Thepracticeofcomputingisaboutridingthenelinebetweentwoentangledquantities:spaceandtime.
Intheworldofgraphcomputing,thesametradeosexist.Thissectionwilldiscussvariousgraph
technologiesinordertoidentifywhatisgainedandsacricedwitheachchoice.Moreover,afew
exampletechnologiesarepresented.Notethatmanymoretechnologiesexistandthementioned
examplesarebynomeansexhaustive.
InMemoryGraphToolkits
Inmemorygraphtoolkitsaresingleusersystemsthatareorientedtowards
graphanalysisandvisualization.Theyusuallyprovideimplementationsofthe
numerousgraphalgorithmsdenedinthegraphtheory
(h p://en.wikipedia.org/wiki/Graph_theory)andnetworkscience
(h p://en.wikipedia.org/wiki/Network_science)literature(seeWikipediaslist
ofgraphalgorithms(h p://en.wikipedia.org/wiki/Category:Graph_algorithms)).Thelimitingfactorof
thesetoolsisthattheycanonlyoperateongraphsthatcanbestoredinlocal,mainmemory.Whilethis
canbelarge(millionsofedges),itisnotalwayssucient.Ifthesourcegraphdatasetistoolargetot
intomainmemory,thensubsetsaretypicallyisolatedandprocessedusingsuchinmemorygraph
toolkits.
Examples:JUNG(h p://jung.sourceforge.net/),NetworkX(h p://networkx.lanl.gov/),iGraph
(h p://igraph.sourceforge.net/),Fulgora(comingsoon)
[+]Richgraphalgorithmlibraries
[+]Richgraphvisualizationlibraries
[+]Dierentmemoryrepresentationsfordierentspace/timetradeos
[]Constrainedtographsthatcantintomainmemory
[]Interactionisnormallyverycodeheavy
RealTimeGraphDatabases

Graphdatabasesareperhapsthemostpopularincarnationofagraphcomputingtechnology.They
providetransactionalsemanticssuchasACID(typicaloflocaldatabases)andeventualconsistency
(typicalofdistributeddatabases).Unlikeinmemorygraphtoolkits,graphdatabasesmakeuseofthe
disktopersistthegraph.Onreasonablemachines,localgraphdatabasescansupportacouplebillion
edgeswhiledistributedsystemscanhandlehundredsofbillionsofedges.Atthisscaleandwithmulti
userconcurrency,whererandomaccesstodiskandmemoryareatplay,globalgraphalgorithmsarenot
feasible.Whatisfeasibleislocalgraphalgorithms/traversals.Insteadoftraversingtheentiregraph,
somesetofverticesserveasthesource(orroot)ofthetraversal.
Examples:Neo4j(h p://neo4j.org/),OrientDB(h p://www.orientdb.org/),InniteGraph
(h p://objectivity.com/INFINITEGRAPH),DEX(h p://www.sparsitytechnologies.com/dex.php),
Titan(h p://thinkaurelius.github.com/titan/)
[+]Optimizedforlocalneighborhoodanalyses(egocentrictraversals)
[+]Optimizedforhandlingnumerousconcurrentusers
[+]Interactionsareviagraphorientedquery/traversallanguages
[]Globalgraphanalyticsareinecientduetorandomdiskinteractions
[]Largecomputationaloverheadduetodatabasefunctionality(e.g.transactionalsemantics)
BatchProcessingGraphFrameworks

Batchprocessinggraphframeworksmakeuseofacomputecluster.Mostofthepopularframeworksin
thisspaceleverageHadoop(h p://hadoop.apache.org)forstorage(HDFS)andprocessing
(MapReduce).Thesesystemsareorientedtowardsglobalanalytics.Thatis,computationsthattouchthe
entiregraphdatasetand,inmanyinstances,touchtheentiregraphmanytimesover(iterative
algorithms).Suchanalysesdonotruninrealtime.However,becausetheyperformglobalscansofthe

algorithms).Suchanalysesdonotruninrealtime.However,becausetheyperformglobalscansofthe
data,theycanleveragesequentialreadsfromdisk(seeThePathologyofBigData
(h p://queue.acm.org/detail.cfm?id=1563874)).Finally,liketheinmemorysystems,theyareoriented
towardsthedatascientistor,inaproductionse ing,forfeedingresultsbackintoarealtimegraph
database.
Examples:Hama(h p://hama.apache.org/),Giraph(h p://incubator.apache.org/giraph/),GraphLab
(h p://graphlab.org/),Faunus(h p://thinkaurelius.github.com/faunus/)
[+]Optimizedforglobalgraphanalytics
[+]Processgraphsrepresentedacrossamachinecluster
[+]Leveragessequentialaccesstodiskforfastreadtimes
[]Doesnotsupportmultipleconcurrentusers
[]Arenotrealtimegraphcomputingsystems
Thissectionpresenteddierentgraphcomputingsolutions.Itisimportanttonotethattherealsoexists
hardwaresolutionslikeConveysMXSeries(h p://www.conveycomputer.com/products/mxseries/)
andCraysYARC(h p://www.yarcdata.com/)graphengines.Eachofthetechnologiesdiscussedall
shareoneimportantthemetheyarefocusedonprocessinggraphdata.Thetradeosofeachcategory
aredeterminedbythelimitssetforthbymodernhardware/softwareand,ultimately,theoretical
computerscience.

Conclusion
Totheadept,graphcomputingisnotonlyasetoftechnologies,butawayofthinkingabouttheworldin
termsofgraphsandtheprocessesthereinintermsoftraversals.Asdataisbecomingmoreaccessible,it
iseasiertobuildrichermodelsoftheenvironment.Whatisbecomingmoredicultisstoringthatdata
inaformthatcanbeconvenientlyandecientlyprocessedbydierentcomputingsystems.Thereare
manysituationsinwhichgraphsareanaturalfoundationformodeling.Whenamodelisagraph,then
thenumerousgraphcomputingtechnologiescanbeappliedtoit.

Acknowledgement
MikeLoukides(h p://radar.oreilly.com/mikel)ofOReillywaskindenoughtoreviewmultiple
versionsofthisarticleandindoingso,madethearticleallthebe er.
FromBlog,GraphDatabases,GraphTheory
10Comments

Trackbacks&Pingbacks
1.TheGraphCometh|AI3:::AdaptiveInformation
2.OnGraphComputing[SharedVertices/Merging]AnotherWordForIt
3.OnGraphComputingBigDataAnalytics
4.Links&readsfor2013Week3|MartinsWeeklyCurations
5.LoopyLa icesRedux|Aurelius
6.HereBeMonstersupdatesandhello,Neo4j|theburningmonk.com
7.Gamesys:HereBeMonstersUpdatesandHello,Neo4jNeoTechnology
8.MachineelfinApachesIncubatorLoveofText
9.GraphComputinglinks|graphcomputingfortheArts
10.UseofBigDataInArtProduction|graphcomputingfortheArts
Commentsareclosed.

BlogatWordPress.com.|

You might also like