You are on page 1of 33

INTRODUCTION TO LINUX CLUSTERING

DOCUMENTRELEASE1.1

Copyright2008JethroCarr
Thisdocumentmaybefreelydistributedprovidedthatitisnotmodifiedandthatfullcreditisgiven
totheoriginalauthor.
Ifyoupublishthisdocumentanywhere,pleasedoletmeknowviaemail,andifitispublishedina
physicalmedium,sendingmeacopywouldbeappreciated.
Email:
Website:

jethro.carr@jethrocarr.com
www.jethrocarr.com

TableofContents
1Introduction......................................................................................................................................4
2Aboutclusters...................................................................................................................................5
3Advantagesandreasonsforclustering.............................................................................................6
4Clusteringfundamentals...................................................................................................................8
4.1Basics........................................................................................................................................8
4.2Importantclusteringcomponents.............................................................................................8
4.2.1Failover..............................................................................................................................8
4.2.2Fencing..............................................................................................................................9
4.2.3SplitBrain.......................................................................................................................10
4.2.4Quorum...........................................................................................................................10
5Clustermanagementsoftware.........................................................................................................12
5.1RedhatClusterSuite................................................................................................................12
5.1.1ManagementandConfiguration......................................................................................12
5.1.2luciandricci....................................................................................................................13
5.1.3systemconfigcluster.......................................................................................................13
5.1.4Loadbalancing.................................................................................................................13
6CombiningXenwithclusters.........................................................................................................15
6.1VMsaspartofmaincluster...................................................................................................15
6.2VMsrunsasaseparatecluster................................................................................................15
7StorageManagement.......................................................................................................................16
7.1Centralisedstorage...................................................................................................................16
7.1.1SANStorageareanetwork.............................................................................................17
7.1.2NASNetworkAttachedStorage....................................................................................18
7.1.3CommodityNetworkFileShare......................................................................................19
7.2Accessmethodsforcentralisedstorage...................................................................................20
7.2.1AccessingSANDirectly..................................................................................................20
7.2.2AccessingNASwithiscsi................................................................................................20
7.2.3AccessingNASwithotherprotocols...............................................................................20
7.3DistributedStorage..................................................................................................................21
7.3.1AFSAndrewFileSystem..............................................................................................21
7.3.2Lustre...............................................................................................................................21

IntroductiontoLinuxClustering

7.3.3Coda.................................................................................................................................22
7.4ReplicatedStorage...................................................................................................................22
7.4.1DistributedReplicatedBlockDevice(DRBD)................................................................22
7.5ClusteredFilesystems..............................................................................................................24
7.5.1CLVM..............................................................................................................................24
7.5.2GFSGlobalFileSystem...............................................................................................24
7.5.3ext4(Underdevelopment)...............................................................................................24
8ClusterExamplesandChallenges..................................................................................................25
8.1TwonodeHAclusterwithDRBD..........................................................................................25
8.2FivenodeHAclusterwithDRBD..........................................................................................26
8.3FivenodeHAclusterwithDRBD+Xen...............................................................................28
8.4Geographicallydistributedclusters.........................................................................................28
8.4.1Twonodedistributedcluster............................................................................................30
8.4.2Three+nodedistributedcluster.......................................................................................31
9FurtherReference...........................................................................................................................33

Copyright2008JethroCarr

Page2/33

IntroductiontoLinuxClustering

Copyright2008JethroCarr

Page3/33

IntroductiontoLinuxClustering

1Introduction
Oneoftheoldestproblemsofcomputingisdesigningfailureproofcomputingsystems.Overthe
years,manydifferentmethodshavebeendeveloped.
Manyoftheseyouwillbefamiliarwith,including:
Sparehardwaretypicallyenoughtomirrortheproductionhardware.
Faulttoleranthardwarebyusinganumberofsparecomponents.
Softwarefailoverfeaturesbuiltintoindividualprograms.
Hardwaresolutionsaretypicallyexpensive(andnotalways100%reliant).Applicationspecific
failovermethodscanoftenaddmaintenancehasslesaswellasdoingnothingtofixtheproblemthat
someofyourprogramsmayhavenofailovercapabilitiesatall.
Inrecentyears,afewnewoptionshavebecomeavailableinparticularLinuxclusteringand
virtualization.
ThisdocumentpresentsthekeyfactorsinimplementingeffectiveLinuxclustersanddesign.Please
notethefollowing:

Clusteringisarelativelynewtopictome,andIhavenothadalotofexperiencedeploying
andmaintainingclusters.
Thisdocumentistheresultsofmyresearchintotheoptionsavailableandlooksatwhat
solutionscouldbedeveloped.Ifyouhaveexperiencewithclustersandhavefound
technologiesthatdoordonotworkwellinpractice,pleasesupplymewithfeedbacksoI
canextendthisdocumentandmakeitmoreuseful.

ThisdocumentmentionsRedhatClusterManagerabit,butdoesn'tgointodetailsaboutthe
othermajoroption,LinuxHA.However,mostoftheconceptsandterminologyisapplicable
forbothsolutions.
Thisdocumentdoesnotcoverthetechnicaldetailsonconfiguringclusters,itismoreofa
highleveldesignview.

ThisdocumentwillalsocovertheuseofclusteringtogetherwithXenvirtualizationformaximum
advantage.
Theaimofthisdocumentistoprovideyouwithanunderstandingofclustersolutionssothatyou
areempoweredtoidentifyapplicabletechnologiesanddecideonthebestapproachtousethem.

Copyright2008JethroCarr

Page4/33

IntroductiontoLinuxClustering

2Aboutclusters
Therearethreemainreasonstouseclustering:
Betterperformance
Faulttolerancebyhighavailabilityservices.
Optimalusageofdiskresources.
OftenyouwillhearabouthighperformancecomputingsolutionsusingLinuxclusterstocreate
smallsupercomputerssuchsystemsareusuallyreferredtoasbeowulfclusters.
Thesesystemstypicallyrunhighlycustomisedapplicationswhicharedesignedtorunonmultiple
computersystemsatonce,andarebeyondthescopeofthisdocument.
Thisdocumentwillcoverthefollowing:
Highavailabilityclusters.
Sharedstoragesolutions
Otherclusteringconsiderations.

Copyright2008JethroCarr

Page5/33

IntroductiontoLinuxClustering

3Advantagesandreasonsforclustering
Clusteringprovidesanumberofadvantagesovertraditionalstandaloneserverconfigurations.First
thereareanumberofobviousones:

HighAvailabilityofsystemservices.

Theclustermanagementsoftwarewillhandleservicefailuresandwillquicklybringupthe
serviceonalternatehardware.

Betterutilisationofsystemresources.

Servicescanbespreadaroundallthenodesheavynodescanbelightenedbymoving
servicesaway,lightlyloadednodescanhavemoreservicesstartedonthem.
WhenyoucombineXenwithclustering,evenmoreoptionsbecomeavailable,asyoucan
splitonenodeintomanydozensofnodes.

Optimisationofdiskresources.

Ratherthanhavinglotsofsmalllocaldisks,storagecanbecentralisedordistributedacross
allmachinesmakingbetteruseofthestorageavailable.

Therearealsosomeotherusefuladvantages:

Usabilityofolderhardwareorwhiteboxhardware

ITdepartmentsliketopurchasenewserversfromabignamevendorlikeIBMorDell,who
willthenprovide5yearsofhardwarereplacementandservice.However,oncethis5year
periodisover,thehardwareisnolongersupportedandthecustomereitherhastocarrytheir
ownspareparts(whichcanbetooexpensiveforasmallorganisation)orupgradetonewer
hardware,whichrequirestime,moneyandefforttomigrateallthedataandapplications.
Withclustering,ifthehardwarefails,anothernodewilltakeuptheworksocompaniescan
buildtheirITinfrastructurearoundolderhardwareorcustombuilthardwaresavingalotof
money.
Somehostingprovidersmayevendecidetoleaveoldhostintheclusterandtojustkeep
addingnewones,andonlypullouttheoldonesoncetheyfailoroncetheyreachanage
wheretheyareuneconomicaltokeeprunning.

Fastermoreefficientsystemadministration

Inserverfarmswherealltheserversareconfiguredindividually,itcanoftenbequitehardto

Copyright2008JethroCarr

Page6/33

IntroductiontoLinuxClustering

migrateaservicefromonecomputertoanother,whichcansometimesberequireddueto
securityorperformancereasons.
However,inaclusteredenvironment,alltheserversaretypicallyidentical.Plus,anyservice
thathasbeensetupinthecluster,canbemovedfromonehosttoanotherbythesimple
executionofasinglecommand.

Copyright2008JethroCarr

Page7/33

IntroductiontoLinuxClustering

4Clusteringfundamentals
4.1 Basics
Highavailabilityclusteringisacomplextopic,anditisimportanttofullyunderstandkey
conceptsbehindit.
Thebasicsaresimple:
Eachcomputeriscalledanode.
Twoormorenodesformacluster.
Intheeventofafailureofanyoneofthenodes,theremainingnodeswilltakeupthe
workbeingperformedbythedeadnode.
Whatmakesclusteringcomplex,ishowtheclusterhandlesnodefailures,shareddiskstorage
andsituationssuchassplitbrain.
Aclustertypicallyworksasfollows:
Allthenodesruntheclustermanagementsoftware(eg:LinuxHAorRedhatCluster
Suite),whichcontrolsstuffsuchasheartbeats,applicationstarting/stoppingand
keepingquorum(moreonthislater).

Oneofthenodesrunsanadministrationapplication,thatallowsyoutomanually
add/removenodesandprovidestheabilitytomanuallymoveapplicationsfromone
nodetoanother.
Intheeventofafailurewithanode,theothernodesfenceit,whichinvolvesusinga
hardwaredevicelikeamanagedpowerswitchtophysicallyturnthenodeoff.Thisis
donetopreventthenodefromwritingtoanyofthestoragedevicesandcorrupting
thedata.
Theothernodesthendecidewhichnodeshouldruntheapplicationsthatwereonthe
deadnode,andoneofthenodeswillbechosen(dependingontheconfig)andwill
startuptheapplication.

Alltheclusternodeshaveaccesstoacentralstoragearray(eg:aSANornetwork
attachedstorage).Thisstoragelocationrunsaclusteredfilesystemwhichallowsall
thenodestoreadandwriteatthesametime.

4.2 Importantclusteringcomponents
4.2.1 FAILOVER

Thereare3typesoffailovermethodsthatexist.
Copyright2008JethroCarr

Page8/33

IntroductiontoLinuxClustering

1. Hotfailover

Inahotfailover,theapplicationiswrittenspeciallyforclusteringandisableto
continuerunningonanothernode,withoutanyinterruptiontoclientservices.
Hotfailoverisnotoftenfoundincommonlyusedapplications,andisusuallyfound
inspeciallywrittenprogramsforbankingortelcosituations.
2. Warmfailover

WarmfailoveriswhatsolutionslikeLinuxHAandRedhatClusterSuiteprovide
theapplicationdoesn'thaveaninstantrecoveryfeature,buttheclustersuitequickly
restartstheapplicationwithoutminimalclientdisruptiononanotherpieceofrunning
hardware.
Usingaclustermanagementsolution,theapplicationdoesnotneedtobewrittento
supportclustering,soyoucanprovideredundancyforanyserviceyoudesire.
Thiscansometimescauseasmalloutage,assomeapplicationscan'ttoleratethe
changeoftheserverinthebackground.Inothercauses,theapplicationisableto
continueon,withnointerruptiontotheclientuserswiththeexceptionofabitofa
delay(eg:NFSisgoodatnotbeingaffectedbyaserverchangebehindthescenes).
3. Coldfailover

Coldfailoveriscommonlyusedasasolutionforredundancywhereaclusterwasnot
abletobesetup.Inacoldfailover,thedeadcomputerneedtobepowereddown,and
asparecomputerstartedup.Thisisusuallyamanualprocess.
4.2.2 FENCING

Whenanodecrashesorbecomesunresponsive,itMUSTbequicklypoweredofforblocked
fromthestoragedevice(fencing).
Fencingisrequired,becauseiftheclusterassumesthenodehascrashed,andreallocatesit's
servicesandIO,iftheserverwastowakeup,itcouldcausehavocandpossiblydisk
corruption.
ThereforeafailoverdevicemustbeavailablesothattheclustercandoSTONITHShoot
TheOtherNodeInTheHeadbydoinganinstantpoweroffofthenode.
VariousdevicesexistsmartpowerswitchesareusuallyusedandscriptsexistinRedhat
ClusterSuitethatcanconnecttoanumberofcommonlyavailabledevicestoshutdown
Copyright2008JethroCarr

Page9/33

IntroductiontoLinuxClustering

servers.
Otherfencingdevicesincludefencingat
theSANlevelaswellasXenVMfencing,
howeveritisrecommendedthatpower
fencingbeusedratherthanSANfencing
asitwillguaranteethatthenodeis
completelykilledandnotdoinganything
unwanted.

4.2.3 SPLITBRAIN

Splitbrainisanastyproblemfoundinclustering,andrequirescarefulthoughttoprevent.
Considerthefollowingscenario:
1. AtwonodeclusterexistsoneserverincityA,oneserverincityB.
2. Theinternetlinkbetweenthetwocitiesfallsover.Neitherservercancontacteach
other.
3. Eachserverassumestheotheroneisdown,andbothresumeactivitiesasthemaster.
4. Whenthelinkcomesbackonline,datacorruptionoccurs.
Othernastyproblemscanoccurifthetwonodesarestillabletofenceeachotherviatheout
ofbandmanagementsystem,asyoumayendupwitheachnoderepeatedlypoweringoffthe
othernode.
Topreventthisfromhappening,wehaveasolutioncalledQuorum.
4.2.4 QUORUM

Quorumiseffectivelyascoringmethod,whereeachnodeintheclusterhasanumberof
votes(bydefaultone).Eachonlineclusternodeaddsit'svotestothequorumcount,andas
longasthequorumcountislargerthan50%ofthecombinedvotes,theclusterisintact.
Iftheclusterfallsbelowquorum,theclusterhaslostquorumandallserviceswillshut
downandbecomeunavailable.
Thisisactuallyadesiredfeature.Consideraclusterwith10serversandtomaintainquorum,
ascoreof6isneeded.Intheeventofthenetworksufferingafailureandcausingthecluster
tospitintotwo,thesmallerhalfwillshutdownandthelargerhalfwillcontinueon.This
preventssplitbraininclusters.
Whataboutsituationswherethereareanevennumberofnodes,suchas10nodes?Itwould
bepossibleforthenetworktosplitintotwoequallysizedclusters.Therefore,anycluster
withanevennumberofnodesrequiresatoneofthenodestohaveanadditionalvoteinthe
Copyright2008JethroCarr

Page10/33

IntroductiontoLinuxClustering

quorumtounbalancethequorumvoting.
Effectively,quorumallowsyoutomakeanevencluster,unevenwhenitcomestofailovers,
sothatsplitbrainwillnotoccur.
Butwhathappenswhenyouonlyhaveatwonodecluster?Afailureofeithernodewould
causetheclustertolosequorum,asbothmachineshaveequalvotes.Inthiscase,youneeda
tiebreakervotetodeterminethemaster.
Therearetwowaysthiscanbedone:
1. IfusingaSAN/NAS,thecentralstoragedevicecanbeusedtoprovidethetiebreaker
votebyusingacustomfilesystemcalledaquorumdisk.
2. Setupaheuristicstest.Thiscanbeanyprogram(typicallysomethinglikeping)that
canprovideanothervoteiftheprogramcompletessuccessfully.Inatwonode
cluster,themostlikelycandidatewouldbetopingthenetworkgateway,oraremote
switch1

host1

host2

switch2

host3

host4

host5

host6

Votes=4

Votes=2

host14formanactivecluster

hosts56getfencedbytheothers

Copyright2008JethroCarr

Page11/33

IntroductiontoLinuxClustering

5Clustermanagementsoftware
Tocontroltheclusterandthemovementofservices,aclustermanagementapplicationisrequired
LinuxhastwomainoptionsavailableLinuxHAandRedhatClusterSuite.
RedhatclustersuiteisonlyfoundonRedhat'sdistributionsFedora,RHELandotherderivatives
suchasCentOS,whereasLinuxHAisfoundinawiderrangeofdistributions.
Mostoftheconceptsandideasbetweenthesetwosolutionsarethesame,sotheknowledgegained
usingoneislikelytomakeiteasytouseanother.

5.1 RedhatClusterSuite
ClusterSuiteisdesignedforcreatingHighAvailabilityclustersandthedefaultstepsto
configureaservicewillresultinaHAservicedependingonthenumberofnodesyouput
intothefailoverdomain.
5.1.1 MANAGEMENTANDCONFIGURATION

Configurationoftheclusteriscontrolledbythe/etc/cluster/cluster.xmlfile,whichisan
XMLformatfile.Oncechanged,youcanrunacommandtoredistributethefiletoallthe
othernodesinthecluster.
However,mostpeopledoconfigurationusingeithersystemconfigcluster(GTKGUI
application)orluciwhichisawebbasedutilityforclusterconfiguration.
Theconfigurationoftheclusterisbrokeninthesamecomponentsregardlessofthe
configurationmethodchosen:

Resources

Resourcesareanythingthatmakesupaservice.Forexample,aresourcemaybe:
1. AnIPaddress.
2. Amountpoint.
3. Asystemservice(eg:httpd)

Service

Aserviceisagroupofresourcesthathavebeengivenaname.Theservice
configurationallowsyoutosetwhatordertheresourcesarestarted/stoppedandgives
youanamethatyoucanusetocontroltheservice.

Failoverdomains

Copyright2008JethroCarr

Page12/33

IntroductiontoLinuxClustering

Whenaclusternodefails,theservicesthatarerunningonitneedtobemigratedto
othermachines.Theclustermanagementsoftwarewilllookatalistofothernodes
calledthefailoverdomain,andwillselectonefromthelisttoruntheservice.
Thelistcanalsobeprioritisedifdesiredboththeorderofwhatserversshouldbe
usedinafailure,aswellaswhetherornottheserviceshouldbemovedbacktoa
higherprioritynodewhenonebecomesavaliable.
Intheeventoftheservicerunningoutofonlinehostsinthefailoverdomain,the
servicewillbestoppeduntilanodecomesbackonline.

Fencingdevices

Afencingdeviceisusedtopowerofforresetunresponsive/crashedclusternodes.
Thisistypicallysomethinglikeanetworkcontrolledpowerstrip,oraoutofband
managementcardintheserver.

5.1.2 LUCIANDRICCI

Lucidoesnothavetoberunontheclusteritself,althoughthatistherecommendedmethod
asyoucanclusterluciandthusbeabletoalwaysadministratethecluster.
ThericcidaemonrunsonallthenodesandallowsLucitocommunicatewiththenodesto
configurethem.
Luciissmartenoughtoinstallthepackagesitrequiresviayumonthenodeswhenyouadd
themtotheclusterwhichmakessetupeasier.
5.1.3 SYSTEMCONFIGCLUSTER

Luciseemstobereplacingsystemconfigclusterasthefavouriteprogramtouse,butatthis
stagesystemconfigclusterisacapableGTKGUIapplicationforclusterconfiguration.
Youcanrunitonanyclusternode,onceyousaveyourchangesyoucanthenclickabutton
tosendoutthenewconfigurationtoalltheclusternodes.
5.1.4 LOADBALANCING

RedhatClusterSuiteisfocusedonprovidingHAserversanddoesn'tprovideanyspecial
featuresfordoingloadbalancing.
However,youcansetuploadbalancingbythefollowingmethod:

Copyright2008JethroCarr

Page13/33

IntroductiontoLinuxClustering

Setupadeviceonthenetwork(eg:areverseproxyorsessionbalancingapplication)
thatpassesthesessiontraffictooneoftheclusternodes.Thisdevicemaybea
hardwaredeviceorastandalonePCmaybeevenatwonodeclustertoensureHA
oftheloadbalancer!
Setupmultipleserviceintheclustersuiteforthenumberofnodesyouwanttoload
balance.
Configuretheservicestobelongtovariousfailoverdomainsforexample,youmay
notwanttofailoversomeservicesaslongasyourunoneinstanceataminimumso
yousetuponeservicewithafailoverdomainandsetuptherestoftheservicestoonly
runontheonenode.

Copyright2008JethroCarr

Page14/33

IntroductiontoLinuxClustering

6CombiningXenwithclusters
Virtualizationtechnologyisbecomingincreasinglypopularduetothereducedcostsandbetter
utilisationofhardwareresources.
LinuxhasvarioussolutionsforvirtualizationonepopularoptionisXenwhichcomesbundled
withanumberofdistributions.Sothenextstep,istocombinetheadvantagesofXen,withthe
advantagesofclustering.
Whilsttherearenumerouswaysthiscanbedone(andthebestsolutionwilldependonwhatyou
need).OnebasicmethodistoconfigureallthephysicalmachinesasXenhostserversandplace
themintoacluster.(Ideally,thehostserversshouldalsobeinthesameXendomaintoallowlive
migrationofVMsfromoneservertoanother.)
Theclustermanagementsoftwarecanthenbeconfiguredtotreatvirtualmachinesasservices
movingthembetweenclusternodesandrestartingdeadVMsonothermachines.
However,thisonlyprovidesbasicfailoverservicesaVMwillonlybemovedtoanotherhostifthe
wholeVMbecomesunavailable,orifthehostnodecrashes.
Formorefinegrainedcontrol,therearetwooptions:

6.1 VMsaspartofmaincluster
Addallthevirtualmachinestotheclusterasnodes,alongsidethephysicalservers.The
servicescanbeconfiguredusingfailoverdomainstoonlyfailovertovirtualmachines.
Thismeansyouonlyhaveonecluster,butitwillintroducealotmorecomplexityintothe
clusterconfigurationandadministration.

6.2 VMsrunsasaseparatecluster.
TheotheroptionistoruntheclustersoftwareontheXenVMsthemselves.Thiscanbe
usefulinthatitallowsyoutoconfiguremultipleclustersontopofthemainclusterwhich
maybeappealingtohostingproviderswhocanoffercustomertheirownprivatetwonode
clusters.
Thismethodisalsousefulforsystemsthatintendtorunlargenumberofservicesontopofa
singleVMandprovidestheabilitytomigrateindividualservicesfromoneVMtoanother.
Thedisadvantageisthatmoreoverheadisintroducedbyrunningtheadditionalclustering
softwareonallthenodesanditmaybecomemoretimeconsumingtomanage.

Copyright2008JethroCarr

Page15/33

IntroductiontoLinuxClustering

7StorageManagement
Storagemanagementmayappeartobeaseparatetopic,butitisinfactaveryimportantpartofa
cluster'sdesign.
Foracluster,itisveryimportantthatdataremainsintactandaccessiblebyallthenodes.Onemajor
topicistheuseofaclustercapablefilesystemsuchasGFS.
Itisalsoimportanttochoosethecorrectstoragemediaforthecluster,takingfuturegrowthinto
consideration.Isperformanceorreliabilitymoreimportant?Areallthenodesinthesamepremises,
ordoyourequireadistributedstoragesolutionthatwillworkacrosstheinternet?Doesthedata
needtobereplicatedinrealtimebetweenthenodes?
Thereisavararityofsolutionsavailable(suchasSANs),howeverallsolutionsfitintooneofthe
followingthreecategories:
Centralisedstorage.
Distributedstorage.
Replicatedstorage(thiscansometimesbeafeatureofeitherofthetwocategoriesabove)

7.1Centralisedstorage
Centralisedstorageinvolveshavingoneormoredevicesprovidingstoragetoalltheother
nodes.AtypicalexampleisasinglearrayofdiskssuchasSANorNAS,whichallthenodes
connecttoforstorage.
Centralisedstoragesolutionsareoftenfoundinenterpriseserverinstallations,withmany
mediumtolargeorganisationsusingsomethinglikeaSANfortheirdatastorageneeds.
Centralisedstorageispopularforanumberofreasons:
Morecosteffectivetopurchaseasinglearrayofdisksthanpurchasingdisksforeach
server.
Centrallocationallowsforeasierbackupsandmirroring.
Easytoconfigure,easytomaintainifyouneedtoaddmorestorage,there'sonly
onedevicetoupgrade.
However,thereisacommonproblemwithcentralisedstorageOftentherewillbejusta
singledeviceprovidingthestorage.(oftenduetothecostofpurchasingredundanthardware
beingtoohigh,devicessuchasSANsarenotcheap).
Thisintroducesasinglepointoffailureifahardwarefaultoccursinthedevice,itcould
crippletheentirecluster,sinceallnodesrelyonit.Topreventthis,youeitherneedtobe
preparedforthepossibility(andcostduetodowntime)ofadevicefailureorinvestin
redundanthardware.

Copyright2008JethroCarr

Page16/33

IntroductiontoLinuxClustering

Centralisedstoragesystemstypicallyexportthediskspaceasablockdevicewhichappears
onthenodesasalocaldisk,whichthenneedstobepartitionedandformattedsoyoucan
runwhateverfilesystemyouwishontopofit.
ThisdifferersfromnetworkfilesystemslikeNFSwhichappearasamountablefilesystems
andcannotbepartitionedorhaveotherfilesystemsontopofthem.
7.1.1 SANSTORAGEAREANETWORK

ASANisahardwaredeviceconsistingofanumberofharddrivesinRAID.TheSANis
thenattachedtoeachclusternodebyfibrechannel.
Advantages

Highspeedperformance.
Directlyattached,sonoissuesduetonetworkloss,congestion,etc.
Triedandtestedtechnology.

Disadvantages

Expensiveeverynodeneedstohaveafibrechannelcommunicationcardinstalled,
boththeSANandthefibrechannelhardwareisexpensive.
Limitedscalabilitythenumberofclusternodespossiblearelimitedbythenumber
ofinterfacesontheSAN.

Redundancy

Withoutthisredundancyalltheworkandresources
putintodevelopingahighavailabilityclusterwill
bewastedwhentheSANdies.

host2

HotSpare

Replication

Forproperredundancytopreventanoutageinthe host1
eventofahardwarefailure,itisrequiredto
purchasetwoSANswhicharecapableofmirroring
eachother,andhavingfibrechannelcardsinthe
serverscapableoftalkingtobothSANs.

SANorNAS

IdealUse
Suitableforuseinclusterswhereallthenodesarelocatedonthesamephysicalsite,aswell
asbeingsuitableforuseinclustersrequiringmaximumI/Operformance.
Howeverforbudgetconciousorganisations,aNASmaybeabetteroption.

Copyright2008JethroCarr

Page17/33

IntroductiontoLinuxClustering
7.1.2 NASNETWORKATTACHEDSTORAGE

ANASisahardwaredeviceconsistingofaRAIDarrayofharddrives(likeaSAN).
However,insteadofusingfibrechannel,itconnectstoastandardethernetnetworkand
supportsprotocolslikeiSCSIorATAoverethernet.
Itstillappearsasablockdevicewhichisconfiguredtoappearasarealdiskontheserver
Advantages
CanbealotcheaperthanaSAN
Nophysicallimittothenumberofnodespossible.(instead,limitedbythenumber
andspeedoftheethernetinterfacesandnetwork)
Commodityserversworkoutofthebox.
Withhighspeednetworkslike10gigethernet,performancecansurpassfibrechannel
SANs.
Disadvantages
Networkoutagecancauseafailurewhichcancauseoutagespreventingaccesstothe
storage.However,theriskofthiscanbereduced,byrunningtheNAStrafficona
separatenetworktothenormalnetwork.(dedicatedhardware,etc.)
SomenetworkstorageprotocolslikeiSCSIintroduceoverheadsforexampleiSCSI
isaTCP/IPprotocol,whichaddsoverheadofTCPwindowing,headers,etc.Other
protocolslikeATAoverEthernetaddlessoverheadandshoulddefinitelybeafactor
toconsiderwhenevaluatingNAS.
Redundancy
Forproperredundancy,asecondNASshouldbepurchasedandmadetomirrortheprimary
NAS.TheycaneitherbesettohaveafloatingIPaddrress,orbothofthemcanhaveiSCSI
exportswhicharethenmultipathedonthenodes(sothenodecanchoosewhichonetouse).
Ifyourequireadditionalredundancyforthenodes,asecondethernetcardcanbeinstalled
intothenodesandbothcardscanbeputintobondedmode.
Inbondedmode,bothinterfacesworktogethertoprovideonevirtualinterfaceifan
interfacefails,theotheronewillcontinuetowork.
Withmanyserverscomingwithdualethernetinterfacesoutofthebox,thereisusuallylittle
needforfurtherhardwareinvestmentforthenodes.
IdealUse
Fromahighlevelview,aSANandaNASisaverysimilardevice,somakeyourchoices
basedonwhatwillgiveyouthebestvalueforyourinvestmentbasedonyourneeds.
AbudgetmindedorganisationmayfindaNASover100mbit/1000mbitethernetprovidesthe
Copyright2008JethroCarr

Page18/33

IntroductiontoLinuxClustering

bestresult,whereasothersmayfindtheperformanceofaSANtobebetter.
Alsoconsiderfutureexpandabilityifyoudecidetogrowtheclusterinthefuture,aSAN
willprobablybelessexpandablethanaNAS.It'salsoeasiertoupgradethespeedofan
ethernetnetworkwithafasterswitch,ormoreethernetcardsintheservers.
7.1.3

COMMODITYNETWORKFILESHARE

Anotheroptionforcentralisedstorageistheuseofacommodityserverwithalargenumber
ofharddrivesusingRAID.
Thiscanbeusedintwodifferentways:
1. UsingablocksharingsoftwaresolutionsuchasiSCSI,ATAoEorGNBD,to
effectivelycreateacheapNASusingstandardcomputercomponents.
2. RunninganetworkfilesystemsuchasCIFSorNFS.
Advantages
Cheapandsimple.Youcanusestandardcomputerofftheshelffromalocalstoreto
builtthis.
Disadvantage
Doesnotprovidefullredundancyintheeventoffailureofcomponentssuchas
motherboardorCPU.However,thiscanbeoffsetusingsomethingDRBDwhichis
coveredfurtheron.
UsuallylessperformancethanaSANorNAS,whichistunedtoallowmaximumIO.
iSCSIsoftwaretargetsmightnotbesupportedbyyourLinuxvendorandmaysuffer
performanceissues.
Redundancy
Thistendstobelimitedtohardwareredundancyofserver.Typicallyaserverwillonlyhave
diskredundancy(andethernetviauseofbondedinterfaces),althoughsomemoreexpensive
modelsofferPSUandevenCPUredundancy.
OncesolutionthatallowsgoodredundancyistoruntwoidenticalserverswithDRBDand
replicatethefilesystemsbetweentheserverseffectivelybothserverswillhaveexactlythe
samedataonthem.
Idealuse
AniSCSIsoftwaretargetrunningonaservercanprovideacheapNASemulatorforusein
developmentenvironments.
ThissolutioncouldbeusedanywhereaNASis,howeveritwillrequirecarefultuningand
smarthardwarepurchasestogetoptimalperformance.Oneexamplethatwillmakea
differenceiswhereornotyourethernetcardshaveTCPoffloadengines,whichwill
Copyright2008JethroCarr

Page19/33

IntroductiontoLinuxClustering

increaseperformancewhenusingaTCPbasedstorageservicelikeiSCSI.

7.2 Accessmethodsforcentralisedstorage
Whilstcentralisedstoragecansometimesprovidethedataviaanetworkfilesystem,itis
morecommontouseiSCSIoraSAN.
BothiSCSIandSANprovideaccesstothestorageasifitwasalocaldisk.Itisthen
necessarytorunaclustercapablefilesystemontopofthemsuchasGFS.
Noteofinterest:Itispossibletohaveanonclusteredfilesystemonashareddrive,but
seriouslybadissueswouldoccurifyouaccidentallymounteditintwoplacesatonetime.
7.2.1 ACCESSINGSANDIRECTLY

AdecentSANsallowstheadministratortosplittheSANintoanumberoflogicalhard
drives,andthenexportonlythedesireddrivestoeachnode.
ThedrivesappearonthenodejustlikeanylocallyconnectedSCSI/SATAdrive.
7.2.2 ACCESSINGNASWITHISCSI

LikeaSAN,manyNASescanbeconfiguredtosplitthestorageintoanumberoflogical
drives.
BecausetheNASisnotconnectedlocally,itusesaTCP/IPprotocolcallediSCSI.This
meansiSCSIcanberouted,andeventransferredovertheinternet(althoughtheperformance
onthiswouldbeterriblewithoutahighspeed,lowlatencylink).
iSCSIisusedbyattachinganiSCSItarget.OnceconnectedtheiSCSIexportappearsjust
likealocalSCSIharddrive.
Itisimportanttonotethatthenamingofthedrivesmaybechange,thusitisimportantto
useudevtoensurestablenaming.
FurtherinformationabouthowtoidentifyandnameiSCSIdevicesusingudevcanbefound
inthescsi_idmanpage.
7.2.3 ACCESSINGNASWITHOTHERPROTOCOLS

Ifyouareusingsomeotherprotocol,suchasATAoverEthernet,youwillneedtorun
softwareonthenodestomaketheNASsharesappearasblockdevicesontheserver,which
issimilartoiSCSIinconcept.

Copyright2008JethroCarr

Page20/33

IntroductiontoLinuxClustering

7.3 DistributedStorage
Distributedstoragetakesanotherapproachtothestoragemechanism,andinsteadofhavinga
centrallocationofdata,thedataisspreadacrossallthenodesinthecluster,oftenincluding
someformofredundancyinordertobeabletocopewiththefailureofanode.
Unfortunately,thisredundancycomesatacostdistributedstoragesolutionsarecomplex,
andhavetobeabletohandleissuessuchasthefailureofnodes,delaysinthenetwork
linkingthemachinesandlockingissues.
Note:YoumayhavecomeacrossDRBD,whichisatwonodeblockreplicationsolution.
Thisiscoveredinthenextsectionofthisdocument,underReplicatedStorage.
7.3.1 AFSANDREWFILESYSTEM

AFSisadistributedfilesystemwhichcachesdatalocallyonmachines.Currentlythereare
twodifferentimplementationsofAFSforLinux:
OpenAFS(IBMPublicLicense)
AFSimplementationinvanillakernel(underdevelopment)
Thecachingprovidesincreasedspeedandlimitedofflineaccessintheeventofnetwork
failures,buttheserversdonotreplicatethemselves(althoughthatcouldbeachievedwith
DRBD).
However,duetothetypesoffilelockingused,itisnotsuitableforlargeshareddatabases,
andcannothandleasinglefilebeingupdatedbymultipleclients.
AFSwasdesignedtorunservicessuchasmailserversusingmaildirwhereeachemailis
storedasanindividualfile.

7.3.2 LUSTRE

Lustreisadistributedfilesystemsuitableatcreatingmassive(manythousandsofnodes)
distributedfilesystems.
Lustreisquiteacomplextechnologytosetupandunfortunatelydoesnotprovideit'sown
datareplicationsystem.Ifdatareplicationisrequired,thenanothertechnologylikeDRBDis
neededtoperformthereplicationbetweenindividualnodes,whichdoeslimitthescalability.

Copyright2008JethroCarr

Page21/33

IntroductiontoLinuxClustering

LustreStoragePool

host1

host2

host3

host4

7.3.3 CODA

Codaisaninterestingfilesystemwithfeaturesforallowingofflinedatacachingforclient
computers,aswellasserverreplication.
Unfortunately,Codahasonlyreallybeendeployedinresearchsituationsandisthereforenot
suitableforrunninginaproductionenvironment,butisworthamentionhere.

7.4 ReplicatedStorage
Somedistributedandsomecentralisedstoragesystemshaveinbuiltmethodsfordata
mirroring(eg:twoSANswithhardwaremirroringenabled).However,therearealso
softwaresolutionsthatrunattheblocklevelandwhichcanmirroranyfilesystemontopof
them,themostpopularonebeingDistributedReplicatedBlockDevice(DRBD)
7.4.1 DISTRIBUTEDREPLICATEDBLOCKDEVICE(DRBD)

DRBDisacommonlyusedtoprovidedblockleveldiskreplicationontwonodeclusters,by
mirroringthedisksbetweentheserversensuringtheyhavetheidenticaldataonthem.
Unfortunately,DRBDsuffersfromthelimitationof
onlysupportinguptotwonodes,althoughthereisa
commercialclosedsourceversionreleasedbythe
developersthatallowstheadditionofathirdnode.
ThismakesDRBDveryusefulforcreatingHA
twonodeservers,butnotusefulforcreatingalarge
sharedstorageareaforlargemultinodeclusters.

host2

host1

Blocklevelreplication
sda2

sda2

DRBDisanidealsolutionforatwonodecluster
thatisgeographicallyseparatedsuchasmailor
webservers.
Copyright2008JethroCarr

Page22/33

IntroductiontoLinuxClustering

DRBDcanbeconfiguredtoworkinoneoftwoways:
Primary/SecondaryThestoragedevicecanonlybemountedononenode(primary)
atanytime,thetwonodessimplymirrorthestorage.Intheeventoftheprimary
servergoingoffline,thesecondaryservercanbecometheprimary.Thisiscontrolled
bytheclustermanagementsoftware.
Primary/PrimaryInrecentversionsofDRBD,itisnowpossibleforbothnodesto
runasprimarysobothnodescanread/write.Thisrequiresuseofaclustercapable
filesystemsuchasGFStorunontopofDRBD.
Themethodofwritingcanbeconfiguredthedefaultistoonlycountawriteascomplete
oncebothnodeshavebeenwrittento,butotheroptionscanbechoseninordertoimprove
performanceatthecostofreliability.
DRBDiscommonlyusedwiththeLinuxHAclustermanagementsoftware,howeveritis
possibletomakeitworkwithRedhatClusterSuitebypreparingastart/stopscriptforit.
BecauseDRBDonlysupportstwonodes,intheeventofrequiringthreeormorenodes,there
aresomemethodsthatcanbeusedtoworkaroundthislimitation:
1. SetuptwoDRBDnodesthathandleallthestorageandalltheothernodesconnectto
thetwostoragenodeusinganetworkfilesystemlikeNFSorsomeotherprotocollike
ATAoverethernetorGNBD.(effectivelycreatingyourownreplicatedNASdevice)
2. SetupallthenodesinpairseachpairmirrorswithDRBDandthenrunsa
distributedfilesystemsuchasLustreorAFSontopofthem.Thiswillalwayshave
theweaknessthatthefailureofbothnodesinapairwouldcausefailureoftheentire
array,butotherwisethefailureofanyonenodeinanyofthepairswillnotdisruptthe
filesystemservices.
3. RunDRBDontopofDRBDthisisnotarecommendation,itismentionedhere
becausesometimespeopledothis.DON'TDOTHIS.Itintroducesahugenumberof
problemsandlimitationsaswellastheunknownstabilityofrunningDRBDontop
ofDRBDyetagain.
4. ModifytheDRBDcodebaseinordertoaddsupportforadditionalnodes.Theredoes
notappeartobeanyobvioustheoreticalreasonswhythiswouldn'tbepossible,it
shouldjustbeacaseofaddingadditionalnodesandperhapsapplyingmodifications
tothedistributedlockmanagertomakeitsuitableforthreenodes.
Therewouldobviouslybemoreofaperformanceimpactduetoincreasedamountof
overheadforeachnodeadded,howeverastechnologyadvances,thisshouldbecome
lessofaproblem.

Copyright2008JethroCarr

Page23/33

IntroductiontoLinuxClustering

7.5 ClusteredFilesystems
WhenusingablocklevelstoragesystemlikeiSCSI,SAN,GNBDorDRBDacluster
capablefilesystemshouldbeused,inordertoallowmultiplenodestoreadandwriteatthe
sametime.
Aclusteredfilesystemdiffersfromaconventionalfilesystembyincludingfeaturestohandle
filelockingandjournallingfrommultiplenodes.
7.5.1 CLVM

ItispossibletorunLVMontopofacentralisedblockstoragedevice,byenablingclustered
lockingintheLVMconfigurationandrunningtheCLVMservicetogetherwithCMANfor
clustering.
OnceclusteredLVMisenabled,itcanbeusedintheexactsamewayasconventionalLVM.
7.5.2 GFSGLOBALFILESYSTEM

GFSisaclusteredfilesystemdevelopedandsupportedbyRedhat,andavailableonRHEL.It
isfullyopensource,andRedhatarecurrentlyworkingongettingitmergedwiththe
mainstreamkernel.
GFShasanumberofpowerfulfeaturesthatmakeitidealforuseinproductionclusters:
Triedandtestedtechnology,fullysupportedforcustomersofRedhat.
Scalesuptohundredsofclusternodes.
Supportsextendedaccesscontrollists.
Supportsuserquotas.
Dynamicsymlinks(knownasContextDependentPathNames)whichallowthe
symlinktopointtodifferentlocationsdependingonvariousvariableofthenode
usingit.Idealfornodedependentconfiguration.
7.5.3 EXT4(UNDERDEVELOPMENT)

Mostpeoplearefamiliarwithext3whichisthedefaultLinuxfilesystemforalmostall
distributions.
In2007developmentstartedonext4,whichwillfixthelimitationsofext3.Oneofthenew
featuresthatisbeingdevelopedwiththisreleaseissupportforclusteredfilesystems.
However,itislikelythatext4willnotbereadyforproductionuseforanumberofyearsand
isonlymentionedhereforthereader'sinterest.

Copyright2008JethroCarr

Page24/33

IntroductiontoLinuxClustering

8ClusterExamplesandChallenges
Therearenumerouswaysyoucanconfigureacluster,whichwilldependonyourrequirementsand
budgets.Therearealsosomecomplexrequirementsifyouwishtohavevirtualizationwiththe
clusteraswellasbuildinggeographicallyseparatedclusters.
Thisdocumentdetailsanumberofexamplesofclusterdesignsthatmaybesuitableforyouandto
justgiveyouanunderstandingofwhatispossibleaswellasdiscussionsoftheproblemsand
limitationswitheachdesign.

8.1 TwonodeHAclusterwithDRBD
Acommonhighavailabilityrequirementistomakeaparticularserversurviveanyhardware
failure.Thesolutiontothis,istoaddasecondidenticalserverandsetupaHAcluster
betweenthem.
Tomakeatwonodehighavailabilityclusterwork,wehavethefollowingsetof
requirements:
Datamustbeaccessiblebybothservers,withbothserversbeingabletoread/writeat
thesametime.
Ifanindividualservicedie,itshouldresumeonthesecondaryserver.
Ifoneserversuffersacompletefailure,thesecondaryservershouldresumealltasks.
Solution:
Bothserversidenticalhardware,runningRedhatClusterSuite.
Localrootandswapfilesystems,remainingdiskspaceturnedintoDRBDblock
devicesetupwithbothnodesrunningasprimary.
DRBDdevicerunningGFSwithjournalspacefor2nodesandbothnodesmounted
atthesametime.
EachserviceconfiguredinclustersuitewithafloatingIPaddress.Anyservicethat
failswillrelocatetothesecondnode.
Intheeventofafullserverfailure,thesecondnodewillresumeallservices.
Notes:

DRBDisusedprimary/primarywithGFSsothatbothserverscanberunning
servicesatonce.Thismaybeundesirableorunwantedifyouonlyhaveonefloating
IPaddress,inthisscenarioyouneedtogroupalltheservicesusingthatIPtogether.
Inthatcase,itwouldalsobeokaytorunDRBDasprimary/secondarywitha
traditionalfilesystemlikeext3orxfsandhavealltheservicesconfiguredasasingle
resource,toallfailovertogether.

Copyright2008JethroCarr

Page25/33

IntroductiontoLinuxClustering
remotehost
for
quorumping

Intheeventofanynodefailing,Redhat
ClusterSuitewillmovealltheservicesto
thealternatenodeandswitchDRBDfrom
secondarytoprimaryonthenewmaster
node.

internet

Anetworkfailurerunstheriskofasplit
brainsituationifneithernodecansee
theother,theywillbothtryandbecome
master.Tofixthis,runpingtoaremote
servertoprovideatiebreakingthirdvote
(Quorumheuristics).
IfyouhaveaSANorNAS,insteadof
storingthedatalocallyandreplicating
withDRBD,theattachedstoragedevice
couldbeusedinstead.

smart
power
switch

switch

floatingIP
host2

host1

sda

DRBD

sda

TheSANcouldbesetupwithanonclusteredoraclusteredfilesystem,thedifference
beingthataclusteredfilesystemisrequiredifyourequirebothserverstobeableto
runservicesatonce
IfusingaSAN,theclusterisscalabletomorethantwonodes,buttheSANcould
becomeasinglepointoffailurefortheclusterandarealotmoreexpensivethan
softwaresolutionslikeDRBD.
SuitableEnvironments:
MakinganymissioncriticalserverHA.
Anybusinessororganisationthatcannottoleratehardwaredowntimeoftheir
productionsever.
Ecommerceserversthatneedtoprovidemail/websites/databases.
Smallhostingorganisations.(largeronesshouldusedesignsliketheonebelow)

8.2 FivenodeHAclusterwithDRBD
OneoftheproblemswithDRBDisthatitonlyworksfortwonodeclusters.Itispossibleto
addathirdnodeifthecommercialversionispurchased,butnoDRBDsolutionexistswhich
canworkmorethanthreenodesatmax.
Whenbuildingclusters,itismoreeconomicaltohaveasinglemultinodeclusterratherthan
manytwonodeclusters,asonlyonecomputerneedstobesetasideforspareresources.
Copyright2008JethroCarr

Page26/33

IntroductiontoLinuxClustering

SomeinstallationsuseSANswhichlimitstheclustersizebythenumberofinterfacesonthe
SAN.However,SANsareveryexpensiveandrequirespecialhardware.
Acheapersolution,istobuildtwocomputerswithplentyofstorageinthemusingoffthe
shelfpartsandthentouseDRBDtocreatewhatiseffectivelyaHANAS.Thesetwostorage
nodesmirroreachotherandcantransparentlytolerateeitheroneofthetwonodesfailing.
Thesetwostoragenodescanthenexporttheavailablestorageusinganetworkfilesystem
likeNFSorablocklevelservicelikeGNBD,whichtherestofthenodescanuse.
Dependingonyourapplicationstheremaybenoneedtohavelocaldisksinanyofthe
serversandtheycanallrundirectlyoffthenetwork.
HereisanexampleforafivenodeclusterusingDRBDforstorage,providingarangeof
servicessuchasHTTPandMAIL.
Solution:
TwonodeswithRAID5harddrivestorageineachnodestoragenodes.
Threenodeswithnodisksproductionnodes
StoragenodearesetupinPrimary/Secondarymode,withLVMandext3ontopofthe
DRBDlayerwithNFSexportsofthedata.
StoragenodesprovideuserauthenticationviaNIS/LDAP/Kerberos.
StoragenodesprovidepxelinuxandDHCPfornetworkbooting.
Quroumvotesaresetupinsuchawaythatfailureofbothstoragenodeswillcausea
clusterfailureresultinginallservicesstopping.
Productionnodesbootofftheactivestoragenodeusingnetbootandmounttheroot
filesystemusingNFS.Alltheproductionnodesrunthesamesoftwarebuild.
Servicesarespreadacrossthethreeproductionnodesifanynodefails,theservices
areresumedonanotherone.
Notes:

TheabovedesigncanalsobeusedwithasmalltwointerfaceSAN.TheSANcanbe
connectedtobothstoragenodesinsteadofusinglocaldisksandthedatathen
exportedviaNFS.
Toincreaseredundancy,twoSANscouldbeused,withoneconnectedtoeachstorage
serverandmirroringdoneeitherbetweentheSANsthemselvesorusingDRBDon
thestoragenodes.However,standardharddriveswillusuallybecheaperandthus
willprobablybeabettersolution.

SuitableEnvironments:
Idealforhostingprovidersinparticular,sharedwebhostingandemail.
Idealforlargecompaniestoincreaseserveravailabilityandtocentralisestorage.

Copyright2008JethroCarr

Page27/33

IntroductiontoLinuxClustering

8.3 FivenodeHAclusterwithDRBD+Xen
ThefivenodeDRBDclusterdesignabovecanbeextendedtobecomeaHAXencluster.The
threeproductionnodescanbeconfiguredasXenhosts,withtheXenguestsbeingbooted
fromthenetworkandusingNFSforstorage(justlikethehoststhemselves).
ClusterSuiteisrunningonthehostnode,andintheeventofaXenVMfailingortheentire
hostitselfsufferingafailure,ClusterSuitecanstarteachXenVMonanalternatehost,with
eachXenVMbeingaclusternodeandrunningasingleormultiplenumberofservices.
TheXenhostscanbeconfiguredtobepartofthesamedomain,whichalsoallowslive
migrationofXenVMssoifonehostserverisbeingheavilyloaded,someVMscanbe
movedtoanotherhostwhilsttheyarestillrunningwithnodowntimeatall.
Notes:

Optionally,insteadofhavingtheXenVMsbelongtothesameclusterasthehosts,
theXenVMscouldbesetupintheirownclusterwitheachXenhostrunningtheir
owncluster,leavingthehostclustertoonlydealwiththeXenVMasawhole.See
theXensectionofthisdocumentfordetailsonadvantages/disadvantageswiththis
method.

SuitableEnvironments:
Idealforhostingproviders
Idealforlargecompaniestoincreaseserverutilisationandavailability.
IdealforITcompaniesthatneedlargenumberofserversforvariousapplicationand
developmentneeds.

8.4 Geographicallydistributedclusters
Alltheclusterdesignsshowhavebeenforuseinonephysicallocationtheclusternodes
areallsittinginarack,connectedviaethernet,andisabletosupportfloatingIPaddresses
becausethereisonlyonerouteintothecluster.
However,acommondesireistohavegeographicallydistributedclusterstopreventfailureof
asinglesitetakingthewholeclusteroffline.
Typicalusesforthismightbe:
Acompanywithofficesintwocitieswouldliketohaveoneserverateachoffice
withthedatamirroredbetweenthem.
Anecommercewebsitewantingreplicatedemail,websiteanddatabaseservices
betweentwositestoensureavailability.
However,therearesomebigissueswithageographicallyseparatedclusterthatneedtobe
Copyright2008JethroCarr

Page28/33

IntroductiontoLinuxClustering

solved:

Internetconnectionsareslowdataneedstobemirroredatbothsitesinawaythatis
bandwidthfriendlyandtransferreddatawhenchangesaremadeneedstobeminimal.
Internetconnectionsfail/haveoutagesfairlyfrequently.Anysolutionmustbeableto
handlethiswithoutsplitbrainissues.
DRBDisanidealcandidatefordistributedclusters,butisonlyabletoscaletotwo
nodes(orthreeusingthecommercialversion).Thiscausesacomplicationfor
organisationswithmorethanthreeofficesthattheywantmirroredserversin.
Youcan'tfloatanIPaddressesaroundacountryunlessyou'reanISP.Buteventhey
can'tfloatanIPaddresstoaserverontheothersideoftheworld.
Fencingismorecomplicatedasecondary(independent)managementnetworkis
usefulinordertobeabletocommunicatetofencingdevicesinotherlocations.
Withoutasecondaryconnection,acrashedservercannotbefencediftheproduction
networkgoesdown,althougharunningserverwillstoptheclusteringifitloses
quorumsosplitbrainisnotanissue.

Howcanwesolvethis?Thereareanumberofsolutions:

Useadistributed,locallycachingfilesystemlikeAFS,whichwillcachecommonly
accesseddata.
HavetwomainofficesrunningDRBDandallotherofficesmustconnectusinga
networkfilesystem,perhapsassistedbyaproxy/cachingdeviceforimproving
performanceatthesites.
DevicessuchasWANacceleratorscanbeusedtomakeoptimaluseofnetwork
performanceandsomemodelshaveinternalharddrivesthatcancachesomedata
likeHTTPorSMBtraffic.

FloatDNSnamesratherthanIPaddresses.Whenanodegoesdownwhichis
providingpublicservices(eg:websites)havethenewnodeprovidingtheservice
connecttotheDNSserverandchangetheArecords.
Theproblemwiththismethod,istheDNSchangescantakesometimeto
synchroniseacrossthewebthiscanbereducedbysettingyourDNSTimeToLive
(TTL)toalowvalueeg:5minsbutitmaynotbehonouredbyallDNScaching
servers.(althoughasageneralruleitis)

Inaddition,itispossibletosetupRoundRobinDNSservers,whichcanallowyouto

Copyright2008JethroCarr

Page29/33

IntroductiontoLinuxClustering

loadbalancebetweenyourgeographicallyseparatedserversthisisgoodfor
servicessuchasHTTPorreadonlydatabaserequests.
8.4.1 TWONODEDISTRIBUTEDCLUSTER

Asdiscussedabove,atwonodedistributedclusterisabletouseDRBDdatareplication
whichsolvesthemajorchallengeofmirroringthedata.
Thistwonodesetupisanidealsolutionfororganisationswhichwanttheredundancyofan
offsitemirrorsuitableforprovidingbothredundancyandloadbalancing.Itisalsosuitable
fororganisationswhichhavetwoofficesandwantsidenticalserversatbothlocations.
Thisexampleassumesthefollowingenvironment:
Twogeographicallyseparatedservers,identicalhardware.
Networkbetweenbothserverscontrolledbyanoutsideparty(thereforefloatingIP
addressesarenotpossible)
Noplanstoscalebeyondtwonodes.
Requirements:
Haveaserveratbothlocations.
Bothserversneedtoofferfilesharingtothelocalnetworksateachofficewith
Samba.
Mail,DBandHTTPservicesrequiredtobeHA.
Solution:
ServerssetupwithDRBDinprimary/primarysetup(bothserverscanread/write).
GFSfilesystemtoallowbothnodessimultaneousaccesstothedata.
Usepingtoaremoteservertodeterminetiebreakerthirdvote.(Quorumheuristics)
FloatingDNSnamesforservices.
Clustersuitecapableofmigratingindividualservices.
MAIL,DBandHTTPserviceswillonlyrunononeserveratanytime(otherwise
interestinglockfileissueswouldoccur).
Intheeventofeitherserversufferingafailure,theserviceswillberelocatedtothe
alternateserverandtheDNSrecordswillbeadjustedtoredirecttraffic.
Sambarunsonbothserversprovidingaccessforthelocalnetwork.Intheeventofa
serverfailure,theDNSnameforthelocalnetworkserverwillbechangedtobethe
alternateserver.

Copyright2008JethroCarr

Page30/33

IntroductiontoLinuxClustering

Internet

remotehost
for
quorumping

www.example.com
(floatingArecord)
host2

host1
DRBDacrossVPN
sda
smartpwrsw

sda
GSMoutofband
management
network

smartpwrsw

8.4.2 THREE+NODEDISTRIBUTEDCLUSTER

Thetwonodedistributedclusterdetailedabovewillworkfinewithtwonodes,butwhat
aboutwhenthreeormorenodesarerequired?
Wewantdatareplicationbetweentheservers,butDRBDwillonlyworkwithtwonodesin
primary/primarymodes.Thisgivesustwooptions:
1. RunDRBDbetweenthetwomainservers,andtheotherserverscanconnecttoone
ofthetwoserversviaanetworkfilesystemlikeNFS.
Thisisquitesimple,buthastheobviousflawofdiskI/Obeinglimitedtothespeed
oftheWANconnection,aswellasthetransferscausingdatacapusageonnon
flatrateconnections.
Dependingonyourusageandrequirements,thismayormaynotbeaproblem.
2. Runadistributedfilesystemwithreplicationsupportonallthenodes.
Unfortunatelytherearecurrentlynodistributedfilesystemsavailablewhichcan
providebothreplicationanddistributeddata.
Therefore,thenextbestsolutionisausingadistributedfilesystem,withDRBD
underneathittoprovidetheredundancy.
Copyright2008JethroCarr

Page31/33

IntroductiontoLinuxClustering

ThisiscoveredinmoredetailsintheDRBDfilesystemsectionearlierinthis
document,butbasicallyyoudividetheclusternodesintopairs.
EachpairrunsDRBDprimary/primarytoensuredatareplication.Ontopofthat,you
runadistributedfilesystemsuchasLustreofAFS.However,thiscanbequite
complexandhastheweaknessofbeinglimitedbyhavingreplicationonlyon2
nodes.
Three+distributedclusterscanbequitecomplexandrequirealotofplanningtomakesure
theywillworkreliablyandhavespeedyaccesstostorage.Thebestsolutionwilldepend
greatlyupontheapplicationsyouneedtorun.

Copyright2008JethroCarr

Page32/33

IntroductiontoLinuxClustering

9FurtherReference
Thefollowingresourcesaregoodfurtherreadingforinformationonsettingupclustersolutions:
ClusterManagement:
Redhatclusteringguides(includesinfoonGFS)
http://www.redhat.com/docs/manuals/csgfs/indexmaster.html
LinuxHAdocumentation
http://www.linuxha.org/

Storage:
DRBD(blocklayerreplication)
http://www.drbd.org/documentation.html
AFS(distributedfilesystem)
http://en.wikipedia.org/wiki/Andrew_file_system
Lustre(distributedfilesystem)
http://www.lustre.org
Training/Courses:
Additionally,ifyouareanRHCE,Redhat'sRH436trainingisagoodcoursethatteachesyouhowto
configureclustersandsharedstorageonRHELwithRedhatClusterSuite.
https://www.redhat.com/courses/rh436_red_hat_enterprise_clustering_and_storage_management/

Copyright2008JethroCarr

Page33/33

You might also like