Professional Documents
Culture Documents
DOCUMENTRELEASE1.1
Copyright2008JethroCarr
Thisdocumentmaybefreelydistributedprovidedthatitisnotmodifiedandthatfullcreditisgiven
totheoriginalauthor.
Ifyoupublishthisdocumentanywhere,pleasedoletmeknowviaemail,andifitispublishedina
physicalmedium,sendingmeacopywouldbeappreciated.
Email:
Website:
jethro.carr@jethrocarr.com
www.jethrocarr.com
TableofContents
1Introduction......................................................................................................................................4
2Aboutclusters...................................................................................................................................5
3Advantagesandreasonsforclustering.............................................................................................6
4Clusteringfundamentals...................................................................................................................8
4.1Basics........................................................................................................................................8
4.2Importantclusteringcomponents.............................................................................................8
4.2.1Failover..............................................................................................................................8
4.2.2Fencing..............................................................................................................................9
4.2.3SplitBrain.......................................................................................................................10
4.2.4Quorum...........................................................................................................................10
5Clustermanagementsoftware.........................................................................................................12
5.1RedhatClusterSuite................................................................................................................12
5.1.1ManagementandConfiguration......................................................................................12
5.1.2luciandricci....................................................................................................................13
5.1.3systemconfigcluster.......................................................................................................13
5.1.4Loadbalancing.................................................................................................................13
6CombiningXenwithclusters.........................................................................................................15
6.1VMsaspartofmaincluster...................................................................................................15
6.2VMsrunsasaseparatecluster................................................................................................15
7StorageManagement.......................................................................................................................16
7.1Centralisedstorage...................................................................................................................16
7.1.1SANStorageareanetwork.............................................................................................17
7.1.2NASNetworkAttachedStorage....................................................................................18
7.1.3CommodityNetworkFileShare......................................................................................19
7.2Accessmethodsforcentralisedstorage...................................................................................20
7.2.1AccessingSANDirectly..................................................................................................20
7.2.2AccessingNASwithiscsi................................................................................................20
7.2.3AccessingNASwithotherprotocols...............................................................................20
7.3DistributedStorage..................................................................................................................21
7.3.1AFSAndrewFileSystem..............................................................................................21
7.3.2Lustre...............................................................................................................................21
IntroductiontoLinuxClustering
7.3.3Coda.................................................................................................................................22
7.4ReplicatedStorage...................................................................................................................22
7.4.1DistributedReplicatedBlockDevice(DRBD)................................................................22
7.5ClusteredFilesystems..............................................................................................................24
7.5.1CLVM..............................................................................................................................24
7.5.2GFSGlobalFileSystem...............................................................................................24
7.5.3ext4(Underdevelopment)...............................................................................................24
8ClusterExamplesandChallenges..................................................................................................25
8.1TwonodeHAclusterwithDRBD..........................................................................................25
8.2FivenodeHAclusterwithDRBD..........................................................................................26
8.3FivenodeHAclusterwithDRBD+Xen...............................................................................28
8.4Geographicallydistributedclusters.........................................................................................28
8.4.1Twonodedistributedcluster............................................................................................30
8.4.2Three+nodedistributedcluster.......................................................................................31
9FurtherReference...........................................................................................................................33
Copyright2008JethroCarr
Page2/33
IntroductiontoLinuxClustering
Copyright2008JethroCarr
Page3/33
IntroductiontoLinuxClustering
1Introduction
Oneoftheoldestproblemsofcomputingisdesigningfailureproofcomputingsystems.Overthe
years,manydifferentmethodshavebeendeveloped.
Manyoftheseyouwillbefamiliarwith,including:
Sparehardwaretypicallyenoughtomirrortheproductionhardware.
Faulttoleranthardwarebyusinganumberofsparecomponents.
Softwarefailoverfeaturesbuiltintoindividualprograms.
Hardwaresolutionsaretypicallyexpensive(andnotalways100%reliant).Applicationspecific
failovermethodscanoftenaddmaintenancehasslesaswellasdoingnothingtofixtheproblemthat
someofyourprogramsmayhavenofailovercapabilitiesatall.
Inrecentyears,afewnewoptionshavebecomeavailableinparticularLinuxclusteringand
virtualization.
ThisdocumentpresentsthekeyfactorsinimplementingeffectiveLinuxclustersanddesign.Please
notethefollowing:
Clusteringisarelativelynewtopictome,andIhavenothadalotofexperiencedeploying
andmaintainingclusters.
Thisdocumentistheresultsofmyresearchintotheoptionsavailableandlooksatwhat
solutionscouldbedeveloped.Ifyouhaveexperiencewithclustersandhavefound
technologiesthatdoordonotworkwellinpractice,pleasesupplymewithfeedbacksoI
canextendthisdocumentandmakeitmoreuseful.
ThisdocumentmentionsRedhatClusterManagerabit,butdoesn'tgointodetailsaboutthe
othermajoroption,LinuxHA.However,mostoftheconceptsandterminologyisapplicable
forbothsolutions.
Thisdocumentdoesnotcoverthetechnicaldetailsonconfiguringclusters,itismoreofa
highleveldesignview.
ThisdocumentwillalsocovertheuseofclusteringtogetherwithXenvirtualizationformaximum
advantage.
Theaimofthisdocumentistoprovideyouwithanunderstandingofclustersolutionssothatyou
areempoweredtoidentifyapplicabletechnologiesanddecideonthebestapproachtousethem.
Copyright2008JethroCarr
Page4/33
IntroductiontoLinuxClustering
2Aboutclusters
Therearethreemainreasonstouseclustering:
Betterperformance
Faulttolerancebyhighavailabilityservices.
Optimalusageofdiskresources.
OftenyouwillhearabouthighperformancecomputingsolutionsusingLinuxclusterstocreate
smallsupercomputerssuchsystemsareusuallyreferredtoasbeowulfclusters.
Thesesystemstypicallyrunhighlycustomisedapplicationswhicharedesignedtorunonmultiple
computersystemsatonce,andarebeyondthescopeofthisdocument.
Thisdocumentwillcoverthefollowing:
Highavailabilityclusters.
Sharedstoragesolutions
Otherclusteringconsiderations.
Copyright2008JethroCarr
Page5/33
IntroductiontoLinuxClustering
3Advantagesandreasonsforclustering
Clusteringprovidesanumberofadvantagesovertraditionalstandaloneserverconfigurations.First
thereareanumberofobviousones:
HighAvailabilityofsystemservices.
Theclustermanagementsoftwarewillhandleservicefailuresandwillquicklybringupthe
serviceonalternatehardware.
Betterutilisationofsystemresources.
Servicescanbespreadaroundallthenodesheavynodescanbelightenedbymoving
servicesaway,lightlyloadednodescanhavemoreservicesstartedonthem.
WhenyoucombineXenwithclustering,evenmoreoptionsbecomeavailable,asyoucan
splitonenodeintomanydozensofnodes.
Optimisationofdiskresources.
Ratherthanhavinglotsofsmalllocaldisks,storagecanbecentralisedordistributedacross
allmachinesmakingbetteruseofthestorageavailable.
Therearealsosomeotherusefuladvantages:
Usabilityofolderhardwareorwhiteboxhardware
ITdepartmentsliketopurchasenewserversfromabignamevendorlikeIBMorDell,who
willthenprovide5yearsofhardwarereplacementandservice.However,oncethis5year
periodisover,thehardwareisnolongersupportedandthecustomereitherhastocarrytheir
ownspareparts(whichcanbetooexpensiveforasmallorganisation)orupgradetonewer
hardware,whichrequirestime,moneyandefforttomigrateallthedataandapplications.
Withclustering,ifthehardwarefails,anothernodewilltakeuptheworksocompaniescan
buildtheirITinfrastructurearoundolderhardwareorcustombuilthardwaresavingalotof
money.
Somehostingprovidersmayevendecidetoleaveoldhostintheclusterandtojustkeep
addingnewones,andonlypullouttheoldonesoncetheyfailoroncetheyreachanage
wheretheyareuneconomicaltokeeprunning.
Fastermoreefficientsystemadministration
Inserverfarmswherealltheserversareconfiguredindividually,itcanoftenbequitehardto
Copyright2008JethroCarr
Page6/33
IntroductiontoLinuxClustering
migrateaservicefromonecomputertoanother,whichcansometimesberequireddueto
securityorperformancereasons.
However,inaclusteredenvironment,alltheserversaretypicallyidentical.Plus,anyservice
thathasbeensetupinthecluster,canbemovedfromonehosttoanotherbythesimple
executionofasinglecommand.
Copyright2008JethroCarr
Page7/33
IntroductiontoLinuxClustering
4Clusteringfundamentals
4.1 Basics
Highavailabilityclusteringisacomplextopic,anditisimportanttofullyunderstandkey
conceptsbehindit.
Thebasicsaresimple:
Eachcomputeriscalledanode.
Twoormorenodesformacluster.
Intheeventofafailureofanyoneofthenodes,theremainingnodeswilltakeupthe
workbeingperformedbythedeadnode.
Whatmakesclusteringcomplex,ishowtheclusterhandlesnodefailures,shareddiskstorage
andsituationssuchassplitbrain.
Aclustertypicallyworksasfollows:
Allthenodesruntheclustermanagementsoftware(eg:LinuxHAorRedhatCluster
Suite),whichcontrolsstuffsuchasheartbeats,applicationstarting/stoppingand
keepingquorum(moreonthislater).
Oneofthenodesrunsanadministrationapplication,thatallowsyoutomanually
add/removenodesandprovidestheabilitytomanuallymoveapplicationsfromone
nodetoanother.
Intheeventofafailurewithanode,theothernodesfenceit,whichinvolvesusinga
hardwaredevicelikeamanagedpowerswitchtophysicallyturnthenodeoff.Thisis
donetopreventthenodefromwritingtoanyofthestoragedevicesandcorrupting
thedata.
Theothernodesthendecidewhichnodeshouldruntheapplicationsthatwereonthe
deadnode,andoneofthenodeswillbechosen(dependingontheconfig)andwill
startuptheapplication.
Alltheclusternodeshaveaccesstoacentralstoragearray(eg:aSANornetwork
attachedstorage).Thisstoragelocationrunsaclusteredfilesystemwhichallowsall
thenodestoreadandwriteatthesametime.
4.2 Importantclusteringcomponents
4.2.1 FAILOVER
Thereare3typesoffailovermethodsthatexist.
Copyright2008JethroCarr
Page8/33
IntroductiontoLinuxClustering
1. Hotfailover
Inahotfailover,theapplicationiswrittenspeciallyforclusteringandisableto
continuerunningonanothernode,withoutanyinterruptiontoclientservices.
Hotfailoverisnotoftenfoundincommonlyusedapplications,andisusuallyfound
inspeciallywrittenprogramsforbankingortelcosituations.
2. Warmfailover
WarmfailoveriswhatsolutionslikeLinuxHAandRedhatClusterSuiteprovide
theapplicationdoesn'thaveaninstantrecoveryfeature,buttheclustersuitequickly
restartstheapplicationwithoutminimalclientdisruptiononanotherpieceofrunning
hardware.
Usingaclustermanagementsolution,theapplicationdoesnotneedtobewrittento
supportclustering,soyoucanprovideredundancyforanyserviceyoudesire.
Thiscansometimescauseasmalloutage,assomeapplicationscan'ttoleratethe
changeoftheserverinthebackground.Inothercauses,theapplicationisableto
continueon,withnointerruptiontotheclientuserswiththeexceptionofabitofa
delay(eg:NFSisgoodatnotbeingaffectedbyaserverchangebehindthescenes).
3. Coldfailover
Coldfailoveriscommonlyusedasasolutionforredundancywhereaclusterwasnot
abletobesetup.Inacoldfailover,thedeadcomputerneedtobepowereddown,and
asparecomputerstartedup.Thisisusuallyamanualprocess.
4.2.2 FENCING
Whenanodecrashesorbecomesunresponsive,itMUSTbequicklypoweredofforblocked
fromthestoragedevice(fencing).
Fencingisrequired,becauseiftheclusterassumesthenodehascrashed,andreallocatesit's
servicesandIO,iftheserverwastowakeup,itcouldcausehavocandpossiblydisk
corruption.
ThereforeafailoverdevicemustbeavailablesothattheclustercandoSTONITHShoot
TheOtherNodeInTheHeadbydoinganinstantpoweroffofthenode.
VariousdevicesexistsmartpowerswitchesareusuallyusedandscriptsexistinRedhat
ClusterSuitethatcanconnecttoanumberofcommonlyavailabledevicestoshutdown
Copyright2008JethroCarr
Page9/33
IntroductiontoLinuxClustering
servers.
Otherfencingdevicesincludefencingat
theSANlevelaswellasXenVMfencing,
howeveritisrecommendedthatpower
fencingbeusedratherthanSANfencing
asitwillguaranteethatthenodeis
completelykilledandnotdoinganything
unwanted.
4.2.3 SPLITBRAIN
Splitbrainisanastyproblemfoundinclustering,andrequirescarefulthoughttoprevent.
Considerthefollowingscenario:
1. AtwonodeclusterexistsoneserverincityA,oneserverincityB.
2. Theinternetlinkbetweenthetwocitiesfallsover.Neitherservercancontacteach
other.
3. Eachserverassumestheotheroneisdown,andbothresumeactivitiesasthemaster.
4. Whenthelinkcomesbackonline,datacorruptionoccurs.
Othernastyproblemscanoccurifthetwonodesarestillabletofenceeachotherviatheout
ofbandmanagementsystem,asyoumayendupwitheachnoderepeatedlypoweringoffthe
othernode.
Topreventthisfromhappening,wehaveasolutioncalledQuorum.
4.2.4 QUORUM
Quorumiseffectivelyascoringmethod,whereeachnodeintheclusterhasanumberof
votes(bydefaultone).Eachonlineclusternodeaddsit'svotestothequorumcount,andas
longasthequorumcountislargerthan50%ofthecombinedvotes,theclusterisintact.
Iftheclusterfallsbelowquorum,theclusterhaslostquorumandallserviceswillshut
downandbecomeunavailable.
Thisisactuallyadesiredfeature.Consideraclusterwith10serversandtomaintainquorum,
ascoreof6isneeded.Intheeventofthenetworksufferingafailureandcausingthecluster
tospitintotwo,thesmallerhalfwillshutdownandthelargerhalfwillcontinueon.This
preventssplitbraininclusters.
Whataboutsituationswherethereareanevennumberofnodes,suchas10nodes?Itwould
bepossibleforthenetworktosplitintotwoequallysizedclusters.Therefore,anycluster
withanevennumberofnodesrequiresatoneofthenodestohaveanadditionalvoteinthe
Copyright2008JethroCarr
Page10/33
IntroductiontoLinuxClustering
quorumtounbalancethequorumvoting.
Effectively,quorumallowsyoutomakeanevencluster,unevenwhenitcomestofailovers,
sothatsplitbrainwillnotoccur.
Butwhathappenswhenyouonlyhaveatwonodecluster?Afailureofeithernodewould
causetheclustertolosequorum,asbothmachineshaveequalvotes.Inthiscase,youneeda
tiebreakervotetodeterminethemaster.
Therearetwowaysthiscanbedone:
1. IfusingaSAN/NAS,thecentralstoragedevicecanbeusedtoprovidethetiebreaker
votebyusingacustomfilesystemcalledaquorumdisk.
2. Setupaheuristicstest.Thiscanbeanyprogram(typicallysomethinglikeping)that
canprovideanothervoteiftheprogramcompletessuccessfully.Inatwonode
cluster,themostlikelycandidatewouldbetopingthenetworkgateway,oraremote
switch1
host1
host2
switch2
host3
host4
host5
host6
Votes=4
Votes=2
host14formanactivecluster
hosts56getfencedbytheothers
Copyright2008JethroCarr
Page11/33
IntroductiontoLinuxClustering
5Clustermanagementsoftware
Tocontroltheclusterandthemovementofservices,aclustermanagementapplicationisrequired
LinuxhastwomainoptionsavailableLinuxHAandRedhatClusterSuite.
RedhatclustersuiteisonlyfoundonRedhat'sdistributionsFedora,RHELandotherderivatives
suchasCentOS,whereasLinuxHAisfoundinawiderrangeofdistributions.
Mostoftheconceptsandideasbetweenthesetwosolutionsarethesame,sotheknowledgegained
usingoneislikelytomakeiteasytouseanother.
5.1 RedhatClusterSuite
ClusterSuiteisdesignedforcreatingHighAvailabilityclustersandthedefaultstepsto
configureaservicewillresultinaHAservicedependingonthenumberofnodesyouput
intothefailoverdomain.
5.1.1 MANAGEMENTANDCONFIGURATION
Configurationoftheclusteriscontrolledbythe/etc/cluster/cluster.xmlfile,whichisan
XMLformatfile.Oncechanged,youcanrunacommandtoredistributethefiletoallthe
othernodesinthecluster.
However,mostpeopledoconfigurationusingeithersystemconfigcluster(GTKGUI
application)orluciwhichisawebbasedutilityforclusterconfiguration.
Theconfigurationoftheclusterisbrokeninthesamecomponentsregardlessofthe
configurationmethodchosen:
Resources
Resourcesareanythingthatmakesupaservice.Forexample,aresourcemaybe:
1. AnIPaddress.
2. Amountpoint.
3. Asystemservice(eg:httpd)
Service
Aserviceisagroupofresourcesthathavebeengivenaname.Theservice
configurationallowsyoutosetwhatordertheresourcesarestarted/stoppedandgives
youanamethatyoucanusetocontroltheservice.
Failoverdomains
Copyright2008JethroCarr
Page12/33
IntroductiontoLinuxClustering
Whenaclusternodefails,theservicesthatarerunningonitneedtobemigratedto
othermachines.Theclustermanagementsoftwarewilllookatalistofothernodes
calledthefailoverdomain,andwillselectonefromthelisttoruntheservice.
Thelistcanalsobeprioritisedifdesiredboththeorderofwhatserversshouldbe
usedinafailure,aswellaswhetherornottheserviceshouldbemovedbacktoa
higherprioritynodewhenonebecomesavaliable.
Intheeventoftheservicerunningoutofonlinehostsinthefailoverdomain,the
servicewillbestoppeduntilanodecomesbackonline.
Fencingdevices
Afencingdeviceisusedtopowerofforresetunresponsive/crashedclusternodes.
Thisistypicallysomethinglikeanetworkcontrolledpowerstrip,oraoutofband
managementcardintheserver.
5.1.2 LUCIANDRICCI
Lucidoesnothavetoberunontheclusteritself,althoughthatistherecommendedmethod
asyoucanclusterluciandthusbeabletoalwaysadministratethecluster.
ThericcidaemonrunsonallthenodesandallowsLucitocommunicatewiththenodesto
configurethem.
Luciissmartenoughtoinstallthepackagesitrequiresviayumonthenodeswhenyouadd
themtotheclusterwhichmakessetupeasier.
5.1.3 SYSTEMCONFIGCLUSTER
Luciseemstobereplacingsystemconfigclusterasthefavouriteprogramtouse,butatthis
stagesystemconfigclusterisacapableGTKGUIapplicationforclusterconfiguration.
Youcanrunitonanyclusternode,onceyousaveyourchangesyoucanthenclickabutton
tosendoutthenewconfigurationtoalltheclusternodes.
5.1.4 LOADBALANCING
RedhatClusterSuiteisfocusedonprovidingHAserversanddoesn'tprovideanyspecial
featuresfordoingloadbalancing.
However,youcansetuploadbalancingbythefollowingmethod:
Copyright2008JethroCarr
Page13/33
IntroductiontoLinuxClustering
Setupadeviceonthenetwork(eg:areverseproxyorsessionbalancingapplication)
thatpassesthesessiontraffictooneoftheclusternodes.Thisdevicemaybea
hardwaredeviceorastandalonePCmaybeevenatwonodeclustertoensureHA
oftheloadbalancer!
Setupmultipleserviceintheclustersuiteforthenumberofnodesyouwanttoload
balance.
Configuretheservicestobelongtovariousfailoverdomainsforexample,youmay
notwanttofailoversomeservicesaslongasyourunoneinstanceataminimumso
yousetuponeservicewithafailoverdomainandsetuptherestoftheservicestoonly
runontheonenode.
Copyright2008JethroCarr
Page14/33
IntroductiontoLinuxClustering
6CombiningXenwithclusters
Virtualizationtechnologyisbecomingincreasinglypopularduetothereducedcostsandbetter
utilisationofhardwareresources.
LinuxhasvarioussolutionsforvirtualizationonepopularoptionisXenwhichcomesbundled
withanumberofdistributions.Sothenextstep,istocombinetheadvantagesofXen,withthe
advantagesofclustering.
Whilsttherearenumerouswaysthiscanbedone(andthebestsolutionwilldependonwhatyou
need).OnebasicmethodistoconfigureallthephysicalmachinesasXenhostserversandplace
themintoacluster.(Ideally,thehostserversshouldalsobeinthesameXendomaintoallowlive
migrationofVMsfromoneservertoanother.)
Theclustermanagementsoftwarecanthenbeconfiguredtotreatvirtualmachinesasservices
movingthembetweenclusternodesandrestartingdeadVMsonothermachines.
However,thisonlyprovidesbasicfailoverservicesaVMwillonlybemovedtoanotherhostifthe
wholeVMbecomesunavailable,orifthehostnodecrashes.
Formorefinegrainedcontrol,therearetwooptions:
6.1 VMsaspartofmaincluster
Addallthevirtualmachinestotheclusterasnodes,alongsidethephysicalservers.The
servicescanbeconfiguredusingfailoverdomainstoonlyfailovertovirtualmachines.
Thismeansyouonlyhaveonecluster,butitwillintroducealotmorecomplexityintothe
clusterconfigurationandadministration.
6.2 VMsrunsasaseparatecluster.
TheotheroptionistoruntheclustersoftwareontheXenVMsthemselves.Thiscanbe
usefulinthatitallowsyoutoconfiguremultipleclustersontopofthemainclusterwhich
maybeappealingtohostingproviderswhocanoffercustomertheirownprivatetwonode
clusters.
Thismethodisalsousefulforsystemsthatintendtorunlargenumberofservicesontopofa
singleVMandprovidestheabilitytomigrateindividualservicesfromoneVMtoanother.
Thedisadvantageisthatmoreoverheadisintroducedbyrunningtheadditionalclustering
softwareonallthenodesanditmaybecomemoretimeconsumingtomanage.
Copyright2008JethroCarr
Page15/33
IntroductiontoLinuxClustering
7StorageManagement
Storagemanagementmayappeartobeaseparatetopic,butitisinfactaveryimportantpartofa
cluster'sdesign.
Foracluster,itisveryimportantthatdataremainsintactandaccessiblebyallthenodes.Onemajor
topicistheuseofaclustercapablefilesystemsuchasGFS.
Itisalsoimportanttochoosethecorrectstoragemediaforthecluster,takingfuturegrowthinto
consideration.Isperformanceorreliabilitymoreimportant?Areallthenodesinthesamepremises,
ordoyourequireadistributedstoragesolutionthatwillworkacrosstheinternet?Doesthedata
needtobereplicatedinrealtimebetweenthenodes?
Thereisavararityofsolutionsavailable(suchasSANs),howeverallsolutionsfitintooneofthe
followingthreecategories:
Centralisedstorage.
Distributedstorage.
Replicatedstorage(thiscansometimesbeafeatureofeitherofthetwocategoriesabove)
7.1Centralisedstorage
Centralisedstorageinvolveshavingoneormoredevicesprovidingstoragetoalltheother
nodes.AtypicalexampleisasinglearrayofdiskssuchasSANorNAS,whichallthenodes
connecttoforstorage.
Centralisedstoragesolutionsareoftenfoundinenterpriseserverinstallations,withmany
mediumtolargeorganisationsusingsomethinglikeaSANfortheirdatastorageneeds.
Centralisedstorageispopularforanumberofreasons:
Morecosteffectivetopurchaseasinglearrayofdisksthanpurchasingdisksforeach
server.
Centrallocationallowsforeasierbackupsandmirroring.
Easytoconfigure,easytomaintainifyouneedtoaddmorestorage,there'sonly
onedevicetoupgrade.
However,thereisacommonproblemwithcentralisedstorageOftentherewillbejusta
singledeviceprovidingthestorage.(oftenduetothecostofpurchasingredundanthardware
beingtoohigh,devicessuchasSANsarenotcheap).
Thisintroducesasinglepointoffailureifahardwarefaultoccursinthedevice,itcould
crippletheentirecluster,sinceallnodesrelyonit.Topreventthis,youeitherneedtobe
preparedforthepossibility(andcostduetodowntime)ofadevicefailureorinvestin
redundanthardware.
Copyright2008JethroCarr
Page16/33
IntroductiontoLinuxClustering
Centralisedstoragesystemstypicallyexportthediskspaceasablockdevicewhichappears
onthenodesasalocaldisk,whichthenneedstobepartitionedandformattedsoyoucan
runwhateverfilesystemyouwishontopofit.
ThisdifferersfromnetworkfilesystemslikeNFSwhichappearasamountablefilesystems
andcannotbepartitionedorhaveotherfilesystemsontopofthem.
7.1.1 SANSTORAGEAREANETWORK
ASANisahardwaredeviceconsistingofanumberofharddrivesinRAID.TheSANis
thenattachedtoeachclusternodebyfibrechannel.
Advantages
Highspeedperformance.
Directlyattached,sonoissuesduetonetworkloss,congestion,etc.
Triedandtestedtechnology.
Disadvantages
Expensiveeverynodeneedstohaveafibrechannelcommunicationcardinstalled,
boththeSANandthefibrechannelhardwareisexpensive.
Limitedscalabilitythenumberofclusternodespossiblearelimitedbythenumber
ofinterfacesontheSAN.
Redundancy
Withoutthisredundancyalltheworkandresources
putintodevelopingahighavailabilityclusterwill
bewastedwhentheSANdies.
host2
HotSpare
Replication
Forproperredundancytopreventanoutageinthe host1
eventofahardwarefailure,itisrequiredto
purchasetwoSANswhicharecapableofmirroring
eachother,andhavingfibrechannelcardsinthe
serverscapableoftalkingtobothSANs.
SANorNAS
IdealUse
Suitableforuseinclusterswhereallthenodesarelocatedonthesamephysicalsite,aswell
asbeingsuitableforuseinclustersrequiringmaximumI/Operformance.
Howeverforbudgetconciousorganisations,aNASmaybeabetteroption.
Copyright2008JethroCarr
Page17/33
IntroductiontoLinuxClustering
7.1.2 NASNETWORKATTACHEDSTORAGE
ANASisahardwaredeviceconsistingofaRAIDarrayofharddrives(likeaSAN).
However,insteadofusingfibrechannel,itconnectstoastandardethernetnetworkand
supportsprotocolslikeiSCSIorATAoverethernet.
Itstillappearsasablockdevicewhichisconfiguredtoappearasarealdiskontheserver
Advantages
CanbealotcheaperthanaSAN
Nophysicallimittothenumberofnodespossible.(instead,limitedbythenumber
andspeedoftheethernetinterfacesandnetwork)
Commodityserversworkoutofthebox.
Withhighspeednetworkslike10gigethernet,performancecansurpassfibrechannel
SANs.
Disadvantages
Networkoutagecancauseafailurewhichcancauseoutagespreventingaccesstothe
storage.However,theriskofthiscanbereduced,byrunningtheNAStrafficona
separatenetworktothenormalnetwork.(dedicatedhardware,etc.)
SomenetworkstorageprotocolslikeiSCSIintroduceoverheadsforexampleiSCSI
isaTCP/IPprotocol,whichaddsoverheadofTCPwindowing,headers,etc.Other
protocolslikeATAoverEthernetaddlessoverheadandshoulddefinitelybeafactor
toconsiderwhenevaluatingNAS.
Redundancy
Forproperredundancy,asecondNASshouldbepurchasedandmadetomirrortheprimary
NAS.TheycaneitherbesettohaveafloatingIPaddrress,orbothofthemcanhaveiSCSI
exportswhicharethenmultipathedonthenodes(sothenodecanchoosewhichonetouse).
Ifyourequireadditionalredundancyforthenodes,asecondethernetcardcanbeinstalled
intothenodesandbothcardscanbeputintobondedmode.
Inbondedmode,bothinterfacesworktogethertoprovideonevirtualinterfaceifan
interfacefails,theotheronewillcontinuetowork.
Withmanyserverscomingwithdualethernetinterfacesoutofthebox,thereisusuallylittle
needforfurtherhardwareinvestmentforthenodes.
IdealUse
Fromahighlevelview,aSANandaNASisaverysimilardevice,somakeyourchoices
basedonwhatwillgiveyouthebestvalueforyourinvestmentbasedonyourneeds.
AbudgetmindedorganisationmayfindaNASover100mbit/1000mbitethernetprovidesthe
Copyright2008JethroCarr
Page18/33
IntroductiontoLinuxClustering
bestresult,whereasothersmayfindtheperformanceofaSANtobebetter.
Alsoconsiderfutureexpandabilityifyoudecidetogrowtheclusterinthefuture,aSAN
willprobablybelessexpandablethanaNAS.It'salsoeasiertoupgradethespeedofan
ethernetnetworkwithafasterswitch,ormoreethernetcardsintheservers.
7.1.3
COMMODITYNETWORKFILESHARE
Anotheroptionforcentralisedstorageistheuseofacommodityserverwithalargenumber
ofharddrivesusingRAID.
Thiscanbeusedintwodifferentways:
1. UsingablocksharingsoftwaresolutionsuchasiSCSI,ATAoEorGNBD,to
effectivelycreateacheapNASusingstandardcomputercomponents.
2. RunninganetworkfilesystemsuchasCIFSorNFS.
Advantages
Cheapandsimple.Youcanusestandardcomputerofftheshelffromalocalstoreto
builtthis.
Disadvantage
Doesnotprovidefullredundancyintheeventoffailureofcomponentssuchas
motherboardorCPU.However,thiscanbeoffsetusingsomethingDRBDwhichis
coveredfurtheron.
UsuallylessperformancethanaSANorNAS,whichistunedtoallowmaximumIO.
iSCSIsoftwaretargetsmightnotbesupportedbyyourLinuxvendorandmaysuffer
performanceissues.
Redundancy
Thistendstobelimitedtohardwareredundancyofserver.Typicallyaserverwillonlyhave
diskredundancy(andethernetviauseofbondedinterfaces),althoughsomemoreexpensive
modelsofferPSUandevenCPUredundancy.
OncesolutionthatallowsgoodredundancyistoruntwoidenticalserverswithDRBDand
replicatethefilesystemsbetweentheserverseffectivelybothserverswillhaveexactlythe
samedataonthem.
Idealuse
AniSCSIsoftwaretargetrunningonaservercanprovideacheapNASemulatorforusein
developmentenvironments.
ThissolutioncouldbeusedanywhereaNASis,howeveritwillrequirecarefultuningand
smarthardwarepurchasestogetoptimalperformance.Oneexamplethatwillmakea
differenceiswhereornotyourethernetcardshaveTCPoffloadengines,whichwill
Copyright2008JethroCarr
Page19/33
IntroductiontoLinuxClustering
increaseperformancewhenusingaTCPbasedstorageservicelikeiSCSI.
7.2 Accessmethodsforcentralisedstorage
Whilstcentralisedstoragecansometimesprovidethedataviaanetworkfilesystem,itis
morecommontouseiSCSIoraSAN.
BothiSCSIandSANprovideaccesstothestorageasifitwasalocaldisk.Itisthen
necessarytorunaclustercapablefilesystemontopofthemsuchasGFS.
Noteofinterest:Itispossibletohaveanonclusteredfilesystemonashareddrive,but
seriouslybadissueswouldoccurifyouaccidentallymounteditintwoplacesatonetime.
7.2.1 ACCESSINGSANDIRECTLY
AdecentSANsallowstheadministratortosplittheSANintoanumberoflogicalhard
drives,andthenexportonlythedesireddrivestoeachnode.
ThedrivesappearonthenodejustlikeanylocallyconnectedSCSI/SATAdrive.
7.2.2 ACCESSINGNASWITHISCSI
LikeaSAN,manyNASescanbeconfiguredtosplitthestorageintoanumberoflogical
drives.
BecausetheNASisnotconnectedlocally,itusesaTCP/IPprotocolcallediSCSI.This
meansiSCSIcanberouted,andeventransferredovertheinternet(althoughtheperformance
onthiswouldbeterriblewithoutahighspeed,lowlatencylink).
iSCSIisusedbyattachinganiSCSItarget.OnceconnectedtheiSCSIexportappearsjust
likealocalSCSIharddrive.
Itisimportanttonotethatthenamingofthedrivesmaybechange,thusitisimportantto
useudevtoensurestablenaming.
FurtherinformationabouthowtoidentifyandnameiSCSIdevicesusingudevcanbefound
inthescsi_idmanpage.
7.2.3 ACCESSINGNASWITHOTHERPROTOCOLS
Ifyouareusingsomeotherprotocol,suchasATAoverEthernet,youwillneedtorun
softwareonthenodestomaketheNASsharesappearasblockdevicesontheserver,which
issimilartoiSCSIinconcept.
Copyright2008JethroCarr
Page20/33
IntroductiontoLinuxClustering
7.3 DistributedStorage
Distributedstoragetakesanotherapproachtothestoragemechanism,andinsteadofhavinga
centrallocationofdata,thedataisspreadacrossallthenodesinthecluster,oftenincluding
someformofredundancyinordertobeabletocopewiththefailureofanode.
Unfortunately,thisredundancycomesatacostdistributedstoragesolutionsarecomplex,
andhavetobeabletohandleissuessuchasthefailureofnodes,delaysinthenetwork
linkingthemachinesandlockingissues.
Note:YoumayhavecomeacrossDRBD,whichisatwonodeblockreplicationsolution.
Thisiscoveredinthenextsectionofthisdocument,underReplicatedStorage.
7.3.1 AFSANDREWFILESYSTEM
AFSisadistributedfilesystemwhichcachesdatalocallyonmachines.Currentlythereare
twodifferentimplementationsofAFSforLinux:
OpenAFS(IBMPublicLicense)
AFSimplementationinvanillakernel(underdevelopment)
Thecachingprovidesincreasedspeedandlimitedofflineaccessintheeventofnetwork
failures,buttheserversdonotreplicatethemselves(althoughthatcouldbeachievedwith
DRBD).
However,duetothetypesoffilelockingused,itisnotsuitableforlargeshareddatabases,
andcannothandleasinglefilebeingupdatedbymultipleclients.
AFSwasdesignedtorunservicessuchasmailserversusingmaildirwhereeachemailis
storedasanindividualfile.
7.3.2 LUSTRE
Lustreisadistributedfilesystemsuitableatcreatingmassive(manythousandsofnodes)
distributedfilesystems.
Lustreisquiteacomplextechnologytosetupandunfortunatelydoesnotprovideit'sown
datareplicationsystem.Ifdatareplicationisrequired,thenanothertechnologylikeDRBDis
neededtoperformthereplicationbetweenindividualnodes,whichdoeslimitthescalability.
Copyright2008JethroCarr
Page21/33
IntroductiontoLinuxClustering
LustreStoragePool
host1
host2
host3
host4
7.3.3 CODA
Codaisaninterestingfilesystemwithfeaturesforallowingofflinedatacachingforclient
computers,aswellasserverreplication.
Unfortunately,Codahasonlyreallybeendeployedinresearchsituationsandisthereforenot
suitableforrunninginaproductionenvironment,butisworthamentionhere.
7.4 ReplicatedStorage
Somedistributedandsomecentralisedstoragesystemshaveinbuiltmethodsfordata
mirroring(eg:twoSANswithhardwaremirroringenabled).However,therearealso
softwaresolutionsthatrunattheblocklevelandwhichcanmirroranyfilesystemontopof
them,themostpopularonebeingDistributedReplicatedBlockDevice(DRBD)
7.4.1 DISTRIBUTEDREPLICATEDBLOCKDEVICE(DRBD)
DRBDisacommonlyusedtoprovidedblockleveldiskreplicationontwonodeclusters,by
mirroringthedisksbetweentheserversensuringtheyhavetheidenticaldataonthem.
Unfortunately,DRBDsuffersfromthelimitationof
onlysupportinguptotwonodes,althoughthereisa
commercialclosedsourceversionreleasedbythe
developersthatallowstheadditionofathirdnode.
ThismakesDRBDveryusefulforcreatingHA
twonodeservers,butnotusefulforcreatingalarge
sharedstorageareaforlargemultinodeclusters.
host2
host1
Blocklevelreplication
sda2
sda2
DRBDisanidealsolutionforatwonodecluster
thatisgeographicallyseparatedsuchasmailor
webservers.
Copyright2008JethroCarr
Page22/33
IntroductiontoLinuxClustering
DRBDcanbeconfiguredtoworkinoneoftwoways:
Primary/SecondaryThestoragedevicecanonlybemountedononenode(primary)
atanytime,thetwonodessimplymirrorthestorage.Intheeventoftheprimary
servergoingoffline,thesecondaryservercanbecometheprimary.Thisiscontrolled
bytheclustermanagementsoftware.
Primary/PrimaryInrecentversionsofDRBD,itisnowpossibleforbothnodesto
runasprimarysobothnodescanread/write.Thisrequiresuseofaclustercapable
filesystemsuchasGFStorunontopofDRBD.
Themethodofwritingcanbeconfiguredthedefaultistoonlycountawriteascomplete
oncebothnodeshavebeenwrittento,butotheroptionscanbechoseninordertoimprove
performanceatthecostofreliability.
DRBDiscommonlyusedwiththeLinuxHAclustermanagementsoftware,howeveritis
possibletomakeitworkwithRedhatClusterSuitebypreparingastart/stopscriptforit.
BecauseDRBDonlysupportstwonodes,intheeventofrequiringthreeormorenodes,there
aresomemethodsthatcanbeusedtoworkaroundthislimitation:
1. SetuptwoDRBDnodesthathandleallthestorageandalltheothernodesconnectto
thetwostoragenodeusinganetworkfilesystemlikeNFSorsomeotherprotocollike
ATAoverethernetorGNBD.(effectivelycreatingyourownreplicatedNASdevice)
2. SetupallthenodesinpairseachpairmirrorswithDRBDandthenrunsa
distributedfilesystemsuchasLustreorAFSontopofthem.Thiswillalwayshave
theweaknessthatthefailureofbothnodesinapairwouldcausefailureoftheentire
array,butotherwisethefailureofanyonenodeinanyofthepairswillnotdisruptthe
filesystemservices.
3. RunDRBDontopofDRBDthisisnotarecommendation,itismentionedhere
becausesometimespeopledothis.DON'TDOTHIS.Itintroducesahugenumberof
problemsandlimitationsaswellastheunknownstabilityofrunningDRBDontop
ofDRBDyetagain.
4. ModifytheDRBDcodebaseinordertoaddsupportforadditionalnodes.Theredoes
notappeartobeanyobvioustheoreticalreasonswhythiswouldn'tbepossible,it
shouldjustbeacaseofaddingadditionalnodesandperhapsapplyingmodifications
tothedistributedlockmanagertomakeitsuitableforthreenodes.
Therewouldobviouslybemoreofaperformanceimpactduetoincreasedamountof
overheadforeachnodeadded,howeverastechnologyadvances,thisshouldbecome
lessofaproblem.
Copyright2008JethroCarr
Page23/33
IntroductiontoLinuxClustering
7.5 ClusteredFilesystems
WhenusingablocklevelstoragesystemlikeiSCSI,SAN,GNBDorDRBDacluster
capablefilesystemshouldbeused,inordertoallowmultiplenodestoreadandwriteatthe
sametime.
Aclusteredfilesystemdiffersfromaconventionalfilesystembyincludingfeaturestohandle
filelockingandjournallingfrommultiplenodes.
7.5.1 CLVM
ItispossibletorunLVMontopofacentralisedblockstoragedevice,byenablingclustered
lockingintheLVMconfigurationandrunningtheCLVMservicetogetherwithCMANfor
clustering.
OnceclusteredLVMisenabled,itcanbeusedintheexactsamewayasconventionalLVM.
7.5.2 GFSGLOBALFILESYSTEM
GFSisaclusteredfilesystemdevelopedandsupportedbyRedhat,andavailableonRHEL.It
isfullyopensource,andRedhatarecurrentlyworkingongettingitmergedwiththe
mainstreamkernel.
GFShasanumberofpowerfulfeaturesthatmakeitidealforuseinproductionclusters:
Triedandtestedtechnology,fullysupportedforcustomersofRedhat.
Scalesuptohundredsofclusternodes.
Supportsextendedaccesscontrollists.
Supportsuserquotas.
Dynamicsymlinks(knownasContextDependentPathNames)whichallowthe
symlinktopointtodifferentlocationsdependingonvariousvariableofthenode
usingit.Idealfornodedependentconfiguration.
7.5.3 EXT4(UNDERDEVELOPMENT)
Mostpeoplearefamiliarwithext3whichisthedefaultLinuxfilesystemforalmostall
distributions.
In2007developmentstartedonext4,whichwillfixthelimitationsofext3.Oneofthenew
featuresthatisbeingdevelopedwiththisreleaseissupportforclusteredfilesystems.
However,itislikelythatext4willnotbereadyforproductionuseforanumberofyearsand
isonlymentionedhereforthereader'sinterest.
Copyright2008JethroCarr
Page24/33
IntroductiontoLinuxClustering
8ClusterExamplesandChallenges
Therearenumerouswaysyoucanconfigureacluster,whichwilldependonyourrequirementsand
budgets.Therearealsosomecomplexrequirementsifyouwishtohavevirtualizationwiththe
clusteraswellasbuildinggeographicallyseparatedclusters.
Thisdocumentdetailsanumberofexamplesofclusterdesignsthatmaybesuitableforyouandto
justgiveyouanunderstandingofwhatispossibleaswellasdiscussionsoftheproblemsand
limitationswitheachdesign.
8.1 TwonodeHAclusterwithDRBD
Acommonhighavailabilityrequirementistomakeaparticularserversurviveanyhardware
failure.Thesolutiontothis,istoaddasecondidenticalserverandsetupaHAcluster
betweenthem.
Tomakeatwonodehighavailabilityclusterwork,wehavethefollowingsetof
requirements:
Datamustbeaccessiblebybothservers,withbothserversbeingabletoread/writeat
thesametime.
Ifanindividualservicedie,itshouldresumeonthesecondaryserver.
Ifoneserversuffersacompletefailure,thesecondaryservershouldresumealltasks.
Solution:
Bothserversidenticalhardware,runningRedhatClusterSuite.
Localrootandswapfilesystems,remainingdiskspaceturnedintoDRBDblock
devicesetupwithbothnodesrunningasprimary.
DRBDdevicerunningGFSwithjournalspacefor2nodesandbothnodesmounted
atthesametime.
EachserviceconfiguredinclustersuitewithafloatingIPaddress.Anyservicethat
failswillrelocatetothesecondnode.
Intheeventofafullserverfailure,thesecondnodewillresumeallservices.
Notes:
DRBDisusedprimary/primarywithGFSsothatbothserverscanberunning
servicesatonce.Thismaybeundesirableorunwantedifyouonlyhaveonefloating
IPaddress,inthisscenarioyouneedtogroupalltheservicesusingthatIPtogether.
Inthatcase,itwouldalsobeokaytorunDRBDasprimary/secondarywitha
traditionalfilesystemlikeext3orxfsandhavealltheservicesconfiguredasasingle
resource,toallfailovertogether.
Copyright2008JethroCarr
Page25/33
IntroductiontoLinuxClustering
remotehost
for
quorumping
Intheeventofanynodefailing,Redhat
ClusterSuitewillmovealltheservicesto
thealternatenodeandswitchDRBDfrom
secondarytoprimaryonthenewmaster
node.
internet
Anetworkfailurerunstheriskofasplit
brainsituationifneithernodecansee
theother,theywillbothtryandbecome
master.Tofixthis,runpingtoaremote
servertoprovideatiebreakingthirdvote
(Quorumheuristics).
IfyouhaveaSANorNAS,insteadof
storingthedatalocallyandreplicating
withDRBD,theattachedstoragedevice
couldbeusedinstead.
smart
power
switch
switch
floatingIP
host2
host1
sda
DRBD
sda
TheSANcouldbesetupwithanonclusteredoraclusteredfilesystem,thedifference
beingthataclusteredfilesystemisrequiredifyourequirebothserverstobeableto
runservicesatonce
IfusingaSAN,theclusterisscalabletomorethantwonodes,buttheSANcould
becomeasinglepointoffailurefortheclusterandarealotmoreexpensivethan
softwaresolutionslikeDRBD.
SuitableEnvironments:
MakinganymissioncriticalserverHA.
Anybusinessororganisationthatcannottoleratehardwaredowntimeoftheir
productionsever.
Ecommerceserversthatneedtoprovidemail/websites/databases.
Smallhostingorganisations.(largeronesshouldusedesignsliketheonebelow)
8.2 FivenodeHAclusterwithDRBD
OneoftheproblemswithDRBDisthatitonlyworksfortwonodeclusters.Itispossibleto
addathirdnodeifthecommercialversionispurchased,butnoDRBDsolutionexistswhich
canworkmorethanthreenodesatmax.
Whenbuildingclusters,itismoreeconomicaltohaveasinglemultinodeclusterratherthan
manytwonodeclusters,asonlyonecomputerneedstobesetasideforspareresources.
Copyright2008JethroCarr
Page26/33
IntroductiontoLinuxClustering
SomeinstallationsuseSANswhichlimitstheclustersizebythenumberofinterfacesonthe
SAN.However,SANsareveryexpensiveandrequirespecialhardware.
Acheapersolution,istobuildtwocomputerswithplentyofstorageinthemusingoffthe
shelfpartsandthentouseDRBDtocreatewhatiseffectivelyaHANAS.Thesetwostorage
nodesmirroreachotherandcantransparentlytolerateeitheroneofthetwonodesfailing.
Thesetwostoragenodescanthenexporttheavailablestorageusinganetworkfilesystem
likeNFSorablocklevelservicelikeGNBD,whichtherestofthenodescanuse.
Dependingonyourapplicationstheremaybenoneedtohavelocaldisksinanyofthe
serversandtheycanallrundirectlyoffthenetwork.
HereisanexampleforafivenodeclusterusingDRBDforstorage,providingarangeof
servicessuchasHTTPandMAIL.
Solution:
TwonodeswithRAID5harddrivestorageineachnodestoragenodes.
Threenodeswithnodisksproductionnodes
StoragenodearesetupinPrimary/Secondarymode,withLVMandext3ontopofthe
DRBDlayerwithNFSexportsofthedata.
StoragenodesprovideuserauthenticationviaNIS/LDAP/Kerberos.
StoragenodesprovidepxelinuxandDHCPfornetworkbooting.
Quroumvotesaresetupinsuchawaythatfailureofbothstoragenodeswillcausea
clusterfailureresultinginallservicesstopping.
Productionnodesbootofftheactivestoragenodeusingnetbootandmounttheroot
filesystemusingNFS.Alltheproductionnodesrunthesamesoftwarebuild.
Servicesarespreadacrossthethreeproductionnodesifanynodefails,theservices
areresumedonanotherone.
Notes:
TheabovedesigncanalsobeusedwithasmalltwointerfaceSAN.TheSANcanbe
connectedtobothstoragenodesinsteadofusinglocaldisksandthedatathen
exportedviaNFS.
Toincreaseredundancy,twoSANscouldbeused,withoneconnectedtoeachstorage
serverandmirroringdoneeitherbetweentheSANsthemselvesorusingDRBDon
thestoragenodes.However,standardharddriveswillusuallybecheaperandthus
willprobablybeabettersolution.
SuitableEnvironments:
Idealforhostingprovidersinparticular,sharedwebhostingandemail.
Idealforlargecompaniestoincreaseserveravailabilityandtocentralisestorage.
Copyright2008JethroCarr
Page27/33
IntroductiontoLinuxClustering
8.3 FivenodeHAclusterwithDRBD+Xen
ThefivenodeDRBDclusterdesignabovecanbeextendedtobecomeaHAXencluster.The
threeproductionnodescanbeconfiguredasXenhosts,withtheXenguestsbeingbooted
fromthenetworkandusingNFSforstorage(justlikethehoststhemselves).
ClusterSuiteisrunningonthehostnode,andintheeventofaXenVMfailingortheentire
hostitselfsufferingafailure,ClusterSuitecanstarteachXenVMonanalternatehost,with
eachXenVMbeingaclusternodeandrunningasingleormultiplenumberofservices.
TheXenhostscanbeconfiguredtobepartofthesamedomain,whichalsoallowslive
migrationofXenVMssoifonehostserverisbeingheavilyloaded,someVMscanbe
movedtoanotherhostwhilsttheyarestillrunningwithnodowntimeatall.
Notes:
Optionally,insteadofhavingtheXenVMsbelongtothesameclusterasthehosts,
theXenVMscouldbesetupintheirownclusterwitheachXenhostrunningtheir
owncluster,leavingthehostclustertoonlydealwiththeXenVMasawhole.See
theXensectionofthisdocumentfordetailsonadvantages/disadvantageswiththis
method.
SuitableEnvironments:
Idealforhostingproviders
Idealforlargecompaniestoincreaseserverutilisationandavailability.
IdealforITcompaniesthatneedlargenumberofserversforvariousapplicationand
developmentneeds.
8.4 Geographicallydistributedclusters
Alltheclusterdesignsshowhavebeenforuseinonephysicallocationtheclusternodes
areallsittinginarack,connectedviaethernet,andisabletosupportfloatingIPaddresses
becausethereisonlyonerouteintothecluster.
However,acommondesireistohavegeographicallydistributedclusterstopreventfailureof
asinglesitetakingthewholeclusteroffline.
Typicalusesforthismightbe:
Acompanywithofficesintwocitieswouldliketohaveoneserverateachoffice
withthedatamirroredbetweenthem.
Anecommercewebsitewantingreplicatedemail,websiteanddatabaseservices
betweentwositestoensureavailability.
However,therearesomebigissueswithageographicallyseparatedclusterthatneedtobe
Copyright2008JethroCarr
Page28/33
IntroductiontoLinuxClustering
solved:
Internetconnectionsareslowdataneedstobemirroredatbothsitesinawaythatis
bandwidthfriendlyandtransferreddatawhenchangesaremadeneedstobeminimal.
Internetconnectionsfail/haveoutagesfairlyfrequently.Anysolutionmustbeableto
handlethiswithoutsplitbrainissues.
DRBDisanidealcandidatefordistributedclusters,butisonlyabletoscaletotwo
nodes(orthreeusingthecommercialversion).Thiscausesacomplicationfor
organisationswithmorethanthreeofficesthattheywantmirroredserversin.
Youcan'tfloatanIPaddressesaroundacountryunlessyou'reanISP.Buteventhey
can'tfloatanIPaddresstoaserverontheothersideoftheworld.
Fencingismorecomplicatedasecondary(independent)managementnetworkis
usefulinordertobeabletocommunicatetofencingdevicesinotherlocations.
Withoutasecondaryconnection,acrashedservercannotbefencediftheproduction
networkgoesdown,althougharunningserverwillstoptheclusteringifitloses
quorumsosplitbrainisnotanissue.
Howcanwesolvethis?Thereareanumberofsolutions:
Useadistributed,locallycachingfilesystemlikeAFS,whichwillcachecommonly
accesseddata.
HavetwomainofficesrunningDRBDandallotherofficesmustconnectusinga
networkfilesystem,perhapsassistedbyaproxy/cachingdeviceforimproving
performanceatthesites.
DevicessuchasWANacceleratorscanbeusedtomakeoptimaluseofnetwork
performanceandsomemodelshaveinternalharddrivesthatcancachesomedata
likeHTTPorSMBtraffic.
FloatDNSnamesratherthanIPaddresses.Whenanodegoesdownwhichis
providingpublicservices(eg:websites)havethenewnodeprovidingtheservice
connecttotheDNSserverandchangetheArecords.
Theproblemwiththismethod,istheDNSchangescantakesometimeto
synchroniseacrossthewebthiscanbereducedbysettingyourDNSTimeToLive
(TTL)toalowvalueeg:5minsbutitmaynotbehonouredbyallDNScaching
servers.(althoughasageneralruleitis)
Inaddition,itispossibletosetupRoundRobinDNSservers,whichcanallowyouto
Copyright2008JethroCarr
Page29/33
IntroductiontoLinuxClustering
loadbalancebetweenyourgeographicallyseparatedserversthisisgoodfor
servicessuchasHTTPorreadonlydatabaserequests.
8.4.1 TWONODEDISTRIBUTEDCLUSTER
Asdiscussedabove,atwonodedistributedclusterisabletouseDRBDdatareplication
whichsolvesthemajorchallengeofmirroringthedata.
Thistwonodesetupisanidealsolutionfororganisationswhichwanttheredundancyofan
offsitemirrorsuitableforprovidingbothredundancyandloadbalancing.Itisalsosuitable
fororganisationswhichhavetwoofficesandwantsidenticalserversatbothlocations.
Thisexampleassumesthefollowingenvironment:
Twogeographicallyseparatedservers,identicalhardware.
Networkbetweenbothserverscontrolledbyanoutsideparty(thereforefloatingIP
addressesarenotpossible)
Noplanstoscalebeyondtwonodes.
Requirements:
Haveaserveratbothlocations.
Bothserversneedtoofferfilesharingtothelocalnetworksateachofficewith
Samba.
Mail,DBandHTTPservicesrequiredtobeHA.
Solution:
ServerssetupwithDRBDinprimary/primarysetup(bothserverscanread/write).
GFSfilesystemtoallowbothnodessimultaneousaccesstothedata.
Usepingtoaremoteservertodeterminetiebreakerthirdvote.(Quorumheuristics)
FloatingDNSnamesforservices.
Clustersuitecapableofmigratingindividualservices.
MAIL,DBandHTTPserviceswillonlyrunononeserveratanytime(otherwise
interestinglockfileissueswouldoccur).
Intheeventofeitherserversufferingafailure,theserviceswillberelocatedtothe
alternateserverandtheDNSrecordswillbeadjustedtoredirecttraffic.
Sambarunsonbothserversprovidingaccessforthelocalnetwork.Intheeventofa
serverfailure,theDNSnameforthelocalnetworkserverwillbechangedtobethe
alternateserver.
Copyright2008JethroCarr
Page30/33
IntroductiontoLinuxClustering
Internet
remotehost
for
quorumping
www.example.com
(floatingArecord)
host2
host1
DRBDacrossVPN
sda
smartpwrsw
sda
GSMoutofband
management
network
smartpwrsw
8.4.2 THREE+NODEDISTRIBUTEDCLUSTER
Thetwonodedistributedclusterdetailedabovewillworkfinewithtwonodes,butwhat
aboutwhenthreeormorenodesarerequired?
Wewantdatareplicationbetweentheservers,butDRBDwillonlyworkwithtwonodesin
primary/primarymodes.Thisgivesustwooptions:
1. RunDRBDbetweenthetwomainservers,andtheotherserverscanconnecttoone
ofthetwoserversviaanetworkfilesystemlikeNFS.
Thisisquitesimple,buthastheobviousflawofdiskI/Obeinglimitedtothespeed
oftheWANconnection,aswellasthetransferscausingdatacapusageonnon
flatrateconnections.
Dependingonyourusageandrequirements,thismayormaynotbeaproblem.
2. Runadistributedfilesystemwithreplicationsupportonallthenodes.
Unfortunatelytherearecurrentlynodistributedfilesystemsavailablewhichcan
providebothreplicationanddistributeddata.
Therefore,thenextbestsolutionisausingadistributedfilesystem,withDRBD
underneathittoprovidetheredundancy.
Copyright2008JethroCarr
Page31/33
IntroductiontoLinuxClustering
ThisiscoveredinmoredetailsintheDRBDfilesystemsectionearlierinthis
document,butbasicallyyoudividetheclusternodesintopairs.
EachpairrunsDRBDprimary/primarytoensuredatareplication.Ontopofthat,you
runadistributedfilesystemsuchasLustreofAFS.However,thiscanbequite
complexandhastheweaknessofbeinglimitedbyhavingreplicationonlyon2
nodes.
Three+distributedclusterscanbequitecomplexandrequirealotofplanningtomakesure
theywillworkreliablyandhavespeedyaccesstostorage.Thebestsolutionwilldepend
greatlyupontheapplicationsyouneedtorun.
Copyright2008JethroCarr
Page32/33
IntroductiontoLinuxClustering
9FurtherReference
Thefollowingresourcesaregoodfurtherreadingforinformationonsettingupclustersolutions:
ClusterManagement:
Redhatclusteringguides(includesinfoonGFS)
http://www.redhat.com/docs/manuals/csgfs/indexmaster.html
LinuxHAdocumentation
http://www.linuxha.org/
Storage:
DRBD(blocklayerreplication)
http://www.drbd.org/documentation.html
AFS(distributedfilesystem)
http://en.wikipedia.org/wiki/Andrew_file_system
Lustre(distributedfilesystem)
http://www.lustre.org
Training/Courses:
Additionally,ifyouareanRHCE,Redhat'sRH436trainingisagoodcoursethatteachesyouhowto
configureclustersandsharedstorageonRHELwithRedhatClusterSuite.
https://www.redhat.com/courses/rh436_red_hat_enterprise_clustering_and_storage_management/
Copyright2008JethroCarr
Page33/33