Introduction Linux Clustering 1.1

INTRODUCTION TO LINUX CLUSTERING
DOCUMENTRELEASE1.1
Copyright2008JethroCarr
Thisdocumentmaybefreelydistributedprovidedthatitisnotmodifiedandthatfullcreditisgiven
totheoriginalauthor.
Ifyoupublishthisdocumentanywhere,pleasedoletmeknowviaemail,andifitispublishedina
physicalmedium,sendingmeacopywouldbeappreciated.
Email:
Website:
jethro.carr@jethrocarr.com
www.jethrocarr.com
TableofContents
1Introduction......................................................................................................................................4
2Aboutclusters...................................................................................................................................5
3Advantagesandreasonsforclustering.............................................................................................6
4Clusteringfundamentals...................................................................................................................8
4.1Basics........................................................................................................................................8
4.2Importantclusteringcomponents.............................................................................................8
4.2.1Failover..............................................................................................................................8
4.2.2Fencing..............................................................................................................................9
4.2.3SplitBrain.......................................................................................................................10
4.2.4Quorum...........................................................................................................................10
5Clustermanagementsoftware.........................................................................................................12
5.1RedhatClusterSuite................................................................................................................12
5.1.1ManagementandConfiguration......................................................................................12
5.1.2luciandricci....................................................................................................................13
5.1.3systemconfigcluster.......................................................................................................13
5.1.4Loadbalancing.................................................................................................................13
6CombiningXenwithclusters.........................................................................................................15
6.1VMsaspartofmaincluster...................................................................................................15
6.2VMsrunsasaseparatecluster................................................................................................15
7StorageManagement.......................................................................................................................16
7.1Centralisedstorage...................................................................................................................16
7.1.1SANStorageareanetwork.............................................................................................17
7.1.2NASNetworkAttachedStorage....................................................................................18
7.1.3CommodityNetworkFileShare......................................................................................19
7.2Accessmethodsforcentralisedstorage...................................................................................20
7.2.1AccessingSANDirectly..................................................................................................20
7.2.2AccessingNASwithiscsi................................................................................................20
7.2.3AccessingNASwithotherprotocols...............................................................................20
7.3DistributedStorage..................................................................................................................21
7.3.1AFSAndrewFileSystem..............................................................................................21
7.3.2Lustre...............................................................................................................................21
IntroductiontoLinuxClustering
7.3.3Coda.................................................................................................................................22
7.4ReplicatedStorage...................................................................................................................22
7.4.1DistributedReplicatedBlockDevice(DRBD)................................................................22
7.5ClusteredFilesystems..............................................................................................................24
7.5.1CLVM..............................................................................................................................24
7.5.2GFSGlobalFileSystem...............................................................................................24
7.5.3ext4(Underdevelopment)...............................................................................................24
8ClusterExamplesandChallenges..................................................................................................25
8.1TwonodeHAclusterwithDRBD..........................................................................................25
8.2FivenodeHAclusterwithDRBD..........................................................................................26
8.3FivenodeHAclusterwithDRBD+Xen...............................................................................28
8.4Geographicallydistributedclusters.........................................................................................28
8.4.1Twonodedistributedcluster............................................................................................30
8.4.2Three+nodedistributedcluster.......................................................................................31
9FurtherReference...........................................................................................................................33
Page2/33
Page3/33
1Introduction
Oneoftheoldestproblemsofcomputingisdesigningfailureproofcomputingsystems.Overthe
years,manydifferentmethodshavebeendeveloped.
Manyoftheseyouwillbefamiliarwith,including:
Sparehardwaretypicallyenoughtomirrortheproductionhardware.
Faulttoleranthardwarebyusinganumberofsparecomponents.
Softwarefailoverfeaturesbuiltintoindividualprograms.
Hardwaresolutionsaretypicallyexpensive(andnotalways100%reliant).Applicationspecific
failovermethodscanoftenaddmaintenancehasslesaswellasdoingnothingtofixtheproblemthat
someofyourprogramsmayhavenofailovercapabilitiesatall.
Inrecentyears,afewnewoptionshavebecomeavailableinparticularLinuxclusteringand
virtualization.
ThisdocumentpresentsthekeyfactorsinimplementingeffectiveLinuxclustersanddesign.Please
notethefollowing:
Clusteringisarelativelynewtopictome,andIhavenothadalotofexperiencedeploying
andmaintainingclusters.
Thisdocumentistheresultsofmyresearchintotheoptionsavailableandlooksatwhat
solutionscouldbedeveloped.Ifyouhaveexperiencewithclustersandhavefound
technologiesthatdoordonotworkwellinpractice,pleasesupplymewithfeedbacksoI
canextendthisdocumentandmakeitmoreuseful.
ThisdocumentmentionsRedhatClusterManagerabit,butdoesn'tgointodetailsaboutthe
othermajoroption,LinuxHA.However,mostoftheconceptsandterminologyisapplicable
forbothsolutions.
Thisdocumentdoesnotcoverthetechnicaldetailsonconfiguringclusters,itismoreofa
highleveldesignview.
ThisdocumentwillalsocovertheuseofclusteringtogetherwithXenvirtualizationformaximum
advantage.
Theaimofthisdocumentistoprovideyouwithanunderstandingofclustersolutionssothatyou
areempoweredtoidentifyapplicabletechnologiesanddecideonthebestapproachtousethem.
Page4/33
2Aboutclusters
Therearethreemainreasonstouseclustering:
Betterperformance
Faulttolerancebyhighavailabilityservices.
Optimalusageofdiskresources.
OftenyouwillhearabouthighperformancecomputingsolutionsusingLinuxclusterstocreate
smallsupercomputerssuchsystemsareusuallyreferredtoasbeowulfclusters.
Thesesystemstypicallyrunhighlycustomisedapplicationswhicharedesignedtorunonmultiple
computersystemsatonce,andarebeyondthescopeofthisdocument.
Thisdocumentwillcoverthefollowing:
Highavailabilityclusters.
Sharedstoragesolutions
Otherclusteringconsiderations.
Page5/33
3Advantagesandreasonsforclustering
Clusteringprovidesanumberofadvantagesovertraditionalstandaloneserverconfigurations.First
thereareanumberofobviousones:
HighAvailabilityofsystemservices.
Theclustermanagementsoftwarewillhandleservicefailuresandwillquicklybringupthe
serviceonalternatehardware.
Betterutilisationofsystemresources.
Servicescanbespreadaroundallthenodesheavynodescanbelightenedbymoving
servicesaway,lightlyloadednodescanhavemoreservicesstartedonthem.
WhenyoucombineXenwithclustering,evenmoreoptionsbecomeavailable,asyoucan
splitonenodeintomanydozensofnodes.
Optimisationofdiskresources.
Ratherthanhavinglotsofsmalllocaldisks,storagecanbecentralisedordistributedacross
allmachinesmakingbetteruseofthestorageavailable.
Therearealsosomeotherusefuladvantages:
Usabilityofolderhardwareorwhiteboxhardware
ITdepartmentsliketopurchasenewserversfromabignamevendorlikeIBMorDell,who
willthenprovide5yearsofhardwarereplacementandservice.However,oncethis5year
periodisover,thehardwareisnolongersupportedandthecustomereitherhastocarrytheir
ownspareparts(whichcanbetooexpensiveforasmallorganisation)orupgradetonewer
hardware,whichrequirestime,moneyandefforttomigrateallthedataandapplications.
Withclustering,ifthehardwarefails,anothernodewilltakeuptheworksocompaniescan
buildtheirITinfrastructurearoundolderhardwareorcustombuilthardwaresavingalotof
money.
Somehostingprovidersmayevendecidetoleaveoldhostintheclusterandtojustkeep
addingnewones,andonlypullouttheoldonesoncetheyfailoroncetheyreachanage
wheretheyareuneconomicaltokeeprunning.
Fastermoreefficientsystemadministration
Inserverfarmswherealltheserversareconfiguredindividually,itcanoftenbequitehardto
Page6/33
migrateaservicefromonecomputertoanother,whichcansometimesberequireddueto
securityorperformancereasons.
However,inaclusteredenvironment,alltheserversaretypicallyidentical.Plus,anyservice
thathasbeensetupinthecluster,canbemovedfromonehosttoanotherbythesimple
executionofasinglecommand.
Page7/33
4Clusteringfundamentals
4.1 Basics
Highavailabilityclusteringisacomplextopic,anditisimportanttofullyunderstandkey
conceptsbehindit.
Thebasicsaresimple:
Eachcomputeriscalledanode.
Twoormorenodesformacluster.
Intheeventofafailureofanyoneofthenodes,theremainingnodeswilltakeupthe
workbeingperformedbythedeadnode.
Whatmakesclusteringcomplex,ishowtheclusterhandlesnodefailures,shareddiskstorage
andsituationssuchassplitbrain.
Aclustertypicallyworksasfollows:
Allthenodesruntheclustermanagementsoftware(eg:LinuxHAorRedhatCluster
Suite),whichcontrolsstuffsuchasheartbeats,applicationstarting/stoppingand
keepingquorum(moreonthislater).
Oneofthenodesrunsanadministrationapplication,thatallowsyoutomanually
add/removenodesandprovidestheabilitytomanuallymoveapplicationsfromone
nodetoanother.
Intheeventofafailurewithanode,theothernodesfenceit,whichinvolvesusinga
hardwaredevicelikeamanagedpowerswitchtophysicallyturnthenodeoff.Thisis
donetopreventthenodefromwritingtoanyofthestoragedevicesandcorrupting
thedata.
Theothernodesthendecidewhichnodeshouldruntheapplicationsthatwereonthe
deadnode,andoneofthenodeswillbechosen(dependingontheconfig)andwill
startuptheapplication.
Alltheclusternodeshaveaccesstoacentralstoragearray(eg:aSANornetwork
attachedstorage).Thisstoragelocationrunsaclusteredfilesystemwhichallowsall
thenodestoreadandwriteatthesametime.
4.2 Importantclusteringcomponents
4.2.1 FAILOVER
Thereare3typesoffailovermethodsthatexist.
Page8/33
1. Hotfailover
Inahotfailover,theapplicationiswrittenspeciallyforclusteringandisableto
continuerunningonanothernode,withoutanyinterruptiontoclientservices.
Hotfailoverisnotoftenfoundincommonlyusedapplications,andisusuallyfound
inspeciallywrittenprogramsforbankingortelcosituations.
2. Warmfailover
WarmfailoveriswhatsolutionslikeLinuxHAandRedhatClusterSuiteprovide
theapplicationdoesn'thaveaninstantrecoveryfeature,buttheclustersuitequickly
restartstheapplicationwithoutminimalclientdisruptiononanotherpieceofrunning
hardware.
Usingaclustermanagementsolution,theapplicationdoesnotneedtobewrittento
supportclustering,soyoucanprovideredundancyforanyserviceyoudesire.
Thiscansometimescauseasmalloutage,assomeapplicationscan'ttoleratethe
changeoftheserverinthebackground.Inothercauses,theapplicationisableto
continueon,withnointerruptiontotheclientuserswiththeexceptionofabitofa
delay(eg:NFSisgoodatnotbeingaffectedbyaserverchangebehindthescenes).
3. Coldfailover
Coldfailoveriscommonlyusedasasolutionforredundancywhereaclusterwasnot
abletobesetup.Inacoldfailover,thedeadcomputerneedtobepowereddown,and
asparecomputerstartedup.Thisisusuallyamanualprocess.
4.2.2 FENCING
Whenanodecrashesorbecomesunresponsive,itMUSTbequicklypoweredofforblocked
fromthestoragedevice(fencing).
Fencingisrequired,becauseiftheclusterassumesthenodehascrashed,andreallocatesit's
servicesandIO,iftheserverwastowakeup,itcouldcausehavocandpossiblydisk
corruption.
ThereforeafailoverdevicemustbeavailablesothattheclustercandoSTONITHShoot
TheOtherNodeInTheHeadbydoinganinstantpoweroffofthenode.
VariousdevicesexistsmartpowerswitchesareusuallyusedandscriptsexistinRedhat
ClusterSuitethatcanconnecttoanumberofcommonlyavailabledevicestoshutdown
Page9/33
servers.
Otherfencingdevicesincludefencingat
theSANlevelaswellasXenVMfencing,
howeveritisrecommendedthatpower
fencingbeusedratherthanSANfencing
asitwillguaranteethatthenodeis
completelykilledandnotdoinganything
unwanted.
4.2.3 SPLITBRAIN
Splitbrainisanastyproblemfoundinclustering,andrequirescarefulthoughttoprevent.
Considerthefollowingscenario:
1. AtwonodeclusterexistsoneserverincityA,oneserverincityB.
2. Theinternetlinkbetweenthetwocitiesfallsover.Neitherservercancontacteach
other.
3. Eachserverassumestheotheroneisdown,andbothresumeactivitiesasthemaster.
4. Whenthelinkcomesbackonline,datacorruptionoccurs.
Othernastyproblemscanoccurifthetwonodesarestillabletofenceeachotherviatheout
ofbandmanagementsystem,asyoumayendupwitheachnoderepeatedlypoweringoffthe
othernode.
Topreventthisfromhappening,wehaveasolutioncalledQuorum.
4.2.4 QUORUM
Quorumiseffectivelyascoringmethod,whereeachnodeintheclusterhasanumberof
votes(bydefaultone).Eachonlineclusternodeaddsit'svotestothequorumcount,andas
longasthequorumcountislargerthan50%ofthecombinedvotes,theclusterisintact.
Iftheclusterfallsbelowquorum,theclusterhaslostquorumandallserviceswillshut
downandbecomeunavailable.
Thisisactuallyadesiredfeature.Consideraclusterwith10serversandtomaintainquorum,
ascoreof6isneeded.Intheeventofthenetworksufferingafailureandcausingthecluster
tospitintotwo,thesmallerhalfwillshutdownandthelargerhalfwillcontinueon.This
preventssplitbraininclusters.
Whataboutsituationswherethereareanevennumberofnodes,suchas10nodes?Itwould
bepossibleforthenetworktosplitintotwoequallysizedclusters.Therefore,anycluster
withanevennumberofnodesrequiresatoneofthenodestohaveanadditionalvoteinthe
Page10/33
quorumtounbalancethequorumvoting.
Effectively,quorumallowsyoutomakeanevencluster,unevenwhenitcomestofailovers,
sothatsplitbrainwillnotoccur.
Butwhathappenswhenyouonlyhaveatwonodecluster?Afailureofeithernodewould
causetheclustertolosequorum,asbothmachineshaveequalvotes.Inthiscase,youneeda
tiebreakervotetodeterminethemaster.
Therearetwowaysthiscanbedone:
1. IfusingaSAN/NAS,thecentralstoragedevicecanbeusedtoprovidethetiebreaker
votebyusingacustomfilesystemcalledaquorumdisk.
2. Setupaheuristicstest.Thiscanbeanyprogram(typicallysomethinglikeping)that
canprovideanothervoteiftheprogramcompletessuccessfully.Inatwonode
cluster,themostlikelycandidatewouldbetopingthenetworkgateway,oraremote
switch1
host1
host2
switch2
host3
host4
host5
host6
Votes=4
Votes=2
host14formanactivecluster
hosts56getfencedbytheothers
Page11/33
5Clustermanagementsoftware
Tocontroltheclusterandthemovementofservices,aclustermanagementapplicationisrequired
LinuxhastwomainoptionsavailableLinuxHAandRedhatClusterSuite.
RedhatclustersuiteisonlyfoundonRedhat'sdistributionsFedora,RHELandotherderivatives
suchasCentOS,whereasLinuxHAisfoundinawiderrangeofdistributions.
Mostoftheconceptsandideasbetweenthesetwosolutionsarethesame,sotheknowledgegained
usingoneislikelytomakeiteasytouseanother.
5.1 RedhatClusterSuite
ClusterSuiteisdesignedforcreatingHighAvailabilityclustersandthedefaultstepsto
configureaservicewillresultinaHAservicedependingonthenumberofnodesyouput
intothefailoverdomain.
5.1.1 MANAGEMENTANDCONFIGURATION
Configurationoftheclusteriscontrolledbythe/etc/cluster/cluster.xmlfile,whichisan
XMLformatfile.Oncechanged,youcanrunacommandtoredistributethefiletoallthe
othernodesinthecluster.
However,mostpeopledoconfigurationusingeithersystemconfigcluster(GTKGUI
application)orluciwhichisawebbasedutilityforclusterconfiguration.
Theconfigurationoftheclusterisbrokeninthesamecomponentsregardlessofthe
configurationmethodchosen:
Resources
Resourcesareanythingthatmakesupaservice.Forexample,aresourcemaybe:
1. AnIPaddress.
2. Amountpoint.
3. Asystemservice(eg:httpd)
Service
Aserviceisagroupofresourcesthathavebeengivenaname.Theservice
configurationallowsyoutosetwhatordertheresourcesarestarted/stoppedandgives
youanamethatyoucanusetocontroltheservice.
Failoverdomains
Page12/33
Whenaclusternodefails,theservicesthatarerunningonitneedtobemigratedto
othermachines.Theclustermanagementsoftwarewilllookatalistofothernodes
calledthefailoverdomain,andwillselectonefromthelisttoruntheservice.
Thelistcanalsobeprioritisedifdesiredboththeorderofwhatserversshouldbe
usedinafailure,aswellaswhetherornottheserviceshouldbemovedbacktoa
higherprioritynodewhenonebecomesavaliable.
Intheeventoftheservicerunningoutofonlinehostsinthefailoverdomain,the
servicewillbestoppeduntilanodecomesbackonline.
Fencingdevices
Afencingdeviceisusedtopowerofforresetunresponsive/crashedclusternodes.
Thisistypicallysomethinglikeanetworkcontrolledpowerstrip,oraoutofband
managementcardintheserver.
5.1.2 LUCIANDRICCI
Lucidoesnothavetoberunontheclusteritself,althoughthatistherecommendedmethod
asyoucanclusterluciandthusbeabletoalwaysadministratethecluster.
ThericcidaemonrunsonallthenodesandallowsLucitocommunicatewiththenodesto
configurethem.
Luciissmartenoughtoinstallthepackagesitrequiresviayumonthenodeswhenyouadd
themtotheclusterwhichmakessetupeasier.
5.1.3 SYSTEMCONFIGCLUSTER
Luciseemstobereplacingsystemconfigclusterasthefavouriteprogramtouse,butatthis
stagesystemconfigclusterisacapableGTKGUIapplicationforclusterconfiguration.
Youcanrunitonanyclusternode,onceyousaveyourchangesyoucanthenclickabutton
tosendoutthenewconfigurationtoalltheclusternodes.
5.1.4 LOADBALANCING
RedhatClusterSuiteisfocusedonprovidingHAserversanddoesn'tprovideanyspecial
featuresfordoingloadbalancing.
However,youcansetuploadbalancingbythefollowingmethod:
Page13/33
Setupadeviceonthenetwork(eg:areverseproxyorsessionbalancingapplication)
thatpassesthesessiontraffictooneoftheclusternodes.Thisdevicemaybea
hardwaredeviceorastandalonePCmaybeevenatwonodeclustertoensureHA
oftheloadbalancer!
Setupmultipleserviceintheclustersuiteforthenumberofnodesyouwanttoload
balance.
Configuretheservicestobelongtovariousfailoverdomainsforexample,youmay
notwanttofailoversomeservicesaslongasyourunoneinstanceataminimumso
yousetuponeservicewithafailoverdomainandsetuptherestoftheservicestoonly
runontheonenode.
Page14/33
6CombiningXenwithclusters
Virtualizationtechnologyisbecomingincreasinglypopularduetothereducedcostsandbetter
utilisationofhardwareresources.
LinuxhasvarioussolutionsforvirtualizationonepopularoptionisXenwhichcomesbundled
withanumberofdistributions.Sothenextstep,istocombinetheadvantagesofXen,withthe
advantagesofclustering.
Whilsttherearenumerouswaysthiscanbedone(andthebestsolutionwilldependonwhatyou
need).OnebasicmethodistoconfigureallthephysicalmachinesasXenhostserversandplace
themintoacluster.(Ideally,thehostserversshouldalsobeinthesameXendomaintoallowlive
migrationofVMsfromoneservertoanother.)
Theclustermanagementsoftwarecanthenbeconfiguredtotreatvirtualmachinesasservices
movingthembetweenclusternodesandrestartingdeadVMsonothermachines.
However,thisonlyprovidesbasicfailoverservicesaVMwillonlybemovedtoanotherhostifthe
wholeVMbecomesunavailable,orifthehostnodecrashes.
Formorefinegrainedcontrol,therearetwooptions:
6.1 VMsaspartofmaincluster
Addallthevirtualmachinestotheclusterasnodes,alongsidethephysicalservers.The
servicescanbeconfiguredusingfailoverdomainstoonlyfailovertovirtualmachines.
Thismeansyouonlyhaveonecluster,butitwillintroducealotmorecomplexityintothe
clusterconfigurationandadministration.
6.2 VMsrunsasaseparatecluster.
TheotheroptionistoruntheclustersoftwareontheXenVMsthemselves.Thiscanbe
usefulinthatitallowsyoutoconfiguremultipleclustersontopofthemainclusterwhich
maybeappealingtohostingproviderswhocanoffercustomertheirownprivatetwonode
clusters.
Thismethodisalsousefulforsystemsthatintendtorunlargenumberofservicesontopofa
singleVMandprovidestheabilitytomigrateindividualservicesfromoneVMtoanother.
Thedisadvantageisthatmoreoverheadisintroducedbyrunningtheadditionalclustering
softwareonallthenodesanditmaybecomemoretimeconsumingtomanage.
Page15/33
7StorageManagement
Storagemanagementmayappeartobeaseparatetopic,butitisinfactaveryimportantpartofa
cluster'sdesign.
Foracluster,itisveryimportantthatdataremainsintactandaccessiblebyallthenodes.Onemajor
topicistheuseofaclustercapablefilesystemsuchasGFS.
Itisalsoimportanttochoosethecorrectstoragemediaforthecluster,takingfuturegrowthinto
consideration.Isperformanceorreliabilitymoreimportant?Areallthenodesinthesamepremises,
ordoyourequireadistributedstoragesolutionthatwillworkacrosstheinternet?Doesthedata
needtobereplicatedinrealtimebetweenthenodes?
Thereisavararityofsolutionsavailable(suchasSANs),howeverallsolutionsfitintooneofthe
followingthreecategories:
Centralisedstorage.
Distributedstorage.
Replicatedstorage(thiscansometimesbeafeatureofeitherofthetwocategoriesabove)
7.1Centralisedstorage
Centralisedstorageinvolveshavingoneormoredevicesprovidingstoragetoalltheother
nodes.AtypicalexampleisasinglearrayofdiskssuchasSANorNAS,whichallthenodes
connecttoforstorage.
Centralisedstoragesolutionsareoftenfoundinenterpriseserverinstallations,withmany
mediumtolargeorganisationsusingsomethinglikeaSANfortheirdatastorageneeds.
Centralisedstorageispopularforanumberofreasons:
Morecosteffectivetopurchaseasinglearrayofdisksthanpurchasingdisksforeach
server.
Centrallocationallowsforeasierbackupsandmirroring.
Easytoconfigure,easytomaintainifyouneedtoaddmorestorage,there'sonly
onedevicetoupgrade.
However,thereisacommonproblemwithcentralisedstorageOftentherewillbejusta
singledeviceprovidingthestorage.(oftenduetothecostofpurchasingredundanthardware
beingtoohigh,devicessuchasSANsarenotcheap).
Thisintroducesasinglepointoffailureifahardwarefaultoccursinthedevice,itcould
crippletheentirecluster,sinceallnodesrelyonit.Topreventthis,youeitherneedtobe
preparedforthepossibility(andcostduetodowntime)ofadevicefailureorinvestin
redundanthardware.
Page16/33
Centralisedstoragesystemstypicallyexportthediskspaceasablockdevicewhichappears
onthenodesasalocaldisk,whichthenneedstobepartitionedandformattedsoyoucan
runwhateverfilesystemyouwishontopofit.
ThisdifferersfromnetworkfilesystemslikeNFSwhichappearasamountablefilesystems
andcannotbepartitionedorhaveotherfilesystemsontopofthem.
7.1.1 SANSTORAGEAREANETWORK
ASANisahardwaredeviceconsistingofanumberofharddrivesinRAID.TheSANis
thenattachedtoeachclusternodebyfibrechannel.
Advantages
Highspeedperformance.
Directlyattached,sonoissuesduetonetworkloss,congestion,etc.
Triedandtestedtechnology.
Disadvantages
Expensiveeverynodeneedstohaveafibrechannelcommunicationcardinstalled,
boththeSANandthefibrechannelhardwareisexpensive.
Limitedscalabilitythenumberofclusternodespossiblearelimitedbythenumber
ofinterfacesontheSAN.
Redundancy
Withoutthisredundancyalltheworkandresources
putintodevelopingahighavailabilityclusterwill
bewastedwhentheSANdies.
host2
HotSpare
Replication
Forproperredundancytopreventanoutageinthe host1
eventofahardwarefailure,itisrequiredto
purchasetwoSANswhicharecapableofmirroring
eachother,andhavingfibrechannelcardsinthe
serverscapableoftalkingtobothSANs.
SANorNAS
IdealUse
Suitableforuseinclusterswhereallthenodesarelocatedonthesamephysicalsite,aswell
asbeingsuitableforuseinclustersrequiringmaximumI/Operformance.
Howeverforbudgetconciousorganisations,aNASmaybeabetteroption.
Page17/33
7.1.2 NASNETWORKATTACHEDSTORAGE
ANASisahardwaredeviceconsistingofaRAIDarrayofharddrives(likeaSAN).
However,insteadofusingfibrechannel,itconnectstoastandardethernetnetworkand
supportsprotocolslikeiSCSIorATAoverethernet.
Itstillappearsasablockdevicewhichisconfiguredtoappearasarealdiskontheserver
Advantages
CanbealotcheaperthanaSAN
Nophysicallimittothenumberofnodespossible.(instead,limitedbythenumber
andspeedoftheethernetinterfacesandnetwork)
Commodityserversworkoutofthebox.
Withhighspeednetworkslike10gigethernet,performancecansurpassfibrechannel
SANs.
Disadvantages
Networkoutagecancauseafailurewhichcancauseoutagespreventingaccesstothe
storage.However,theriskofthiscanbereduced,byrunningtheNAStrafficona
separatenetworktothenormalnetwork.(dedicatedhardware,etc.)
SomenetworkstorageprotocolslikeiSCSIintroduceoverheadsforexampleiSCSI
isaTCP/IPprotocol,whichaddsoverheadofTCPwindowing,headers,etc.Other
protocolslikeATAoverEthernetaddlessoverheadandshoulddefinitelybeafactor
toconsiderwhenevaluatingNAS.
Redundancy
Forproperredundancy,asecondNASshouldbepurchasedandmadetomirrortheprimary
NAS.TheycaneitherbesettohaveafloatingIPaddrress,orbothofthemcanhaveiSCSI
exportswhicharethenmultipathedonthenodes(sothenodecanchoosewhichonetouse).
Ifyourequireadditionalredundancyforthenodes,asecondethernetcardcanbeinstalled
intothenodesandbothcardscanbeputintobondedmode.
Inbondedmode,bothinterfacesworktogethertoprovideonevirtualinterfaceifan
interfacefails,theotheronewillcontinuetowork.
Withmanyserverscomingwithdualethernetinterfacesoutofthebox,thereisusuallylittle
needforfurtherhardwareinvestmentforthenodes.
IdealUse
Fromahighlevelview,aSANandaNASisaverysimilardevice,somakeyourchoices
basedonwhatwillgiveyouthebestvalueforyourinvestmentbasedonyourneeds.
AbudgetmindedorganisationmayfindaNASover100mbit/1000mbitethernetprovidesthe
Page18/33
bestresult,whereasothersmayfindtheperformanceofaSANtobebetter.
Alsoconsiderfutureexpandabilityifyoudecidetogrowtheclusterinthefuture,aSAN
willprobablybelessexpandablethanaNAS.It'salsoeasiertoupgradethespeedofan
ethernetnetworkwithafasterswitch,ormoreethernetcardsintheservers.
7.1.3
COMMODITYNETWORKFILESHARE
Anotheroptionforcentralisedstorageistheuseofacommodityserverwithalargenumber
ofharddrivesusingRAID.
Thiscanbeusedintwodifferentways:
1. UsingablocksharingsoftwaresolutionsuchasiSCSI,ATAoEorGNBD,to
effectivelycreateacheapNASusingstandardcomputercomponents.
2. RunninganetworkfilesystemsuchasCIFSorNFS.
Advantages
Cheapandsimple.Youcanusestandardcomputerofftheshelffromalocalstoreto
builtthis.
Disadvantage
Doesnotprovidefullredundancyintheeventoffailureofcomponentssuchas
motherboardorCPU.However,thiscanbeoffsetusingsomethingDRBDwhichis
coveredfurtheron.
UsuallylessperformancethanaSANorNAS,whichistunedtoallowmaximumIO.
iSCSIsoftwaretargetsmightnotbesupportedbyyourLinuxvendorandmaysuffer
performanceissues.
Redundancy
Thistendstobelimitedtohardwareredundancyofserver.Typicallyaserverwillonlyhave
diskredundancy(andethernetviauseofbondedinterfaces),althoughsomemoreexpensive
modelsofferPSUandevenCPUredundancy.
OncesolutionthatallowsgoodredundancyistoruntwoidenticalserverswithDRBDand
replicatethefilesystemsbetweentheserverseffectivelybothserverswillhaveexactlythe
samedataonthem.
Idealuse
AniSCSIsoftwaretargetrunningonaservercanprovideacheapNASemulatorforusein
developmentenvironments.
ThissolutioncouldbeusedanywhereaNASis,howeveritwillrequirecarefultuningand
smarthardwarepurchasestogetoptimalperformance.Oneexamplethatwillmakea
differenceiswhereornotyourethernetcardshaveTCPoffloadengines,whichwill
Page19/33
increaseperformancewhenusingaTCPbasedstorageservicelikeiSCSI.
7.2 Accessmethodsforcentralisedstorage
Whilstcentralisedstoragecansometimesprovidethedataviaanetworkfilesystem,itis
morecommontouseiSCSIoraSAN.
BothiSCSIandSANprovideaccesstothestorageasifitwasalocaldisk.Itisthen
necessarytorunaclustercapablefilesystemontopofthemsuchasGFS.
Noteofinterest:Itispossibletohaveanonclusteredfilesystemonashareddrive,but
seriouslybadissueswouldoccurifyouaccidentallymounteditintwoplacesatonetime.
7.2.1 ACCESSINGSANDIRECTLY
AdecentSANsallowstheadministratortosplittheSANintoanumberoflogicalhard
drives,andthenexportonlythedesireddrivestoeachnode.
ThedrivesappearonthenodejustlikeanylocallyconnectedSCSI/SATAdrive.
7.2.2 ACCESSINGNASWITHISCSI
LikeaSAN,manyNASescanbeconfiguredtosplitthestorageintoanumberoflogical
drives.
BecausetheNASisnotconnectedlocally,itusesaTCP/IPprotocolcallediSCSI.This
meansiSCSIcanberouted,andeventransferredovertheinternet(althoughtheperformance
onthiswouldbeterriblewithoutahighspeed,lowlatencylink).
iSCSIisusedbyattachinganiSCSItarget.OnceconnectedtheiSCSIexportappearsjust
likealocalSCSIharddrive.
Itisimportanttonotethatthenamingofthedrivesmaybechange,thusitisimportantto
useudevtoensurestablenaming.
FurtherinformationabouthowtoidentifyandnameiSCSIdevicesusingudevcanbefound
inthescsi_idmanpage.
7.2.3 ACCESSINGNASWITHOTHERPROTOCOLS
Ifyouareusingsomeotherprotocol,suchasATAoverEthernet,youwillneedtorun
softwareonthenodestomaketheNASsharesappearasblockdevicesontheserver,which
issimilartoiSCSIinconcept.
Page20/33
7.3 DistributedStorage
Distributedstoragetakesanotherapproachtothestoragemechanism,andinsteadofhavinga
centrallocationofdata,thedataisspreadacrossallthenodesinthecluster,oftenincluding
someformofredundancyinordertobeabletocopewiththefailureofanode.
Unfortunately,thisredundancycomesatacostdistributedstoragesolutionsarecomplex,
andhavetobeabletohandleissuessuchasthefailureofnodes,delaysinthenetwork
linkingthemachinesandlockingissues.
Note:YoumayhavecomeacrossDRBD,whichisatwonodeblockreplicationsolution.
Thisiscoveredinthenextsectionofthisdocument,underReplicatedStorage.
7.3.1 AFSANDREWFILESYSTEM
AFSisadistributedfilesystemwhichcachesdatalocallyonmachines.Currentlythereare
twodifferentimplementationsofAFSforLinux:
OpenAFS(IBMPublicLicense)
AFSimplementationinvanillakernel(underdevelopment)
Thecachingprovidesincreasedspeedandlimitedofflineaccessintheeventofnetwork
failures,buttheserversdonotreplicatethemselves(althoughthatcouldbeachievedwith
DRBD).
However,duetothetypesoffilelockingused,itisnotsuitableforlargeshareddatabases,
andcannothandleasinglefilebeingupdatedbymultipleclients.
AFSwasdesignedtorunservicessuchasmailserversusingmaildirwhereeachemailis
storedasanindividualfile.
7.3.2 LUSTRE
Lustreisadistributedfilesystemsuitableatcreatingmassive(manythousandsofnodes)
distributedfilesystems.
Lustreisquiteacomplextechnologytosetupandunfortunatelydoesnotprovideit'sown
datareplicationsystem.Ifdatareplicationisrequired,thenanothertechnologylikeDRBDis
neededtoperformthereplicationbetweenindividualnodes,whichdoeslimitthescalability.
Page21/33
LustreStoragePool
host1
host2
host3
host4
7.3.3 CODA
Codaisaninterestingfilesystemwithfeaturesforallowingofflinedatacachingforclient
computers,aswellasserverreplication.
Unfortunately,Codahasonlyreallybeendeployedinresearchsituationsandisthereforenot
suitableforrunninginaproductionenvironment,butisworthamentionhere.
7.4 ReplicatedStorage
Somedistributedandsomecentralisedstoragesystemshaveinbuiltmethodsfordata
mirroring(eg:twoSANswithhardwaremirroringenabled).However,therearealso
softwaresolutionsthatrunattheblocklevelandwhichcanmirroranyfilesystemontopof
them,themostpopularonebeingDistributedReplicatedBlockDevice(DRBD)
7.4.1 DISTRIBUTEDREPLICATEDBLOCKDEVICE(DRBD)
DRBDisacommonlyusedtoprovidedblockleveldiskreplicationontwonodeclusters,by
mirroringthedisksbetweentheserversensuringtheyhavetheidenticaldataonthem.
Unfortunately,DRBDsuffersfromthelimitationof
onlysupportinguptotwonodes,althoughthereisa
commercialclosedsourceversionreleasedbythe
developersthatallowstheadditionofathirdnode.
ThismakesDRBDveryusefulforcreatingHA
twonodeservers,butnotusefulforcreatingalarge
sharedstorageareaforlargemultinodeclusters.
host2
host1
Blocklevelreplication
sda2
sda2
DRBDisanidealsolutionforatwonodecluster
thatisgeographicallyseparatedsuchasmailor
webservers.
Page22/33
DRBDcanbeconfiguredtoworkinoneoftwoways:
Primary/SecondaryThestoragedevicecanonlybemountedononenode(primary)
atanytime,thetwonodessimplymirrorthestorage.Intheeventoftheprimary
servergoingoffline,thesecondaryservercanbecometheprimary.Thisiscontrolled
bytheclustermanagementsoftware.
Primary/PrimaryInrecentversionsofDRBD,itisnowpossibleforbothnodesto
runasprimarysobothnodescanread/write.Thisrequiresuseofaclustercapable
filesystemsuchasGFStorunontopofDRBD.
Themethodofwritingcanbeconfiguredthedefaultistoonlycountawriteascomplete
oncebothnodeshavebeenwrittento,butotheroptionscanbechoseninordertoimprove
performanceatthecostofreliability.
DRBDiscommonlyusedwiththeLinuxHAclustermanagementsoftware,howeveritis
possibletomakeitworkwithRedhatClusterSuitebypreparingastart/stopscriptforit.
BecauseDRBDonlysupportstwonodes,intheeventofrequiringthreeormorenodes,there
aresomemethodsthatcanbeusedtoworkaroundthislimitation:
1. SetuptwoDRBDnodesthathandleallthestorageandalltheothernodesconnectto
thetwostoragenodeusinganetworkfilesystemlikeNFSorsomeotherprotocollike
ATAoverethernetorGNBD.(effectivelycreatingyourownreplicatedNASdevice)
2. SetupallthenodesinpairseachpairmirrorswithDRBDandthenrunsa
distributedfilesystemsuchasLustreorAFSontopofthem.Thiswillalwayshave
theweaknessthatthefailureofbothnodesinapairwouldcausefailureoftheentire
array,butotherwisethefailureofanyonenodeinanyofthepairswillnotdisruptthe
filesystemservices.
3. RunDRBDontopofDRBDthisisnotarecommendation,itismentionedhere
becausesometimespeopledothis.DON'TDOTHIS.Itintroducesahugenumberof
problemsandlimitationsaswellastheunknownstabilityofrunningDRBDontop
ofDRBDyetagain.
4. ModifytheDRBDcodebaseinordertoaddsupportforadditionalnodes.Theredoes
notappeartobeanyobvioustheoreticalreasonswhythiswouldn'tbepossible,it
shouldjustbeacaseofaddingadditionalnodesandperhapsapplyingmodifications
tothedistributedlockmanagertomakeitsuitableforthreenodes.
Therewouldobviouslybemoreofaperformanceimpactduetoincreasedamountof
overheadforeachnodeadded,howeverastechnologyadvances,thisshouldbecome
lessofaproblem.
Page23/33
7.5 ClusteredFilesystems
WhenusingablocklevelstoragesystemlikeiSCSI,SAN,GNBDorDRBDacluster
capablefilesystemshouldbeused,inordertoallowmultiplenodestoreadandwriteatthe
sametime.
Aclusteredfilesystemdiffersfromaconventionalfilesystembyincludingfeaturestohandle
filelockingandjournallingfrommultiplenodes.
7.5.1 CLVM
ItispossibletorunLVMontopofacentralisedblockstoragedevice,byenablingclustered
lockingintheLVMconfigurationandrunningtheCLVMservicetogetherwithCMANfor
clustering.
OnceclusteredLVMisenabled,itcanbeusedintheexactsamewayasconventionalLVM.
7.5.2 GFSGLOBALFILESYSTEM
GFSisaclusteredfilesystemdevelopedandsupportedbyRedhat,andavailableonRHEL.It
isfullyopensource,andRedhatarecurrentlyworkingongettingitmergedwiththe
mainstreamkernel.
GFShasanumberofpowerfulfeaturesthatmakeitidealforuseinproductionclusters:
Triedandtestedtechnology,fullysupportedforcustomersofRedhat.
Scalesuptohundredsofclusternodes.
Supportsextendedaccesscontrollists.
Supportsuserquotas.
Dynamicsymlinks(knownasContextDependentPathNames)whichallowthe
symlinktopointtodifferentlocationsdependingonvariousvariableofthenode
usingit.Idealfornodedependentconfiguration.
7.5.3 EXT4(UNDERDEVELOPMENT)
Mostpeoplearefamiliarwithext3whichisthedefaultLinuxfilesystemforalmostall
distributions.
In2007developmentstartedonext4,whichwillfixthelimitationsofext3.Oneofthenew
featuresthatisbeingdevelopedwiththisreleaseissupportforclusteredfilesystems.
However,itislikelythatext4willnotbereadyforproductionuseforanumberofyearsand
isonlymentionedhereforthereader'sinterest.
Page24/33
8ClusterExamplesandChallenges
Therearenumerouswaysyoucanconfigureacluster,whichwilldependonyourrequirementsand
budgets.Therearealsosomecomplexrequirementsifyouwishtohavevirtualizationwiththe
clusteraswellasbuildinggeographicallyseparatedclusters.
Thisdocumentdetailsanumberofexamplesofclusterdesignsthatmaybesuitableforyouandto
justgiveyouanunderstandingofwhatispossibleaswellasdiscussionsoftheproblemsand
limitationswitheachdesign.
8.1 TwonodeHAclusterwithDRBD
Acommonhighavailabilityrequirementistomakeaparticularserversurviveanyhardware
failure.Thesolutiontothis,istoaddasecondidenticalserverandsetupaHAcluster
betweenthem.
Tomakeatwonodehighavailabilityclusterwork,wehavethefollowingsetof
requirements:
Datamustbeaccessiblebybothservers,withbothserversbeingabletoread/writeat
thesametime.
Ifanindividualservicedie,itshouldresumeonthesecondaryserver.
Ifoneserversuffersacompletefailure,thesecondaryservershouldresumealltasks.
Solution:
Bothserversidenticalhardware,runningRedhatClusterSuite.
Localrootandswapfilesystems,remainingdiskspaceturnedintoDRBDblock
devicesetupwithbothnodesrunningasprimary.
DRBDdevicerunningGFSwithjournalspacefor2nodesandbothnodesmounted
atthesametime.
EachserviceconfiguredinclustersuitewithafloatingIPaddress.Anyservicethat
failswillrelocatetothesecondnode.
Intheeventofafullserverfailure,thesecondnodewillresumeallservices.
Notes:
DRBDisusedprimary/primarywithGFSsothatbothserverscanberunning
servicesatonce.Thismaybeundesirableorunwantedifyouonlyhaveonefloating
IPaddress,inthisscenarioyouneedtogroupalltheservicesusingthatIPtogether.
Inthatcase,itwouldalsobeokaytorunDRBDasprimary/secondarywitha
traditionalfilesystemlikeext3orxfsandhavealltheservicesconfiguredasasingle
resource,toallfailovertogether.
Page25/33
remotehost
for
quorumping
Intheeventofanynodefailing,Redhat
ClusterSuitewillmovealltheservicesto
thealternatenodeandswitchDRBDfrom
secondarytoprimaryonthenewmaster
node.
internet
Anetworkfailurerunstheriskofasplit
brainsituationifneithernodecansee
theother,theywillbothtryandbecome
master.Tofixthis,runpingtoaremote
servertoprovideatiebreakingthirdvote
(Quorumheuristics).
IfyouhaveaSANorNAS,insteadof
storingthedatalocallyandreplicating
withDRBD,theattachedstoragedevice
couldbeusedinstead.
smart
power
switch
switch
floatingIP
host2
host1
sda
DRBD
sda
TheSANcouldbesetupwithanonclusteredoraclusteredfilesystem,thedifference
beingthataclusteredfilesystemisrequiredifyourequirebothserverstobeableto
runservicesatonce
IfusingaSAN,theclusterisscalabletomorethantwonodes,buttheSANcould
becomeasinglepointoffailurefortheclusterandarealotmoreexpensivethan
softwaresolutionslikeDRBD.
SuitableEnvironments:
MakinganymissioncriticalserverHA.
Anybusinessororganisationthatcannottoleratehardwaredowntimeoftheir
productionsever.
Ecommerceserversthatneedtoprovidemail/websites/databases.
Smallhostingorganisations.(largeronesshouldusedesignsliketheonebelow)
8.2 FivenodeHAclusterwithDRBD
OneoftheproblemswithDRBDisthatitonlyworksfortwonodeclusters.Itispossibleto
addathirdnodeifthecommercialversionispurchased,butnoDRBDsolutionexistswhich
canworkmorethanthreenodesatmax.
Whenbuildingclusters,itismoreeconomicaltohaveasinglemultinodeclusterratherthan
manytwonodeclusters,asonlyonecomputerneedstobesetasideforspareresources.
Page26/33
SomeinstallationsuseSANswhichlimitstheclustersizebythenumberofinterfacesonthe
SAN.However,SANsareveryexpensiveandrequirespecialhardware.
Acheapersolution,istobuildtwocomputerswithplentyofstorageinthemusingoffthe
shelfpartsandthentouseDRBDtocreatewhatiseffectivelyaHANAS.Thesetwostorage
nodesmirroreachotherandcantransparentlytolerateeitheroneofthetwonodesfailing.
Thesetwostoragenodescanthenexporttheavailablestorageusinganetworkfilesystem
likeNFSorablocklevelservicelikeGNBD,whichtherestofthenodescanuse.
Dependingonyourapplicationstheremaybenoneedtohavelocaldisksinanyofthe
serversandtheycanallrundirectlyoffthenetwork.
HereisanexampleforafivenodeclusterusingDRBDforstorage,providingarangeof
servicessuchasHTTPandMAIL.
Solution:
TwonodeswithRAID5harddrivestorageineachnodestoragenodes.
Threenodeswithnodisksproductionnodes
StoragenodearesetupinPrimary/Secondarymode,withLVMandext3ontopofthe
DRBDlayerwithNFSexportsofthedata.
StoragenodesprovideuserauthenticationviaNIS/LDAP/Kerberos.
StoragenodesprovidepxelinuxandDHCPfornetworkbooting.
Quroumvotesaresetupinsuchawaythatfailureofbothstoragenodeswillcausea
clusterfailureresultinginallservicesstopping.
Productionnodesbootofftheactivestoragenodeusingnetbootandmounttheroot
filesystemusingNFS.Alltheproductionnodesrunthesamesoftwarebuild.
Servicesarespreadacrossthethreeproductionnodesifanynodefails,theservices
areresumedonanotherone.
Notes:
TheabovedesigncanalsobeusedwithasmalltwointerfaceSAN.TheSANcanbe
connectedtobothstoragenodesinsteadofusinglocaldisksandthedatathen
exportedviaNFS.
Toincreaseredundancy,twoSANscouldbeused,withoneconnectedtoeachstorage
serverandmirroringdoneeitherbetweentheSANsthemselvesorusingDRBDon
thestoragenodes.However,standardharddriveswillusuallybecheaperandthus
willprobablybeabettersolution.
Idealforhostingprovidersinparticular,sharedwebhostingandemail.
Idealforlargecompaniestoincreaseserveravailabilityandtocentralisestorage.
Page27/33
8.3 FivenodeHAclusterwithDRBD+Xen
ThefivenodeDRBDclusterdesignabovecanbeextendedtobecomeaHAXencluster.The
threeproductionnodescanbeconfiguredasXenhosts,withtheXenguestsbeingbooted
fromthenetworkandusingNFSforstorage(justlikethehoststhemselves).
ClusterSuiteisrunningonthehostnode,andintheeventofaXenVMfailingortheentire
hostitselfsufferingafailure,ClusterSuitecanstarteachXenVMonanalternatehost,with
eachXenVMbeingaclusternodeandrunningasingleormultiplenumberofservices.
TheXenhostscanbeconfiguredtobepartofthesamedomain,whichalsoallowslive
migrationofXenVMssoifonehostserverisbeingheavilyloaded,someVMscanbe
movedtoanotherhostwhilsttheyarestillrunningwithnodowntimeatall.
Notes:
Optionally,insteadofhavingtheXenVMsbelongtothesameclusterasthehosts,
theXenVMscouldbesetupintheirownclusterwitheachXenhostrunningtheir
owncluster,leavingthehostclustertoonlydealwiththeXenVMasawhole.See
theXensectionofthisdocumentfordetailsonadvantages/disadvantageswiththis
method.
Idealforhostingproviders
Idealforlargecompaniestoincreaseserverutilisationandavailability.
IdealforITcompaniesthatneedlargenumberofserversforvariousapplicationand
developmentneeds.
8.4 Geographicallydistributedclusters
Alltheclusterdesignsshowhavebeenforuseinonephysicallocationtheclusternodes
areallsittinginarack,connectedviaethernet,andisabletosupportfloatingIPaddresses
becausethereisonlyonerouteintothecluster.
However,acommondesireistohavegeographicallydistributedclusterstopreventfailureof
asinglesitetakingthewholeclusteroffline.
Typicalusesforthismightbe:
Acompanywithofficesintwocitieswouldliketohaveoneserverateachoffice
withthedatamirroredbetweenthem.
Anecommercewebsitewantingreplicatedemail,websiteanddatabaseservices
betweentwositestoensureavailability.
However,therearesomebigissueswithageographicallyseparatedclusterthatneedtobe
Page28/33
solved:
Internetconnectionsareslowdataneedstobemirroredatbothsitesinawaythatis
bandwidthfriendlyandtransferreddatawhenchangesaremadeneedstobeminimal.
Internetconnectionsfail/haveoutagesfairlyfrequently.Anysolutionmustbeableto
handlethiswithoutsplitbrainissues.
DRBDisanidealcandidatefordistributedclusters,butisonlyabletoscaletotwo
nodes(orthreeusingthecommercialversion).Thiscausesacomplicationfor
organisationswithmorethanthreeofficesthattheywantmirroredserversin.
Youcan'tfloatanIPaddressesaroundacountryunlessyou'reanISP.Buteventhey
can'tfloatanIPaddresstoaserverontheothersideoftheworld.
Fencingismorecomplicatedasecondary(independent)managementnetworkis
usefulinordertobeabletocommunicatetofencingdevicesinotherlocations.
Withoutasecondaryconnection,acrashedservercannotbefencediftheproduction
networkgoesdown,althougharunningserverwillstoptheclusteringifitloses
quorumsosplitbrainisnotanissue.
Howcanwesolvethis?Thereareanumberofsolutions:
Useadistributed,locallycachingfilesystemlikeAFS,whichwillcachecommonly
accesseddata.
HavetwomainofficesrunningDRBDandallotherofficesmustconnectusinga
networkfilesystem,perhapsassistedbyaproxy/cachingdeviceforimproving
performanceatthesites.
DevicessuchasWANacceleratorscanbeusedtomakeoptimaluseofnetwork
performanceandsomemodelshaveinternalharddrivesthatcancachesomedata
likeHTTPorSMBtraffic.
FloatDNSnamesratherthanIPaddresses.Whenanodegoesdownwhichis
providingpublicservices(eg:websites)havethenewnodeprovidingtheservice
connecttotheDNSserverandchangetheArecords.
Theproblemwiththismethod,istheDNSchangescantakesometimeto
synchroniseacrossthewebthiscanbereducedbysettingyourDNSTimeToLive
(TTL)toalowvalueeg:5minsbutitmaynotbehonouredbyallDNScaching
servers.(althoughasageneralruleitis)
Inaddition,itispossibletosetupRoundRobinDNSservers,whichcanallowyouto
Page29/33
loadbalancebetweenyourgeographicallyseparatedserversthisisgoodfor
servicessuchasHTTPorreadonlydatabaserequests.
8.4.1 TWONODEDISTRIBUTEDCLUSTER
Asdiscussedabove,atwonodedistributedclusterisabletouseDRBDdatareplication
whichsolvesthemajorchallengeofmirroringthedata.
Thistwonodesetupisanidealsolutionfororganisationswhichwanttheredundancyofan
offsitemirrorsuitableforprovidingbothredundancyandloadbalancing.Itisalsosuitable
fororganisationswhichhavetwoofficesandwantsidenticalserversatbothlocations.
Thisexampleassumesthefollowingenvironment:
Twogeographicallyseparatedservers,identicalhardware.
Networkbetweenbothserverscontrolledbyanoutsideparty(thereforefloatingIP
addressesarenotpossible)
Noplanstoscalebeyondtwonodes.
Requirements:
Haveaserveratbothlocations.
Bothserversneedtoofferfilesharingtothelocalnetworksateachofficewith
Samba.
Mail,DBandHTTPservicesrequiredtobeHA.
Solution:
ServerssetupwithDRBDinprimary/primarysetup(bothserverscanread/write).
GFSfilesystemtoallowbothnodessimultaneousaccesstothedata.
Usepingtoaremoteservertodeterminetiebreakerthirdvote.(Quorumheuristics)
FloatingDNSnamesforservices.
Clustersuitecapableofmigratingindividualservices.
MAIL,DBandHTTPserviceswillonlyrunononeserveratanytime(otherwise
interestinglockfileissueswouldoccur).
Intheeventofeitherserversufferingafailure,theserviceswillberelocatedtothe
alternateserverandtheDNSrecordswillbeadjustedtoredirecttraffic.
Sambarunsonbothserversprovidingaccessforthelocalnetwork.Intheeventofa
serverfailure,theDNSnameforthelocalnetworkserverwillbechangedtobethe
alternateserver.
Page30/33
Internet
remotehost
for
quorumping
www.example.com
(floatingArecord)
host2
host1
DRBDacrossVPN
sda
smartpwrsw
sda
GSMoutofband
management
network
smartpwrsw
8.4.2 THREE+NODEDISTRIBUTEDCLUSTER
Thetwonodedistributedclusterdetailedabovewillworkfinewithtwonodes,butwhat
aboutwhenthreeormorenodesarerequired?
Wewantdatareplicationbetweentheservers,butDRBDwillonlyworkwithtwonodesin
primary/primarymodes.Thisgivesustwooptions:
1. RunDRBDbetweenthetwomainservers,andtheotherserverscanconnecttoone
ofthetwoserversviaanetworkfilesystemlikeNFS.
Thisisquitesimple,buthastheobviousflawofdiskI/Obeinglimitedtothespeed
oftheWANconnection,aswellasthetransferscausingdatacapusageonnon
flatrateconnections.
Dependingonyourusageandrequirements,thismayormaynotbeaproblem.
2. Runadistributedfilesystemwithreplicationsupportonallthenodes.
Unfortunatelytherearecurrentlynodistributedfilesystemsavailablewhichcan
providebothreplicationanddistributeddata.
Therefore,thenextbestsolutionisausingadistributedfilesystem,withDRBD
underneathittoprovidetheredundancy.
Page31/33
ThisiscoveredinmoredetailsintheDRBDfilesystemsectionearlierinthis
document,butbasicallyyoudividetheclusternodesintopairs.
EachpairrunsDRBDprimary/primarytoensuredatareplication.Ontopofthat,you
runadistributedfilesystemsuchasLustreofAFS.However,thiscanbequite
complexandhastheweaknessofbeinglimitedbyhavingreplicationonlyon2
nodes.
Three+distributedclusterscanbequitecomplexandrequirealotofplanningtomakesure
theywillworkreliablyandhavespeedyaccesstostorage.Thebestsolutionwilldepend
greatlyupontheapplicationsyouneedtorun.
Page32/33
9FurtherReference
Thefollowingresourcesaregoodfurtherreadingforinformationonsettingupclustersolutions:
ClusterManagement:
Redhatclusteringguides(includesinfoonGFS)
http://www.redhat.com/docs/manuals/csgfs/indexmaster.html
LinuxHAdocumentation
http://www.linuxha.org/
Storage:
DRBD(blocklayerreplication)
http://www.drbd.org/documentation.html
AFS(distributedfilesystem)
http://en.wikipedia.org/wiki/Andrew_file_system
Lustre(distributedfilesystem)
http://www.lustre.org
Training/Courses:
Additionally,ifyouareanRHCE,Redhat'sRH436trainingisagoodcoursethatteachesyouhowto
configureclustersandsharedstorageonRHELwithRedhatClusterSuite.
https://www.redhat.com/courses/rh436_red_hat_enterprise_clustering_and_storage_management/
Page33/33

Introduction Linux Clustering 1.1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction Linux Clustering 1.1

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO LINUX CLUSTERING

You might also like