Professional Documents
Culture Documents
Microsoft SQL Server AlwaysOn Solutions Guide For High Availability and Disaster Recovery
Microsoft SQL Server AlwaysOn Solutions Guide For High Availability and Disaster Recovery
oft SQ
QL Se
erverr Alw
waysO
On
Solution
ns Gu
uide for High
H
Ava
ailability and
a Disasster Reco
overy
y
LeRoyy Tuttle,, Jr.
Contrib
butors: Liindsey Allen, Justin Erickson, Min He, Cephas
C
Lin, Sanjay
Mishra
wers: Kevin Farlee, Shahryar
S
G Hashem
G.
mi (Motriccity), Allan
n Hirt
Review
(SQLHA
A), Alexei Khalyako,, Wolfgang Kutsche
era (Bwin Party), Ch
harles
Matthe
ews, AyadS
Shammou
ut (Caregrroup), Davvid P. Smitth (Service
eU), Juerg
gen
Thomas, Benjam
min Wrightt-Jones
Contents
HighAvailabilityandDisasterRecoveryConcepts.........................................................................1
DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1
DegradedAvailability..............................................................................................................................................................2
QuantifyingDowntime.........................................................................................................................................................2
RecoveryObjectives................................................................................................................................................................3
JustifyingROIorOpportunityCost..........................................................................................................................................3
MonitoringAvailabilityHealth................................................................................................................................................4
PlanningforDisasterRecovery...............................................................................................................................................4
Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5
SQLServerAlwaysOn..............................................................................................................................................................5
SignificantlyReducePlannedDowntime.................................................................................................................................5
EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6
EasyDeploymentandManagement.......................................................................................................................................6
ContrastingRPOandRTOCapabilities....................................................................................................................................6
SQLServerAlwaysOnLayersofProtection..........................................................................................7
InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8
WindowsServerFailoverClustering.......................................................................................................................................9
WSFCClusterValidationWizard...........................................................................................................................................11
WSFCQuorumModesandVotingConfiguration..................................................................................................................12
WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15
SQLServerInstanceLevelProtection...........................................................................................................................17
AvailabilityImprovementsSQLServerInstances...............................................................................................................17
AlwaysOnFailoverClusterInstances.....................................................................................................................................18
DatabaseAvailability..........................................................................................................................................................21
AlwaysOnAvailabilityGroups...............................................................................................................................................21
AvailabilityGroupFailover....................................................................................................................................................22
AvailabilityGroupListener....................................................................................................................................................24
AvailabilityImprovementsDatabases................................................................................................................................26
ClientConnectivityRecommendations........................................................................................................................27
Conclusion..............................................................................................................................................................................28
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
iv
HighAvailabilityandDisasterRecoveryConcepts
Youcanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecovery
solutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,
andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.
ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywith
MicrosoftSQLServer2012sectionofthispaper.
DescribingHighAvailability
Foragivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsofthe
endusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemay
beexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,
contractualdamages,orthelossofgoodwill.
Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.A
soundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)with
technicalcapabilitiesandinfrastructurecosts.
Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersand
stakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:
100%
Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolution
provides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesof
downtime.
Numberof9s
2
3
4
5
AvailabilityPercentage
99%
99.9%
99.99%
99.999%
TotalAnnualDowntime
3days,15hours
8hours,45minutes
52minutes,34seconds
5minutes,15seconds
Plannedvs.UnplannedDowntime
Systemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplanned
failure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokey
typesofforeseeabledowntime:
Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance
taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,data
loading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperational
proceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplanned
outagescenarios.
Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedor
uncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orare
consideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesof
failures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.
WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformance
indicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyou
tocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanned
downtime.
DegradedAvailability
Highavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoa
completeoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohave
limitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:
Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster
recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybe
temporarilyhaltedorqueued.
Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,ora
partialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.User
experiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.
Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthat
retriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduser
asdatalatencyorpoorapplicationresponsiveness.
Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayers
ofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferent
functionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthe
featuresorcomponentsthatareaffected.
Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegraded
availabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.
QuantifyingDowntime
Whendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthe
systembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.With
unplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutage
occurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostop
investigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthe
systembackonline,andifneeded,reestablishfaulttolerance.
RecoveryObjectives
Dataredundancyisakeycomponentofahighavailabilitydatabasesolution.Transactionalactivityon
yourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondary
instances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybe
lostonthesecondaryinstancesduetodelaysindatapropagation.
Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackin
business,andhowmuchtimelatencythereisinthelasttransactionrecovered:
RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe
systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,
theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.
RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itis
thetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthe
mostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependinguponthe
workloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhigh
availabilitysolutionused.
YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeand
acceptabledataloss,andasmetricsformonitoringavailabilityhealth.
JustifyingROIorOpportunityCost
Thebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecosts
mayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditionto
projectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcan
alsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPO
goalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:
Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe
firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,
distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventive
maintenance.
Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeon
thecustomerexperiencethroughautomaticandtransparentrecovery.
Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocan
beleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributing
workloadsacrossallavailablehardware.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththe
projectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactual
outage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.
MonitoringAvailabilityHealth
Fromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderall
relevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordata
latencyonyourstandbyinstancesasaproxyforexpectedRPO.
Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduring
theoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupon
detailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.
PlanningforDisasterRecovery
Whilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddress
whatisdonetoreestablishhighavailabilityaftertheoutage.
Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforean
actualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedor
manualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescope
ofasounddisasterrecoveryplanshouldinclude:
Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake
correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,or
workload.
Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,and
diagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.
Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethe
systemandbusinessdependencies?
Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesrole
responsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecovery
steps.
Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthe
systemhasreturnedtonormaloperations?
Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailand
claritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistype
ofdocumentationiscommonlyreferredasarunbookoracookbook.
Recoveryrehearsals.Regularlyexercisethedisasterrecoveryplantoestablishbaselineexpectations
forRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimary
andeachofthedisasterrecoverysites.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
Overview:HighAvailabilitywithMicrosoftSQLServer2012
AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplications
andprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetof
featuresandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.
ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothe
deepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.
SQLServerAlwaysOn
AlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.It
canprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplication
failovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibility
inconfigurationandenablesreuseofexistinghardwareinvestments.
AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityat
boththedatabaseandtheinstancelevel:
AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase
mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodataloss
throughlogbaseddatamovementfordataprotectionwithoutshareddisks.
Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofa
logicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,and
automaticpagerepair.
AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureand
supportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServer
instances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfaster
applicationrecovery.
SignificantlyReducePlannedDowntime
Thekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperating
systempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentofthe
outagesinanITenvironment.
SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsand
enablingmoreonlinemaintenanceoperations:
WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,
streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.This
operatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystem
patchingrequirementsbyasmuchas60percent.
OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumns
withdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingof
instances,whichhelpssignificantlytoreduceapplicationdowntime.
SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivethe
additionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhosts
withzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithout
impactingapplications.
EliminateIdleHardwareandImproveCostEfficiencyandPerformance
Typicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOn
AvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleservers
forreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.The
abilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimprove
performanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardware
investments.
EasyDeploymentandManagement
FeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandline
interface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystem
Centerintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.
ContrastingRPOandRTOCapabilities
ThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekey
driversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.
Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:
HighAvailabilityandDisasterRecovery
SQLServerSolution
AlwaysOnAvailabilityGroupsynchronouscommit
AlwaysOnAvailabilityGroupasynchronouscommit
AlwaysOnFailoverClusterInstance
DatabaseMirroring(2)Highsafety(sync+witness)
DatabaseMirroring(2)Highperformance(async)
LogShipping
Backup,Copy,Restore(3)
Potential
DataLoss
(RPO)
Potential
Recovery
Time(RTO)
Automatic
Failover
Readable
Secondaries(1)
Zero
Seconds
Yes(4)
02
Seconds
Minutes
No
04
NA(5)
Yes
NA
Zero
Seconds
tominutes
Seconds
Yes
NA
Seconds(6)
Minutes(6)
No
NA
Minutes(6)
Minutes
tohours(6)
Hours
todays(6)
No
Notduring
arestore
Notduring
arestore
Hours(6)
No
(1)
AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.
(2)
ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.
(3)
Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.
(4)
Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.
(5)
TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.
(6)
Highlydependentupontheworkload,datavolume,andfailoverprocedures.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
SQLServerAlwaysOnLayersofProtection
SQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogical
andphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommon
practicetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,
suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.
Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,and
toofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.
AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:
Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages
WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.
SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer
instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.Thenodesthat
hosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).
Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailability
groupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.Eachreplicaishostedbyan
instanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.
Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstance
networkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailability
grouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,
logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.
ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
InfrastructureAvailability
BothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindows
ServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successful
MicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.
WindowsOperatingSystem
SQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfor
networking,storage,security,patching,andmonitoring.
ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesand
capacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer
2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,and
WindowsServer2008R2Datacenteroperatingsystem.
Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer
2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).
WindowsServerCoreInstallationOption
Asakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallation
optioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimal
environmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplication
support.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.
Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcan
significantlyreduceongoingmaintenance,servicing,andpatchingrequirements.
AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,
configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbe
doneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineor
remotetools.
OptimizingSQLServerforPrivateCloud
HighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloud
environment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkand
storageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperational
expenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresources
ondemandwithoutcompromisingcontrol.
InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQL
ServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithno
discernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.
Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivate
Cloud(http://www.microsoft.com/SqlServerPrivateCloud).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
WindowsServerFailoverClustering
WindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehigh
availabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.
IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbe
automaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.With
AlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.
ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:
Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais
maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusin
additiontohostedapplicationsettings.Changestothemetadataorstatusononenodeare
automaticallypropagatedtotheothernodesinthecluster.
Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchas
directattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hosted
applications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigure
startupandhealthdependenciesuponotherresources.
Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthrougha
combinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverall
healthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.
Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbe
automaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailover
policycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhosted
applicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.
Formoreinformation,seeWindowsServer|FailoverClusteringandNode
Balancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).
Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFC
clustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecovery
stepsareallintrinsicallytiedtoyourWSFCconfiguration.
WSFCStorageConfigurations
WindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnected
storagedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremely
robust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeis
consideredtobeatfault.
Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusinga
SCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifa
nodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfiguration
combinations,including:
Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey
arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).Remotestorage
technologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,as
wellasServerMessagingBlock(SMB)filesharebasedsolutions.
Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldisk
volumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysical
implementationandcapacityoftheunderlyingdiskvolumescanvary.
Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthe
cluster.Sharedstorageisaccessibletomultiplenodesinthecluster.Controlandownershipof
compliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3
protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfile
sharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoashared
volume.
Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenode
ownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannow
deploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,
localorremotestorage.
WSFCResourceHealthDetectionandFailover
EachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.A
varietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemory
errors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.
YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentuponone
another.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththe
healthofeachofitsresourcedependencies.
ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregistered
asWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQL
ServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentuponthe
instancesvirtualnetworknameresource.
IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,the
configuredfailoverpolicycausestheclusterservicetodooneofthefollowing:
Restarttheresourceonthecurrentnode.
Settheresourceoffline.
Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
10
Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthorthe
overallhealthofthecluster.
WSFCClusterValidationWizard
TheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer
2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensure
thataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.
Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofservers
thatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlying
hardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFC
clusterwouldbesupportedonagivenconfiguration.
Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:
Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating
systemversions,devices,services,drivers,andsoon.
Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall
configuration.ValidatesinternodecommunicationsonallNICs.
Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI
commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.
Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory
dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,
andservicepackandWindowsSoftwareUpdatelevels.
Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,
tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.
YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.
YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyou
installSQLServer,andasapartofanydisasterrecoveryprocess.Aclustervalidationreportisrequired
byMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFC
clusterconfiguration.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster
(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).
Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeo
clusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedto
applyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidation
steps.
Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailability
Groups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
11
WSFCQuorumModesandVotingConfiguration
WSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfault
tolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisvery
importanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisaster
recoverysolution.
ClusterHealthDetectionbyQuorum
EachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode's
healthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.
AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.Theoverallhealth
andstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeans
thattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.
Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbe
maintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailover
to.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.Thisalso
causesallSQLServerinstancesregisteredwiththeclustertobestopped.
Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobring
itbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsection
laterinthispaper.
QuorumModes
AquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorum
voting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodes
inthecluster.
Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:
NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe
clustertobehealthy.
NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile
shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalso
countedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshould
bevisibletoallnodesinthecluster.
NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskcluster
resourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskis
alsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
12
DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodeto
thatshareddiskiscountedasanaffirmativevote.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuorumina
Cluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).
Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,
youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,
ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.
VotingandNonVotingNodes
Bydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,file
sharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.Thequorum
discussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteon
clusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.
EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeinthe
clustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygiven
moment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,or
appeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunication
failure.Akeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodein
theWSFCclusterisindeedthatactualstateofthosenodes.
ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliable
communicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenall
nodesareonthesamephysicalsubnet.
However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactually
onlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetween
subnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,that
networkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.
Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasa
splitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,and
inconflictwithoneanother.
Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforced
quorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthe
quorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorum
sectionlaterinthispaper.
Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodes
NodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
13
RecommendedAdjustmentstoQuorumVoting
Todeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,in
sequentialorder:
1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.
2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris
thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.
3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as
theresultofanautomaticfailover,shouldhaveavote.
4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary
disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisionto
taketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQL
Serverinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossible
tiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfiguration
thatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeight
Settings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).
Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodeto
includeorexcludeitsvote.
Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyou
administersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.
Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff878305(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
14
WSFCDisasterRecoverythroughForcedQuorum
Quorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolving
severalnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQL
Serverinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausethecluster
cannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFC
clusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemay
havejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilityto
communicatewithaquorum.
TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleast
onenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureor
identifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFC
clustertoreflectthesurvivingclustertopologyaswell.
YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthat
tooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andlets
youbringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.
Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:
1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare
nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthen
examinetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshould
preserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,
manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotential
dataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.
Formoreinformation,seeForceaWSFCClustertoStartWithouta
Quorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).
Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFC
clusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeof
operation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothaveto
specifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.
AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodesto
synchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetoprevent
potentialraceconditionsinresolvingthelastknownstateofthecluster.
Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,or
youruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyour
findingsinstep1areaccurate,thisshouldnotoccur.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
15
4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesinthe
clusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorum
failure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.
Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,
andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFC
clusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.
Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestored
backtoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverCluster
Manager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriate
DMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.Somedatabasesmayrecoverandcome
backonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofother
databasesmayrequireadditionalmanualsteps.
Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringing
thembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,
asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfrom
theinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjust
relatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroup
replicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.
Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncate
thetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailed
replicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyou
willruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhigh
availabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,and
Windowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPoint
andRecoveryTimeexperiences.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
16
SQLServerInstanceLevelProtection
ThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilities
andfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServer
infrastructurecomponents.
AvailabilityImprovementsSQLServerInstances
ThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOn
FailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.
Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:
FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure
detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityof
afailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactsthe
SQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServer
internalcomponenterror.
Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:server
down,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.The
FailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.
PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;any
servicelevelfailurecausedfailover.
Formoreinformation,seeFailoverPolicyforFailoverClusterInstances
(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).
Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystem
configurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthat
capturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOn
deployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.
Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions
(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes
(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).
SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefile
shareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedrive
letterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageona
physicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/O
performancecanverynearlyapproximatethatofdirectattachedstorage.
Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe
scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabaseson
filesharesitstimetoreconsiderthescenario.aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
17
Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServer
resourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthe
filesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.
WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygroup
listenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIP
addresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtual
networkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIP
addressinavaryingroundrobinsequence.
AlwaysOnFailoverClusterInstances
TheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailability
ofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.
AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailover
Clustering(WSFC)cluster,butonlyactiveononenodeatatime.Clientapplicationsconnecttoavirtual
networknameandvirtualIPaddressthatareownedbytheactiveclusternode.
EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCcluster
servicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeach
installednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstance
anditsresources,withinapreferredfailoversequence.
Databasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththe
WSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.
Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/en
us/library/ms189134(SQL.110).aspx).
FCIFailoverProcess
Ifadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFC
clusterserviceusingthishighlevelprocesstodoafailover:
1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration
indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeis
initiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.
3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer
serviceisattempted.
4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand
itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednode
owneroftheFCI.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
18
5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartup
procedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputs
theresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymode
whiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovements
PreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeature
enhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:
Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone
subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetwork
interfaceisavailable;thisisknownasanORclusterresourcedependency.
PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServer
servicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.
Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnet
clustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicate
dataandcoordinatestoragefailoverbetweenclusternodes.
Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverCluster
Instance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012
alwayson_3a00_multisitefailoverclusterinstance.aspx).
Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnection
toeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystem
storedprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnostic
information.
PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasa
simpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanew
SQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailureto
connect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailable
diagnosticinformation.
Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/en
us/library/ff878233(SQL.110).aspx).
ThereisnowbroadersupportforFCIstoragescenarios:
Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The
specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresource
dependencyduringsetup.
tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,such
asalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
19
PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstorage
volumethatfailedoverwithothersystemdatabases.
Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduring
failover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotential
nodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
20
DatabaseAvailability
ThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponents
worktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetof
optionsforexplicitlyprotectingdatabasedataanddatatierapplications.
AlwaysOnAvailabilityGroups
AnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstanceto
anotherwithinthesameWSFCcluster.Clientapplicationscanconnecttotheavailabilitygroups
databasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstracts
theunderlyingSQLServerinstances.
AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,
failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServer
instancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,andit
doesnotrequiretheuseofsymmetricalsharedstorage.
Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff877884(SQL.110).aspx).
AvailabilityReplicasandRoles
EachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyofthe
userdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafrom
agivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQL
Serverinstancemusthavededicated(nonshared)storagevolumes.
Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyof
theavailabilitygroupdatabasesandisenabledforread/writeoperations.
Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateach
separatelyserveintheroleofasecondaryreplica.
AvailabilityReplicaSynchronization
Thecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeach
ofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,all
databasesintheavailabilitygroupmustbesettothefullrecoverymodel.
Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesand
transactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportion
ofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroring
endpointoneachofthesecondaryreplicanodes.
Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthe
secondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequence
number(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionlog
hasbeenhardenedandflushedtotheremotedisk.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
21
Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenot
partoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocess
onthesecondaryreplicasasdatalatency.
Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailability
mode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRAN
statement:
Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall
synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheir
respectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2
synchronouscommitsecondaryreplicas.
Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butit
ensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.
Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocal
transactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondary
replicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommit
secondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.
Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbut
allowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.
Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/en
us/library/ff877931(SQL.110).aspx).
Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronization
stateofeachreplica.Youwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawith
asynchronizationstateofanythingotherthanSynchronizedorSynchronizing.
Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondary
replicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itis
temporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoes
notimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicais
healthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommit
modeoperations.
AvailabilityGroupFailover
Theavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesinthe
WSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealth
andfailoverpolicyoftheprimaryreplica.
AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseverity
tolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththe
sp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
22
Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanother
node,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeover
theroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredto
thatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.
Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplica
hasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.Thisreplicahealth
informationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_states
systemview.
Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhen
failoverisindicated.
Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn
configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedand
synchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicarole
istransferredtoasecondaryreplicawithoutanyuserintervention.
Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbeset
tosynchronouscommitavailabilitymode.Thesynchronizationstatebetweenthereplicasmustbe
Synchronized.Additionally,theWSFCclustermusthaveahealthyquorum.
AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.Thisis
blockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.
Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakea
decisiontodeliberatelyfailovertoasecondaryreplicaornot.
Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:
o
Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth
theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionally
equivalenttoanautomaticfailover.
Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatis
possibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitis
notsynchronizedwiththeprimaryreplica.
Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimary
replicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicas
tosynchronouscommitandthenperformaplannedmanualfailover.
Formoreinformation,seePerformaForcedManualFailoverofanAvailability
Group(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
23
Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimary
replicaorthesecondaryreplicathatyouwanttofailoverto:
Failovermodeissettomanual.
Availabilitymodeissettoasynchronouscommit.
ReplicaresidesonanFCI.
Formoreinformation,seeFailoverModes(AlwaysOnAvailability
Groups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).
Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,the
secondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondary
replicasuntiltheprimaryreplicaissettosynchronouscommitmode.
AvailabilityGroupListener
AnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessa
databaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceon
whichtheprimaryreplicaresides.
ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduring
configurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerare
registeredwithDNSunderthesamevirtualnetworkname.
Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworkname
astheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultina
connectiontotheSQLServerinstancethatishostingtheprimaryreplica.
Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmapto
thevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitis
successful,oruntilitreachestheconnectiontimeout.Theclientwillattempttomaketheseconnections
inparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.
Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygroup
listenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointis
boundtothenewinstancesvirtualIPaddressesandTCPports.
Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/en
us/library/hh213417(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
24
ApplicationIntentFiltering
Whileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentis
tobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,
thedefaultapplicationintentfortheclientisreadwrite.
Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnection
accesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,
invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServer
shouldfilteroutclientconnectionrequestsusingthefollowingrules.
Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.
Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:
Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery.
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.
Formoreinformation,seeConfigureConnectionAccessonanAvailability
Replica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).
ApplicationIntentReadOnlyRouting
AkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandby
hardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyour
secondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimary
replicas.
Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,
databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,
operationalsupport,andadhocqueries.
Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServer
instanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedto
redirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailable
secondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.
Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisbound
totheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.
Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQL
Server)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
25
AvailabilityImprovementsDatabases
SQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationand
capabilities.
Thefollowingimprovementreducesrecoverytime:
PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused
tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccurs
periodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofa
restartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeach
checkpoint,andincreasingrecoverytime(RTO)predictability.
PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,
irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.
Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/en
us/library/ms189573(SQL.110).aspx).
Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:
OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),
varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.
OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha
defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystem
metadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.
SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorre
indexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.
Thereisanexampleofbroadersupportforstoragescenarios:
AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit
unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesof
errorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfroma
differentavailabilityreplica.
SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhanced
tosupportmultiplereplicas.
Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/en
us/library/bb677167(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
26
ClientConnectivityRecommendations
FollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012
AlwaysOntechnologies:
AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)
protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOn
features.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,
andtheSQLNativeClient11.0.
Connectionproviderproperty:MultiSubnetFailover=True.Usethiskeywordinyourconnection
stringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatare
registeredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.
Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonly
workloadsfromyourprimaryreplicaontothesecondaryreplicas.
Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallel
connectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachof
themsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.
Youshouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotential
sequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15
seconds+21secondsforeverysecondaryreplica.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
27
Conclusion
Thiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanned
downtime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012
AlwaysOnhighavailabilityanddisasterrecoverysolutions.
Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailable
databaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecovery
TimeObjectives(RTO).
SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevel
thatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,ina
mannerthatcanbewelljustifiedusingRPOandRTOgoals.
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery
28