You are on page 1of 33

Miccroso

oft SQ
QL Se
erverr Alw
waysO
On
Solution
ns Gu
uide for High
H
Ava
ailability and
a Disasster Reco
overy
y
LeRoyy Tuttle,, Jr.
Contrib
butors: Liindsey Allen, Justin Erickson, Min He, Cephas
C
Lin, Sanjay
Mishra
wers: Kevin Farlee, Shahryar
S
G Hashem
G.
mi (Motriccity), Allan
n Hirt
Review
(SQLHA
A), Alexei Khalyako,, Wolfgang Kutsche
era (Bwin Party), Ch
harles
Matthe
ews, AyadS
Shammou
ut (Caregrroup), Davvid P. Smitth (Service
eU), Juerg
gen
Thomas, Benjam
min Wrightt-Jones

ow to reducce planned and unplan


nned downttime,
Summary: This white paper discusses ho
maximizze applicatio
on availability, and pro
ovide data protection
p
using SQL Server
S
2012
2
AlwaysO
On high availability and
d disaster re
ecovery solutions.
A key go
oal of this paper
p
is to establish
e
a common co
ontext for related
r
discussions
between
n business stakeholder
s
rs, technical decision makers,
m
systtem archite
ects,
infrastru
ucture engin
neers, and database
d
ad
dministrato
ors.
Categorry: Quick Guide
Applies to: SQL Se
erver 2012
Source: White paper (link to source
s
content)
on date: Ma
ay 2012
E-book publicatio
32 pages

This page intentionally left blank

Copyright 2012 by Microsoft Corporation


All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means
without the written permission of the publisher.

Microsoft and the trademarks listed at


http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the
Microsoft group of companies. All other marks are property of their respective owners.
The example companies, organizations, products, domain names, email addresses, logos, people, places, and events
depicted herein are fictitious. No association with any real company, organization, product, domain name, email address,
logo, person, place, or event is intended or should be inferred.
This book expresses the authors views and opinions. The information contained in this book is provided without any
express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will
be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

Contents
HighAvailabilityandDisasterRecoveryConcepts.........................................................................1
DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1
DegradedAvailability..............................................................................................................................................................2

QuantifyingDowntime.........................................................................................................................................................2
RecoveryObjectives................................................................................................................................................................3
JustifyingROIorOpportunityCost..........................................................................................................................................3
MonitoringAvailabilityHealth................................................................................................................................................4
PlanningforDisasterRecovery...............................................................................................................................................4

Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5
SQLServerAlwaysOn..............................................................................................................................................................5
SignificantlyReducePlannedDowntime.................................................................................................................................5
EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6
EasyDeploymentandManagement.......................................................................................................................................6
ContrastingRPOandRTOCapabilities....................................................................................................................................6

SQLServerAlwaysOnLayersofProtection..........................................................................................7
InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8
WindowsServerFailoverClustering.......................................................................................................................................9
WSFCClusterValidationWizard...........................................................................................................................................11
WSFCQuorumModesandVotingConfiguration..................................................................................................................12
WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15

SQLServerInstanceLevelProtection...........................................................................................................................17
AvailabilityImprovementsSQLServerInstances...............................................................................................................17
AlwaysOnFailoverClusterInstances.....................................................................................................................................18

DatabaseAvailability..........................................................................................................................................................21
AlwaysOnAvailabilityGroups...............................................................................................................................................21
AvailabilityGroupFailover....................................................................................................................................................22
AvailabilityGroupListener....................................................................................................................................................24
AvailabilityImprovementsDatabases................................................................................................................................26

ClientConnectivityRecommendations........................................................................................................................27
Conclusion..............................................................................................................................................................................28

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

iv

HighAvailabilityandDisasterRecoveryConcepts
Youcanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecovery
solutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,
andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.
ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywith
MicrosoftSQLServer2012sectionofthispaper.

DescribingHighAvailability
Foragivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsofthe
endusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemay
beexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,
contractualdamages,orthelossofgoodwill.
Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.A
soundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)with
technicalcapabilitiesandinfrastructurecosts.
Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersand
stakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:

100%

Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolution
provides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesof
downtime.
Numberof9s
2
3
4
5

AvailabilityPercentage
99%
99.9%
99.99%
99.999%

TotalAnnualDowntime
3days,15hours
8hours,45minutes
52minutes,34seconds
5minutes,15seconds

Plannedvs.UnplannedDowntime
Systemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplanned
failure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokey
typesofforeseeabledowntime:

Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance
taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,data
loading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperational
proceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery


canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplanned
outagescenarios.

Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedor
uncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orare
consideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesof
failures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.

WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformance
indicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyou
tocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanned
downtime.
DegradedAvailability
Highavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoa
completeoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohave
limitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:

Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster
recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybe
temporarilyhaltedorqueued.

Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,ora
partialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.User
experiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.

Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthat
retriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduser
asdatalatencyorpoorapplicationresponsiveness.

Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayers
ofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferent
functionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthe
featuresorcomponentsthatareaffected.

Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegraded
availabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.

QuantifyingDowntime
Whendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthe
systembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.With
unplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutage
occurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery


Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostop
investigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthe
systembackonline,andifneeded,reestablishfaulttolerance.
RecoveryObjectives
Dataredundancyisakeycomponentofahighavailabilitydatabasesolution.Transactionalactivityon
yourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondary
instances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybe
lostonthesecondaryinstancesduetodelaysindatapropagation.
Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackin
business,andhowmuchtimelatencythereisinthelasttransactionrecovered:

RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe
systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,
theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.

RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itis
thetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthe
mostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependinguponthe
workloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhigh
availabilitysolutionused.

YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeand
acceptabledataloss,andasmetricsformonitoringavailabilityhealth.
JustifyingROIorOpportunityCost
Thebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecosts
mayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditionto
projectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcan
alsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPO
goalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:

Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe
firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,
distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventive
maintenance.

Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeon
thecustomerexperiencethroughautomaticandtransparentrecovery.

Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocan
beleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributing
workloadsacrossallavailablehardware.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery


ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththe
projectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactual
outage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.
MonitoringAvailabilityHealth
Fromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderall
relevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordata
latencyonyourstandbyinstancesasaproxyforexpectedRPO.
Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduring
theoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupon
detailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.
PlanningforDisasterRecovery
Whilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddress
whatisdonetoreestablishhighavailabilityaftertheoutage.
Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforean
actualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedor
manualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescope
ofasounddisasterrecoveryplanshouldinclude:

Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake
correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,or
workload.

Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,and
diagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.

Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethe
systemandbusinessdependencies?

Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesrole
responsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecovery
steps.

Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthe
systemhasreturnedtonormaloperations?

Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailand
claritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistype
ofdocumentationiscommonlyreferredasarunbookoracookbook.

Recoveryrehearsals.Regularlyexercisethedisasterrecoveryplantoestablishbaselineexpectations
forRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimary
andeachofthedisasterrecoverysites.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

Overview:HighAvailabilitywithMicrosoftSQLServer2012
AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplications
andprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetof
featuresandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.
ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothe
deepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.
SQLServerAlwaysOn
AlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.It
canprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplication
failovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibility
inconfigurationandenablesreuseofexistinghardwareinvestments.
AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityat
boththedatabaseandtheinstancelevel:

AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase
mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodataloss
throughlogbaseddatamovementfordataprotectionwithoutshareddisks.
Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofa
logicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,and
automaticpagerepair.

AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureand
supportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServer
instances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfaster
applicationrecovery.

SignificantlyReducePlannedDowntime
Thekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperating
systempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentofthe
outagesinanITenvironment.
SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsand
enablingmoreonlinemaintenanceoperations:

WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,
streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.This
operatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystem
patchingrequirementsbyasmuchas60percent.

OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumns
withdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingof
instances,whichhelpssignificantlytoreduceapplicationdowntime.

SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivethe
additionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhosts
withzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithout
impactingapplications.

EliminateIdleHardwareandImproveCostEfficiencyandPerformance
Typicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOn
AvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleservers
forreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.The
abilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimprove
performanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardware
investments.
EasyDeploymentandManagement
FeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandline
interface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystem
Centerintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.
ContrastingRPOandRTOCapabilities
ThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekey
driversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.
Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:

HighAvailabilityandDisasterRecovery
SQLServerSolution
AlwaysOnAvailabilityGroupsynchronouscommit

AlwaysOnAvailabilityGroupasynchronouscommit

AlwaysOnFailoverClusterInstance
DatabaseMirroring(2)Highsafety(sync+witness)

DatabaseMirroring(2)Highperformance(async)

LogShipping
Backup,Copy,Restore(3)

Potential
DataLoss
(RPO)

Potential
Recovery
Time(RTO)

Automatic
Failover

Readable
Secondaries(1)

Zero

Seconds

Yes(4)

02

Seconds

Minutes

No

04

NA(5)

Yes

NA

Zero

Seconds
tominutes
Seconds

Yes

NA

Seconds(6)

Minutes(6)

No

NA

Minutes(6)

Minutes
tohours(6)
Hours
todays(6)

No

Notduring
arestore
Notduring
arestore

Hours(6)

No

(1)

AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.

(2)

ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.

(3)

Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.

(4)

Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.

(5)

TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.

(6)

Highlydependentupontheworkload,datavolume,andfailoverprocedures.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

SQLServerAlwaysOnLayersofProtection
SQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogical
andphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommon
practicetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,
suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.
Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,and
toofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.
AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:

Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages
WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.

SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer
instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.Thenodesthat
hosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).

Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailability
groupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.Eachreplicaishostedbyan
instanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.

Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstance
networkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailability
grouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,
logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.

ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

InfrastructureAvailability
BothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindows
ServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successful
MicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.
WindowsOperatingSystem
SQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfor
networking,storage,security,patching,andmonitoring.
ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesand
capacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer
2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,and
WindowsServer2008R2Datacenteroperatingsystem.
Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer
2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).
WindowsServerCoreInstallationOption
Asakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallation
optioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimal
environmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplication
support.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.
Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcan
significantlyreduceongoingmaintenance,servicing,andpatchingrequirements.
AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,
configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbe
doneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineor
remotetools.
OptimizingSQLServerforPrivateCloud
HighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloud
environment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkand
storageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperational
expenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresources
ondemandwithoutcompromisingcontrol.
InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQL
ServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithno
discernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.
Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivate
Cloud(http://www.microsoft.com/SqlServerPrivateCloud).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery


WindowsServerFailoverClustering
WindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehigh
availabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.
IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbe
automaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.With
AlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.
ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:

Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais
maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusin
additiontohostedapplicationsettings.Changestothemetadataorstatusononenodeare
automaticallypropagatedtotheothernodesinthecluster.

Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchas
directattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hosted
applications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigure
startupandhealthdependenciesuponotherresources.

Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthrougha
combinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverall
healthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.

Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbe
automaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailover
policycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhosted
applicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.

Formoreinformation,seeWindowsServer|FailoverClusteringandNode
Balancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).
Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFC
clustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecovery
stepsareallintrinsicallytiedtoyourWSFCconfiguration.
WSFCStorageConfigurations
WindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnected
storagedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremely
robust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeis
consideredtobeatfault.
Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusinga
SCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifa
nodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery


SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfiguration
combinations,including:

Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey
arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).Remotestorage
technologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,as
wellasServerMessagingBlock(SMB)filesharebasedsolutions.

Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldisk
volumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysical
implementationandcapacityoftheunderlyingdiskvolumescanvary.

Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthe
cluster.Sharedstorageisaccessibletomultiplenodesinthecluster.Controlandownershipof
compliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3
protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfile
sharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoashared
volume.

Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenode
ownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannow
deploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,
localorremotestorage.
WSFCResourceHealthDetectionandFailover
EachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.A
varietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemory
errors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.
YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentuponone
another.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththe
healthofeachofitsresourcedependencies.
ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregistered
asWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQL
ServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentuponthe
instancesvirtualnetworknameresource.
IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,the
configuredfailoverpolicycausestheclusterservicetodooneofthefollowing:

Restarttheresourceonthecurrentnode.
Settheresourceoffline.
Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

10


Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthorthe
overallhealthofthecluster.
WSFCClusterValidationWizard
TheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer
2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensure
thataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.
Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofservers
thatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlying
hardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFC
clusterwouldbesupportedonagivenconfiguration.
Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:

Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating
systemversions,devices,services,drivers,andsoon.

Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall
configuration.ValidatesinternodecommunicationsonallNICs.

Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI
commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.

Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory
dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,
andservicepackandWindowsSoftwareUpdatelevels.

Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,
tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.
YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.
YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyou
installSQLServer,andasapartofanydisasterrecoveryprocess.Aclustervalidationreportisrequired
byMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFC
clusterconfiguration.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster
(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).
Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeo
clusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedto
applyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidation
steps.
Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailability
Groups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

11


WSFCQuorumModesandVotingConfiguration
WSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfault
tolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisvery
importanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisaster
recoverysolution.
ClusterHealthDetectionbyQuorum
EachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode's
healthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.
AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.Theoverallhealth
andstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeans
thattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.
Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbe
maintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailover
to.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.Thisalso
causesallSQLServerinstancesregisteredwiththeclustertobestopped.
Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobring
itbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsection
laterinthispaper.
QuorumModes
AquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorum
voting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodes
inthecluster.
Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:

NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe
clustertobehealthy.

NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile
shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalso
countedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshould
bevisibletoallnodesinthecluster.

NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskcluster
resourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskis
alsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

12

DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodeto
thatshareddiskiscountedasanaffirmativevote.

Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuorumina
Cluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).
Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,
youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,
ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.
VotingandNonVotingNodes
Bydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,file
sharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.Thequorum
discussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteon
clusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.
EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeinthe
clustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygiven
moment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,or
appeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunication
failure.Akeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodein
theWSFCclusterisindeedthatactualstateofthosenodes.
ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliable
communicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenall
nodesareonthesamephysicalsubnet.
However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactually
onlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetween
subnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,that
networkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.
Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasa
splitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,and
inconflictwithoneanother.
Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforced
quorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthe
quorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorum
sectionlaterinthispaper.
Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodes
NodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

13


RecommendedAdjustmentstoQuorumVoting
Todeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,in
sequentialorder:
1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.
2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris
thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.
3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as
theresultofanautomaticfailover,shouldhaveavote.
4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary
disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisionto
taketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQL
Serverinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossible
tiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfiguration
thatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeight
Settings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).
Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodeto
includeorexcludeitsvote.
Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyou
administersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.
Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff878305(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

14


WSFCDisasterRecoverythroughForcedQuorum
Quorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolving
severalnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQL
Serverinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausethecluster
cannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFC
clusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemay
havejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilityto
communicatewithaquorum.
TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleast
onenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureor
identifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFC
clustertoreflectthesurvivingclustertopologyaswell.
YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthat
tooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andlets
youbringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.
Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:
1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare
nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthen
examinetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshould
preserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,
manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotential
dataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.
Formoreinformation,seeForceaWSFCClustertoStartWithouta
Quorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).
Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFC
clusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeof
operation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothaveto
specifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.
AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodesto
synchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetoprevent
potentialraceconditionsinresolvingthelastknownstateofthecluster.
Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,or
youruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyour
findingsinstep1areaccurate,thisshouldnotoccur.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

15


4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesinthe
clusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorum
failure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.
Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,
andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFC
clusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.
Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestored
backtoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverCluster
Manager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriate
DMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.Somedatabasesmayrecoverandcome
backonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofother
databasesmayrequireadditionalmanualsteps.
Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringing
thembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,
asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfrom
theinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjust
relatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroup
replicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.
Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncate
thetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailed
replicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyou
willruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhigh
availabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,and
Windowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPoint
andRecoveryTimeexperiences.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

16

SQLServerInstanceLevelProtection
ThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilities
andfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServer
infrastructurecomponents.
AvailabilityImprovementsSQLServerInstances
ThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOn
FailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.
Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:

FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure
detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityof
afailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactsthe
SQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServer
internalcomponenterror.
Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:server
down,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.The
FailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.
PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;any
servicelevelfailurecausedfailover.
Formoreinformation,seeFailoverPolicyforFailoverClusterInstances
(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).

Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystem
configurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthat
capturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOn
deployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.
Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions
(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes
(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).

SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefile
shareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedrive
letterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageona
physicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/O
performancecanverynearlyapproximatethatofdirectattachedstorage.
Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe
scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabaseson
filesharesitstimetoreconsiderthescenario.aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

17


Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServer
resourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthe
filesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.

WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygroup
listenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIP
addresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtual
networkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIP
addressinavaryingroundrobinsequence.

AlwaysOnFailoverClusterInstances
TheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailability
ofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.
AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailover
Clustering(WSFC)cluster,butonlyactiveononenodeatatime.Clientapplicationsconnecttoavirtual
networknameandvirtualIPaddressthatareownedbytheactiveclusternode.
EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCcluster
servicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeach
installednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstance
anditsresources,withinapreferredfailoversequence.
Databasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththe
WSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.
Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/en
us/library/ms189134(SQL.110).aspx).
FCIFailoverProcess
Ifadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFC
clusterserviceusingthishighlevelprocesstodoafailover:
1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration
indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeis
initiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.
3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer
serviceisattempted.
4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand
itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednode
owneroftheFCI.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

18


5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartup
procedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputs
theresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymode
whiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovements
PreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeature
enhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:

Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone
subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetwork
interfaceisavailable;thisisknownasanORclusterresourcedependency.
PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServer
servicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.
Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnet
clustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicate
dataandcoordinatestoragefailoverbetweenclusternodes.
Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverCluster
Instance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012
alwayson_3a00_multisitefailoverclusterinstance.aspx).

Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnection
toeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystem
storedprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnostic
information.
PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasa
simpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanew
SQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailureto
connect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailable
diagnosticinformation.
Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/en
us/library/ff878233(SQL.110).aspx).

ThereisnowbroadersupportforFCIstoragescenarios:

Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The
specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresource
dependencyduringsetup.

tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,such
asalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

19


PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstorage
volumethatfailedoverwithothersystemdatabases.
Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduring
failover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotential
nodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

20

DatabaseAvailability
ThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponents
worktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetof
optionsforexplicitlyprotectingdatabasedataanddatatierapplications.
AlwaysOnAvailabilityGroups
AnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstanceto
anotherwithinthesameWSFCcluster.Clientapplicationscanconnecttotheavailabilitygroups
databasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstracts
theunderlyingSQLServerinstances.
AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,
failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServer
instancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,andit
doesnotrequiretheuseofsymmetricalsharedstorage.
Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff877884(SQL.110).aspx).
AvailabilityReplicasandRoles
EachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyofthe
userdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafrom
agivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQL
Serverinstancemusthavededicated(nonshared)storagevolumes.
Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyof
theavailabilitygroupdatabasesandisenabledforread/writeoperations.
Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateach
separatelyserveintheroleofasecondaryreplica.
AvailabilityReplicaSynchronization
Thecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeach
ofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,all
databasesintheavailabilitygroupmustbesettothefullrecoverymodel.
Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesand
transactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportion
ofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroring
endpointoneachofthesecondaryreplicanodes.
Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthe
secondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequence
number(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionlog
hasbeenhardenedandflushedtotheremotedisk.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

21


Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenot
partoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocess
onthesecondaryreplicasasdatalatency.
Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailability
mode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRAN
statement:

Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall
synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheir
respectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2
synchronouscommitsecondaryreplicas.
Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butit
ensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.

Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocal
transactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondary
replicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommit
secondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.
Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbut
allowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.

Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/en
us/library/ff877931(SQL.110).aspx).
Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronization
stateofeachreplica.Youwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawith
asynchronizationstateofanythingotherthanSynchronizedorSynchronizing.
Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondary
replicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itis
temporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoes
notimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicais
healthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommit
modeoperations.
AvailabilityGroupFailover
Theavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesinthe
WSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealth
andfailoverpolicyoftheprimaryreplica.
AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseverity
tolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththe
sp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

22


Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanother
node,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeover
theroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredto
thatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.
Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplica
hasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.Thisreplicahealth
informationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_states
systemview.
Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhen
failoverisindicated.

Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn
configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedand
synchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicarole
istransferredtoasecondaryreplicawithoutanyuserintervention.
Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbeset
tosynchronouscommitavailabilitymode.Thesynchronizationstatebetweenthereplicasmustbe
Synchronized.Additionally,theWSFCclustermusthaveahealthyquorum.
AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.Thisis
blockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.

Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakea
decisiontodeliberatelyfailovertoasecondaryreplicaornot.
Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:
o

Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth
theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionally
equivalenttoanautomaticfailover.

Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatis
possibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitis
notsynchronizedwiththeprimaryreplica.
Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimary
replicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicas
tosynchronouscommitandthenperformaplannedmanualfailover.
Formoreinformation,seePerformaForcedManualFailoverofanAvailability
Group(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

23


Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimary
replicaorthesecondaryreplicathatyouwanttofailoverto:

Failovermodeissettomanual.
Availabilitymodeissettoasynchronouscommit.
ReplicaresidesonanFCI.

Formoreinformation,seeFailoverModes(AlwaysOnAvailability
Groups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).
Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,the
secondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondary
replicasuntiltheprimaryreplicaissettosynchronouscommitmode.
AvailabilityGroupListener
AnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessa
databaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceon
whichtheprimaryreplicaresides.
ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduring
configurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerare
registeredwithDNSunderthesamevirtualnetworkname.
Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworkname
astheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultina
connectiontotheSQLServerinstancethatishostingtheprimaryreplica.
Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmapto
thevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitis
successful,oruntilitreachestheconnectiontimeout.Theclientwillattempttomaketheseconnections
inparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.
Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygroup
listenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointis
boundtothenewinstancesvirtualIPaddressesandTCPports.
Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/en
us/library/hh213417(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

24


ApplicationIntentFiltering
Whileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentis
tobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,
thedefaultapplicationintentfortheclientisreadwrite.
Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnection
accesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,
invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServer
shouldfilteroutclientconnectionrequestsusingthefollowingrules.
Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:

Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.

Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:

Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery.
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.

Formoreinformation,seeConfigureConnectionAccessonanAvailability
Replica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).
ApplicationIntentReadOnlyRouting
AkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandby
hardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyour
secondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimary
replicas.
Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,
databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,
operationalsupport,andadhocqueries.
Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServer
instanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedto
redirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailable
secondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.
Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisbound
totheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.
Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQL
Server)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

25


AvailabilityImprovementsDatabases
SQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationand
capabilities.
Thefollowingimprovementreducesrecoverytime:

PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused
tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccurs
periodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofa
restartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeach
checkpoint,andincreasingrecoverytime(RTO)predictability.
PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,
irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.
Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/en
us/library/ms189573(SQL.110).aspx).

Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:

OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),
varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.

OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha
defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystem
metadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.
SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorre
indexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.

Thereisanexampleofbroadersupportforstoragescenarios:

AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit
unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesof
errorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfroma
differentavailabilityreplica.
SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhanced
tosupportmultiplereplicas.
Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/en
us/library/bb677167(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

26

ClientConnectivityRecommendations
FollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012
AlwaysOntechnologies:

AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)
protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOn
features.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,
andtheSQLNativeClient11.0.

Connectionproviderproperty:MultiSubnetFailover=True.Usethiskeywordinyourconnection
stringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatare
registeredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.

Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonly
workloadsfromyourprimaryreplicaontothesecondaryreplicas.

Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallel
connectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachof
themsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.
Youshouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotential
sequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15
seconds+21secondsforeverysecondaryreplica.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

27

Conclusion
Thiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanned
downtime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012
AlwaysOnhighavailabilityanddisasterrecoverysolutions.
Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailable
databaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecovery
TimeObjectives(RTO).
SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevel
thatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,ina
mannerthatcanbewelljustifiedusingRPOandRTOgoals.

For more information:


http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:

Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?

This feedback will help us improve the quality of white papers we release.
Send feedback.

Version 1.1, 21 February 2012.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery

28

You might also like