Professional Documents
Culture Documents
3WaystoReplicate
Databases&WhichOneyou
ShouldPrefer
Introduction
One of the most significant issues that most organizations face today is maintaining
High Data Availability andAccessibility acrosstheirinfrastructure.Asaresult,today,
organizations have a growing need to scale up their systems and ensure easy data
accessatalltimes.DataReplicationisonesuchtechniquethatallowsyoutostorethe
same data at multiple locations in order to increase Data Availability, Accessibility,
Reliability,andSystemResilience.
Data Replication is commonly used for DisasterRecovery, ensuringthatanaccurate
backup is available at all times, especially in the event of a hardware failure or a
system breach. Having a replica can also speed up data access and optimize your
server performance. Thus, there is an increasing need forDataReplicationinmodern
organizations.
There are numerous ways in which users can replicate aDatabase.Here,wewilllook
intothe3industrystandardstoreplicateDatabases:
1)Full-TableReplication
Full-Table Replication is a method of replicating entire data,includingNew,Updated,
and Existing data, from the source to the destination. All records in the table will be
chosen for extraction, regardless of whether they are new or recently updated. Small
Databases like Website CDN (Content Delivery Network) utilize this method of
Replication. Let’s lookatthescenarioswhereyoucanuseFull-TableReplicationalong
withitslimitations:
WhentouseFull-TableReplication?
Full-TableReplicationcouldbeasuitablefitifyouhavethefollowingrequirements:
WaystoReplicateDatabases 02
➔ Therecordsarepermanentlyremovedfromthesource.
➔ AcolumnforK
ey-basedIncrementalReplication doesnotexistintheDatabase.
➔ ThesourcedoesnotsupportLog-BasedReplication.
➔ For Database integrations backed by MongoDB, the _id field only contains one
datatype.
Limitations
Full-Table Replication provides users with multiple benefits that help boost
Performance and enhance Data Availability.However,italsoposesseverallimitations
thatyoushouldbeawareofbeforeimplementingitasyourReplicationMethod.Below
arethelimitationsassociatedwiththisReplicationMethod:
➔ Latency Issues: When utilizing Full-Table Replication on large tables, you can
onlyextractdataassoonasitisreturned.ThisimpliesthatifaDatabaseorSaaS
(Software-as-a-Service)platformreturnsdataslowly,ReplicationLatencywould
likelyr ise.
➔ Increased Row Consumption: Full-Table Replication replicates the entire table
duringeachReplicationJob,regardlessofwhetherindividualrecordshavebeen
updatedornot.Thisleadstoincreased rowconsumption.
➔ HighBandwidth: Full-TableReplicationtakesupalargeamountofbandwidthin
terms of computing power, resources, etc. as the entire table is replicated in
eachReplicationoperation.
2)Log-BasedReplication
Log-Based Replication is the type of Replication that is solely applicable to Data
Sources. This method replicates data using information from the Database Log File,
which records the Database changes.ItisbestsuitedwhentheDatabasestructureis
relativelys
tatic.TherearetwomethodstoachieveLog-BasedDataReplication:
WaystoReplicateDatabases 03
➔ Statement-Based Replication: Statement-Based Replication tracks and stores
all commands, queries, and actions that modify the Database. It generates
replicas by re-running all of these statements in the same sequence in which
theyappeared.
➔ Row-BasedReplication: Row-BasedReplicationkeepstrackofallnewDatabase
rows and saves them in a Log file. Procedures using a Row-Based Replication
mechanism perform Replication by iterating over each Log Message in the
sequenceinwhichtheywerereceived.
WhentouseLog-BasedReplication?
Log-BasedReplicationcouldbeasuitablefitifyouhavethefollowingrequirements:
➔ TheDatabasesthatsupportLog-basedIncrementalReplicationincludeAmazon
DynamoDB,MicrosoftSQLServer,MongoDB,MySQL,Oracle,a
ndP
ostgreSQL.
➔ Thetableholdsthedataandnottheview.
➔ Thetable'sstructureisinfrequentlyaltered.
➔ Only supported event types like DELETE, INSERT, and UPDATE are used to make
changesintherecords.
Limitations
Log-based Replication brings an immense amount of benefits and helps you ensure
accurate and seamless data availability.However,thisposessomeuse-casespecific
limitations that you must keepinmindbeforechoosingitasyourReplicationMethod.
BelowarethelimitationsassociatedwiththisReplicationMethod:
➔ Limited Database Support: Only specific Databases like Amazon DynamoDB,
Microsoft SQL Server, MongoDB, MySQL, Oracle, and PostgreSQL support
Log-BasedReplication.
WaystoReplicateDatabases 04
➔ Limited DatabaseEventTypes:Log-BasedReplicationsupportsonlytheDELETE,
INSERT and UPDATE event types. This means that other event types will not be
writtentotheLog.
➔ Manual InterventionforStructuralChanges: You'llneedtoreset thetablefrom
the Table Settingspageeachtimethestructureofasourcetablechanges.This
will trigger a Full Re-Replication of the table, ensuring that any structural
changesareproperlyrecorded.
3)CDC(ChangeDataCapture)
Change Data Capture (CDC) is a method for capturing Database changes and
ensuring that those changes are replicated to the destination. CDC reduces the
amount of resources required for the ETL (Extract, Transform, and Load) process by
leveragingtheBinaryLog(binlog)ofaSourceDatabaseorbyusingtriggerfunctionsto
ingestonlythedatathathaschangedsincethepreviousETLoperation.
Furthermore, CDC Replication can detect Metadata changes such as Schema
Migrations (column name changes, table attribute changes, etc.) and accurately
update the Target Database to reflect the schema changes. There are 3 common
methodsforCDC:
➔ Log-BasedCDC
➔ Trigger-BasedCDC
➔ CustomCDCScript
A)Log-BasedCDC
In Log-Based CDC, a Transaction Log keeps track of every change usersmaketothe
data.ALog-BasedCDCsystemanalyseschangesinLogsandsendstheminreal-time
WaystoReplicateDatabases 05
to the target Data Warehouse/Destination. Some of the advantages and
disadvantagesofLog-BasedCDCinclude:
Advantages
➔ Data Integrity: Transactions are either fully executed or not at all due to the
atomicity property. Moreover, as all changes are recorded in Transaction Logs,
replicatingthemensuresthatnochangeswillbemissed.
➔ Minimum Latency: Transactions are recorded in the Transaction Logs either
immediately or before they are committed. CDC can replicate in real-time or
near-real-timebycontinuouslymonitoringandreplicatingtherecords.
Disadvantages
B)Trigger-BasedCDC
Triggers are functions that are used to capture changes in response to events. For
example, you maysetupatabletriggertoreplicatedatatoadestinationtablewhen
"AFTERINSERT"or"A
FTERUPDATE"commandsareexecutedonthesourcetable.Some
oftheadvantagesanddisadvantagesofTrigger-BasedCDCinclude:
Advantages
WaystoReplicateDatabases 06
➔ Minimal Latency: Trigger-based CDC generates replicas almost at the same
time as the original data is created. The reason for this is that the same
transactioniscommittedtob
oth theOriginalandReplicaDatabases.
Disadvantages
➔ Overhead Issue: Each table requires its own set of triggers.Thus,ifyourSource
Database is large, the implementation and management overhead might be
costly.
➔ Reliability Issue: Some triggersareneveractivated.Forexample,onSQLServer,
youcan'tputupatriggerforT
RUNCATE operations.
C)CustomCDCScript
AdevelopercanbuildaCDCsolutionthatdependsonakeyfieldorfields toindicate
thatdatainarowhaschanged.StoredProceduresareusedincustomCDCscriptsto
initiate the queries that recognize specified changes. Some of the advantages and
disadvantagesoftheCustomCDCscriptinclude:
Advantages
➔ High Levels of Specificity: A developer can specify several change-tracking
columns, for example, one for the timestamp of the last update, another to
indicateifthechangecontainsaspecificfeature,andlikewise,manyotherscan
be easily specified. This kind of customizability can be easily done in Custom
CDCReplication.
Disadvantages
➔ ReliabilityIssue: Ifthescriptisnotmanuallyupdated,schemachangeswillnot
automaticallyreflect intheReplication,whichmightcausetheprocesstofail.
WaystoReplicateDatabases 07
➔ PerformanceIssue: StoredProceduresarefrequentlydesignedasTransactions.
Executing them incurs additional expenses for the Database, thus,reducingthe
Databaseperformance.
Needmoreinformation?
ForanyfurtherqueriesontheDatabaseReplication,youcanreachouttousanytime!
Further, if you want to take Hevo for a spin or gain more insights about Hevo andits
offerings, you can check out our website. You will also be able to leverage our
Intercom-poweredlivechatservicebackedbyourexceptional24/7supportteamthat
willhelpclarifyanyqueriesanddoubtsthatcrossyourmind.
WaystoReplicateDatabases 08