Professional Documents
Culture Documents
Distributed architecture
with a Multi-Master approach
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3 of 41
After first tests we decided to
throw away the old Master-Slave
architecture because it was
against the OrientDB philosophy:
doesn't scale
and
Multi-Master* *http://en.wikipedia.org/wiki/Multi-master_replication
Say wow!
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7 of 41
...but
conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8 of 41
Fortunately we found some
smart ways to resolve conflicts without
falling in a
Blood Bath
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9 of 41
The actors
Only 1 per Leader per cluster, checks other nodes and
Leader Node notify changes to other Peer Nodes. Can be any server
node in the cluster, usually the first to start
Any server node in the cluster. Has a permanent
Peer Node connection to the Leader Node
Clients are connected to Server Nodes no matter if Leader
Client
or Peer
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10 of 41
How the cluster
of nodes is
composed
and
managed?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11 of 41
Cluster auto-discovering
At start up each Server Node sends a IP Multicast message in broadcast to
discover if any Leader Node is available to join the cluster. If available, the
Leader Node will connect to it and it becomes a Peer Node, otherwise it becomes
the Leader node.
Server #1 DB
DB
DB
DB DB
DB
(Leader)
Server #2
(Peer)
DB
DB DB
DB
DB
DB
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12 of 41
One Leader Multiple Peers
The first node to start is always the Leader but in case of failure can be elected
any other. Leader Node polls all the servers verifying the status and alerts all the
Peer Nodes at every changes in the cluster composition.
Server #1 DB
DB
DB
DB DB
DB
(Leader)
Server #2 Server #3
(Peer) (Peer)
DB
DB DB
DB DB
DB DB
DB
DB
DB DB
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13 of 41
Asymmetric clustering
Each database can be clustered in multiple server nodes. Databases can be moved
across servers. Replication strategy has per database/server granularity.
This means you could have Server #2 that replicates database B in asynch way
to the Server #3 and database A in synch way to the Server #1.
A
Server #1
(Leader)
C
Server #2 Server #3
(Peer) (Peer)
A B C B
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14 of 41
Distributed configuration
Cluster configuration is broadcasted from the Leader Node to all the Peer Nodes.
Peer Nodes broadcast to all the connected clients.
Everybody knows who has the database
Client #1 Server #1
(Leader) Client #3
Server #2 Server #3
(Peer) (Peer)
Client #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15 of 41
Security
To join a cluster the Server Node has to configure the cluster name and password
Broadcast messages are encrypted using the password
Password doesn't cross the network: it's stored in the configuration file
Server #1
(Leader)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16 of 41
Leader election
Each Peer Node continuously checks the connection with the Leader Node
If lost try to elect itself as a new Leader Node
Split Network resolved using a simple algorithm
Server #1 Server #2
192.168.0.10:2424 192.168.10.27:2424
(Leader) (Leader)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17 of 41
Multiple clusters
Multiple separate clusters can coexist in the same network
Clusters can't see each others. Are separated boxes
What identify a cluster is name + password
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18 of 41
Fail-over
Clients knows about other nodes, so transparently switch
to good servers. No error is sent to the client app.
Running transactions will be repeated transparently too (v1.2)
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19 of 41
How the replication works?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20 of 41
Synchronous Replication
Guarantees two databases are always consistent
More expensive than asynchronous because the First Server
waits for the Second Server's answer before to send back
the ACK to the client. After ACK the Client is secure
the data is placed in multiple nodes at the same time
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21 of 41
Synchronous Replication
steps
Client #1
6) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 5) Sends back OK to Server #1 4) update record to DB-2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22 of 41
Asynchronous Replication
Changes are propagated without waiting for the answer
Two databases could be not consistent in the range of few ms
For this reason it's called “Eventually Consistent”
It's much less expensive than synchronous replication.
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23 of 41
Asynchronous Replication
steps
(4a and 4b are executed in parallel)
Client #1
4a) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 4b) update record to DB-2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24 of 41
Error Management
During replication the Second Server could get an error due to a
conflict (the record was modified in the same moment from another client)
or a I/O problem. In this case the error is logged to disk to being fixed later.
Client #1
4) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 6) log the error 5) update record to DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25 of 41
Conflict Management
During replication conflicts could happen if two clients are
updating the same record at the same time
The conflicts resolution strategy can be plugged by providing
implementations of the OConflictResolver interface
Server #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26 of 41
Conflict Management
Default strategy
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27 of 41
Manual control of conflicts
like SVN/GIT tools
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28 of 41
Display the diff of 2 databases
> compare database db1 db2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29 of 41
How nodes are re-aligned
Client
Update a record
Server #1 Server #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31 of 41
On restart the node asks to the Leader
which are the servers to synchronize
Server #1 Server #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32 of 41
To be
consistent
or not be,
that is
the question
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33 of 41
Always consistent
use it as a Master-Slave
Read only, consistent. Leave it as
Read/Write. All replica. Since it's always aligned it's
changes on this server the best candidate as new master if
avoiding conflicts Server #1 is unavailable
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34 of 41
Read-only scaling
using many asynchronous replicas
Read/Write. All
changes on this server
avoiding conflicts
Server #2
Synch Slave
Client Server #1 read only
Master
Client read + write Server #N
Server #3
Server
Asynch Slave#3
Server
Asynch
read
Slave#3
only
Asynch Slave
read onlySlave
Asynch
Read only, eventually read only
read only
consistent. Replication
cost close to zero
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35 of 41
Read/Write scaling
Multi master + handling conflicts
Client Server #1
Master
Client read + write
Server #2 Client
Master
read + write Client
Client Server #3
Master
Client read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36 of 41
Read/Write scaling + sharding
Multi master, no conflict! :-)
Server USA
Client Master customers_usa
Writes on
customers_china
Server CHI
Client Master customers_china
read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37 of 41
Multi-Master + Sharding
=
big scale in high-availability and no conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39 of 41
NuvolaBase.com (beta)
The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from Web & Mobile
apps
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40 of 41
Luca Garulli
Author of OrientDB and
Roma <Meta> Framework
Open Source projects,
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41 of 41