You are on page 1of 41

rev 1.

Distributed architecture
with a Multi-Master approach

Available in version 1.0


(planned for December 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 of 41
Where is the previous
OrientDB
Master/Slave
architecture?

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3 of 41
After first tests we decided to
throw away the old Master-Slave
architecture because it was
against the OrientDB philosophy:

doesn't scale
and

it's hard to configure properly


www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4 of 41
So what's next?
We've re-designed the entire distributed
architecture to get it working as

Multi-Master* *http://en.wikipedia.org/wiki/Multi-master_replication

to release in the version 1.0


(december 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5 of 41
In the Multi-Master architecture

any node can read/write to the database

this scale up horizontly

adding nodes is straightforward

Say wow!

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7 of 41
...but

you have to fight


with

conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8 of 41
Fortunately we found some
smart ways to resolve conflicts without
falling in a

Blood Bath

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9 of 41
The actors
Only 1 per Leader per cluster, checks other nodes and
Leader Node notify changes to other Peer Nodes. Can be any server
node in the cluster, usually the first to start
Any server node in the cluster. Has a permanent
Peer Node connection to the Leader Node
Clients are connected to Server Nodes no matter if Leader
Client
or Peer

Database Database, where data are stored

Synchronous mode replication. Server node propagates


changes waiting for the response from the remote server,
then sends the ACK to the client
Asynchronous mode replication. Server node propagates
changes and sends the ACK to the client without waiting
for the response from the remote server

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10 of 41
How the cluster
of nodes is
composed
and
managed?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11 of 41
Cluster auto-discovering
At start up each Server Node sends a IP Multicast message in broadcast to
discover if any Leader Node is available to join the cluster. If available, the
Leader Node will connect to it and it becomes a Peer Node, otherwise it becomes
the Leader node.

Server #1 DB
DB
DB
DB DB
DB
(Leader)

Server #2
(Peer)

DB
DB DB
DB
DB
DB

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12 of 41
One Leader Multiple Peers
The first node to start is always the Leader but in case of failure can be elected
any other. Leader Node polls all the servers verifying the status and alerts all the
Peer Nodes at every changes in the cluster composition.

Server #1 DB
DB
DB
DB DB
DB
(Leader)

Server #2 Server #3
(Peer) (Peer)

DB
DB DB
DB DB
DB DB
DB
DB
DB DB

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13 of 41
Asymmetric clustering
Each database can be clustered in multiple server nodes. Databases can be moved
across servers. Replication strategy has per database/server granularity.
This means you could have Server #2 that replicates database B in asynch way
to the Server #3 and database A in synch way to the Server #1.

A
Server #1
(Leader)
C

Server #2 Server #3
(Peer) (Peer)

A B C B

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14 of 41
Distributed configuration
Cluster configuration is broadcasted from the Leader Node to all the Peer Nodes.
Peer Nodes broadcast to all the connected clients.
Everybody knows who has the database

Client #1 Server #1
(Leader) Client #3

Server #2 Server #3
(Peer) (Peer)

Client #2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15 of 41
Security
To join a cluster the Server Node has to configure the cluster name and password
Broadcast messages are encrypted using the password
Password doesn't cross the network: it's stored in the configuration file

Server #1
(Leader)

Server #2 Join the cluster


(Peer) ONLY
If knows the name
DB
DB
DB
DB
DB
DB and password

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16 of 41
Leader election
Each Peer Node continuously checks the connection with the Leader Node
If lost try to elect itself as a new Leader Node
Split Network resolved using a simple algorithm

Server #1 Server #2
192.168.0.10:2424 192.168.10.27:2424
(Leader) (Leader)

Server #1 takes the


leadership
because has the lower ID
ID = <ip-address>:<port>

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17 of 41
Multiple clusters
Multiple separate clusters can coexist in the same network
Clusters can't see each others. Are separated boxes
What identify a cluster is name + password

Cluster 'A', password 'aaa'

Server #1 Cluster 'B', password 'bbb'


(Leader)
Server #2 Server #1
(Peer)
Server #3 (Leader)
(Peer) Server #2
(Peer)
Server #3
(Peer)

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18 of 41
Fail-over
Clients knows about other nodes, so transparently switch
to good servers. No error is sent to the client app.
Running transactions will be repeated transparently too (v1.2)

Client #1 Client #2 Client #3 Client #4

Server #1 Server #2

DB-1 DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19 of 41
How the replication works?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20 of 41
Synchronous Replication
Guarantees two databases are always consistent
More expensive than asynchronous because the First Server
waits for the Second Server's answer before to send back
the ACK to the client. After ACK the Client is secure
the data is placed in multiple nodes at the same time

Server #1 Server #2

DB-1 DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21 of 41
Synchronous Replication
steps

Client #1
6) Sends back OK to Client #1
1) Update record request
3) Propagates the update

Server #1 Server #2
2) Update record to DB-1 5) Sends back OK to Server #1 4) update record to DB-2

DB-1 DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22 of 41
Asynchronous Replication
Changes are propagated without waiting for the answer
Two databases could be not consistent in the range of few ms
For this reason it's called “Eventually Consistent”
It's much less expensive than synchronous replication.

Server #1 Server #2

DB-1 DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23 of 41
Asynchronous Replication
steps
(4a and 4b are executed in parallel)

Client #1
4a) Sends back OK to Client #1
1) Update record request
3) Propagates the update

Server #1 Server #2
2) Update record to DB-1 4b) update record to DB-2

DB-1 DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24 of 41
Error Management
During replication the Second Server could get an error due to a
conflict (the record was modified in the same moment from another client)
or a I/O problem. In this case the error is logged to disk to being fixed later.

Client #1
4) Sends back OK to Client #1
1) Update record request
3) Propagates the update

Server #1 Server #2
2) Update record to DB-1 6) log the error 5) update record to DB-2

DB-1 Synch Log DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25 of 41
Conflict Management
During replication conflicts could happen if two clients are
updating the same record at the same time
The conflicts resolution strategy can be plugged by providing
implementations of the OConflictResolver interface

Server #2

Conflict Strategy DB-2

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26 of 41
Conflict Management
Default strategy

Default implementation Server #2


merges the records:
in case same fields are
changed the oldest
Default DB-2
document wins and the
Conflict Strategy
newest is written into the
Synch Log
Synch Log

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27 of 41
Manual control of conflicts
like SVN/GIT tools

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28 of 41
Display the diff of 2 databases
> compare database db1 db2

Copy a record across databases


> copy record #10:20@db1 to #10:20@db2

Copy entire cluster across databases


> copy cluster city@db1 to city@db2

Merges two records across databases


> merge records #10:20@db1 #10:20@db2
to #10:20@db1

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29 of 41
How nodes are re-aligned

once up again after a fail,


shutdown or network problem?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30 of 41
During replication all operations
are logged using

unique op-id with the format <node>#<serial>

Client
Update a record

Server #1 Server #2

Op-id: 192.168.0.10:2424#123232 Op-id: 192.168.0.10:2424#123232

Operation Log DB-1 DB-2 Operation Log

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31 of 41
On restart the node asks to the Leader
which are the servers to synchronize

op-ids are used to know the operation missed

Server #1 Server #2

Op-id: 192.168.1.11:2424#9569 Op-id: 192.168.0.10:2424#123232

Operation Log DB-1 DB-2 Operation Log

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32 of 41
To be
consistent
or not be,
that is
the question

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33 of 41
Always consistent
use it as a Master-Slave
Read only, consistent. Leave it as
Read/Write. All replica. Since it's always aligned it's
changes on this server the best candidate as new master if
avoiding conflicts Server #1 is unavailable

Client Server #1 Server #2


Master Synch Slave
Client read + write read only

Perfect for Analysis,


One-way only
Business Intelligence
and Reports

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34 of 41
Read-only scaling
using many asynchronous replicas

Read/Write. All
changes on this server
avoiding conflicts

Server #2
Synch Slave
Client Server #1 read only
Master
Client read + write Server #N
Server #3
Server
Asynch Slave#3
Server
Asynch
read
Slave#3
only
Asynch Slave
read onlySlave
Asynch
Read only, eventually read only
read only
consistent. Replication
cost close to zero

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35 of 41
Read/Write scaling
Multi master + handling conflicts
Client Server #1
Master
Client read + write

Server #2 Client
Master
read + write Client

Client Server #3
Master
Client read + write

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36 of 41
Read/Write scaling + sharding
Multi master, no conflict! :-)
Server USA
Client Master customers_usa

Writes on read + write


customers_usa

Writes on
customers_china
Server CHI
Client Master customers_china
read + write

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37 of 41
Multi-Master + Sharding
=
big scale in high-availability and no conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39 of 41
NuvolaBase.com (beta)

The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from Web & Mobile
apps

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40 of 41
Luca Garulli
Author of OrientDB and
Roma <Meta> Framework
Open Source projects,

Member of JSR#12 (jdo 1.0)


and JSR#243 (jdo 2.0)
www.twitter.com/lgarulli
@London, UK CEO at Nuvola Base Ltd
and
@Rome, Italy

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41 of 41