Professional Documents
Culture Documents
2
Agenda
3
Quick Sniff at PXC
What is PXC ?
Enhanced
Multi-master
Security
Network Flexible
protection topology
(Geo-distributed)
Auto-node Performance
provisioning tuned
5
Failure Scenarios and their recovery
Scenario: New node fail to connect to cluster
7
Scenario: New node fail to connect to cluster
Joiner log
8
Scenario: New node fail to connect to cluster
Joiner log
Administrator reviews
DONOR log doesn’t have any
configuration settings like IP
traces of JOINER trying to JOIN.
address are sane and valid.
9
Scenario: New node fail to connect to cluster
Joiner log
Administrator reviews
DONOR log doesn’t have any
configuration settings like IP
traces of JOINER trying to JOIN.
address are sane and valid.
10
Scenario: New node fail to connect to cluster
Joiner log
Administrator reviews
DONOR log doesn’t have any
configuration settings like IP
traces of JOINER trying to JOIN.
address are sane and valid.
SELinux/AppArmor
11
Scenario: New node fail to connect to cluster
Joiner log
12
Scenario: New node fail to connect to cluster
● Solution-1:
○ Setting mode to PERMISSIVE or DISABLED
13
Scenario: New node fail to connect to cluster
● Solution-1:
○ Setting mode to PERMISSIVE or DISABLED
● Solution-2:
○ Configuring policy to allow access in ENFORCING mode.
○ Related blogs
■ “Lock Down: Enforcing SELinux with Percona XtraDB Cluster”. It probs
what all permission are needed and add rules accordingly.
■ “Lock Down: Enforcing AppArmor with Percona XtraDB Cluster”
■ Using this we can continue to use SELinux in enable mode. (You can also
refer to selinux configuration on Codership site too).
14
Scenario: New node fail to connect to cluster
15
Scenario: Catching up cluster (SST, IST)
16
Scenario: Catching up cluster (SST, IST)
● SST: complete copy-over of data-directory
○ SST has has multiple external components SST script, XB, network aspect,
etc. Some of these are outside control of PXC process.
17
Scenario: Catching up cluster (SST, IST)
Joiner log
#1
18
Scenario: Catching up cluster (SST, IST)
Joiner log
SST failed on DONOR
#1
19
Scenario: Catching up cluster (SST, IST)
Joiner log
SST failed on DONOR
wsrep_sst_auth
not set on DONOR #1
20
Scenario: Catching up cluster (SST, IST)
Joiner log
#1
21
Scenario: Catching up cluster (SST, IST)
Donor log
#2
22
Scenario: Catching up cluster (SST, IST)
Donor log
Possible cause:
● Specified wsrep_sst_auth user doesn’t exit.
● Credentials are wrong.
● Insufficient privileges. #2
23
Scenario: Catching up cluster (SST, IST)
Joiner log
#3
24
Scenario: Catching up cluster (SST, IST)
Joiner log
#3
Trying to get old version JOINER to join from
new version DONOR. (Not supported).
25
Scenario: Catching up cluster (SST, IST)
Donor log
Joiner log
#4
26
Scenario: Catching up cluster (SST, IST)
Donor log
#4
27
Scenario: Catching up cluster (SST, IST)
#5
28
Scenario: Catching up cluster (SST, IST)
29
Scenario: Catching up cluster (SST, IST)
30
Scenario: Cluster doesn’t come up on restart
31
Scenario: Cluster doesn’t come up on restart
● All your nodes are located in same Data-Center (DC)
● DC hits power failure and all nodes are restarted.
● On restart, recovery flow is executed to recover wsrep coordinates.
32
Scenario: Cluster doesn’t come up on restart
● All your nodes are located in same Data-Center (DC)
● DC hits power failure and all nodes are restarted.
● On restart, recovery flow is executed to recover wsrep coordinates.
33
Scenario: Cluster doesn’t come up on restart
● All your nodes are located in same Data-Center (DC)
● DC hits power failure and all nodes are restarted.
● On restart, recovery flow is executed to recover wsrep coordinates.
34
Scenario: Cluster doesn’t come up on restart
● Close look at the log shows original bootstrapping node has
safe_to_bootstrap set to 0 so it refuse to come up.
35
Scenario: Cluster doesn’t come up on restart
● Close look at the log shows original bootstrapping node has
safe_to_bootstrap set to 0 so it refuse to come up.
36
Scenario: Cluster doesn’t come up on restart
37
Scenario: Cluster doesn’t come up on restart
38
Scenario: Cluster doesn’t come up on restart
wsrep_cluster_address=<node-ip>
&
pc.recovery=true (default)
39
Scenario: Cluster doesn’t come up on restart
but pc.recovery=false
40
Scenario: Cluster doesn’t come up on restart
41
Scenario: Data inconsistency
42
Scenario: Data inconsistency
43
Scenario: Data inconsistency
● 2 kinds of inconsistencies
○ Physical inconsistency: Hardware Issues
○ Logical inconsistency: Data Issues
44
Scenario: Data inconsistency
● 2 kinds of inconsistencies
○ Physical inconsistency: Hardware Issues
○ Logical inconsistency: Data Issues
45
Scenario: Data inconsistency
● 2 kinds of inconsistencies
○ Physical inconsistency: Hardware Issues
○ Logical inconsistency: Data Issues
46
Scenario: Data inconsistency
Inconsistency
detected
47
Scenario: Data inconsistency
ISOLATED NODE
(SHUTDOWN)
Cluster in
healthy and
running
48
Scenario: Data inconsistency
Inconsistency
detected
Inconsistency
detected
49
Scenario: Data inconsistency
shutdown
State marked as
UNSAFE
non-prim shutdown
50
Scenario: Data inconsistency
51
Scenario: Data inconsistency
52
Scenario: Data inconsistency
53
Scenario: Data inconsistency
54
Scenario: Data inconsistency
55
Scenario: Data inconsistency
CLUSTER
RESTORED
56
Scenario: Data inconsistency
shutdown
State marked as
UNSAFE
non-prim shutdown
57
Scenario: Data inconsistency
58
Scenario: Data inconsistency
59
Scenario: Data inconsistency
60
Scenario: Data inconsistency
61
Scenario: Data inconsistency
CLUSTER
RESTORED
63
Scenario: Another aspect of data inconsistency
64
Scenario: Another aspect of data inconsistency
65
Scenario: Another aspect of data inconsistency
Transaction
upto X - 1
Transaction
upto X
66
Scenario: Another aspect of data inconsistency
Transaction X caused
inconsistency so it
never made it to
these nodes.
Transaction
upto X - 1
Transaction
upto X
67
Scenario: Another aspect of data inconsistency
Transaction
upto X - 1
Transaction
upto X
68
Scenario: Another aspect of data inconsistency
Membership rejected
as new coming node
has one extra
transaction than
cluster state.
Transaction
upto X - 1
Transaction
upto X
69
Scenario: Another aspect of data inconsistency
70
Scenario: Another aspect of data inconsistency
71
Scenario: Another aspect of data inconsistency
Node got
membership and
node joined
2 node cluster is up and it started
through IST too? processing transaction. Moving
the state of cluster from X -> X +
3
72
Scenario: Another aspect of data inconsistency
Node has
transaction upto X
and cluster says it
has transaction 2 node cluster is up and it started
upto X+3. processing transaction. Moving
the state of cluster from X -> X +
3
Node joining
doesn’t evaluate
data. It is all
dependent on
seqno. 73
Scenario: Another aspect of data inconsistency
74
Scenario: Another aspect of data inconsistency
trx-seqno=x
e
s am
t
w ith ren
e
t i on diff
c t
n sa bu ate
a o d
Tr eqn up trx-seqno=x
s
trx-seqno=x
75
Scenario: Another aspect of data inconsistency
trx-seqno=x
trx-seqno=x
76
Scenario: Cluster doesn’t come up on restart
77
Scenario: Delayed purging
78
Scenario: Delayed purging
Gcache
(staging area to hold
replicated
transaction)
79
Scenario: Delayed purging
Transaction
replicated and staged
80
Scenario: Delayed purging
81
Scenario: Delayed purging
Transactions can be
removed from gcache
82
Scenario: Delayed purging
● Each node at configured interval notifies other nodes/cluster about its
transaction committed status
● Accordingly each node evaluates the cluster level lowest water mark and
initiate gcache purge.
83
Scenario: Delayed purging
84
Scenario: Delayed purging
cluster-purge-water-mark=X
And accordingly all
nodes will purge local N1_purged_upto: x+1
gcache upto X. N2_purged_upto: x+1
N3_purged_upto: x
cluster-purge-water-mark=X cluster-purge-water-mark=X
85
Scenario: Delayed purging
86
Scenario: Delayed purging
87
Scenario: Delayed purging
88
Scenario: Delayed purging
89
Scenario: Delayed purging
STOP
Gcache
processing
Transaction start to transaction
pile up in gcache
90
Scenario: Delayed purging
STOP
Gcache
processing
Transaction start to transaction
pile up in gcache ● FTWRL, RSU … action that
causes node to pause and
desync.
91
Scenario: Delayed purging
● Given that one of the node is not making progress it would not emit
its transaction committed status.
● This would freeze the cluster-purge-water-mark as lowest
transaction continue to lock-down.
● This means, though other nodes are making progress, they will
continue to pile up galera cache.
92
Scenario: Delayed purging
● Given that one of the node is not making progress it would not emit
its transaction committed status.
● This would freeze the cluster-purge-water-mark as lowest
transaction continue to lock-down.
● This means, though other nodes are making progress, they will
continue to pile up galera cache.
93
Scenario: Delayed purging
94
Scenario: Delayed purging
95
Scenario: Delayed purging
STOP
Gcache
processing
transaction
Force purge done
96
Scenario: Delayed purging
97
Scenario: Delayed purging
98
Scenario: Network latency and related failures
99
Scenario: Network latency and related failures
10
0
Scenario: Network latency and related failures
10
1
Scenario: Network latency and related failures
Why ?
What caused this weird behavior ?
10
2
Scenario: Network latency and related failures
10
3
Scenario: Network latency and related failures
10
4
Scenario: Network latency and related failures
10
5
Scenario: Network latency and related failures
10
6
Scenario: Network latency and related failures
10
7
Scenario: Network latency and related failures
10
8
Scenario: Network latency and related failures
10
9
Scenario: Network latency and related failures
Each node will monitor other nodes of the If node detects delay in response from
cluster @ inactive_check_period (0.5 given node it would try to add it to
seconds). delayed list.
If node is not reachable from given node Node waits for delayed_margin
post peer_timeout (3S), cluster will before adding node to delayed_list
enable relaying of message. (1S)
If all nodes votes for said node inactivity Even if node becomes active again it
(suspect_timeout (5S)) it is would take delayed_keep_period
pronounced DEAD. (30S) to remove it from the list.
11
0
Scenario: Network latency and related failures
Each node will monitor other nodes of the If node detects delay in response from
cluster @ inactive_check_period (0.5 given node it would try to add it to
seconds). delayed list.
If node is not reachable from given node Node waits for delayed_margin
post peer_timeout (3S), cluster will before adding node to delayed_list
enable relaying of message. (1S)
If all nodes votes for said node inactivity Even if node becomes active again it
(suspect_timeout (5S)) it is would take delayed_keep_period
pronounced DEAD. (30S) to remove it from the list.
Runtime configurable
11
1
Scenario: Network latency and related failures
Latency
< 1 ms 7 sec
7 sec
11
2
Scenario: Network latency and related failures
< 1 ms 7 sec
7 sec
11
3
Scenario: Network latency and related failures
< 1 ms 7 sec
7 sec
11
4
Scenario: Network latency and related failures
7 sec
11
5
Scenario: Network latency and related failures
#1
11
6
Scenario: Network latency and related failures
#1
11
7
Scenario: Network latency and related failures
#1
11
8
Scenario: Network latency and related failures
#1
11
9
Scenario: Network latency and related failures
#2 12
0
Scenario: Network latency and related failures
< 1 ms 2 sec
2 sec
12
1
Scenario: Network latency and related failures
#2 12
2
Scenario: Network latency and related failures
12
3
Scenario: Network latency and related failures
12
4
Scenario: Network latency and related failures
Flaky node are not good for overall transaction processing. (Can cause
certification failures).
12
5
Scenario: Blocking Transaction and related failures
12
6
Scenario: Blocking Transaction and related failures
12
7
Scenario: Blocking Transaction and related failures
12
9
Scenario: Blocking Transaction and related failures
Transaction
replicate after
Transaction first it has been
execute on local executed on
node. During this local node but
execution not yet
transaction committed.
doesn’t block
other Replication
non-dependent involves
transaction transporting
write-set
(binlog) to
other nodes.
13
0
Scenario: Blocking Transaction and related failures
apply commit N2
13
1
Scenario: Blocking Transaction and related failures
To maintain data
consistency across the
cluster, protocol needs
transaction to commit in
same order on all the
nodes.
apply commit N2
13
2
Scenario: Blocking Transaction and related failures
13
3
Scenario: Blocking Transaction and related failures
13
4
Scenario: Blocking Transaction and related failures
Bigger the
transaction, bigger
backlog of small
transactions this
apply commit N2
would eventually
cause FLOW_CONTROL
13
5
Scenario: Blocking Transaction and related failures
13
6
Scenario: Blocking Transaction and related failures
13
7
Scenario: Blocking Transaction and related failures
13
8
Scenario: Network latency and related failures
For load data use LOAD DATA INFILE that would cause intermediate
commit every 10K rows. Note: Random failure can cause partial data to
get committed.
13
9
One last important note
● Majority of the error are due to mis-configuration or
difference in configuration of nodes.
● PXC recommend same configuration on all nodes of
the cluster.
14
0
PXC Genie: You Wish. We implement
PXC Genie: You Wish. We implement
● Like to hear from you what you want next in PXC ?
● Any specific module that you expect improvement ?
● How can Percona help you with PXC or HA ?
● Log issue (mark them as new improvement)
https://jira.percona.com/projects/PXC/issue
● PXC forum is other way to reach us.
14
2
Questions and Answer
Thank You Sponsors!!
14
4