You are on page 1of 4

 

Last Login: September 15, 2021 1:06 PM ICT Natee (Available) (0) Contact Us Help
PowerView is Off

 
Dashboard Knowledge Service Requests Patches & Updates

Give Feedback...
Copy right (c) 2021, Oracle. A ll rights reserv ed. Oracle Confidential.

MySQL NDB Cluster: Never Install a Management Node on the Same Host as a Data Node (Doc ID 2439490.1) To Bottom

In this Document   Was this document helpful?  

Purpose Yes
No
Scope
Details
  Document Details  
  Node Failure Handling
  Examples
Type:
BULLETIN
  Split-Brain Scenario Status:
PUBLISHED
  Host Failure Last Major
Aug 26, 2018
Update:
  Arbitrator on Independent Host Feb 28, 2020
Last Update:
  Conclusion
References   Related Products  

MySQL Cluster
APPLIES TO:
  Information Centers  
MySQL Cluster - Version 7.1 and later Oracle Database 11g Release 2
Information in this document applies to any platform. Information Center
[1436725.2]

PURPOSE Redirect - 1307334.1 Primary


Note for ORA-1555 Errors
[1338919.2]
Avoid causing downtime by installing the management nodes on the same hosts as the data nodes.

SCOPE   Document References  

Best Practices For MySQL NDB


Cluster [1926680.1]
For the DBA familiar with MySQL NDB Cluster seeking guidance on the location of the installation of the management node.

DETAILS   Recently Viewed  

V$DIAG_ALERT_EXT does
In MySQL NDB Cluster, the management node (ndb_mgmd) is a lightweight process that among other things handles the Not Get Cleared after adrci
configuration of the cluster. Since it is lightweight. It can be tempting to install it with one of the other nodes. However, if you purge [2237144.1]
want a high-availability setup, you should never install it on the same host as a data node (ndbd or ndbmtd). If you do that, it Best Practices For MySQL
can cause a total cluster outage where the cluster could otherwise have survived. NDB Cluster [1926680.1]
MySQL NDB Cluster: Never
The first sign of trouble occurs when you start the management nodes. The following warning is printed to standard output: Install a Management Node
on the Same Host as a Data
Node [2439490.1]
2018-08-22 18:04:14 [MgmtSrvr] WARNING  -- at line 46: Cluster configuration warning:
How to improve ndb_restore
  arbitrator with id 49 and db node with id 1 on same host 192.0.2.1
performance [2130235.1]
  arbitrator with id 50 and db node with id 2 on same host 192.0.2.2
  Running arbitrator on the same host as a database node may How Can I Make it Faster to
Import Data into MySQL
  cause complete cluster shutdown in case of host failure.
Server? [1347016.1]
Show More
To understand why this setup can cause a complete cluster shutdown, it is necessary first to review how a node failure is
handled in MySQL NDB Cluster.

Node Failure Handling

When a data node fails, the cluster automatically evaluates whether it is safe to continue. A node failure can in this respect be
either that a data node crashes or there is a network outage meaning one or more nodes cannot be seen by the other nodes.

A clean node shutdown (such as when using the recommended STOP command in the management client) is not subject to the
evaluation as the other nodes will be notified of the shutdown.

So, how does MySQL NDB Cluster decide whether, the cluster can continue after a node failure or not? And if there are two
halves, which one gets to continue? Assuming two replicas (NoOfReplicas = 2), the rules are quite simple. The data nodes
are allowed to continue if the following conditions are fulfilled:

The data nodes have all data (i.e. there is at least one data node from each node group).
The data nodes constitutes a majority of the data nodes or they are the first to contact the arbitrator.

For the group of data nodes to hold all the data, there must be one data node from each node group available. A node group is a
group of NoOfReplicas nodes that share data. The arbitration process refers to the process of contacting the arbitrator
(typically a management node) – the first half to make contact will remain online.

This is all a bit abstract, so let’s take a look at a couple of examples.

Example s
Consider a cluster with two data nodes and two management nodes. Most of the examples will have a management node
installed on each of the hosts with the data nodes. The last example will as contrast have the management nodes on separate
hosts.

The starting point is thus a cluster using two hosts each with one data node and one management node as shown in this figure:

The green colour represents that the data node is online. The blue colour for the management node with Node Id 49 is the
arbitrator, and the yellow management node is on standby.

This is where the problem with the setup starts to show up. The arbitrator is the node that is involved when there is exactly half
the data nodes available after a node failure. In that case, the data node(s) will have to contact the arbitrator to confirm whether
it is OK to continue. This avoids a split-brain scenario where there are two halves with all the data; in that case it is imperative
that one half is shut down or the data can start to diverge. The half that is first to contact the arbitrator survives, the other is
killed (STONITH – shoot the other node in the head).

So, let’s look at a potential split-brain scenario.

Split-Brain Scenario

A split-brain scenario can occur when the network between the two halves of the cluster is lost as shown in the next figure:

In this case the network connection between the two hosts is lost. Since both nodes have all data, it is necessary with arbitration
to decide who can continue. The data node with Id 1 can still contact the arbitrator as it is on the same host, but Node Id 2
cannot (it would need to use the network that is down). So, Node Id 1 gets to continue whereas Node Id 2 is shut down.

So far so good. This is what is expected. A single point of failure does not lead to a cluster outage. However, what happens if we
instead of a network failure considers a complete host failure?

Host Failure

Consider now a case where there is a hardware failure on Host A or someone by accident pulls the power. This causes the whole
host to shut down taking both the data and management node with it. What happens in this case?
The first thought is that it will not be an issue. Node Id 2 has all the data, so surely it will continue, right? No, that is not so. The
result is a total cluster outage as shown in the following figure:

Why does this happen? When Host A crashes, so does the arbitrator management node. Since Node Id 2 does not on its own
constitute a majority of the data nodes, it must contact the arbitrator to be allowed to remain online.

You may think it can use the management node with Node Id 50 as the arbitrator, but that will not happen: while handling a node
failure, under no circumstances can the arbitrator be changed. The nodes on Host B cannot know whether it cannot see the
nodes on Host A because of a network failure (as in the previous example) or because the nodes are dead. So, they have to
assume the other nodes are still alive or there would sooner or later be a split-brain cluster with both halves online.

Important: The arbitrator will never change while the cluster handles a node failure.

So, the data node with Id 2 has no other option than to shut itself down, and there is a total cluster outage. A single point of
failure has caused a total failure. That is not the idea of a high availability cluster.

What could have been done to prevent the cluster outage? Let’s reconsider the case where the arbitrator is on a third
independent host.

Arbitrator on Independent Host

The picture changes completely, if the management nodes are installed on Hosts C and D instead of Hosts A and B. For simplicity
the management node with Node Id 50 is left out as it is anyway just a spectator while handling the node failure. In this case the
scenario is:

Here Node Id 2 can still contact the arbitrator. Node Id 1 is dead, so it will not compete to win the arbitration, and the end result
becomes that Node Id 2 remains online. The situation is back where a single point of failure does not bring down the whole
cluster.

 
Conclusion

If you want your cluster to have the best chance of survival if there is a problem with one of the hosts, never install the
management nodes on the same hosts as where there are data nodes. One of the management nodes will also act as the
arbitrator. Since the arbitrator cannot change while the cluster is handling a node failure, if the host with the arbitrator crashes, it
will cause a complete cluster shutdown if arbitration is required.

When you consider what is a host, you should look at physical hosts. Installing the management node in a different virtual
machine on the same physical host offers only little extra protection compared to the case where they are installed in the same
virtual host or on the same host using bare metal.

So, to conclude: make sure your management nodes are on a completely different physical host compared to your data nodes.

REFERENCES

NOTE:1926680.1 - Best Practices For MySQL NDB Cluster


Didn't find what you are looking for? Ask in Community...

Related
Products

Oracle Database Products > MySQL > MySQL Cluster > MySQL Cluster > Installation and Configuration

Keywords
AVAILABILITY; CLUSTER; HIGH AVAILABILITY; MYSQL; SHUTDOWN; SPLIT-BRAIN

Back to Top
 
Copy right (c) 2021, Oracle. A ll rights reserv ed. Legal Notices and Terms of Use Priv acy Statement
   

You might also like