You are on page 1of 10

31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

Images haven’t loaded yet. Please exit printing, wait for images to load, and try to
Journey to 1,000 nodes for IBM
print again.
Cloud Private
Guang Ya Liu Follow
Apr 26, 2018 · 7 min read

Kubernetes is becoming more mature and the size of a single


Kubernetes cluster is growing from hundreds of nodes to thousands of
nodes.

This article documents the enablement of IBM Cloud Private, which is


based on Kubernetes, to support 1,000 nodes in one single cluster.

Test Environment
IBM Cloud Private 2.1.0.2 (Kubernetes 1.9.1), released in March 2018,
was used to test 500 nodes in one single cluster.

Four types of nodes were used in the IBM Cloud Private scalability
cluster:

• Master Node: This type of node uses processes such as resource


allocation and state maintenance to control worker nodes in a
cluster. Master nodes primarily run Kubernetes core services such
as apiserver, controller manager and scheduler. They also run
lightweight services such as auth service and catalog service.

• Management Node: This type of node is optional. It hosts


management services such as monitoring, metering, and logging.
When you implement management nodes, you help prevent the
master node from becoming overloaded.

• Proxy Node: This type of node is primarily used to run the ingress
controller. Use of a proxy node enables you to access services
inside IBM Cloud Private from outside of the cluster.

• Worker Node: This type of node works as a Kubernetes agent that


provides an environment for running user applications in a
container.

The environment included implementation of the following:

• Container networks managed by Calico

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 1/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

• Calico used Node-Node-Mesh to manage Border Gateway Protocol


(BGP) routers

• Calico version 2.6.6 was installed and used etcd Version 2

• One etcd cluster was shared by both Kubernetes and Calico.

The following topology provides a visual representation of the various


components of the 500-node cluster:

All functions in IBM Cloud Private worked well with 500 nodes in one
IBM Cloud Private cluster.

Issues related to support of 1,000-node cluster

The following issues arose when attempting to scale the cluster to


1,000 nodes:

• Node-to-node mesh stopped working when there were more than


700 nodes in the cluster.

• In a cluster with 1,000 nodes, the node-to-node mesh number


would be 1,000, which is too large. This results in failure to start
the Calico node.

• etcd load is very high when scaled up to 1,000 nodes. Kubernetes


APIServer is not responding.

• 1,000 Calico nodes result in many reads and writes for the shared
etcd.

• After deleting Calico, etcd load returned to normal.

Solutions for scaling up to 1,000 nodes

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 2/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

Do not use Calico node-to-node mesh in large clusters
Test results from the 1,000-node scenario prompted investigation
regarding node-to-node mesh. The Calico community suggested that
node-to-node mesh can be used with clusters of less than 200 nodes.
However, clusters containing more than 200 nodes should use Router
Reflector mode. Each Router Reflector can manage a group of Calico
nodes so there will be no mesh connections between each Calico node.
Based on testing, one Router Reflector can be used by 1,000 nodes.
There is ongoing testing to see if one Router Reflector can be used to
support 2000 or more nodes.

For many Calico deployments, the use of a Route Reflector is not


required. However, for large scale deployments a full mesh of BGP
peerings between each of your Calico nodes will become untenable. In
this case, Route Reflector allows you to remove the full mesh and scale-
up the size of the cluster.

The screenshot below is a discussion with @projectcalico on Twitter:

You may be concerned about the performance of node-to-node mesh


and Router Reflectors. When tested, there is no performance difference
between node-to-node mesh and Router Reflectors. The major

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 3/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

difference between the two modes is in management effort. The only


negative effect of using Router Reflectors is that you must run and
manage them. The majority of users, especially small clusters, will
benefit from the simplification of using node-to-node mesh instead of
requiring a Router Reflector.

Upgrading Calico from V2.6.6 to V 3.0.4
The Calico Version 3 changelog indicates that Calico Version 3 supports
the etcd Version 3 API. When you use the etcd Version 3 API,
applications use the new gRPC API Version 3 to access the Multi-
Version Concurrency Control (MVCC) store which provides more
features and improved performance. See the etcd documentation for
more information about the etcd Version 3 API.

Testing was done to compare the performance of Calico Version 2.6.6


with etcd Version 2 API (Table 1) and Calico Version 3.0.4 with etcd
Version 3 API (Table 2).

The test results showed improved performance using Calico Version


3.0.4. Calico Version 3.0.4 latency is more than 2 times less than with
Calico Version 2. In addition, the improvement of queries per second
(QPS) with Calico Version 3.0.4 is more than 2 times the rate with
Calico Version 2.

Separate etcd for Calico and Kubernetes (optional)
Test results show that the shared etcd used by Kubernetes and Calico
creates very high loads in a large cluster. The intention was to enable
Calico with a separate etcd, which is also a best practice proposed by
the Calico community.

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 4/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

However, after upgrading from Calico Version 2.6.6 to Version 3.0.4,


the etcd that was dedicated to Calico was not experiencing high loads
from Router Reflector indicating that it is not necessary to use a
separate etcd for Calico.

During the test, 20,000 Liberty pods were created in the 1,000-node
cluster, which means that each worker node was running 20 pods.

The following test results are based on Calico V3.0.4 with 1,000 nodes
and Router Reflector with a separate etcd for Calico.

top value for Kubernetes APIServer (on leader master) without workload

top - 02:41:23 up 6 days, 23:27, 12 users, load average:


8.03, 7.26, 8.21
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped,
0 zombie
%Cpu(s): 18.2 us, 4.5 sy, 0.0 ni, 76.4 id, 0.4 wa, 0.0
hi, 0.5 si, 0.0 st
KiB Mem : 13203342+total, 67974048 free, 13182128 used,
50877248 buff/cache
KiB Swap: 13421568+total, 13421446+free, 1208 used.
11707584+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM


TIME+ COMMAND
3716 root 20 0 21.661g 6.045g 73296 S 108.3 4.8
3117:20 hyperkube

top value for APIServer (on leader master) with workload

top - 04:47:52 up 7 days, 1:34, 12 users, load average:


64.92, 68.68, 66.98
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped,
0 zombie
%Cpu(s): 62.9 us, 21.3 sy, 0.0 ni, 11.2 id, 0.2 wa, 0.0
hi, 4.4 si, 0.0 st
KiB Mem : 13203342+total, 36378248 free, 42160692 used,
53494488 buff/cache
KiB Swap: 13421568+total, 13421446+free, 1208 used.
88023200 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM


TIME+ COMMAND
3716 root 20 0 31.356g 0.026t 73552 S 1923 21.3
5165:31 hyperkube

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 5/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

etcd memory load change for Kubernetes

The memory graph above indicates that when workload is submitted to


IBM Cloud Private, the etcd memory increases due to frequent write
and read operations. However, after all workloads are running, the
memory returns to a stable value.

etcd DB size change for Kubernetes

When 20,000 pods were started in IBM Cloud Private, the DB size for
Kubernetes etcd was increased from 1.3G to approximately 1.45G.

etcd memory load change for Calico Router Reflector

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 6/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

The chart above shows that even when 20,000 pods were submitted in
the IBM Cloud Private cluster, the etcd memory for the Calico Router
Reflectory used only 500M of memory.

etcd DB size change for Calico Router Reflector

Calico Router Reflector DB size was increased by approximately 40M


with 20,000 pods.

The test above indicates that the Calico Router Reflector does not
contribute too much load to etcd, so you can use shared etcd for
Kubernetes and Calico with Router Reflector. However, it is
recommended that you use a separate etcd for Calico to be sure that
Calico does not impact the Kubernetes APIServer when using a shared
etcd.

Topology Change

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 7/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

You can easily extend IBM Cloud Private to support 1,000+ nodes in
one cluster.

The following two diagrams illustrate the major topology changes


needed to support 1,000+ nodes with IBM Cloud Private.

The changes are:

• Use Calico Router Reflector for large scale clusters.

• Upgrade Calico from Version 2 to Version 3 so as to leverage etcd


Version 3 API for better performance.

• Separating etcd for Calico is recommended for production.

Old Topology

New Topology

Summary
Kubernetes currently claims support of 5,000 nodes in one cluster.
However, different configurations in the Kubernetes cluster, such as
different network management technology and different deployment
topology may limit the cluster size of Kubernetes. More performance
and deployment topology tuning is yet to be done for large scale
clusters.

IBM Cloud Private 2.1.0.3 (Kubernetes 1.10.0) supports 1,000+ nodes


in one cluster with configurations discussed in this article. A new

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 8/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

Medium blog with details on setting up IBM Cloud Private cluster with
1,000+ nodes in one cluster is coming soon.

References
• Router Reflector Cluster Issues

• Router Reflector and Node-Node-Mesh Comparision

• Migrate etcd data when migrate from Node-Node-Mesh to Router


Reflector

• Router Reflector Guidance

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 9/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium

https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 10/10

You might also like