Professional Documents
Culture Documents
Images haven’t loaded yet. Please exit printing, wait for images to load, and try to
Journey to 1,000 nodes for IBM
print again.
Cloud Private
Guang Ya Liu Follow
Apr 26, 2018 · 7 min read
Test Environment
IBM Cloud Private 2.1.0.2 (Kubernetes 1.9.1), released in March 2018,
was used to test 500 nodes in one single cluster.
Four types of nodes were used in the IBM Cloud Private scalability
cluster:
• Proxy Node: This type of node is primarily used to run the ingress
controller. Use of a proxy node enables you to access services
inside IBM Cloud Private from outside of the cluster.
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 1/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
All functions in IBM Cloud Private worked well with 500 nodes in one
IBM Cloud Private cluster.
• 1,000 Calico nodes result in many reads and writes for the shared
etcd.
Solutions for scaling up to 1,000 nodes
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 2/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
Do not use Calico node-to-node mesh in large clusters
Test results from the 1,000-node scenario prompted investigation
regarding node-to-node mesh. The Calico community suggested that
node-to-node mesh can be used with clusters of less than 200 nodes.
However, clusters containing more than 200 nodes should use Router
Reflector mode. Each Router Reflector can manage a group of Calico
nodes so there will be no mesh connections between each Calico node.
Based on testing, one Router Reflector can be used by 1,000 nodes.
There is ongoing testing to see if one Router Reflector can be used to
support 2000 or more nodes.
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 3/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
Upgrading Calico from V2.6.6 to V 3.0.4
The Calico Version 3 changelog indicates that Calico Version 3 supports
the etcd Version 3 API. When you use the etcd Version 3 API,
applications use the new gRPC API Version 3 to access the Multi-
Version Concurrency Control (MVCC) store which provides more
features and improved performance. See the etcd documentation for
more information about the etcd Version 3 API.
Separate etcd for Calico and Kubernetes (optional)
Test results show that the shared etcd used by Kubernetes and Calico
creates very high loads in a large cluster. The intention was to enable
Calico with a separate etcd, which is also a best practice proposed by
the Calico community.
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 4/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
During the test, 20,000 Liberty pods were created in the 1,000-node
cluster, which means that each worker node was running 20 pods.
The following test results are based on Calico V3.0.4 with 1,000 nodes
and Router Reflector with a separate etcd for Calico.
top value for Kubernetes APIServer (on leader master) without workload
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 5/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
When 20,000 pods were started in IBM Cloud Private, the DB size for
Kubernetes etcd was increased from 1.3G to approximately 1.45G.
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 6/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
The chart above shows that even when 20,000 pods were submitted in
the IBM Cloud Private cluster, the etcd memory for the Calico Router
Reflectory used only 500M of memory.
The test above indicates that the Calico Router Reflector does not
contribute too much load to etcd, so you can use shared etcd for
Kubernetes and Calico with Router Reflector. However, it is
recommended that you use a separate etcd for Calico to be sure that
Calico does not impact the Kubernetes APIServer when using a shared
etcd.
Topology Change
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 7/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
You can easily extend IBM Cloud Private to support 1,000+ nodes in
one cluster.
Old Topology
New Topology
Summary
Kubernetes currently claims support of 5,000 nodes in one cluster.
However, different configurations in the Kubernetes cluster, such as
different network management technology and different deployment
topology may limit the cluster size of Kubernetes. More performance
and deployment topology tuning is yet to be done for large scale
clusters.
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 8/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
Medium blog with details on setting up IBM Cloud Private cluster with
1,000+ nodes in one cluster is coming soon.
References
• Router Reflector Cluster Issues
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 9/10
31/05/2019 Journey to 1,000 nodes for IBM Cloud Private – IBM Cloud – Medium
https://medium.com/ibm-cloud/journey-to-1000-nodes-for-ibm-cloud-private-5294138047d5 10/10