Section 5: Cluster Maintenance

Section 5 : Cluster maintenance
OS Upgrades
● When performing maintenance on nodes like upgrading a base

software or applying patches, you can take down nodes to perform
this tasks on them.
○ If a node is down for more than 5 minutes, the pods hosted on it
are terminated as they are considered dead.

◆ The time k8s wait for a pod to come back online before
considering it as dead is called pod-eviction-timeout.

◆ This option is set on the Kube-controller-manager with the
default value of 5 minutes.

○ If the pods were part of a replicaSet, they are recreated on other
nodes.
● Some of the commands used to temporarily take nodes down for
maintenance are:-
○ k drain <node-name>
◆ k drain --ignore-daemonsets <node-name> : incase there
are daemonSets present on the node.

◆ k drain --force <node-name> : incase there are pods which
are not part of a replicaSet.

◆ k drain --ignore-daemonsets --force <node-name> : incase
the above 2 are present on the node.
● Meaning of used are :

○ drain : when you drain a node pods on it as gracefully terminated
and recreated on other nodes.

◆ The node is also marked unschedulable(cordoned).
◆ Command used to mark node unschedulable is :-
– k cordon <node-name>
– Unlike drain it does not recreate pods on other nodes, it
simply makes sure that no new pods can be scheduled on
the node.
◆ This means no pods can be scheduled on the node unless you
uncordon it. With command

– k uncordon <node-name>
K8s software versions

● When we install k8s, we are installing a specific version of the API
Server. How does k8s project manage Kubernetes releases ?
○ The Kubernetes version (v1.26.2) consists of 3 parts :
◆ Major version : v1
◆ Minor version : 26
– Released every few months with new features.

◆ Patch version : 2
– Released more often with critical bug fixes.
○ Kubernetes follows a standard software release version procedure.

◆ Every few month it comes out with new features and
functionalities through a minor release.

◆ Stable releases commonly take the form of : v1.minor-
version.0
◆ v1.25.0
◆ v1.26.0 etc
◆ There are also alpha and beta releases :
◆ Alpha release : consist of bug fixes and improvements. In
this release the new features are disables by default and

may be buggy.
◆ Beta release : code is well tested, new features are
enabled by default. This release leads to main stable

release.
○ In any release, the core control-plane components usually have

the same version number except externally dependent apps like
ETCD and CoreDNS.
K8s cluster upgrade introduction
● All core control-plane components must not have a version greater

than that of the Kube-apiserver.
○ Kube-controller-manager and Kube-scheduler can be one
version down the one of Kube-apiserver.

◆ If the api-server was at v1.20, they can be v1.20 or v.19.
○ Kube-proxy and Kubelet can be two versions down the one of

Kube-apiserver.
◆ If the api-server was at v1.20, they can be v1.20, v.19 or v.18.
○ Kubectl can be one version up / down the Kube-apiserver.
◆ If the api-server was at v1.20, they can be v1.21, v.20 or v.19.
● When should one upgrade the cluster ?

○ K8s provide support for 2 lower versions from the current main
stable version.
○ If the current stable version is v1.26.0 then K8s only provides
support for v1.25 and v1.24.
● Upgrading a cluster can be done by :-

. Using kubeadm for a cluster deployed using kubeadm.
. Updating each component of the cluster, if the cluster was built
from scratch. (a.k.a The Hard Way)
● Upgrading a cluster involves two major steps :-

. Upgrading the control-plane / master nodes.
◆ When upgrading the master node all control-plane
components are down.

◆ Apps hosted on worker nodes are not affected when upgrading
master node, but pods cannot be recreated if they goes down.
. Upgrading the worker nodes.

◆ There are 3 strategies when upgrading worker nodes :
. Strategy 1 : Upgrading all of them at once.

– That is to shut them down and then bring them back up
after performing upgrades.
– This strategy requires application downtime for the
duration of the upgrade.
– Once the upgrade is complete, nodes are back up, new
pods are scheduled and users can resume access.
. Strategy 2 : Upgrading one node at a time.
– Each node is first drained of all workloads.
– Workloads from the node are recreated on other nodes.
– Node is upgraded and then brought back for new pods
to be scheduled.
. Strategy 3 : Upgraded nodes are added to the cluster.
– Workloads are shifted to a new upgraded node.
– After workloads are rescheduled to the new node, the
node with older version is decommissioned.
– This is convenient on a cloud environment where you
can easily provision new nodes and decommission old
ones.
● Upgrading a cluster with kubeadm :-

○ Steps used upgrading the master node include :-
. kubeadm upgrade plan
– Command to see current versions of kubeadm, control-
plane components and version they can be upgraded to.
– It also tells you that after upgrading all components, you
should manually upgrade Kubelet version on each node in
the cluster since kubeadm does not install or upgrade
Kubelets.
– You must upgrade kubeadm tool itself before upgrading
the cluster.
. apt-get upgrade kubeadm=1.minor.0-00 : upgrade kubeadm

to one minor version above the current one.
. kubeadm upgrade apply v1.minor.0 : use the command in
the kubeadm upgrade plan output to upgrade the cluster.
– This pull the necessary images and upgrade the cluster
components.
– Once complete your components are now at the version
provided in the command.
– But if you run k get nodes command you’ll see that the
previous version is shown.
◆ This is because, the output shows the version of
Kubelet on the master node.

◆ In a cluster setup with kubeadm, Kubelet are present
on the master node since control-plane components

run as pods.
. apt-get upgrade kubelet=1.minor.0-00

kubectl=1.minor.0-00 : upgrade the Kubelet and Kubectl on
the master server.
. systemctl restart kubelet : restart the Kubelet service.
. k get nodes : to conform if the Kubelet was upgraded.
○ Steps used upgrading the worker nodes include :-

. k drain <worker-node1> : move workload elsewhere and mark
node1 unschedulable to perform upgrades as follows :-
◆ apt-get upgrade kubeadm=1.minor.0-00
◆ apt-get upgrade kubelet=1.minor.0-00
◆ kubeadm upgrade node config --kubelet-version
v1.minor.0
◆ systemctl restart kubelet
◆ k uncordon <worker-node1>
. k drain <worker-node2> : move workload elsewhere and

mark node2 unschedulable to perform upgrades as follows :-
◆ apt-get upgrade kubeadm=1.minor.0-00
◆ apt-get upgrade kubelet=1.minor.0-00
◆ kubeadm upgrade node config --kubelet-version
v1.minor.0
◆ systemctl restart kubelet
◆ k uncordon <worker-node2>
○ There is a script for that in UbuntuVM

○ Github Documentation coming soon…
Backup and restore methods with ETCD
● You can create a backup of your k8s cluster either by :-

. querying the kube-apiserver
◆ k get all --all-namespaces -o yaml > all-deployment-
services.yaml
◆ This is ideal in a managed k8s environment such as in a cloud
environment where you might not has access to ETCD cluster.

. Taking snapshots of the entire ETCD cluster as seen below :-
● You can backup the k8s cluster info in ETCD database with
command :-
○ First run, export ETCDCTL_API=3 : set a enviromnent variable so
that you won’t have to type it in every etcdctl command.

○ CMD : snapshot save <snapshot-name>.db, provided by the
etcdctl utility.
◆ After running this command, a snapshot file is created in the
current directory.
◆ You can also save it somewhere else by providing full path.
○ To view the status of the captured snapshot :-
◆ snapshot status <snapshot-name>.db
○ To restore the snapshot :-
. Stop kube-apiserver service.

◆ systemctl stop kube-apiserver
◆ This is because restore process will require you to restart
the ETCD cluster and kube-apiserver depend on it.
. Restore the snapshot by running :-

◆ CMD : etcdctl snapshot restore /path/to/<snapshot-
name>.db --data-dir=/var/lib/etcd-from-backup
◆ Upon running this command a new data directory is
created.
◆ This initializes a new cluster configuration and configures
members of the ETCD cluster as new members to a new
cluster.
– This is to prevent a new member from accidentally
joining cluster.
. Configure the etcd service to use the new data directory.

◆ Edit --data-dir option and set it to /var/lib/etcd-from-
backup.
. Reload the service daemon and restart the etcd service

◆ systemctl daemon-reload
◆ systemctl restart etcd.service
. Restart kube-apiserver service

◆ systemctl start kube-apiserver
● NOTE : Remember to specify the endpoints(To the ETCD cluster),

cacert, cert and key files in every etcdctl commands.
● Some important etcdctl options include :-
etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
● Restoring ETCD database on an ETCD cluster is pretty similar, with

steps :-
. Extracting the ETCD backup to /var/lib/<etcd-backup-name>
with all necessary options such as :-
◆ endpoint IP
◆ cacert
◆ cert
◆ key
◆ data-dir
◆ path to the backup
. Configure the new extracted etcd backup directory’s owner and

group are etcd.
◆ chown -R etcd:etcd /var/lib/<etcd-backup-name>
. Edit the service file at /etc/systemd/system/etcd.service and

.
configure the new data-dir path.
. Reload daemon config and restarting etcd service.

◆ systemctl daemon-reload
◆ systemctl restart etcd.service

Section 5: Cluster Maintenance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Section 5: Cluster Maintenance

Uploaded by

Copyright:

Available Formats

Section 5 : Cluster maintenance

● When performing maintenance on nodes like upgrading a base

are terminated as they are considered dead.

considering it as dead is called pod-eviction-timeout.

default value of 5 minutes.

are daemonSets present on the node.

are not part of a replicaSet.

the above 2 are present on the node.

● Meaning of used are :

and recreated on other nodes.

uncordon it. With command

K8s software versions

– Released every few months with new features.

– Released more often with critical bug fixes.

○ Kubernetes follows a standard software release version procedure.

functionalities through a minor release.

this release the new features are disables by default and

enabled by default. This release leads to main stable

○ In any release, the core control-plane components usually have

K8s cluster upgrade introduction

● All core control-plane components must not have a version greater

version down the one of Kube-apiserver.

○ Kube-proxy and Kubelet can be two versions down the one of

● When should one upgrade the cluster ?

support for v1.25 and v1.24.

● Upgrading a cluster can be done by :-

● Upgrading a cluster involves two major steps :-

components are down.

master node, but pods cannot be recreated if they goes down.

. Upgrading the worker nodes.

. Strategy 1 : Upgrading all of them at once.

● Upgrading a cluster with kubeadm :-

. apt-get upgrade kubeadm=1.minor.0-00 : upgrade kubeadm

Kubelet on the master node.

on the master node since control-plane components

. apt-get upgrade kubelet=1.minor.0-00

○ Steps used upgrading the worker nodes include :-

. k drain <worker-node2> : move workload elsewhere and

○ There is a script for that in UbuntuVM

Backup and restore methods with ETCD

● You can create a backup of your k8s cluster either by :-

environment where you might not has access to ETCD cluster.

that you won’t have to type it in every etcdctl command.

. Stop kube-apiserver service.

the ETCD cluster and kube-apiserver depend on it.

. Restore the snapshot by running :-

. Configure the etcd service to use the new data directory.

. Reload the service daemon and restart the etcd service

. Restart kube-apiserver service

● NOTE : Remember to specify the endpoints(To the ETCD cluster),

● Restoring ETCD database on an ETCD cluster is pretty similar, with

. Configure the new extracted etcd backup directory’s owner and

. Edit the service file at /etc/systemd/system/etcd.service and

. Reload daemon config and restarting etcd service.

You might also like