You are on page 1of 7

Section 5 : Cluster maintenance

OS Upgrades

● When performing maintenance on nodes like upgrading a base


software or applying patches, you can take down nodes to perform
this tasks on them.
○ If a node is down for more than 5 minutes, the pods hosted on it

are terminated as they are considered dead.


◆ The time k8s wait for a pod to come back online before

considering it as dead is called pod-eviction-timeout.


◆ This option is set on the Kube-controller-manager with the

default value of 5 minutes.


○ If the pods were part of a replicaSet, they are recreated on other

nodes.
● Some of the commands used to temporarily take nodes down for
maintenance are:-
○ k drain <node-name>
◆ k drain --ignore-daemonsets <node-name> : incase there

are daemonSets present on the node.


◆ k drain --force <node-name> : incase there are pods which

are not part of a replicaSet.


◆ k drain --ignore-daemonsets --force <node-name> : incase

the above 2 are present on the node.

● Meaning of used are :


○ drain : when you drain a node pods on it as gracefully terminated

and recreated on other nodes.


◆ The node is also marked unschedulable(cordoned).
◆ Command used to mark node unschedulable is :-

– k cordon <node-name>
– Unlike drain it does not recreate pods on other nodes, it
simply makes sure that no new pods can be scheduled on
the node.
◆ This means no pods can be scheduled on the node unless you

uncordon it. With command


– k uncordon <node-name>

K8s software versions


● When we install k8s, we are installing a specific version of the API
Server. How does k8s project manage Kubernetes releases ?
○ The Kubernetes version (v1.26.2) consists of 3 parts :
◆ Major version : v1
◆ Minor version : 26

– Released every few months with new features.


◆ Patch version : 2

– Released more often with critical bug fixes.

○ Kubernetes follows a standard software release version procedure.


◆ Every few month it comes out with new features and

functionalities through a minor release.


◆ Stable releases commonly take the form of : v1.minor-

version.0
◆ v1.25.0
◆ v1.26.0 etc
◆ There are also alpha and beta releases :
◆ Alpha release : consist of bug fixes and improvements. In

this release the new features are disables by default and


may be buggy.
◆ Beta release : code is well tested, new features are

enabled by default. This release leads to main stable


release.

○ In any release, the core control-plane components usually have


the same version number except externally dependent apps like
ETCD and CoreDNS.

K8s cluster upgrade introduction

● All core control-plane components must not have a version greater


than that of the Kube-apiserver.
○ Kube-controller-manager and Kube-scheduler can be one

version down the one of Kube-apiserver.


◆ If the api-server was at v1.20, they can be v1.20 or v.19.

○ Kube-proxy and Kubelet can be two versions down the one of


Kube-apiserver.
◆ If the api-server was at v1.20, they can be v1.20, v.19 or v.18.
○ Kubectl can be one version up / down the Kube-apiserver.
◆ If the api-server was at v1.20, they can be v1.21, v.20 or v.19.

● When should one upgrade the cluster ?


○ K8s provide support for 2 lower versions from the current main

stable version.
○ If the current stable version is v1.26.0 then K8s only provides

support for v1.25 and v1.24.

● Upgrading a cluster can be done by :-


. Using kubeadm for a cluster deployed using kubeadm.
. Updating each component of the cluster, if the cluster was built
from scratch. (a.k.a The Hard Way)

● Upgrading a cluster involves two major steps :-


. Upgrading the control-plane / master nodes.
◆ When upgrading the master node all control-plane

components are down.


◆ Apps hosted on worker nodes are not affected when upgrading

master node, but pods cannot be recreated if they goes down.

. Upgrading the worker nodes.


◆ There are 3 strategies when upgrading worker nodes :

. Strategy 1 : Upgrading all of them at once.


– That is to shut them down and then bring them back up
after performing upgrades.
– This strategy requires application downtime for the
duration of the upgrade.
– Once the upgrade is complete, nodes are back up, new
pods are scheduled and users can resume access.
. Strategy 2 : Upgrading one node at a time.
– Each node is first drained of all workloads.
– Workloads from the node are recreated on other nodes.
– Node is upgraded and then brought back for new pods
to be scheduled.
. Strategy 3 : Upgraded nodes are added to the cluster.
– Workloads are shifted to a new upgraded node.
– After workloads are rescheduled to the new node, the
node with older version is decommissioned.
– This is convenient on a cloud environment where you
can easily provision new nodes and decommission old
ones.

● Upgrading a cluster with kubeadm :-


○ Steps used upgrading the master node include :-
. kubeadm upgrade plan
– Command to see current versions of kubeadm, control-
plane components and version they can be upgraded to.
– It also tells you that after upgrading all components, you
should manually upgrade Kubelet version on each node in
the cluster since kubeadm does not install or upgrade
Kubelets.
– You must upgrade kubeadm tool itself before upgrading
the cluster.

. apt-get upgrade kubeadm=1.minor.0-00 : upgrade kubeadm


to one minor version above the current one.
. kubeadm upgrade apply v1.minor.0 : use the command in
the kubeadm upgrade plan output to upgrade the cluster.
– This pull the necessary images and upgrade the cluster
components.
– Once complete your components are now at the version
provided in the command.
– But if you run k get nodes command you’ll see that the
previous version is shown.
◆ This is because, the output shows the version of

Kubelet on the master node.


◆ In a cluster setup with kubeadm, Kubelet are present

on the master node since control-plane components


run as pods.

. apt-get upgrade kubelet=1.minor.0-00


kubectl=1.minor.0-00 : upgrade the Kubelet and Kubectl on
the master server.
. systemctl restart kubelet : restart the Kubelet service.
. k get nodes : to conform if the Kubelet was upgraded.

○ Steps used upgrading the worker nodes include :-


. k drain <worker-node1> : move workload elsewhere and mark
node1 unschedulable to perform upgrades as follows :-
◆ apt-get upgrade kubeadm=1.minor.0-00
◆ apt-get upgrade kubelet=1.minor.0-00
◆ kubeadm upgrade node config --kubelet-version

v1.minor.0
◆ systemctl restart kubelet
◆ k uncordon <worker-node1>

. k drain <worker-node2> : move workload elsewhere and


mark node2 unschedulable to perform upgrades as follows :-
◆ apt-get upgrade kubeadm=1.minor.0-00
◆ apt-get upgrade kubelet=1.minor.0-00
◆ kubeadm upgrade node config --kubelet-version
v1.minor.0
◆ systemctl restart kubelet
◆ k uncordon <worker-node2>

○ There is a script for that in UbuntuVM


○ Github Documentation coming soon…

Backup and restore methods with ETCD

● You can create a backup of your k8s cluster either by :-


. querying the kube-apiserver
◆ k get all --all-namespaces -o yaml > all-deployment-

services.yaml
◆ This is ideal in a managed k8s environment such as in a cloud

environment where you might not has access to ETCD cluster.


. Taking snapshots of the entire ETCD cluster as seen below :-

● You can backup the k8s cluster info in ETCD database with
command :-
○ First run, export ETCDCTL_API=3 : set a enviromnent variable so

that you won’t have to type it in every etcdctl command.


○ CMD : snapshot save <snapshot-name>.db, provided by the

etcdctl utility.
◆ After running this command, a snapshot file is created in the

current directory.
◆ You can also save it somewhere else by providing full path.
○ To view the status of the captured snapshot :-
◆ snapshot status <snapshot-name>.db
○ To restore the snapshot :-

. Stop kube-apiserver service.


◆ systemctl stop kube-apiserver
◆ This is because restore process will require you to restart

the ETCD cluster and kube-apiserver depend on it.

. Restore the snapshot by running :-


◆ CMD : etcdctl snapshot restore /path/to/<snapshot-

name>.db --data-dir=/var/lib/etcd-from-backup
◆ Upon running this command a new data directory is

created.
◆ This initializes a new cluster configuration and configures
members of the ETCD cluster as new members to a new
cluster.
– This is to prevent a new member from accidentally
joining cluster.

. Configure the etcd service to use the new data directory.


◆ Edit --data-dir option and set it to /var/lib/etcd-from-

backup.

. Reload the service daemon and restart the etcd service


◆ systemctl daemon-reload
◆ systemctl restart etcd.service

. Restart kube-apiserver service


◆ systemctl start kube-apiserver

● NOTE : Remember to specify the endpoints(To the ETCD cluster),


cacert, cert and key files in every etcdctl commands.
● Some important etcdctl options include :-

etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \

● Restoring ETCD database on an ETCD cluster is pretty similar, with


steps :-
. Extracting the ETCD backup to /var/lib/<etcd-backup-name>
with all necessary options such as :-
◆ endpoint IP
◆ cacert
◆ cert
◆ key
◆ data-dir
◆ path to the backup

. Configure the new extracted etcd backup directory’s owner and


group are etcd.
◆ chown -R etcd:etcd /var/lib/<etcd-backup-name>

. Edit the service file at /etc/systemd/system/etcd.service and


.
configure the new data-dir path.

. Reload daemon config and restarting etcd service.


◆ systemctl daemon-reload
◆ systemctl restart etcd.service

You might also like