Openstack Ceph Admin

Openstack/Ceph
Admin
Installation & Deployment
OBJECTIFS DE LA FORMATION : OPENSTACK
● Connaître le fonctionnement du projet OpenStack et ses possibilités
● Comprendre le fonctionnement de chacun des composants d’OpenStack
● Pouvoir faire les bons choix de configuration
● Savoir déployer manuellement un cloud OpenStack pour fournir du IaaS
● Connaître les bonnes pratiques de déploiement d’OpenStack
● Être capable de déterminer l’origine d’une erreur dans OpenStack
● Savoir réagir face à un problème
PRÉ-REQUIS DE LA FORMATION
● Compétences d’administration système Linux tel que CentOS
○ Gestion des paquets
○ Manipulation de fichiers de configuration et de services
○ LVM (Logical Volume Management) et systèmes de fichiers
○ Ansible
○ Docker
● Notions :
○ Virtualisation : KVM (Kernel-Based Virtual Machine), libvirt
○ Réseau : iptables, namespaces
○ SQL
Ansible Linux Automation
What can I do using Ansible?
Ansible automates technologies you use
Inventory
● Ansible works against multiple systems in an inventory
● Inventory is usually file based
● Can have multiple groups
● Can have variables for each group or even host
# Static inventory example:
[myservers]
10.42.0.2
10.42.0.6
10.42.0.7
Understanding Inventory - Basic
Understanding Inventory - Variables
Understanding Inventory - Groups
Ad-Hoc Commands PING
The Ansible Command
Ad-Hoc Commands
-m MODULE_NAME, --module-name=MODULE_NAME
Module name to execute the ad-hoc command
-a MODULE_ARGS, --args=MODULE_ARGS
Module arguments for the ad-hoc command
-b, --become
Run ad-hoc command with elevated rights such as sudo, the default method
-e EXTRA_VARS, --extra-vars=EXTRA_VARS
Set additional variables as key=value or YAML/JSON
Example
An Ansible Playbook
An Ansible Playbook (2)
An Ansible Playbook (3)
Running an Ansible Playbook
The most important colors of Ansible
Running an Ansible Playbook
An Ansible Playbook Variable Example
Facts
Just like variables, but coming from the host itself
Gather facts on target machine
Conditionals via VARS
Conditionals with facts
Variables & Templates
Roles
● Roles: Think Ansible packages
● Roles provide Ansible with a way to load tasks, handlers, and variables
from separate files.
● Roles group content, allowing easy sharing of code with others
● Roles make larger projects more manageable
● Roles can be developed in parallel by different administrators
Role structure
Defaults: default variables with lowest precedence (e.g. port)
Handlers: contains all handlers
Meta: role metadata including dependencies to other roles
Tasks: plays or tasks
Tip: It’s common to include tasks in main.yml with “when” (e.g.

OS==xyz)
Templates: templates to deploy
Tests: place for playbook tests
Vars: variables (e.g. override port)

Docker
Docker
Docker
Docker
Docker
Docker
Docker
Docker
Docker
Ceph Storage
The Ceph architecture
● Ceph Storage cluster is a distributed data object store designed to provide
excellent performance, reliability and scalability.
● Distributed object stores are the future of storage, because they
accommodate unstructured data, and because clients can use modern
object interfaces and legacy interfaces simultaneously.
○ APIs in many languages (C/C++, Java, Python)
○ RESTful interfaces (S3/Swift)
○ Block device interface
○ Filesystem interface
SDS & HCI Market
Ceph components
● Ceph OSD Daemon: Ceph OSDs store data on behalf of Ceph clients.
Additionally, Ceph OSDs utilize the CPU, memory and networking of Ceph
nodes to perform data replication, erasure coding, rebalancing, recovery,
monitoring and reporting functions.
● Ceph Monitor: A Ceph Monitor maintains a master copy of the Ceph
Storage cluster map with the current state of the Ceph Storage cluster.
● Ceph Manager: The Ceph Manager maintains detailed information about
placement groups, process metadata and host metadata in lieu of the
Ceph Monitor—significantly improving performance at scale. The Ceph
Manager handles execution of many of the read-only Ceph CLI queries,
such as placement group statistics. The Ceph Manager also provides the
RESTful monitoring APIs.
Ceph components (2)
● Ceph MDS: Ceph Metadata Server (MDS) manages metadata related to
files stored on the Ceph File System (CephFS). The Ceph MDS daemon
also coordinates access to the shared storage cluster.
● Ceph Object Gateway: Ceph Object Gateway is an object storage
interface built on top of librados to provide applications with a RESTful
access point to the Ceph storage cluster. The Ceph Object Gateway
supports two interfaces:
○ S3: Provides object storage functionality with an interface that is compatible
with a large subset of the Amazon S3 RESTful API.
○ Swift: Provides object storage functionality with an interface that is
compatible with a large subset of the OpenStack Swift API.
Ceph Architecture
Ceph Client
Ceph client interfaces read data from and write data to the Ceph Storage
cluster. Clients need the following data to communicate with the Ceph
Storage cluster:
● The Ceph configuration file, or the cluster name (usually ceph) and the
monitor address.
● The pool name.
● The user name and the path to the secret key.
Ceph Client (2)
Ceph clients maintain object IDs and the pool names where they store the
objects. However, they do not need to maintain an object-to-OSD index or
communicate with a centralized object index to lookup object locations. To
store and retrieve data, Ceph clients access a Ceph Monitor and retrieve the
latest copy of the Ceph Storage cluster map. Then, Ceph clients provide an
object name and pool name to librados, which computes an object’s
placement group and the primary OSD for storing and retrieving data using
the CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The
Ceph client connects to the primary OSD where it may perform read and write
operations. There is no intermediary server, broker or bus between the client
and the OSD.
Ceph Client (3)
When an OSD stores data, it receives data from a Ceph client—whether the
client is a Ceph Block Device, a Ceph Object Gateway, a Ceph Filesystem or
another interface—and it stores the data as an object.
Ceph OSDs store all data as objects in a flat namespace. There are no
hierarchies of directories. An object has a cluster-wide unique identifier,
binary data, and metadata consisting of a set of name/value pairs.
Ceph Cluster
A Ceph Storage cluster can have a large number of Ceph nodes for limitless
scalability, high availability and performance.
● Write and read data

● Compress data
● Ensure durability by replicating or erasure coding data
● Monitor and report on cluster health—also called 'heartbeating'
● Redistribute data dynamically—also called 'backfilling'
● Ensure data integrity
● Recover from failures.
Ceph pools
The Ceph storage cluster stores data objects in logical partitions called 'Pools.' Ceph
administrators can create pools for particular types of data, such as for block devices, object
gateways, or simply just to separate one group of users from another.
● Pool Type: Ceph can maintain multiple copies of an object, or it can use erasure coding
to ensure durability.
● Placement Groups: a Ceph pool might store millions of data objects or more. Ceph
must handle many types of operations, including data durability via replicas or erasure
code, data integrity by scrubbing or CRC checks, replication, rebalancing and recovery.
Consequently, managing data on a per-object basis presents a scalability and
performance bottleneck. Ceph addresses this bottleneck by sharding a pool into
placement groups. System administrators set the placement group count when
creating or modifying a pool.
Ceph pools (2)
● CRUSH Ruleset: CRUSH can detect failure domains. CRUSH organize
OSDs hierarchically into nodes, chassis, and racks. CRUSH enables Ceph
OSDs to store object copies across failure domains.
● Durability: Ceph provides high data durability in two ways:
○ Replica pools will store multiple deep copies of an object using the CRUSH
failure domain to physically separate one data object copy from another.
○ Erasure coded pools store each object as K+M chunks, where K represents
data chunks and M represents coding chunks.
Ceph replication
ERASURE CODE
Ceph authentication
To identify users and protect against man-in-the-middle attacks, Ceph
provides its cephx authentication system, which authenticates users and
daemons.
Cephx uses shared secret keys for authentication, meaning both the client and
the monitor cluster have a copy of the client’s secret key.
Ceph placement groups
A PG is a subset of a pool that serves to contain a collection of objects. Ceph
shards a pool into a series of PGs.
When a system administrator creates a pool, CRUSH creates a user-defined

number of PGs for the pool. For example, 100 PGs per OSD per pool would
mean that each PG contains approximately 1% of the pool’s data.
The number of PGs has a performance impact when Ceph needs to move a PG
from one OSD to another OSD.
Ceph CRUSH ruleset
Ceph assigns a CRUSH ruleset to a pool. When a Ceph client stores or retrieves
data in a pool, Ceph identifies the CRUSH ruleset for storing and retrieving
data. As Ceph processes the CRUSH rule, it identifies the primary OSD that
contains the placement group for an object. That enables the client to connect
directly to the OSD, access the placement group and read or write object data.
Ceph ObjectStore
Ceph implements several concrete methods for storing data:
● FileStore: A production grade implementation using a filesystem to store

object data.
● BlueStore: A production grade implementation using a raw block device to
store object data.
● Memstore: A developer implementation for testing read/write operations
directly in RAM.
Ceph BlueStore
BlueStore stores data as:
● Object Data: In BlueStore, Ceph stores objects as blocks directly on a raw

block device.
● Block Database: An object’s unique identifier is a key in the block
database. The values in the block database consist of a series of block
addresses that refer to the stored object data, the object’s placement
group, and object metadata.
● Write-ahead Log (WAL): similar to the journaling functionality of
FileStore.
Installing Ceph Storage on CentOS Linux
Ceph Storage considerations
Ceph Storage cluster consideration.
● Understanding the hardware and network requirements,
● understanding what type of workloads work well with a Ceph Storage
cluster.
● Ceph Storage can be used for different workloads based on a particular
business need.
● Developing a storage strategy for the data.
● Ceph Storage can support multiple storage strategies (IOPS-optimized,
capacity-optimized).
● Data durability (Replication, Erasure coding)
● Faster is better. Bigger is better. High durability is better
Docker local registry
Install Docker:
# dnf install docker-ce --nobest -y
# systemctl enable --now docker
Start a local container registry:
# docker run -d -p 5000:4000 --restart=always --name registry registry:2
Ceph Docker images
Pull the Ceph Storage image, Prometheus image, and Dashboard image:
# docker pull docker.io/ceph/daemon:latest-pacific
# docker pull docker.io/grafana/grafana:6.7.4
# docker pull docker.io/ceph/ceph-grafana:latest
# docker pull docker.io/prom/alertmanager:v0.16.2
# docker pull docker.io/prom/prometheus:v2.7.2
# docker pull docker.io/prom/node-exporter:v0.17.0

Docker Configuration
/etc/docker/daemon.json
{
"log-opts": {
"max-file": "5",
"max-size": "50m"
},
"insecure-registries" : ["192.168.1.15:4000", "10.10.1.15:4000"],
"live-restore": true
}
Docker Tag
docker image tag ceph/daemon:latest-pacific 10.20.1.33:4000/ceph/daemon:latest-pacific
docker image tag grafana/grafana:6.7.4 10.20.1.33:4000/grafana/grafana:6.7.4
docker image tag ceph/ceph-grafana:latest 10.20.1.33:4000/ceph/ceph-grafana:latest
docker image tag prom/alertmanager:v0.16.2 10.20.1.33:4000/prom/alertmanager:v0.16.2
docker image tag prom/prometheus:v2.7.2 10.20.1.33:4000/prom/prometheus:v2.7.2
docker image tag prom/node-exporter:v0.17.0 10.20.1.33:4000/prom/node-exporter:v0.17.0

Docker Push
docker image push 10.20.1.33:4000/ceph/daemon:latest-pacific
docker image push 10.20.1.33:4000/grafana/grafana:6.7.4
docker image push 10.20.1.33:4000/ceph/ceph-grafana:latest
docker image push 10.20.1.33:4000/prom/alertmanager:v0.16.2
docker image push 10.20.1.33:4000/prom/prometheus:v2.7.2
docker image push 10.20.1.33:4000/prom/node-exporter:v0.17.0

Verifying the network configuration for Ceph Storage
All Ceph Storage nodes require a public network. You must have a network
interface card configured to a public network where Ceph clients can reach
Ceph monitors and Ceph OSD nodes.
You might have a network interface card for a cluster network so that Ceph
can conduct heart-beating, peering, replication, and recovery on a network
separate from the public network.
Ansible
Ansible installation:
# dnf install python3-devel libffi-devel gcc openssl-devel python3-libselinux
# pip3 install setuptools
# pip3 install setuptools-rust
# pip3 install wheel
# pip3 install ansible==2.9.27

Ansible user with sudo access
Ansible must be able to log into all the Ceph Storage nodes as a user that has
root privileges to install software and create configuration files without
prompting for a password. You must create an Ansible user with
password-less root access on all nodes in the storage cluster when deploying
and configuring a Ceph Storage cluster with Ansible.
# cat << EOF >/etc/sudoers.d/admin
admin ALL = (root) NOPASSWD:ALL
EOF
# chmod 0440 /etc/sudoers.d/admin
Enabling password-less SSH for Ansible
Generate an SSH key pair on the Ansible administration node and distribute
the public key to each node in the storage cluster so that Ansible can access
the nodes without being prompted for a password.
[ansible@admin ~]$ ssh-keygen
[ansible@admin ~]$ ssh-copy-id admin@ceph-mon01

Installing Ceph Storage using Ansible
Use the Ansible application with the ceph-ansible playbook to install Ceph
Storage on bare-metal or in containers. Using a Ceph storage clusters in
production must have a minimum of three monitor nodes and three OSD
nodes containing multiple OSD daemons.
Ceph-Ansible
Install ceph-ansible via packages
[root@admin ~]# dnf install ceph-ansible

[root@admin ~]# cd /usr/share/ceph-ansible
[root@admin ceph-ansible]# cp group_vars/all.yml.sample group_vars/all.yml
[root@admin ceph-ansible]# cp group_vars/osds.yml.sample group_vars/osds.yml
[root@admin ceph-ansible]# cp site-container.yml.sample site-container.yml
Ceph-Ansible (2)
Install ceph-ansible via Git
$ git clone https://github.com/ceph/ceph-ansible.git

$ git checkout $branch
$ pip install -r requirements.txt
Ceph-ansible Configuration
● Open for editing the group_vars/all.yml file
monitor_interface: bond1.10
monitor_address_block: 10.10.1.0/24
is_hci: true
hci_safety_factor: 0.2
osd_memory_target: 4294967296
public_network: 10.10.1.0/24
cluster_network: 10.10.2.0/24
Ceph-ansible Configuration (2)
radosgw_interface: bond1.10
radosgw_address_block: 10.10.1.0/24
ceph_docker_image: "ceph/daemon"
ceph_docker_image_tag: latest-pacific
ceph_docker_registry: 10.10.1.15:4000
containerized_deployment: True
Ceph-ansible Configuration for Openstack
openstack_config: true
openstack_glance_pool:
name: "images"
pg_autoscale_mode: False
application: "rbd"
pg_num: 128
pgp_num: 128
target_size_ratio: 5.00
rule_name: "SSD"
Ceph-ansible Configuration for Openstack (2)
openstack_cinder_pool:
name: "volumes"
application: "rbd"
pg_num: 1024
pgp_num: 1024
rule_name: "SSD"
openstack_nova_pool:
name: "vms"
application: "rbd"
pg_num: 256
pgp_num: 256
rule_name: "SSD"
openstack_cinder_backup_pool:
name: "backups"
application: "rbd"
pg_num: 512
pgp_num: 512
rule_name: "SSD"
openstack_gnocchi_pool:
name: "metrics"
application: "rbd"
pg_num: 32
pgp_num: 32
rule_name: "SSD"
openstack_cephfs_data_pool:
name: "cephfs_data"
application: "cephfs"
pg_num: 256
pgp_num: 256
rule_name: "SSD"
openstack_cephfs_metadata_pool:
name: "cephfs_metadata"
application: "cephfs"
pg_num: 32
pgp_num: 32
rule_name: "SSD"
openstack_pools:
- "{{ openstack_glance_pool }}"
- "{{ openstack_cinder_pool }}"
- "{{ openstack_nova_pool }}"
- "{{ openstack_cinder_backup_pool }}"
- "{{ openstack_gnocchi_pool }}"
- "{{ openstack_cephfs_data_pool }}"
- "{{ openstack_cephfs_metadata_pool }}"
openstack_keys:
- { name: client.glance, caps: { mon: "profile rbd", osd: "profile rbd pool={{
openstack_cinder_pool.name }}, profile rbd pool={{ openstack_glance_pool.name }}"}, mode: "0600" }
- { name: client.cinder, caps: { mon: "profile rbd", osd: "profile rbd pool={{
openstack_cinder_pool.name }}, profile rbd pool={{ openstack_nova_pool.name }}, profile rbd pool={{
openstack_glance_pool.name }}"}, mode: "0600" }
- { name: client.cinder-backup, caps: { mon: "profile rbd", osd: "profile rbd pool={{
openstack_cinder_backup_pool.name }}"}, mode: "0600" }
- { name: client.gnocchi, caps: { mon: "profile rbd", osd: "profile rbd pool={{
openstack_gnocchi_pool.name }}"}, mode: "0600", }
- { name: client.openstack, caps: { mon: "profile rbd", osd: "profile rbd pool={{
openstack_glance_pool.name }}, profile rbd pool={{ openstack_nova_pool.name }}, profile rbd pool={{
openstack_cinder_pool.name }}, profile rbd pool={{ openstack_cinder_backup_pool.name }}"}, mode:
"0600" }
Ceph-ansible Configuration for Dashboard
dashboard_enabled: True
dashboard_protocol: https
dashboard_port: 8443
dashboard_network: "192.168.1.0/24"
dashboard_admin_user: admin
dashboard_admin_user_ro: true
dashboard_admin_password: ertyuiop
Ceph-ansible Configuration for Dashboard (2)
dashboard_crt: '/root/work/site-central/chaininv.crt'
dashboard_key: '/root/work/site-central/cloud_cerist_dz.priv'
dashboard_grafana_api_no_ssl_verify: false
dashboard_frontend_vip: '192.168.1.1'
node_exporter_container_image:
"10.10.1.15:4000/prom/node-exporter:v0.17.0"
Ceph-ansible Configuration for Grafana
grafana_admin_password: ertyuiop
grafana_crt: '/root/work/site-central/chaininv.crt'
grafana_key: '/root/work/site-central/cloud_cerist_dz.priv'
grafana_server_fqdn: 'grafanasrv.cloud.cerist.dz'
grafana_container_image: "10.10.1.15:4000/grafana/grafana:6.7.4"
prometheus_container_image: "10.10.1.15:4000/prom/prometheus:v2.7.2"
alertmanager_container_image: "10.10.1.15:4000/prom/alertmanager:v0.16.2"
Ceph-ansible OSD Configuration
● Open for editing the group_vars/osds.yml file
copy_admin_key: true
devices:
- /dev/nvme0n1
- /dev/nvme1n1
- /dev/nvme2n1
- /dev/nvme3n1
Ceph-ansible OSD Configuration (2)
crush_rule_config: true
crush_rule_ssd:
name: SSD
root: default
type: chassis
class: ssd
default: true
Ceph-ansible OSD Configuration (3)
crush_rules:
- "{{ crush_rule_ssd }}"
create_crush_tree: true
Ceph-ansible inventory
● /etc/ansible/hosts
[mgrs]
ceph-mona
ceph-monb
ceph-monc
[mons]
ceph-mona
ceph-monb
ceph-monc
Ceph-ansible inventory (2)
[osds]
computehci01 osd_crush_location="{ 'root': 'default', 'rack': 'rack1', 'chassis': 'chassis1',

'host': 'computehci01' }"



Ceph-ansible inventory (3)
[grafana-server]
ceph-mona
● Verify that Ansible can reach the Ceph nodes:
[ansible@admin ceph-ansible]$ ansible all -m ping -i /etc/ansible/hosts

Ceph container deployment
Container deployments:
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml -e

container_package_name=docker-ce -e container_binary=docker -e
container_service_name=docker -i /etc/ansible/hosts
Ceph container deployment (2)
Verify the status of the Ceph storage cluster
[root@mon ~]# docker exec ceph-mon-controllera ceph health
[root@mon ~]# docker exec ceph-mon-controllera ceph -s
● To mute health warnings
[root@mon ~]# docker exec ceph-mon-controllera ceph config set mon

auth_allow_insecure_global_id_reclaim false
Installing Metadata servers
● Add a new section [mdss] to the /etc/ansible/hosts file
[mdss]
ceph-mona
ceph-monb
ceph-monc
● Create a copy of the group_vars/mdss.yml.sample file named mdss.yml
# cp group_vars/mdss.yml.sample group_vars/mdss.yml
$ ansible-playbook site-container.yml --limit mdss -i /etc/ansible/hosts
Installing the Ceph Object Gateway
● Add gateway hosts to the /etc/ansible/hosts file under the [rgws] section
to identify their roles to Ansible
[rgws]
ceph-mona
ceph-monb
ceph-monc
● Create the rgws.yml file from the sample file
# cp group_vars/rgws.yml.sample group_vars/rgws.yml
Installing the Ceph Object Gateway (2)
● Open and edit the group_vars/rgws.yml file
$ ansible-playbook site-container.yml --limit rgws -i /etc/ansible/hosts

Installing the NFS-Ganesha Gateway
The Ceph NFS Ganesha Gateway is an NFS interface built on top of the Ceph
Object Gateway to provide applications with a POSIX filesystem interface to
the Ceph Object Gateway
● Create the nfss.yml file from the sample file:
# cd /usr/share/ceph-ansible/group_vars
# cp nfss.yml.sample nfss.yml
Installing the NFS-Ganesha Gateway (2)
● Add gateway hosts to the /etc/ansible/hosts file under an [nfss] group to
identify their group membership to Ansible
[nfss]
Ceph-mona
Ceph-monb
● Open nfss.yml
$ ansible-playbook site-container.yml --limit nfss -i hosts
Adding osd
Adding new OSD(s) on an existing host or adding a new OSD node can be
achieved by running the main playbook with the --limit ansible option
The command used would be like following:
# ansible-playbook -i <your-inventory> site-container.yml --limit <node>

Shrinking osd
Shrinking OSDs can be done by using the shrink-osd.yml playbook provided in
infrastructure-playbooks directory
The variable osd_to_kill is a comma separated list of OSD IDs which must be
passed to the playbook
$ ansible-playbook -i hosts infrastructure-playbooks/shrink-osd.yml -e

osd_to_kill=1,2,3
Purging the cluster
ceph-ansible provides a playbook in infrastructure-playbooks for purging a
Ceph cluster: purge-container-cluster.yml
$ ansible-playbook -i hosts infrastructure-playbooks/purge-container-cluster.yml
Upgrading the ceph cluster
ceph-ansible provides a playbook in infrastructure-playbooks for upgrading a
Ceph cluster: rolling_update.yml
This playbook could be used for both minor upgrades (X.Y to X.Z) or major
upgrades (X to Y)
Before running a major upgrade you need to update the ceph-ansible version
first
$ ansible-playbook -i hosts infrastructure-playbooks/rolling_update.yml

OPENSTACK : LE PROJET
Introduction Openstack
The OpenStack project is an open source cloud computing platform for all
types of clouds, which aims to be simple to implement, massively scalable,
and feature rich. Developers and cloud computing technologists from around
the world create the OpenStack project.
OpenStack provides an Infrastructure-as-a-Service (IaaS) solution through a

set of interrelated services. Each service offers an Application Programming
Interface (API) that facilitates this integration. Depending on your needs, you
can install some or all services.
OpenStack Services
OpenStack Services (2)
Deployment Tools
User Stories
City Network currently runs well over 10,000 VMs in eight different locations from
New York to Tokyo. Our focus is the European enterprise where many times
regulatory challenges put a little extra work around how each workload is handled.
Vexxhost has a public cloud spanning over two regions, as well as numerous private
clouds that have been deployed and managed all over the world. Overall, they’ve
started managing an aggregate of over 100,000 cores.
Blizzard Entertainment currently has 12,000 compute nodes on OpenStack

distributed globally, and even managed to upgrade five releases in one jump in 2019
to start using Rocky. The team at Blizzard is also dedicated to contributing upstream.
User Stories (2)
Walmart’s OpenStack private cloud has over 800,000 cores. Its team developed a
tool called Galaxy for its multi-cloud setup that minimizes the time to detect issues
on OpenStack clouds.
OVH is an OpenStack powered public cloud provider managing 27 datacenters
running over 300,000 cores in production.
CERN: After six years and 13 upgrades, the CERN cloud now covers 11 OpenStack
projects adding containers, bare metal, block, share, workflows, networking and file
system storage.
SK Telecom runs many different open infrastructure clusters based on TACO (SKT All
Container Orchestrator, it is a containerized, declarative, cloud infrastructure
lifecycle manager fully leveraging Kubernetes, OpenStack and Airship).
User Stories (3)
Adobe IT has five OpenStack clusters spread across three locations in North America
and Asia. Of these clusters, three are in production. Over the last five years it grew
1000% and presently hosts 13,000+ VMs on 500+ physical hypervisors. Their Kubernetes
implementations grew exponentially in the last few years and now account for
thousands of nodes.
Openstack : Kolla-ansible
Introduction to Kolla and Kolla-Ansible
Kolla and Kolla-Ansible are related OpenStack projects for building and
running containerized OpenStack clouds.
● Builds container images for OpenStack services.

● Follows application container pattern.
● One application per container (neutron-server, neutron-dhcp-agent, etc).
● Container images can be built from upstream source or binary (yum/apt).
● During builds images can be fully modified, as needed.
Kolla-Ansible
Kolla provides container images, but does not deploy or configure the services
running in containers.
Kolla-Ansible project provides opinionated (but fully customizable and
extensible) Ansible playbooks for operators to deploy OpenStack private
clouds.
● Playbooks deploy and manage OpenStack services running in containers.
● Broad support for the variety of OpenStack services.
● Template as little as possible to get to a functional deployment (avoid
"customization madness").
● Instead, use a clean and simple default configuration override approach.
Install dependencies
Install Python build dependencies:
# sudo dnf install python3-devel libffi-devel gcc openssl-devel python3-libselinux
Install pip:
# sudo dnf install python3-pip
Ensure the latest version of pip is installed:
# sudo pip3 install -U pip
Install Ansible. Kolla Ansible requires at least Ansible 2.9
# sudo dnf install ansible==2.9
Install Kolla-ansible
Install kolla-ansible and its dependencies using pip:
# sudo pip3 install kolla-ansible==12.0 (For wallaby version)
Create the /etc/kolla directory:
# sudo mkdir -p /etc/kolla
# sudo chown $USER:$USER /etc/kolla

Install Kolla-ansible
Copy globals.yml and passwords.yml to /etc/kolla directory:
# cp -r /usr/local/share/kolla-ansible/etc_examples/kolla/* /etc/kolla
Copy multinode inventory files to the current directory:
# cp /usr/local/share/kolla-ansible/ansible/inventory/* .
Prepare initial configuration
The next step is to prepare our inventory file. An inventory is an Ansible file
where we specify hosts and the groups that they belong to. We can use this to
define node roles and access credentials.
Inventory
● Edit the first section of multinode
[control]
192.168.2.23
192.168.2.27
192.168.2.31
[network]
192.168.2.23
192.168.2.31
[compute]
192.168.2.2
192.168.2.3
192.168.2.4
Inventory (2)
● Check whether the configuration of inventory is correct or not, run:
# ansible -i multinode all -m ping

Kolla passwords
Passwords used in our deployment are stored in /etc/kolla/passwords.yml file.
All passwords are blank in this file and have to be filled either manually or by
running random password generator:
# kolla-genpwd
Kolla options: globals.yml
globals.yml is the main configuration file for Kolla Ansible. There are a few
options that are required to deploy Kolla Ansible:
###############
# Kolla options
###############
# Valid options are ['centos', 'debian', 'rhel', 'ubuntu']
kolla_base_distro: "centos"
# Valid options are [ binary, source ]
kolla_install_type: "source"
openstack_release: "wallaby"
kolla_internal_vip_address: "10.10.3.1"
kolla_internal_fqdn: "dashint.cloud.cerist.dz"
kolla_external_vip_address: "193.194.66.1"
kolla_external_fqdn: "dash.cloud.cerist.dz"
Docker options: globals.yml
docker_registry: 192.168.1.15:4000
#docker_registry_insecure: "{{ 'yes' if docker_registry else 'no' }}"
#docker_registry_username:
# docker_registry_password is set in the passwords.yml file.
# Namespace of images:
#docker_namespace: "kolla"
Networking Options: globals.yml
network_interface: "bond0"
kolla_external_vip_interface: "bond1"
api_interface: "bond1.30"
storage_interface: "bond1.10"
tunnel_interface: "bond1.40"
neutron_external_interface: "bond2"
# Valid options are [ openvswitch, ovn, linuxbridge, vmware_nsxv, vmware_nsxv3,

vmware_dvs ]
neutron_plugin_agent: "openvswitch"
keepalived options: globals.yml
keepalived_virtual_router_id: "51"
TLS options: globals.yml
kolla_enable_tls_internal: "yes"
kolla_enable_tls_external: "yes"
# node_config=/etc/kolla
kolla_certificates_dir: "{{ node_config }}/certificates"
kolla_external_fqdn_cert: "{{ kolla_certificates_dir }}/haproxy.pem"
kolla_internal_fqdn_cert: "{{ kolla_certificates_dir }}/haproxy-internal.pem"
kolla_admin_openrc_cacert: "{{ kolla_certificates_dir }}/ca.pem"
kolla_copy_ca_into_containers: "yes"
Backend TLS options: globals.yml
kolla_enable_tls_backend: "yes"
kolla_verify_tls_backend: "no"
kolla_tls_backend_cert: "{{ kolla_certificates_dir }}/backend-cert.pem"
kolla_tls_backend_key: "{{ kolla_certificates_dir }}/backend-key.pem"

OpenStack options: globals.yml
# Enable core OpenStack services. This includes: glance, keystone, neutron, nova, heat,
and horizon.
enable_openstack_core: "yes"
enable_glance: "{{ enable_openstack_core | bool }}"
enable_hacluster: "yes"
enable_haproxy: "yes"
enable_aodh: "yes"
enable_barbican: "yes"
enable_ceilometer: "yes"
OpenStack options (2) : globals.yml
enable_cinder: "yes"
enable_cinder_backup: "yes"
enable_designate: "yes"
enable_gnocchi: "yes"
enable_gnocchi_statsd: "yes"
enable_magnum: "yes"
enable_manila: "yes"
enable_manila_backend_generic: "yes"
enable_mariabackup: "yes"
OpenStack options (3) : globals.yml
enable_masakari: "yes"
enable_neutron_vpnaas: "yes"
enable_neutron_qos: "yes"
enable_neutron_agent_ha: "yes"
enable_neutron_provider_networks: "yes"
enable_neutron_segments: "yes"
enable_octavia: "yes"
enable_trove: "yes"
Ceph options : globals.yml
external_ceph_cephx_enabled: "yes"
# Glance
ceph_glance_keyring: "ceph.client.glance.keyring"
ceph_glance_user: "glance"
ceph_glance_pool_name: "images"
Ceph options (2) : globals.yml
# Cinder
ceph_cinder_keyring: "ceph.client.cinder.keyring"
ceph_cinder_user: "cinder"
ceph_cinder_pool_name: "volumes"
ceph_cinder_backup_keyring: "ceph.client.cinder-backup.keyring"
ceph_cinder_backup_user: "cinder-backup"
ceph_cinder_backup_pool_name: "backups"
# Nova
ceph_nova_keyring: "{{ ceph_cinder_keyring }}"
ceph_nova_user: "cinder"
ceph_nova_pool_name: "vms"
# Gnocchi
ceph_gnocchi_keyring: "ceph.client.gnocchi.keyring"
ceph_gnocchi_user: "gnocchi"
ceph_gnocchi_pool_name: "metrics"
# Manila
ceph_manila_keyring: "ceph.client.manila.keyring"
ceph_manila_user: "manila"
# Glance - Image Options
# Configure image backend.
glance_backend_ceph: "yes"
glance_backend_file: "no"
# Gnocchi options
gnocchi_backend_storage: "ceph"
# Cinder - Block Storage Options
cinder_backend_ceph: "yes"
cinder_backup_driver: "ceph"
# Nova - Compute Options
nova_backend_ceph: "yes"
nova_compute_virt_type: "kvm"
Openstack/Ceph Integration
Deployment
After configuration is set, we can proceed to the deployment phase. First we need to
setup basic host-level dependencies, like docker.
Kolla Ansible provides a playbook that will install all required services in the correct
versions
● pull images for containers
# kolla-ansible pull
● Bootstrap servers with kolla deploy dependencies:
# kolla-ansible -i ./multinode bootstrap-servers
● Do pre-deployment checks for hosts:
# kolla-ansible -i ./multinode prechecks
Deployment (2)
● Finally proceed to actual OpenStack deployment:
# kolla-ansible -i ./multinode deploy
● OpenStack requires an openrc file where credentials for admin user are set. To
generate this file:
# kolla-ansible -i ./multinode post-deploy
● Install the OpenStack CLI client:
# sudo pip3 install python-openstackclient
● There is a script that will create example networks, images, and so on:
# /usr/local/share/kolla-ansible/init-runonce
Kolla Ansible CLI
kolla-ansible -i INVENTORY deploy is used to deploy and start all Kolla containers.
kolla-ansible -i INVENTORY destroy is used to clean up containers and volumes in the

cluster.
kolla-ansible -i INVENTORY mariadb_recovery is used to recover a completely

stopped mariadb cluster.
kolla-ansible -i INVENTORY prechecks is used to check if all requirements are meet

before deploy for each of the OpenStack services.
kolla-ansible -i INVENTORY post-deploy is used to do post deploy on deploy node to

get the admin openrc file.
Kolla Ansible CLI (2)
kolla-ansible -i INVENTORY pull is used to pull all images for containers.
kolla-ansible -i INVENTORY reconfigure is used to reconfigure OpenStack service.

kolla-ansible -i INVENTORY upgrade is used to upgrades existing OpenStack
Environment.
kolla-ansible -i INVENTORY check is used to do post-deployment smoke tests.
kolla-ansible -i INVENTORY stop is used to stop running containers.
kolla-ansible -i INVENTORY prune-images is used to prune orphaned Docker images
on hosts.
Advanced Configuration
OpenStack Service Configuration in Kolla
Kolla allows the operator to override configuration of services. Kolla will
generally look for a file in /etc/kolla/config/<< config file >>,
/etc/kolla/config/<< service name >>/<< config file >> or
/etc/kolla/config/<< service name >>/<< hostname >>/<< config file >>
OpenStack Service Configuration in Kolla (2)
For example, in the case of nova.conf the following locations are supported,
assuming that you have services using nova.conf running on hosts called
controller01, controller02 and controller03:
● /etc/kolla/config/nova.conf
● /etc/kolla/config/nova/controller01/nova.conf
● /etc/kolla/config/nova/nova-scheduler.conf
If the operator wants to configure compute node cpu and ram allocation ratio
on host compute05, the operator needs to create file
/etc/kolla/config/nova/compute05/nova.conf with content:
[DEFAULT]
cpu_allocation_ratio = 16.0
ram_allocation_ratio = 5.0
# kolla-ansible -i ./multinode reconfigure –limit compute05

Kolla allows the operator to override configuration globally for all services. It
will look for a file called /etc/kolla/config/global.conf.
For example to modify database pool size connection for all services, the
operator needs to create /etc/kolla/config/global.conf with content:
[database]
max_pool_size = 100
TLS
When an OpenStack service exposes an API endpoint, Kolla Ansible will
configure HAProxy for that service to listen on the internal and/or external VIP
address. The HAProxy container load-balances requests on the VIPs to the
nodes running the service container.
There are two different layers of TLS configuration for OpenStack APIs:
● Enabling TLS on the internal and/or external VIP, so communication
between an OpenStack client and the HAProxy listening on the VIP is
secure.
● Enabling TLS on the backend network, so communication between
HAProxy and the backend API services is secure.
TLS (2)
Generating a Private Certificate Authority
# kolla-ansible -i multinode certificates
The certificates role performs the following actions:
● Generates a test root Certificate Authority

● Generates the internal/external certificates which are signed by the root
CA.
● If back-end TLS is enabled, generate the back-end certificate signed by the
root CA.
TLS (3)
The combined certificate will be generated and stored in the
/etc/kolla/certificates/ directory, and a copy of the CA certificate (root.crt)
will be stored in the /etc/kolla/certificates/ca/ directory
MariaDB database backup and restore
Kolla Ansible can facilitate either full or incremental backups of data hosted in
MariaDB. It achieves this using Mariabackup, a tool designed to allow for ‘hot
backups’ - an approach which means that consistent backups can be taken
without any downtime for your database or your cloud.
By default, backups will be performed on the first node in your Galera cluster
or on the MariaDB node itself if you just have the one. Backup files are saved
to a dedicated Docker volume - mariadb_backup - and it’s the contents of this
that you should target for transferring backups elsewhere.
Enabling Backup Functionality
For backups to work, some reconfiguration of MariaDB is required - this is to
enable appropriate permissions for the backup client, and also to create an
additional database in order to store backup information.
Firstly, enable backups via globals.yml:
enable_mariabackup: "yes"
# kolla-ansible -i INVENTORY reconfigure -t mariadb

Backup Procedure
To perform a full backup, run the following command:
# kolla-ansible -i INVENTORY mariadb_backup
Or to perform an incremental backup:
# kolla-ansible -i INVENTORY mariadb_backup --incremental
Kolla doesn’t currently manage the scheduling of these backups, so you’ll need to
configure an appropriate scheduler (i.e cron) to run these commands on your behalf
should you require regular snapshots of your data. A suggested schedule would be:
● Daily full, retained for two weeks
● Hourly incremental, retained for one day
Restoring Full backups
# docker run --rm -it --volumes-from mariadb --name dbrestore --volume \
mariadb_backup:/backup kolla/centos-binary-mariadb:wallaby /bin/bash
(dbrestore) $ cd /backup
(dbrestore) $ rm -rf /backup/restore
(dbrestore) $ mkdir -p /backup/restore/full
(dbrestore) $ gunzip mysqlbackup-04-10-20.qp.xbc.xbs.gz
(dbrestore) $ mbstream -x -C /backup/restore/full/ < mysqlbackup-04-10-20.qp.xbc.xbs
(dbrestore) $ mariabackup --prepare --target-dir /backup/restore/full

Restoring Full backups (2)
# docker stop mariadb
# docker run --rm -it --volumes-from mariadb --name dbrestore --volume
mariadb_backup:/backup kolla/centos-binary-mariadb:wallaby /bin/bash
(dbrestore) $ rm -rf /var/lib/mysql/*
(dbrestore) $ rm -rf /var/lib/mysql/\.[^\.]*
(dbrestore) $ mariabackup --copy-back --target-dir /backup/restore/full
# docker start mariadb
# docker logs mariadb
Restoring Incremental backups
docker run --rm -it --volumes-from mariadb --name dbrestore --volume
mariadb_backup:/backup --tmpfs /backup/restore kolla/centos-binary-mariadb:train \
/bin/bash
(dbrestore) $ cd /backup
(dbrestore) $ rm -rf /backup/restore
(dbrestore) $ mkdir -p /backup/restore/full
(dbrestore) $ mkdir -p /backup/restore/inc
(dbrestore) $ gunzip mysqlbackup-06-11-20-1541505206.qp.xbc.xbs.gz

Restoring Incremental backups (2)
(dbrestore) $ gunzip incremental-11-mysqlbackup-06-11-20-1541505223.qp.xbc.xbs.gz
(dbrestore) $ mbstream -x -C /backup/restore/full/ < \
mysqlbackup-06-11-20-1541505206.qp.xbc.xbs
(dbrestore) $ mbstream -x -C /backup/restore/inc < \
incremental-11-mysqlbackup-06-11-20-1541505223.qp.xbc.xbs
(dbrestore) $ mariabackup --prepare --target-dir /backup/restore/full
(dbrestore) $ mariabackup --prepare --incremental-dir=/backup/restore/inc --target-dir
/backup/restore/full
Troubleshooting
The status of containers after deployment can be determined on the
deployment targets by executing:
# docker ps -a
The logs can be examined by executing:
# docker logs <container-name>
Container shell access::
# docker exec -it fluentd bash

Troubleshooting (2)
The log volume “kolla_logs” is linked to
/var/lib/docker/volumes/kolla_logs/_data on the host. You can find all kolla
logs in there.
When enable_central_logging is enabled, to view the logs in a web browser

using Kibana, go to
https://<kolla_external_vip_address>:5601
Authenticate using <kibana_user> and <kibana_password>
<kibana_password> can be found in /etc/kolla/passwords.yml

Troubleshooting (3)
When Kibana is opened for the first time, it requires creating a default index
pattern. To view, analyse and search logs, at least one index pattern has to be
created. To match indices stored in ElasticSearch, we suggest using the
following configuration:
● Index pattern - flog-*

● Time Filter field name - @timestamp
● Expand index pattern when searching [DEPRECATED] - not checked
● Use event times to create index names [DEPRECATED] - not checked
After setting parameters, one can create an index with the Create button.
Upgrade procedure
The kolla-ansible package itself should be upgraded first.
This will include reviewing some of the configuration and inventory files
On the operator/master node, a backup of the /etc/kolla directory may be
desirable
# pip install --upgrade kolla-ansible==13.0
files need manual updating are:
● /etc/kolla/globals.yml
● /etc/kolla/passwords.yml
Upgrade procedure (2)
Run the command to pull the updated images
# kolla-ansible pull
Perform the Upgrade
# kolla-ansible -i multinode upgrade

Passwords
kolla-mergepwd is used to merge passwords from old installation with newly
generated passwords during upgrade of Kolla release. The workflow is:
● Save old passwords from /etc/kolla/passwords.yml into

passwords.yml.old.
● Generate new passwords via kolla-genpwd as passwords.yml.new.
● Merge passwords.yml.old and passwords.yml.new into
/etc/kolla/passwords.yml.
Passwords (2)
# mv /etc/kolla/passwords.yml passwords.yml.old
# cp kolla-ansible/etc/kolla/passwords.yml passwords.yml.new
# kolla-genpwd -p passwords.yml.new
# kolla-mergepwd --old passwords.yml.old --new passwords.yml.new --final

/etc/kolla/passwords.yml
Tools
Kolla ships with several utilities intended to facilitate ease of operation.
tools/cleanup-containers is used to remove deployed containers from the system.
This can be useful when you want to do a new clean deployment. It will preserve the
registry and the locally built images in the registry, but will remove all running Kolla
containers from the local Docker daemon. It also removes the named volumes.
tools/cleanup-host is used to remove remnants of network changes triggered on
the Docker host when the neutron-agents containers are launched. This can be
useful when you want to do a new clean deployment, particularly one changing the
network topology.
tools/cleanup-images --all is used to remove all Docker images built by Kolla from
the local Docker cache.
Maintenance
Here’s a quick list of various to-do items for each hour, day, week, month, and
year. Please note that these tasks are neither required nor definitive but
helpful ideas:
● Hourly
○ Check your monitoring system for alerts and act on them.
○ Check your ticket queue for new tickets.
● Daily
○ Check for instances in a failed or weird state and investigate why.
○ Check for security patches and apply them as needed.
Maintenance (2)
● Weekly
○ Check cloud usage:
○ User quotas
○ Disk space
○ Image usage
○ Large instances
○ Network usage (bandwidth and IP usage)
○ Verify your alert mechanisms are still working.
● Monthly
○ Check usage and trends over the past month.
○ Check for user accounts that should be removed.
○ Check for operator accounts that should be removed.
Maintenance (3)
● Quarterly
○ Review usage and trends over the past quarter.
○ Prepare any quarterly reports on usage and statistics.
○ Review and plan any necessary cloud additions.
○ Review and plan any major OpenStack upgrades.
● Semiannually
○ Upgrade OpenStack.
○ Clean up after an OpenStack upgrade (any unused or new services to be aware of?).

Openstack Ceph Admin

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Openstack Ceph Admin

Uploaded by

Copyright:

Available Formats

Openstack/Ceph

Handlers: contains all handlers

Meta: role metadata including dependencies to other roles

Tasks: plays or tasks

Tip: It’s common to include tasks in main.yml with “when” (e.g.

Templates: templates to deploy

Tests: place for playbook tests

Vars: variables (e.g. override port)

● Write and read data

When a system administrator creates a pool, CRUSH creates a user-deﬁned

● FileStore: A production grade implementation using a ﬁlesystem to store

● Object Data: In BlueStore, Ceph stores objects as blocks directly on a raw

# docker pull docker.io/ceph/daemon:latest-paciﬁc

# docker pull docker.io/grafana/grafana:6.7.4

# docker pull docker.io/ceph/ceph-grafana:latest

# docker pull docker.io/prom/alertmanager:v0.16.2

# docker pull docker.io/prom/prometheus:v2.7.2

# docker pull docker.io/prom/node-exporter:v0.17.0

docker image tag grafana/grafana:6.7.4 10.20.1.33:4000/grafana/grafana:6.7.4

docker image tag ceph/ceph-grafana:latest 10.20.1.33:4000/ceph/ceph-grafana:latest

docker image tag prom/alertmanager:v0.16.2 10.20.1.33:4000/prom/alertmanager:v0.16.2

docker image tag prom/prometheus:v2.7.2 10.20.1.33:4000/prom/prometheus:v2.7.2

docker image tag prom/node-exporter:v0.17.0 10.20.1.33:4000/prom/node-exporter:v0.17.0

docker image push 10.20.1.33:4000/grafana/grafana:6.7.4

docker image push 10.20.1.33:4000/ceph/ceph-grafana:latest

docker image push 10.20.1.33:4000/prom/alertmanager:v0.16.2

docker image push 10.20.1.33:4000/prom/prometheus:v2.7.2

docker image push 10.20.1.33:4000/prom/node-exporter:v0.17.0

# dnf install python3-devel libﬃ-devel gcc openssl-devel python3-libselinux

# pip3 install setuptools

# pip3 install setuptools-rust

# pip3 install wheel

# pip3 install ansible==2.9.27

[ansible@admin ~]$ ssh-keygen

[ansible@admin ~]$ ssh-copy-id admin@ceph-mon01

[root@admin ~]# dnf install ceph-ansible

$ git clone https://github.com/ceph/ceph-ansible.git

- "{{ crush_rule_ssd }}"

computehci01 osd_crush_location="{ 'root': 'default', 'rack': 'rack1', 'chassis': 'chassis1',

computehci02 osd_crush_location="{ 'root': 'default', 'rack': 'rack1', 'chassis': 'chassis1',

computehci03 osd_crush_location="{ 'root': 'default', 'rack': 'rack1', 'chassis': 'chassis1',

computehci04 osd_crush_location="{ 'root': 'default', 'rack': 'rack1', 'chassis': 'chassis2',

● Verify that Ansible can reach the Ceph nodes:

[ansible@admin ceph-ansible]$ ansible all -m ping -i /etc/ansible/hosts

[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml -e

[root@mon ~]# docker exec ceph-mon-controllera ceph health

[root@mon ~]# docker exec ceph-mon-controllera ceph -s

● To mute health warnings

[root@mon ~]# docker exec ceph-mon-controllera ceph conﬁg set mon

$ ansible-playbook site-container.yml --limit rgws -i /etc/ansible/hosts

● Create the nfss.yml ﬁle from the sample ﬁle:

The command used would be like following:

# ansible-playbook -i <your-inventory> site-container.yml --limit <node>

$ ansible-playbook -i hosts infrastructure-playbooks/shrink-osd.yml -e

$ ansible-playbook -i hosts infrastructure-playbooks/rolling_update.yml

OpenStack provides an Infrastructure-as-a-Service (IaaS) solution through a

Blizzard Entertainment currently has 12,000 compute nodes on OpenStack

● Builds container images for OpenStack services.

# sudo pip3 install kolla-ansible==12.0 (For wallaby version)

Create the /etc/kolla directory:

# sudo mkdir -p /etc/kolla

# sudo chown $USER:$USER /etc/kolla

Copy multinode inventory ﬁles to the current directory:

# ansible -i multinode all -m ping