You are on page 1of 59

Ceph Storage, PaaS 로 서비스 / 운영하기

Hyun Ha @ naver
In-House Platform

Computing Tools Monitoring

Security Storage DB
In-House PaaS Platform - PASTA

Computing Tools Monitoring

Security Storage DB
Goal:
In-House PaaS Platform - PASTA

Containerized
(stateless / stateful)
Mission:
Persistent Volume for (stateful)Container

Ceph
Container

Data

Container
Why Ceph?

Strong Consistency à Block, Filesystem

Open Source Eco-System

Linux Kernel Source

Kernel mount / fuse mount / rbd kernel map


적용된 PaaS 플랫폼

Docker Swarm Farm(RBD, CephFS)

Jenkins Farm(RBD)

Elastic-Search Farm(RBD)

Deep storage for DRUID (CephFS)

etc …
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

1. Login
Client PASTA
Ceph
Volume
2. 인증/권한

Keystone Cinder
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

3. 서비스 생성 요청

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm
4. Container 생성

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #1: 서비스 생성 Flow
5. Image download / Container 생성
Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #1: 서비스 생성 Flow
6. Volume 생성
Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

Client PASTA
Ceph
Volume
7. 인증/권한
Keystone Cinder
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

Client PASTA
Ceph
Volume

Keystone Cinder
8. Volume 생성
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

9. Attach

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #1: 서비스 생성 Flow

Docker Registry
Container Docker-plugin

/dev/rbd0
Host
Docekr-Swarm

10. mkfs, Mount

Client PASTA
Ceph
Volume

Keystone Cinder
Use Case #2: Ceph UI
Use Case #3: Ceph CLI

$ ceph-cli -h
NAME:
ceph-cli - use ceph volume for your PM/VM!
USAGE:
ceph-cli [global options] command [command options] [arguments...]
COMMANDS:
auth Certify with `TOKEN`
show Gets detailed information about the given `VOLUME ID` or `VOLUME NAME`
list Get volume list
create Create volume
delete Delete Volume by `VOLUME ID` or `VOLUME NAME`
attach Attach volume
detach Detach volume
extend Extend volume
reset Rest volume
GLOBAL OPTIONS:
--debug, -d Enable debug logging
--help, -h show help
Ceph - Component

MON MON MGR MON MGR MON

MDS MDS MDS MDS

MON
OSD OSD OSD OSD

MDS

OSD OSD OSD OSD
Ceph – Pool

Ceph Cluster

SSD Pool Application


SSD OSD.0 SSD OSD.1

HDD OSD.4
Hybrid Pool
HDD Pool HDD OSD.2 HDD OSD.3 HDD OSD.5

Host #1 Host #2 Host #3

Device Classes 기능 이용: Luminous 이후 버전


운영 / Troubleshooting
• Multi-Mapped Volume
• Upgrade Ceph
• Network 장애
• scrub/deep-scrub
• RBD Image 복구
• Monitor 장애/복구
ISSUE#1 : Multi-Mapped Volume
Multi-Mapped Volume

Host #2 Host #3 Host #4

Container Container Container

Container Container Container

Host #1

Container
Mapped
Container Volume

Container

System hang / network 단절 등 비정상 동작


Multi-Mapped Volume

Host #2 Host #3 Host #4

Container Container Container

Container Container Container

Host #1 Mapped

Container
Mapped
Container Volume

Container

System hang / network 단절 등 비정상 동작


Multi-Mapped Volume

Host #2 Host #3 Host #4

Container Container Container

Container Container Container

Host #1 Mapped

Container
Mapped
Container Volume
?
Container

System 정상화
Host blacklist 등록을 통한 multi-map 방지

• Add blacklist
$ ceph osd blacklist add ${client_ip} 10
blacklisting ${client_ip}:0/0 until 2018-05-02 (10 sec)

$ ceph osd blacklist ls


listed 1 entries
${client_ip}:0/0 2018-05-02 15:42:12.935377

• Automation
Call action Add blacklist

Docker Lambda Ceph


Multi-Mapped Volume 관리

• Multi mapped volumes 확인


$ rbd -p volumes status ${volume-uuid}

Watchers:
watcher=${client_host_1_ip}:0/1015181303 client.259635408 cookie=18446462598732840991
watcher=${client_host_2_ip}:0/4152018459 client.201522571 cookie=18446462598732841309

• Monitoring / alarm
ISSUE#2 : Upgrade Ceph
Upgrade Policy:

Mitaka Newton Ocata Pike Queens rockey


Openstack
2016. 4 2016. 10 2017. 2 2017. 8 2018. 2 2018. 8

Mitaka Newton Jewel Ocata Luminous

2017. 2 2017. 3 2017. 10 2018. 4

Jewel LTS Kraken EOL Luminous LTS Mimic Nautilus LTS


(v10.1.0) (v12.2.0) (v13.2.0)
Ceph
2016. 4 2017. 4 2017. 8 2018. 6.1
Upgrade 준비:

• ceph/src/vstart.sh
$ MON=1 MDS=1 ../src/vstart.sh -d -n -x

• Ceph-ansible (https://github.com/ceph/ceph-ansible)

• Kolla (https://github.com/openstack/kolla)

(https://review.openstack.org/#/c/566810/)
주의할 점 : Kolla

ü ceph osd set noout

ü ceph osd set norebalance

ü Deploy one by one

ü Health check
주의할 점 : Configuration

• osd crush update on start = false (default: true)


• osd class update on start = false (default: true)

• osd_beacon_report_interval = 200 (default: 300)


• mon_osd_report_timeout = 300 (default: 900)

• ceph osd set-full-ratio 0.95


• ceph osd set-backfillfull-ratio 0.90
• ceph osd set-nearfull-ratio 0.70

üCheck : Custom configuration VS Default configuration


ISSUE#3 : Network 장애
Network장애 시

Ceph은 network attatced storage!

libceph

/dev/rbd0

libceph: mon1 socket closed (con state CONNECTING)


libceph: mon2 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
Network장애 시

libceph

/dev/rbd0

libceph: mon1 socket closed (con state CONNECTING)


libceph: mon2 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon0 socket closed (con state CONNECTING)
libceph: mon0 socket closed (con state CONNECTING)
Network장애 시
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1libceph
socket closed (con state CONNECTING)
libceph: mon0 socket closed (con state CONNECTING)
libceph:/dev/rbd0
mon0 socket closed (con state CONNECTING)
libceph: mon0 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
libceph: mon0 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon1 socket closed (con state CONNECTING)
libceph: mon2 socket closed (con state CONNECTING)
Network장애 시

http://tracker.ceph.com/issues/20927#change-96952
동일 이슈 – Rook(Storage Orchestration for Kubernetes)

rbd force unmap 만 지원됨. full-force unmap 개발 완료 후 적용 예정

# no in-flight I/O only


$ rbd unmap -o force

# in-flight I/O
$ rbd unmap –o full-force

https://github.com/rook/rook/pull/1179/files#diff-dabdd325e9ee838bb51e4f3f6c5b046cR142
우리가 해결한 방법은?
(Linux kernel : 4.11 이후)
rbd map 이 아닌 rbd kernel map 사용, “osd_request_timeout” 옵션 적용
(3600 second)

http://docs.ceph.com/docs/argonaut/rbd/rbd-ko/
rbd kernel map을 사용 시 이슈 1)
: Client 에서 keyring 이 그대로 노출됨

$ cat /sys/bus/rbd/devices/0/config_info
[mon_ip] name=admin,secret=AQDnZHxxAAy4OUSreyDDE6YMwKOT4Bug==

Ø 해결 방법: Keyutils 적용

$ cat config_info
[mon_ip] name=admin,key=client.admin volumes vol01 -
rbd kernel map을 사용 시 이슈 2)
: different major number
$ ls -al /dev/rbd*
brw------- 1 root root 252, 0 Dec 5 21:16 /dev/rbd0
brw------- 1 root root 251, 0 Dec 5 21:16 /dev/rbd1
brw------- 1 root root 242, 0 Dec 5 21:17 /dev/rbd10
brw------- 1 root root 241, 0 Dec 7 15:43 /dev/rbd11
brw------- 1 root root 240, 0 Dec 18 11:49 /dev/rbd12

Ø 해결 방법 : single_major 적용
echo ”${mon_ip} name=admin,secret=*** volumes vol01" > /sys/bus/rbd/add_single_major
$ ls -la /dev/rbd*
brw-rw----. 1 root disk 252, 0 Feb 8 02:14 /dev/rbd0
brw-rw----. 1 root disk 252, 16 Feb 8 02:13 /dev/rbd1
brw-rw----. 1 root disk 252, 32 Feb 8 02:14 /dev/rbd2
brw-rw----. 1 root disk 252, 48 Feb 8 02:20 /dev/rbd3
brw-rw----. 1 root disk 252, 64 Feb 8 02:29 /dev/rbd4
ISSUE #4 : scrub / deep-scrub
Deep Scrub 시 performance impact

“3 slow requests are blocked > 32 sec”


Deep-scrub performance impact

동시 deep-scrub 실행되는 PG 수 Disk Read (MB/s)


Configuration & Manual schedule

Ø Configuration

• osd scrub chunk min = 1


• osd scrub chunk max = 1

Ø Set noscrub/nodeep-scrub

$ ceph osd pool set $pool nodeep-scrub

Ø Manual Schedule

• Scrub : 전체 pg 가 2일에 1회 실행
• Deep scrub : 전체 pg가 34일에 1회 실행(동시 최대 1개)
ISSUE #5 : RBD Image 복구
Ceph Object 저장 방식 – Directory 구조

/var/lib/ceph/osd/ceph-4/current/3.72_head/

OSD DIR OSD ID PG ID

rbd_data.2576d643c9869.0000000000000000__head_22269772__3

Block_name_prefix Seq Num Hash Pool ID


Ceph Data 복구 방법

1. rbd image의 block_name_prefix 찾기

$ find /var/lib/ceph/osd -type f -name *vol01*


/var/lib/ceph/osd/ceph-4/current/3.7a_head/rbd_id.vol01__head_E10E397A__3

$ hexdump -C /var/lib/ceph/osd/ceph-4/current/3.7a_head/rbd\\uid.vol01__head_E10E397A__3
00000000 0d 00 00 00 32 35 37 36 64 36 34 33 63 39 38 36 |....2576d643c986|
00000010 39 |9|

Block_name_prefix : 2576d643c9869

2. “blocke_name_prefix” 로 모든 object get

3. 모은 object 로 image cerate & device 생성


ISSUE #6 : Monitor 장애 / 복구
Mon 장애 시나리오

MON MON MON MON

MON
Service : Fine
Mon 장애 시나리오

MON MON MON MON

MON
Service : Failure
(단, client I/O는 정상. rbd map/unmap 불가)
Mon 장애 복구

MON MON MON MON

1. 전체 mon stop, monmap 1개로 변경

$ monmaptool --rm ${mon_id} /tmp/monmap


MON
2. 새로운 monmap 적용

$ ceph-mon --inject-monmap /tmp/oo

3. 이후 mon 전체 배포하여 5개 quorum 구성


Mon 장애 복구

MON MON MON MON

다 죽은 경우는?
MON
Mon 장애 복구

MON MON MON MON

MON MON

Remote backup
백업본으로 복구!
Mon backup

MON MON MON MON

Local backup

MON

Local backup

Remote backup
Mon backup
/var/lib/ceph/mon/ceph-{mon_id}

MON

/local_backup /dev/rbd

백업 정책>

- 백업 주기 : 1일 2회 (새벽 2시, 3시) Ceph


- 보관 주기 : 1 주
- 2개 mon에서 교대로 백업 수행,
backup_pool
- 다른 ceph cluster 로 remote backup
감사합니다.
Q&A
Thank you

하 현 / Hyun Ha

hyun.ha@navercorp.com

You might also like