2021.05.10
a. Problem: rook-ceph-osd-19-5b8c7f4787-klrfr POD 상태가 CrashLoopBackOff
- Environments
Kubernetes 1.16.15, Rook Ceph 1.3.8
[iap@iap01 ~]$ k get pod -n rook-ceph -o wide | egrep 'NAME|osd-[0-9]'
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-12-686858c5dd-hsxh7 1/1 Running 1 37h 10.244.10.105 iap10 <none> <none>
rook-ceph-osd-13-584d4ff974-wdtq9 1/1 Running 1 37h 10.244.10.125 iap10 <none> <none>
rook-ceph-osd-14-7696cd8cbc-g228g 1/1 Running 1 37h 10.244.10.109 iap10 <none> <none>
rook-ceph-osd-15-6f944f9fb8-lxmrb 1/1 Running 1 37h 10.244.10.108 iap10 <none> <none>
rook-ceph-osd-16-75cf4c8897-v545x 1/1 Running 51 28d 10.244.10.111 iap11 <none> <none>
rook-ceph-osd-17-69dd5bb885-fp6ht 1/1 Running 14 28d 10.244.10.113 iap11 <none> <none>
rook-ceph-osd-18-8c4c6f89-dvdqv 1/1 Running 16 28d 10.244.10.117 iap11 <none> <none>
rook-ceph-osd-19-5b8c7f4787-klrfr 0/1 CrashLoopBackOff 214 28d 10.244.10.119 iap11 <none> <none>
[iap@iap01 ~]$
b. Cause analysis
- Error message
[iap@iap01 ~]$ k logs rook-ceph-osd-19-5b8c7f4787-klrfr -n rook-ceph
debug 2021-05-08 01:57:00.979 7f6f2781da80 0 set uid:gid to 167:167 (ceph:ceph)
debug 2021-05-08 01:57:00.979 7f6f2781da80 0 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable), process ceph-osd, pid 84758
debug 2021-05-08 01:57:00.979 7f6f2781da80 0 pidfile_write: ignore empty --pid-file
debug 2021-05-08 01:57:01.015 7f6f2781da80 -1 bluestore(/var/lib/ceph/osd/ceph-19/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-19/block: (5) Input/output error
debug 2021-05-08 01:57:01.015 7f6f2781da80 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-19: (2) No such file or directory
[iap@iap01 ~]$
- Running 상태인 rook-ceph-osd-12-686858c5dd-hsxh7 POD 관련 정보
[iap@iap01 ~]$ k exec rook-ceph-osd-12-686858c5dd-hsxh7 -n rook-ceph -it -- ls /var/lib/ceph/osd/ceph-12 -l
total 24
lrwxrwxrwx 1 ceph ceph 92 May 10 00:19 block -> /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c
-rw------- 1 ceph ceph 37 May 10 00:19 ceph_fsid
-rw------- 1 ceph ceph 37 May 10 00:19 fsid
-rw------- 1 ceph ceph 56 May 10 00:19 keyring
-rw------- 1 ceph ceph 6 May 10 00:19 ready
-rw------- 1 ceph ceph 10 May 10 00:19 type
-rw------- 1 ceph ceph 3 May 10 00:19 whoami
[iap@iap01 ~]$ k describe pod rook-ceph-osd-12-686858c5dd-hsxh7 -n rook-ceph
Name: rook-ceph-osd-12-686858c5dd-hsxh7
...
Node: iap10/14.52.244.213
...
Mounts:
/dev from devices (rw)
...
devices:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
...
[iap@iap01 ~]$
[root@iap10 ~]# ls -ld /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c
lrwxrwxrwx 1 root root 7 5월 10 12:37 /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c -> ../dm-6
[root@iap10 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 893.8G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 892.8G 0 part
├─centos-root 253:0 0 811.4G 0 lvm /
├─centos-swap 253:1 0 31.3G 0 lvm
└─centos-home 253:5 0 50G 0 lvm /home
sdb 8:16 0 14.6T 0 disk
└─ceph--785f0302--e6a1--4efd--bb0e--8ed51f9c9c5b-... 253:6 0 14.6T 0 lvm
...
[iap@iap10 ~]#
- OS 재 기동시 특정 Disk(4번째) 에서 오류가 발생 되었다는 메세지 출력 되면서 부팅 실패
c. Workaround: 오류 Disk 교체
'Kubernetes > Storage' 카테고리의 다른 글
Rook Ceph - DiskPressure (0) | 2021.09.16 |
---|---|
Rook Ceph - scrub error (0) | 2021.09.16 |
Rook Ceph - pgs undersized (0) | 2021.09.16 |
Rook Ceph - OSD autoout (0) | 2021.09.16 |
Rook Ceph - failed to get status (0) | 2021.09.16 |
댓글