본문 바로가기
Kubernetes/Storage

Rook Ceph - rook-ceph-osd POD is CrashLoopBackOff

by 여행을 떠나자! 2021. 9. 16.

2021.05.10

a. Problem: rook-ceph-osd-19-5b8c7f4787-klrfr POD 상태가 CrashLoopBackOff

- Environments

   Kubernetes 1.16.15, Rook Ceph 1.3.8

 

[iap@iap01 ~]$ k get pod -n rook-ceph -o wide | egrep 'NAME|osd-[0-9]'

  NAME                               READY  STATUS            RESTARTS  AGE     IP          NODE   NOMINATED NODE READINESS GATES

  rook-ceph-osd-12-686858c5dd-hsxh7  1/1    Running           1         37h  10.244.10.105  iap10  <none>          <none>

  rook-ceph-osd-13-584d4ff974-wdtq9  1/1    Running           1         37h  10.244.10.125  iap10  <none>          <none>

  rook-ceph-osd-14-7696cd8cbc-g228g  1/1    Running           1         37h  10.244.10.109  iap10  <none>          <none>

  rook-ceph-osd-15-6f944f9fb8-lxmrb  1/1    Running           1         37h  10.244.10.108  iap10  <none>          <none>

  rook-ceph-osd-16-75cf4c8897-v545x  1/1    Running           51        28d  10.244.10.111  iap11  <none>          <none>

  rook-ceph-osd-17-69dd5bb885-fp6ht  1/1    Running           14        28d  10.244.10.113  iap11  <none>          <none>

  rook-ceph-osd-18-8c4c6f89-dvdqv    1/1    Running           16        28d  10.244.10.117  iap11  <none>          <none>

  rook-ceph-osd-19-5b8c7f4787-klrfr  0/1    CrashLoopBackOff  214       28d  10.244.10.119  iap11  <none>          <none>

  [iap@iap01 ~]$

 

 

b.  Cause analysis

- Error message

 [iap@iap01 ~]$ k logs rook-ceph-osd-19-5b8c7f4787-klrfr -n rook-ceph

 debug 2021-05-08 01:57:00.979 7f6f2781da80  0 set uid:gid to 167:167 (ceph:ceph)

 debug 2021-05-08 01:57:00.979 7f6f2781da80  0 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable), process ceph-osd, pid 84758

 debug 2021-05-08 01:57:00.979 7f6f2781da80  0 pidfile_write: ignore empty --pid-file

 debug 2021-05-08 01:57:01.015 7f6f2781da80 -1 bluestore(/var/lib/ceph/osd/ceph-19/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-19/block: (5) Input/output error

 debug 2021-05-08 01:57:01.015 7f6f2781da80 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-19: (2) No such file or directory

 [iap@iap01 ~]$

 

- Running 상태인 rook-ceph-osd-12-686858c5dd-hsxh7 POD 관련 정보

 [iap@iap01 ~]$ k exec rook-ceph-osd-12-686858c5dd-hsxh7 -n rook-ceph -it -- ls   /var/lib/ceph/osd/ceph-12 -l

 total 24

 lrwxrwxrwx 1 ceph ceph 92 May 10 00:19 block -> /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c

 -rw------- 1 ceph ceph 37 May 10 00:19 ceph_fsid

 -rw------- 1 ceph ceph 37 May 10 00:19 fsid

 -rw------- 1 ceph ceph 56 May 10 00:19 keyring

 -rw------- 1 ceph ceph  6 May 10 00:19 ready

 -rw------- 1 ceph ceph 10 May 10 00:19 type

 -rw------- 1 ceph ceph  3 May 10 00:19 whoami

 [iap@iap01 ~]$ k describe pod rook-ceph-osd-12-686858c5dd-hsxh7 -n rook-ceph

 Name:         rook-ceph-osd-12-686858c5dd-hsxh7

 ...

 Node:         iap10/14.52.244.213

 ...

 Mounts:

       /dev from devices (rw)

 ...

 devices:

     Type:          HostPath (bare host directory volume)

     Path:          /dev

     HostPathType:

 ...

 [iap@iap01 ~]$

 

 [root@iap10 ~]# ls -ld /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c

 lrwxrwxrwx 1 root root 7  5월 10 12:37 /dev/ceph-785f0302-e6a1-4efd-bb0e-8ed51f9c9c5b/osd-data-f497643a-143c-4642-bc24-56d73aef3a5c -> ../dm-6

 [root@iap10 ~]# lsblk

 NAME                                                   MAJ:MIN  RM   SIZE RO TYPE MOUNTPOINT

 sda                                                    8:0      0  893.8G  0 disk

 ├─sda1                                                8:1      0      1G  0 part /boot

 └─sda2                                                8:2      0  892.8G  0 part

   ├─centos-root                                       253:0    0  811.4G  0 lvm  /

   ├─centos-swap                                       253:1    0   31.3G  0 lvm

   └─centos-home                                       253:5    0     50G  0 lvm  /home

 sdb                                                     8:16    0   14.6T  0 disk

 └─ceph--785f0302--e6a1--4efd--bb0e--8ed51f9c9c5b-...  253:6    0   14.6T  0 lvm

 ...

 [iap@iap10 ~]#

 

  • OS 재 기동시 특정 Disk(4번째) 에서 오류가 발생 되었다는 메세지 출력 되면서 부팅 실패

 

c. Workaround: 오류 Disk 교체

'Kubernetes > Storage' 카테고리의 다른 글

Rook Ceph - DiskPressure  (0) 2021.09.16
Rook Ceph - scrub error  (0) 2021.09.16
Rook Ceph - pgs undersized  (0) 2021.09.16
Rook Ceph - OSD autoout  (0) 2021.09.16
Rook Ceph - failed to get status  (0) 2021.09.16

댓글