본문 바로가기
Kubernetes/Storage

Rook Ceph - pgs undersized

by 여행을 떠나자! 2021. 9. 16.

2020.12.31

a. Problem: pgs undersized

- Environments

  Kubernetes 1.16.15, Rook Ceph 1.3.8

 

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph status

    cluster:

      id:     1ef6e249-005e-477e-999b-b874f9fa0854

      health: HEALTH_WARN

              Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

  …

 

 

b. Cause analysis

- undersized

   The placement group has fewer copies than the configured pool replication level.

   https://docs.ceph.com/en/latest/rados/operations/pg-states/

 

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

  HEALTH_WARN Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

  PG_DEGRADED Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

    pg 7.8 is stuck undersized for 15747.618868, current state active+undersized, last acting [14,17]

    pg 7.14 is stuck undersized for 15952.931322, current state active+undersized, last acting [13,17]

    pg 8.e is stuck undersized for 15952.943984, current state active+undersized+degraded, last acting [18,13]

    pg 8.1c is stuck undersized for 15952.943809, current state active+undersized+degraded, last acting [18,15]

    pg 9.5 is stuck undersized for 15952.625823, current state active+undersized, last acting [16,12]

  …

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump | egrep 'PG_STAT|undersized'

  PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES       OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                       STATE_STAMP                VERSION        REPORTED       UP         UP_PRIMARY ACTING     ACTING_PRIMARY LAST_SCRUB     SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN

  PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES       OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                      STATE_STAMP                VERSION        REPORTED       UP         UP_PRIMARY ACTING     ACTING_PRIMARY LAST_SCRUB     SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN

  7.14          0                  0        0         0       0           0           0          0    0        0          active+undersized 2020-12-30 02:57:39.199343            0'0    30503:29647    [13,17]         13    [13,17]             13            0'0 2020-12-29 18:57:13.901750             0'0 2020-12-29 18:57:13.901750             0

  9.1b          0                  0        0         0       0           0           0          0    0        0          active+undersized 2020-12-30 02:57:39.585109            0'0    30503:28143    [19,12]         19    [19,12]             19            0'0 2020-12-28 23:42:56.728311             0'0 2020-12-22 14:25:19.047234             0

  …

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

 

  https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#stuck-placement-groups

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck stale

  ok

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck inactive

  ok

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck unclean

  ok

  PG_STAT STATE                      UP      UP_PRIMARY ACTING  ACTING_PRIMARY

  7.8              active+undersized [14,17]         14 [14,17]             14

  7.14             active+undersized [13,17]         13 [13,17]             13

  8.1c    active+undersized+degraded [18,15]         18 [18,15]             18

  9.16             active+undersized [12,17]         12 [12,17]             12

  …

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

 

  For stuck unclean placement groups, there is usually something preventing recovery from completing, like unfound objects (see Unfound Objects)

  조회 결과 'Unfound objects'가 없으므로 'ceph pg 7.8 mark_unfound_lost revert | delete' 명령어를 실행해도 해결되지 않음

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 query | egrep recovery_state -A9

        "recovery_state": [

            {

                "name": "Started/Primary/Active",

                "enter_time": "2020-12-28 09:40:26.685231",

                "might_have_unfound": [

                    {

                        "osd": "19",

                        "status": "not queried"

                    }

                ],

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 list_unfound

  {

        "num_missing": 0,

        "num_unfound": 0,

        "objects": [],

        "more": false

  }

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

 

 

c. Solution

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg repeer 7.8

       

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

  HEALTH_WARN Reduced data availability: 1 pg peering; Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

  PG_AVAILABILITY Reduced data availability: 1 pg peering

      pg 7.8 is stuck peering for 2307.535700, current state remapped+peering, last acting [14]

  PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

      pg 7.14 is stuck undersized for 19406.872563, current state active+undersized, last acting [13,17]

  …

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

  HEALTH_WARN Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

  PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

      pg 7.14 is stuck undersized for 19428.889166, current state active+undersized, last acting [13,17]

  …

 

  오류 발생된 모든 PG에 대하여 repeer을 실행 후 ceph의 상태

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph -s

      cluster:

        id:     1ef6e249-005e-477e-999b-b874f9fa0854

        health: HEALTH_OK

    …

  [root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

'Kubernetes > Storage' 카테고리의 다른 글

Rook Ceph - scrub error  (0) 2021.09.16
Rook Ceph - rook-ceph-osd POD is CrashLoopBackOff  (0) 2021.09.16
Rook Ceph - OSD autoout  (0) 2021.09.16
Rook Ceph - failed to get status  (0) 2021.09.16
Rook Ceph 구성  (0) 2021.09.15

댓글