Rook Ceph - pgs undersized

2020.12.31

a. Problem: pgs undersized

- Environments

Kubernetes 1.16.15, Rook Ceph 1.3.8

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph status

cluster:

id: 1ef6e249-005e-477e-999b-b874f9fa0854

health: HEALTH_WARN

Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

…

b. Cause analysis

- undersized

The placement group has fewer copies than the configured pool replication level.

https://docs.ceph.com/en/latest/rados/operations/pg-states/

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

HEALTH_WARN Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

PG_DEGRADED Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized

pg 7.8 is stuck undersized for 15747.618868, current state active+undersized, last acting [14,17]

pg 7.14 is stuck undersized for 15952.931322, current state active+undersized, last acting [13,17]

pg 8.e is stuck undersized for 15952.943984, current state active+undersized+degraded, last acting [18,13]

pg 8.1c is stuck undersized for 15952.943809, current state active+undersized+degraded, last acting [18,15]

pg 9.5 is stuck undersized for 15952.625823, current state active+undersized, last acting [16,12]

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump | egrep 'PG_STAT|undersized'

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN

7.14 0 0 0 0 0 0 0 0 0 0 active+undersized 2020-12-30 02:57:39.199343 0'0 30503:29647 [13,17] 13 [13,17] 13 0'0 2020-12-29 18:57:13.901750 0'0 2020-12-29 18:57:13.901750 0

9.1b 0 0 0 0 0 0 0 0 0 0 active+undersized 2020-12-30 02:57:39.585109 0'0 30503:28143 [19,12] 19 [19,12] 19 0'0 2020-12-28 23:42:56.728311 0'0 2020-12-22 14:25:19.047234 0

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#stuck-placement-groups

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck stale

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck inactive

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck unclean

PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY

7.8 active+undersized [14,17] 14 [14,17] 14

7.14 active+undersized [13,17] 13 [13,17] 13

8.1c active+undersized+degraded [18,15] 18 [18,15] 18

9.16 active+undersized [12,17] 12 [12,17] 12

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

For stuck unclean placement groups, there is usually something preventing recovery from completing, like unfound objects (see Unfound Objects)

조회 결과 'Unfound objects'가 없으므로 'ceph pg 7.8 mark_unfound_lost revert | delete' 명령어를 실행해도 해결되지 않음

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 query | egrep recovery_state -A9

"recovery_state": [

{

"name": "Started/Primary/Active",

"enter_time": "2020-12-28 09:40:26.685231",

"might_have_unfound": [

{

"osd": "19",

"status": "not queried"

}

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 list_unfound

{

"num_missing": 0,

"num_unfound": 0,

"objects": [],

"more": false

}

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

c. Solution

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg repeer 7.8

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

HEALTH_WARN Reduced data availability: 1 pg peering; Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

PG_AVAILABILITY Reduced data availability: 1 pg peering

pg 7.8 is stuck peering for 2307.535700, current state remapped+peering, last acting [14]

PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

pg 7.14 is stuck undersized for 19406.872563, current state active+undersized, last acting [13,17]

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail

HEALTH_WARN Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized

pg 7.14 is stuck undersized for 19428.889166, current state active+undersized, last acting [13,17]

…

오류 발생된 모든 PG에 대하여 repeer을 실행 후 ceph의 상태

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph -s

cluster:

id: 1ef6e249-005e-477e-999b-b874f9fa0854

health: HEALTH_OK

…

[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#

'Kubernetes > Storage' 카테고리의 다른 글

Rook Ceph - scrub error (0)	2021.09.16
Rook Ceph - rook-ceph-osd POD is CrashLoopBackOff (0)	2021.09.16
Rook Ceph - OSD autoout (0)	2021.09.16
Rook Ceph - failed to get status (0)	2021.09.16
Rook Ceph 구성 (0)	2021.09.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

일주일만 하면 ...

Rook Ceph - pgs undersized

'Kubernetes > Storage' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

Rook Ceph - pgs undersized

'Kubernetes > Storage' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역