2020.12.31
a. Problem: pgs undersized
- Environments
Kubernetes 1.16.15, Rook Ceph 1.3.8
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph status
cluster:
id: 1ef6e249-005e-477e-999b-b874f9fa0854
health: HEALTH_WARN
Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized
…
b. Cause analysis
- undersized
The placement group has fewer copies than the configured pool replication level.
https://docs.ceph.com/en/latest/rados/operations/pg-states/
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail
HEALTH_WARN Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized
PG_DEGRADED Degraded data redundancy: 2/1036142 objects degraded (0.000%), 2 pgs degraded, 14 pgs undersized
pg 7.8 is stuck undersized for 15747.618868, current state active+undersized, last acting [14,17]
pg 7.14 is stuck undersized for 15952.931322, current state active+undersized, last acting [13,17]
pg 8.e is stuck undersized for 15952.943984, current state active+undersized+degraded, last acting [18,13]
pg 8.1c is stuck undersized for 15952.943809, current state active+undersized+degraded, last acting [18,15]
pg 9.5 is stuck undersized for 15952.625823, current state active+undersized, last acting [16,12]
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump | egrep 'PG_STAT|undersized'
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
7.14 0 0 0 0 0 0 0 0 0 0 active+undersized 2020-12-30 02:57:39.199343 0'0 30503:29647 [13,17] 13 [13,17] 13 0'0 2020-12-29 18:57:13.901750 0'0 2020-12-29 18:57:13.901750 0
9.1b 0 0 0 0 0 0 0 0 0 0 active+undersized 2020-12-30 02:57:39.585109 0'0 30503:28143 [19,12] 19 [19,12] 19 0'0 2020-12-28 23:42:56.728311 0'0 2020-12-22 14:25:19.047234 0
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#stuck-placement-groups
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck stale
ok
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck inactive
ok
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg dump_stuck unclean
ok
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
7.8 active+undersized [14,17] 14 [14,17] 14
7.14 active+undersized [13,17] 13 [13,17] 13
8.1c active+undersized+degraded [18,15] 18 [18,15] 18
9.16 active+undersized [12,17] 12 [12,17] 12
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#
For stuck unclean placement groups, there is usually something preventing recovery from completing, like unfound objects (see Unfound Objects)
조회 결과 'Unfound objects'가 없으므로 'ceph pg 7.8 mark_unfound_lost revert | delete' 명령어를 실행해도 해결되지 않음
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 query | egrep recovery_state -A9
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2020-12-28 09:40:26.685231",
"might_have_unfound": [
{
"osd": "19",
"status": "not queried"
}
],
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg 9.16 list_unfound
{
"num_missing": 0,
"num_unfound": 0,
"objects": [],
"more": false
}
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#
c. Solution
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph pg repeer 7.8
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail
HEALTH_WARN Reduced data availability: 1 pg peering; Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized
PG_AVAILABILITY Reduced data availability: 1 pg peering
pg 7.8 is stuck peering for 2307.535700, current state remapped+peering, last acting [14]
PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized
pg 7.14 is stuck undersized for 19406.872563, current state active+undersized, last acting [13,17]
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph health detail
HEALTH_WARN Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized
PG_DEGRADED Degraded data redundancy: 2/1036144 objects degraded (0.000%), 2 pgs degraded, 10 pgs undersized
pg 7.14 is stuck undersized for 19428.889166, current state active+undersized, last acting [13,17]
…
오류 발생된 모든 PG에 대하여 repeer을 실행 후 ceph의 상태
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]# ceph -s
cluster:
id: 1ef6e249-005e-477e-999b-b874f9fa0854
health: HEALTH_OK
…
[root@rook-ceph-tools-79d7c49c8d-4c4x5 /]#
'Kubernetes > Storage' 카테고리의 다른 글
Rook Ceph - scrub error (0) | 2021.09.16 |
---|---|
Rook Ceph - rook-ceph-osd POD is CrashLoopBackOff (0) | 2021.09.16 |
Rook Ceph - OSD autoout (0) | 2021.09.16 |
Rook Ceph - failed to get status (0) | 2021.09.16 |
Rook Ceph 구성 (0) | 2021.09.15 |
댓글