본문 바로가기
Kubernetes/NoSQL

Redis - corrupted cluster config file

by 여행을 떠나자! 2021. 10. 2.

1. Environment

- helm chart redis-cluster-6.0.3 / redis 6.2.3
- Helm 3.3.1 / Kubernetes 1.16.15 - Kubernetes 1.16.15

 

 

2. Problem

- ptts-redis-cluster-1 error : CrashLoopBackOff

$ k get pod -n ptts -l app.kubernetes.io/name=redis-cluster -o wide
NAME                   READY   STATUS             RESTARTS   AGE    IP             NODE    NOMINATED NODE   READINESS GATES
ptts-redis-cluster-0   2/2     Running            0          25d    10.244.4.203   iap10   <none>           <none>
ptts-redis-cluster-1   1/2     CrashLoopBackOff   6          9m6s   10.244.6.150   iap13   <none>           <none>
ptts-redis-cluster-2   2/2     Running            0          21h    10.244.3.43    iap11   <none>           <none>
ptts-redis-cluster-3   2/2     Running            0          21h    10.244.6.149   iap13   <none>           <none>
ptts-redis-cluster-4   2/2     Running            0          21h    10.244.2.91    iap12   <none>           <none>
ptts-redis-cluster-5   2/2     Running            0          21h    10.244.3.42    iap11   <none>           <none>
$

 

- 영향도 분석

   ✓ 장애가 발생한 Pod는 slave 중 하나이며, 정상적으로 서비스하고 있음

$ export REDIS_PASSWORD=$(kubectl get secret --namespace ptts ptts-redis-cluster -o jsonpath="{.data.redis-password}" | base64 --decode)
$ kubectl run --namespace ptts ptts-redis-cluster-client --rm --tty -i --restart='Never' \
                   --env REDIS_PASSWORD=$REDIS_PASSWORD --image docker.io/bitnami/redis-cluster:6.2.3-debian-10-r0 -- bash
I have no name!@ptts-redis-cluster-client:/$ redis-cli -c -h ptts-redis-cluster -a $REDIS_PASSWORD cluster nodes
3be1be6256d939340183ab6cf75725efe3877b45 10.244.4.203:6379@16379 master        - 0 1632794305752 1  connected 0-5460
e19c381e213ca6c8939e7d3b1e344f7d32c24154 10.244.3.42:6379@16379  myself,master - 0 1632794308000 13 connected 5461-10922
9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 10.244.6.149:6379@16379 master        - 0 1632794309770 14 connected 10923-16383
2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 10.244.2.91:6379@16379  slave         3be1be6256d939340183ab6cf75725efe3877b45 0             1632794307761 1  connected
1af64755438c681e5b14e16af7ea544f8818f536 10.244.5.86:6379@16379  slave,fail    e19c381e213ca6c8939e7d3b1e344f7d32c24154 1632716466039 1632716466039 13 connected
715c5a4e67d57fbe53d0f4d7cc05dfd1f27ab568 10.244.3.43:6379@16379  slave         9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 0             1632794308765 14 connected
I have no name!@ptts-redis-cluster-client:/$
I have no name!@ptts-redis-cluster-client:/$ redis-cli -c -h ptts-redis-cluster -a $REDIS_PASSWORD
ptts-redis-cluster:6379> set key1 hello
-> Redirected to slot [9189] located at 10.244.3.42:6379
OK
10.244.3.42:6379> del key1
(integer) 1
10.244.3.42:6379>

   ✓ Cluster nodes command

       ▷ format : <id> <ip:port@cport> <flags> <master> <ping-sent> <pong-recv> <config-epoch> <link-state> <slot> ... <slot>

       Redis cluster > 3.  Configure Redis cluster > c. Connect using the Redis CLI

 

 

3. Cause analysis

- Error message : Unrecoverable error: corrupted cluster config file

$ k logs ptts-redis-cluster-1 -n ptts -c ptts-redis-cluster
redis-cluster 01:52:42.82
redis-cluster 01:52:42.82 Welcome to the Bitnami redis-cluster container
redis-cluster 01:52:42.83 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-redis-cluster
redis-cluster 01:52:42.83 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-redis-cluster/issues
redis-cluster 01:52:42.83
redis-cluster 01:52:42.84 INFO  ==> ** Starting Redis setup **
redis-cluster 01:52:42.87 INFO  ==> Initializing Redis
redos-cluster 01:52:42.89 INFO  ==> Setting Redis config file
Changing old IP 10.244.4.203 by the new one 10.244.4.203
Changing old IP 10.244.6.150 by the new one 10.244.6.150
Changing old IP 10.244.3.43 by the new one 10.244.3.43
Changing old IP 10.244.6.149 by the new one 10.244.6.149
Changing old IP 10.244.2.91 by the new one 10.244.2.91
Changing old IP 10.244.3.42 by the new one 10.244.3.42
redis-cluster 01:52:43.11 INFO  ==> ** Redis setup finished! **

1:C 28 Sep 2021 01:52:43.155 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 28 Sep 2021 01:52:43.155 # Redis version=6.2.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 28 Sep 2021 01:52:43.155 # Configuration loaded
1:M 28 Sep 2021 01:52:43.156 * monotonic clock: POSIX clock_gettime
1:M 28 Sep 2021 01:52:43.159 # Unrecoverable error: corrupted cluster config file.

 

- ptts-redis-cluster-1  Pod spec

$ k describe pod ptts-redis-cluster-1 -n ptts
Name:         ptts-redis-cluster-1
...
Containers:
  ptts-redis-cluster:
  ...
    Mounts:
      /bitnami/redis/data from redis-data (rw)
  ...
Volumes:
  redis-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  redis-data-ptts-redis-cluster-1
  ...
$ k get pvc redis-data-ptts-redis-cluster-1 -n ptts
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
redis-data-ptts-redis-cluster-1   Bound    pvc-12e6d749-5ef5-4f4d-adc9-c6ff3bda71f2   8Gi        RWO            nfs-sc-iap     29d
$

 

- redis-data-ptts-redis-cluster-1 PVC 검사

   ✓ Storageclass가 NFS-Client provisioner이기 때문에 다음과 같이 NAS에 직접 접근이 가능 함

   ✓ PVC내에 있는 nodes.conf 파일이 비워있기 때문에 발생하였음

   ✓ nodes.conf 파일이 왜 비워졌는지는 파악 못 함

$ cd /nfs_01/ptts-redis-data-ptts-redis-cluster-1-pvc-12e6d749-5ef5-4f4d-adc9-c6ff3bda71f2
$ ll
total 16
-rw-r--r--. 1 1001 root  92 Sep 23 13:58 appendonly.aof
-rw-r--r--. 1 1001 root 175 Sep 23 13:58 dump.rdb
-rw-r--r--. 1 1001 root   1 Oct  1 14:01 nodes.conf
-rw-r--r--. 1 1001 root  92 Sep 23 13:47 temp-rewriteaof-bg-4022.aof
$ cat nodes.conf

$

 

 

4. Solution

- nodes.conf 파일 수정 및 redis-data-ptts-redis-cluster-1 재 기동

   redis-data-ptts-redis-cluster-0 Pod의 nodes.conf를 복사하고, nodes.conf 파일을 수정 함 (myself를 장애 노드로 변경)

$ cd /nfs_01/ptts-redis-data-ptts-redis-cluster-1-pvc-12e6d749-5ef5-4f4d-adc9-c6ff3bda71f2
$ cp ../ptts-redis-data-ptts-redis-cluster-0-pvc-a3ae600f-a38b-46e3-90ab-0889247b07c4/nodes.conf .
$ cat nodes.conf
9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 10.244.6.149:6379@16379 master - 0 1633070274495 14 connected 10923-16383
2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 10.244.2.91:6379@16379 master - 0 1633070274000 15 connected 0-5460
3be1be6256d939340183ab6cf75725efe3877b45 10.244.4.10:6379@16379 myself,slave 2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 0 1633070274000 15 connected
e19c381e213ca6c8939e7d3b1e344f7d32c24154 10.244.3.42:6379@16379 master - 0 1633070272000 13 connected 5461-10922
1af64755438c681e5b14e16af7ea544f8818f536 10.244.5.86:6379@16379 slave,fail e19c381e213ca6c8939e7d3b1e344f7d32c24154 1632716466039 1632716466039 13 connected
715c5a4e67d57fbe53d0f4d7cc05dfd1f27ab568 10.244.3.43:6379@16379 slave 9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 0 1633070274000 14 connected
vars currentEpoch 15 lastVoteEpoch 14
$ vi nodes.conf
9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 10.244.6.149:6379@16379 master - 0 1633070274495 14 connected 10923-16383
2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 10.244.2.91:6379@16379 master - 0 1633070274000 15 connected 0-5460
3be1be6256d939340183ab6cf75725efe3877b45 10.244.4.10:6379@16379 slave 2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 0 1633070274000 15 connected
e19c381e213ca6c8939e7d3b1e344f7d32c24154 10.244.3.42:6379@16379 master - 0 1633070272000 13 connected 5461-10922
1af64755438c681e5b14e16af7ea544f8818f536 10.244.5.86:6379@16379 myself,slave,fail e19c381e213ca6c8939e7d3b1e344f7d32c24154 1632716466039 1632716466039 13 connected
715c5a4e67d57fbe53d0f4d7cc05dfd1f27ab568 10.244.3.43:6379@16379 slave 9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 0 1633070274000 14 connected
vars currentEpoch 15 lastVoteEpoch 14
$ 
$ k delete pod redis-data-ptts-redis-cluster-1 -n ptts
$

 

- redis-data-ptts-redis-cluster-1 상태 확인

   ✓ redis-data-ptts-redis-cluster-1 재기동 되면서 IP 주소가 10.244.6.214로 할당 하였음

   ✓ 이전 IP 주소(10.244.5.86)를 신규 IP 주소로 변경 

       Changing old IP 10.244.5.86 by the new one 10.244.6.214

$ k get pod -n ptts -l app.kubernetes.io/name=redis-cluster
NAME                   READY   STATUS    RESTARTS   AGE
ptts-redis-cluster-0   2/2     Running   0          4d1h
ptts-redis-cluster-1   2/2     Running   0          6m26s
ptts-redis-cluster-2   2/2     Running   0          4d1h
ptts-redis-cluster-3   2/2     Running   0          4d1h
ptts-redis-cluster-4   2/2     Running   0          4d1h
ptts-redis-cluster-5   2/2     Running   0          4d1h
$ k describe pod ptts-redis-cluster-1 -n ptts | grep "^IP:"
IP:           10.244.6.214
$ k logs ptts-redis-cluster-1 -n ptts -c ptts-redis-cluster
COPYING FILE
redis-cluster 06:30:54.68
redis-cluster 06:30:54.68 Welcome to the Bitnami redis-cluster container
redis-cluster 06:30:54.68 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-redis-cluster
redis-cluster 06:30:54.68 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-redis-cluster/issues
redis-cluster 06:30:54.68
redis-cluster 06:30:54.69 INFO  ==> ** Starting Redis setup **
redis-cluster 06:30:54.71 INFO  ==> Initializing Redis
redis-cluster 06:30:54.75 INFO  ==> Setting Redis config file
Changing old IP 10.244.4.10 by the new one 10.244.4.10
Changing old IP 10.244.5.86 by the new one 10.244.6.214
Changing old IP 10.244.3.43 by the new one 10.244.3.43
Changing old IP 10.244.6.149 by the new one 10.244.6.149
Changing old IP 10.244.2.91 by the new one 10.244.2.91
Changing old IP 10.244.3.42 by the new one 10.244.3.42
redis-cluster 06:30:54.99 INFO  ==> ** Redis setup finished! **

1:C 01 Oct 2021 06:30:55.036 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 Oct 2021 06:30:55.037 # Redis version=6.2.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 Oct 2021 06:30:55.037 # Configuration loaded
1:M 01 Oct 2021 06:30:55.037 * monotonic clock: POSIX clock_gettime
1:M 01 Oct 2021 06:30:55.040 * Node configuration loaded, I'm 1af64755438c681e5b14e16af7ea544f8818f536
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 6.2.3 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in cluster mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           https://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:M 01 Oct 2021 06:30:55.042 # Server initialized
1:M 01 Oct 2021 06:30:55.080 * Reading RDB preamble from AOF file...
1:M 01 Oct 2021 06:30:55.080 * Loading RDB produced by version 6.2.3
1:M 01 Oct 2021 06:30:55.080 * RDB age 312 seconds
1:M 01 Oct 2021 06:30:55.080 * RDB memory usage when created 2.47 Mb
1:M 01 Oct 2021 06:30:55.080 * RDB has an AOF tail
1:M 01 Oct 2021 06:30:55.080 * Reading the remaining AOF tail...
1:M 01 Oct 2021 06:30:55.080 * DB loaded from append only file: 0.038 seconds
1:M 01 Oct 2021 06:30:55.080 # I have keys for slot 1860, but the slot is assigned to another node. Setting it to importing state.
1:M 01 Oct 2021 06:30:55.081 * Ready to accept connections
1:M 01 Oct 2021 06:30:55.083 # Cluster state changed: ok
$

 

- Redis cluster 노드 상태가 정상

$ k exec ptts-redis-cluster-client -n ptts -it -- bash
I have no name!@ptts-redis-cluster-client:/$ redis-cli -c -h ptts-redis-cluster -a $REDIS_PASSWORD cluster nodes
2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 10.244.2.91:6379@16379  master - 0 1633070128000 15 connected 0-5460
9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 10.244.6.149:6379@16379 master - 0 1633070126000 14 connected 10923-16383
e19c381e213ca6c8939e7d3b1e344f7d32c24154 10.244.3.42:6379@16379  myself,master - 0 1633070129000 13 connected 5461-10922
1af64755438c681e5b14e16af7ea544f8818f536 10.244.6.214:6379@16379 slave e19c381e213ca6c8939e7d3b1e344f7d32c24154 0 1633070128877 13 connected
715c5a4e67d57fbe53d0f4d7cc05dfd1f27ab568 10.244.3.43:6379@16379  slave 9f6177e6163dbffee8aa7b9c88f1e4d8a0ab1dde 0 1633070126865 14 connected
3be1be6256d939340183ab6cf75725efe3877b45 10.244.4.10:6379@16379  slave 2d7f14c0e5d617db128ca6bd51e4eac47ddf89d1 0 1633070129883 15 connected
I have no name!@ptts-redis-cluster-client:/$

 

'Kubernetes > NoSQL' 카테고리의 다른 글

MongoDB Sharded by Bitnami  (0) 2021.11.03
MongoDB Community Kubernetes Operator  (1) 2021.11.03
Elastic Cloud on Kubernetes (ECK)  (0) 2021.09.22
Redis cluster  (0) 2021.09.22
Elasticsearch - Index lifecycle error  (0) 2021.09.15

댓글