본문 바로가기
Kubernetes/Management

Velero와 restic - 'signal: killed'

by 여행을 떠나자! 2021. 12. 16.

1. 개요

- velero backup 실행 시 발생된 에러의 원인을 파악하고 조치한다.

 

 

2. 환경

- Velero 1.7 & restic 0.12.0

- MinIO 2021-11-09T03:21:45Z

- Kubernetes 1.16.15

 

 

3. 문제점

- PostgreSQL Pod의 볼륨을 백업하는 도중에 'signal: killed'에러가 발생되었다.

   상세 에러 메시지: od volume backup failed: error running restic backup, stderr=: signal: killed

$ k logs -n velero velero-77bd5cd848-k54rk -f | grep 'level=error'
…
time="2021-12-16T06:24:38Z" 
    level=error 
    msg="Error backing up item" 
    backup=velero/pgo-211216 
    error="pod volume backup failed: error running restic backup, stderr=: signal: killed" 
    error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184" 
    error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" 
    logSource="pkg/backup/backup.go:435" 
    name=emo-dev-67b8456944-xv8qj
…

 

- 위 에러는 볼륨을 백업하는 기능을 수행하는 restic pod에서 발생되었다. resitc pod의 로그는 아래와 같다.

$ k logs daemonsets/restic -n velero -f
…
time="2021-12-16T06:16:53Z" level=info msg="Looking for most recent completed pod volume backup for this PVC" backup=velero/pgo-211216 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:340" name=pgo-211216-fwthd namespace=velero pvcUID=0ee1af5c-22c5-4f6a-a1f5-fedd8f37d878
time="2021-12-16T06:16:53Z" level=info msg="No completed pod volume backup found for PVC" backup=velero/pgo-211216 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:370" name=pgo-211216-fwthd namespace=velero pvcUID=0ee1af5c-22c5-4f6a-a1f5-fedd8f37d878
time="2021-12-16T06:16:53Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=velero/pgo-211216 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:277" name=pgo-211216-fwthd namespace=velero
time="2021-12-16T06:24:28Z" 
    level=error 
    msg="Error running command=restic backup --repo=s3:https://api.acp.kt.co.kr:9002/k8s-ext/restic/pgo 
                                             --password-file=/tmp/credentials/velero/velero-restic-credentials-repository-password 
                                             --cacert=/tmp/cacert-default077612050 
                                             --cache-dir=/scratch/.cache/restic . 
                                             --tag=pod-uid=5c9521d3-5b8f-4136-a49e-db63d902caf7 
                                             --tag=pvc-uid=0ee1af5c-22c5-4f6a-a1f5-fedd8f37d878 
                                             --tag=volume=pgdata 
                                             --tag=backup=pgo-211216 
                                             --tag=backup-uid=add7c4e2-b8c8-41e7-b1f1-f810ee130e98 
                                             --tag=ns=pgo 
                                             --tag=pod=emo-dev-67b8456944-xv8qj 
                                             --host=velero 
                                             --json, 
         stdout={\"message_type\":\"status\",\"seconds_elapsed\":2,\"percent_done\":0,\"total_files\":1,\"total_bytes\":3}\n
                {\"message_type\":\"status\",\"seconds_elapsed\":2,\"percent_done\":0,\"total_files\":2,\"total_bytes\":8195}\n
                …
                {\"message_type\":\"status\",\"seconds_elapsed\":453,\"seconds_remaining\":470,\"percent_done\":0.49040538451497323,\"total_files\":3488,\"files_done\":1247,\"total_bytes\":61138275673,\"bytes_done\":29982539590,\"current_files\":[\"/emo-dev/base/16397/39421\",\"/emo-dev/base/16397/39423\”]},
         stderr=" backup=velero/pgo-211216 
                  controller=pod-volume-backup 
                  error="signal: killed" 
                  error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/pod_volume_backup_controller.go:291" 
                  error.function="github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup" 
                  logSource="pkg/controller/pod_volume_backup_controller.go:291" 
    name=pgo-211216-fwthd 
    namespace=velero
…

 

 

4. 원인

- 백업을 수행하면서 restic pod의 최대 메모리 허용량(Limits.memory 1Gi)까지 도달됨을 확인할 수 있다. 메모리 부족으로 프로세스가 종료되었다.

$ k describe daemonsets.apps restic -n velero | grep Limits -A5
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     500m
      memory:  512Mi
$

   restic-psfgj Pod의 메모리가 지속적으로 증가되고 있다.

Every 2.0s: kubectl top pod -n velero --sort-by memory          ysjeon-Dev.local: Thu Dec 16 16:10:39 2021

NAME                      CPU(cores)   MEMORY(bytes)
velero-77bd5cd848-k54rk   88m          394Mi
restic-psfgj              998m         286Mi
restic-24656              1m           212Mi
restic-cv72n              3m           160Mi
restic-6c6bv              2m           92Mi
restic-pdlls              5m           69Mi

...

Every 2.0s: kubectl top pod -n velero --sort-by memory         ysjeon-Dev.local: Thu Dec 16 16:17:40 2021

NAME                      CPU(cores)   MEMORY(bytes)
restic-psfgj              1006m        919Mi
velero-77bd5cd848-k54rk   118m         390Mi
restic-24656              5m           212Mi
restic-cv72n              2m           156Mi
restic-6c6bv              2m           92Mi
restic-pdlls              2m           68Mi

 

 

5. 조치사항

- restic Pod의 최대 메모리 사용량을 1GB에서 4GB로 변경하였다. CPU 사용량도 최대치에 도달하였기에 1에서 4로 설정하였다. 

$ k edit daemonsets.apps restic -n velero
…
        resources:
          limits:
            cpu: “4”
            memory: 4Gi
          requests:
            cpu: 500m
            memory: 512Mi
$

 

댓글