본문 바로가기
Kubernetes/Management

Velero와 restic으로 K8s 백업/복구 in on-premise

by 여행을 떠나자! 2021. 12. 8.

1. 개요

- Velero (https://velero.io/)?

   ✓ Velero는 쿠버네티스 리소스와 퍼시스턴트 볼륨을 오프젝트 스토리지로 백업하는 툴이다.

   ✓ Velero는 로컬에서 수행하는 클라이언트 CLI(Command-line Interface)와 쿠버네티스 클러스터에서 운영되는 서버로 구성되어 있다.

   ✓ 클라우드 프로바이더가 제공하는 블럭 스토리지 스냅샷 기능을 이용하여 PV(Persistent Volume)에 대한 스냅샷을 생성하여 백업한다.       

   ✓ On-premise 환경에서는 restic 오픈소스를 활용하여 PV를 오브젝트 스토리지로 백업한다.  (Velero 1.5 부터)

       https://velero.io/blog/velero-1.5-for-and-by-community/ 

 

- Velero 활용

   ✓ Take backups of your cluster and restore in case of loss.

   Migrate cluster resources to other clusters.

   ✓ Replicate your production cluster to development and testing clusters.

 

- Velero가 제공하는 스토리지 프로바이더 (https://velero.io/docs/v1.7/supported-providers/)

   ✓ Velero supported providers: AWS, GCP, MS Azure,...

   ✓ Community supported providers: AlibabaCloud, OpenEBS, OpenStack,...

   ✓ S3-Compatible object store providers: Mninio, Ceph RADOS,...

 

 

2. 환경

- Velero 1.7 & restic 0.12.0

- MinIO 2021-11-09T03:21:45Z

- Kubernetes 1.16.15

- quay.io/external_storage/nfs-client-provisioner v3.1.0-k8s1.11: PV(Persistent Volume) 

 

 

3. Velero 구성 

a. MinIO 오브젝트 스토리지 구성

- MinIO 구성 절차

   'MinIO - Distributed Mode'문서를 참조한다.

- Minio 서버 정보

   ✓ 주소 : api.acp.kt.co.kr:9000

   ✓ Velero가 사용할 계정 (aws_access_key_id/aws_secret_access_key): velero / *****

   ✓ velero가 사용할 bucket: velero

   ✓ resion: ap-northeast-2

 

b. Velero CLI 설치

$ wget https://github.com/vmware-tanzu/velero/releases/download/v1.7.0/velero-v1.7.0-linux-amd64.tar.gz
...
$ tar xzvf velero-v1.7.0-linux-amd64.tar.gz
x velero-v1.7.0-linux-amd64/LICENSE
x velero-v1.7.0-linux-amd64/examples/README.md
x velero-v1.7.0-linux-amd64/examples/minio
x velero-v1.7.0-linux-amd64/examples/minio/00-minio-deployment.yaml
x velero-v1.7.0-linux-amd64/examples/nginx-app
x velero-v1.7.0-linux-amd64/examples/nginx-app/README.md
x velero-v1.7.0-linux-amd64/examples/nginx-app/base.yaml
x velero-v1.7.0-linux-amd64/examples/nginx-app/with-pv.yaml
x velero-v1.7.0-linux-amd64/velero
$ mv velero-v1.7.0-linux-amd64/velero ~/bin/
$

 

c. Velero 서버 컴포넌트 설치 및 구성 

- MinIO 인증서 복사, 자격증명 파일 생성

   ✓ 오브젝트 스토리지로 사용할 MinIO는 self-signed certificate를 사용하기 때문에 MinIO의 인증서가 필요한다.

   ✓ Velero가 백업으로 사용할 MinIO의 계정 정보를 저장할 자격증명 파일을 생성한다.

       Velero는 아래의 계정 정보를 이용하여 MinIO에 접속 후 쿠버네티스 리소스들을 백업한다.

       restic도 아래의 계정 정보를 이용하여 MinIO에 접속 후 쿠버네티스 PV들을 백업한다. 

            restic은 암호에 특수문자가 포함되어 있으면 로그인할 수 없다.       

           자세한 내용은 아래 트러블슈팅을 참고하기 바란다.

$ scp minio@iap12:~/.minio/certs/ca.crt .
minio@iap12's password:
ca.crt                    100% 1269     1.0MB/s   00:00
$ vi credentials-velero
[default]
aws_access_key_id = OU32SEVWIRYSNFQKBEB9
aws_secret_access_key = s9T1+dtQNTHJkas4j14Z2B3usWy5MBljXixmEl74
[acp@iap01 velero]$
$

 

- Velero 서버 컴포넌트 설치

   ✓ 'velero install' 명령어로 Kubernetes에 서버 컴포넌트를 설치한다.

   ✓ 백업을 저장할 오브젝트 스토리지로 AWS S3 호환 MinIO를 사용한다.

   ✓ 다음과 같이 옵션을 설정한다. (https://velero.io/docs/v1.7/customize-installation/)

       ▷ plugins: 설치할 Velero 서버의 버전과 호환되는 AWS 플러그인 버전

       ▷ bucket, backup-location-config, cacert, secret-file: 백업 데이터를 저장할 스토리지 접속 정보 (MinIO 서버)

       ▷ use-volume-snapshots: 볼륨 스냅샷을 사용하지 않기 때문에 false로 지정

       ▷ use-restic: resitc 사용 지정

$ velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.3.0 \
  --bucket velero \
  --backup-location-config region=ap-northeast-2,s3ForcePathStyle="true",s3Url=https://api.acp.kt.co.kr:9000 \
  --cacert ca.crt \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --use-restic
I1120 07:54:17.744074   16192 request.go:655] Throttling request took 1.149216168s, request: GET:https://14.52.244.136:7443/apis/batch/v1?timeout=32s
I1120 07:54:27.943938   16192 request.go:655] Throttling request took 11.348477825s, request: GET:https://14.52.244.136:7443/apis/apiextensions.k8s.io/v1?timeout=32s
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
...
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
$

 

- Velero 서버 컴포넌트 설치 확인

$ kubectl get deployments.apps -n velero
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
velero   1/1     1            1           5m12s
$ kubectl get daemonsets.apps -n velero
NAME     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
restic   5         5         4       5            4           <none>          14s
$
$ kubectl logs deployments/velero -n velero -f
...
time="2021-11-16T11:34:03Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
time="2021-11-16T11:34:03Z" level=info msg="Backup storage location valid, marking as available" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:121"

   Velero에서 MinIO에 접속이 성공되면 Phase가 Available로 표시된다.

$ kubectl get backupstoragelocations -n velero
NAME      PHASE       LAST VALIDATED   AGE    DEFAULT
default   Available   28s              111s   true
$

 

- Velero CLI 자동 완성

$ sudo yum install bash-completion -y
$ sudo velero completion bash > /etc/bash_completion.d/velero
$ vi ~/.bash_profile
...
#
# Velero CLI autocompletion
source <(velero completion bash)
alias v=velero
complete -F __start_velero v
...
$ source ~/.bash_profile

 

- PV 백업 대상 설정 방법 

   ✓ Velero는 restic으로 PV 백업 시 Pod에 마운트 된 볼륨을 대상으로 하며, 다음의 볼륨 타입은 제외한다.

       the default service account token, kubernetes secrets, and config maps

   ✓opt-in 접근 (디폴트)

      Pod의 annotation 절을 참조하여 등록된 볼륨들만 백업한다.   

      kubectl -n your_namespace annotate pod/your_pod_name backup.velero.io/backup-volumes-includes=pvc1-vm

$ k describe pod nginx-deployment-f96b7fd86-w9k7f -n nginx-example
Name:         nginx-deployment-f96b7fd86-w9k7f
Namespace:    nginx-example
...
Containers:
  nginx:
    Mounts:
      /var/log/nginx from nginx-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5j6pb (ro)
...
Volumes:
  nginx-logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nginx-logs
    ReadOnly:   false
$
$ kubectl -n nginx-example annotate pod/nginx-deployment-f96b7fd86-w9k7f backup.velero.io/backup-volumes-includes=nginx-logs
pod/nginx-deployment-f96b7fd86-w9k7f annotated
$
$ k describe pod nginx-deployment-f96b7fd86-w9k7f -n nginx-example | grep -i annotation -A1
Annotations:  backup.velero.io/backup-volumes-includes: nginx-logs
Status:       Running
$

   ✓opt-out 접근 

      Pod에 마운트 된 모든 볼륨을 백업한다. 다만 백업에서 제외하고자 할 경우 annotaion절에 선언한다.

      kubectl -n your_namespace annotate pod/your_pod_name backup.velero.io/backup-volumes-excludes=pvc1-vm

      opt-out 방식으로 동작하기 위해서는 아래와 같이 '--default-volumns-to-restic' 옵션을 추가해야 한다.

$ kubectl edit deployments.apps -n velero velero
...
    spec:
      containers:
      - args:
        - server
        - --features=
        - --default-volumes-to-restic
...
$

 

 

4. 백업, 복구 테스트 

a. 백업 환경 구성

- velero git를 다운로드하고 PV를 사용하는 nginx를 kubernetes에 배포한다.

   예제 Pod는 Annotation절에 백업용 hook 명령어 설정되어 있으며, NAS를 사용하여 PV를 할당하는 경우 에러가 발생된다. Annotation 설정을 삭제하고 백업한다. 자세한 내용은 아래 Trouble shooting을 참고해라.

$ git clone https://github.com/vmware-tanzu/velero.git
...
$ cd velero
$ kubectl apply -f examples/nginx-app/with-pv.yaml
namespace/nginx-example created
persistentvolumeclaim/nginx-logs created
deployment.apps/nginx-deployment created
service/my-nginx created
$ kubectl get all -n nginx-example
NAME                                   READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-f96b7fd86-mfvjf   2/2     Running   0          18s

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
service/my-nginx   LoadBalancer   10.96.239.180   14.52.244.141   80:30865/TCP   18s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   1/1     1            1           18s

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-deployment-f96b7fd86   1         1         1       18s
$ kubectl get pvc -n nginx-example
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-logs   Bound    pvc-4a7e5c60-e536-4650-b656-5aa3bd2bc82e   50Mi       RWO            nfs-sc-iap     41s
$

 

- 복구 후 정상적으로 되었는지 확인하기 위하여 PV에 데이터를 저장한다. 

$ kubectl get svc -n nginx-example
NAME       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
my-nginx   LoadBalancer   10.107.128.20   14.52.244.141   80:32422/TCP   8d
$ curl http://14.52.244.141
...
$ curl http://14.52.244.141/test
...
$ kubectl exec nginx-deployment-f96b7fd86-w9k7f -c nginx -n nginx-example -it -- cat /var/log/nginx/access.log
10.244.0.0 - - [08/Dec/2021:06:39:59 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-"
10.244.0.0 - - [08/Dec/2021:06:40:04 +0000] "GET /test HTTP/1.1" 404 153 "-" "curl/7.29.0" "-"
$

 

b. 백업

- velero 명령어로 nginx-example 네임스페이스의 리소스와 PV를 백업한다.

$ velero backup create nginx-backup --include-namespaces nginx-example --wait 
Backup request "nginx-backup" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...........................................
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe nginx-backup` and `velero backup logs nginx-backup`.
$

 

- 정상적으로 백업이 되었는지 확인한다. 

$ velero backup get
NAME           STATUS      ERRORS  WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   0       0          2021-12-08 15:41:03 +0900 KST   29d       default            <none>
$
$ velero backup describe nginx-backup --details --insecure-skip-tls-verify
Name:         nginx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.16.15
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=16

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  nginx-example
  Excluded:  <none>
Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>
Storage Location:  default
Velero-Native Snapshot PVs:  auto
TTL:  720h0m0s
Hooks:  <none>
Backup Format Version:  1.1.0

Started:    2021-12-08 15:41:03 +0900 KST
Completed:  2021-12-08 15:41:25 +0900 KST
Expiration:  2022-01-07 15:41:01 +0900 KST

Total items to be backed up:  28
Items backed up:              28

Resource List:
  apps/v1/Deployment:
    - nginx-example/nginx-deployment
  apps/v1/ReplicaSet:
    - nginx-example/nginx-deployment-f96b7fd86
  v1/Endpoints:
    - nginx-example/my-nginx
  v1/Event:
    - nginx-example/my-nginx.16beb4037732fbba
    - nginx-example/my-nginx.16beb403777deda8
    - nginx-example/my-nginx.16beb403782597d7
    - nginx-example/my-nginx.16beb40670b26ce6
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb4036a87ef74
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb4040574ec52
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb4040b8c72b7
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb40471b2ba74
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb405c288d1be
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb405c90c19fd
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb405f21d0a21
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb405f24f8440
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb405f900e905
    - nginx-example/nginx-deployment-f96b7fd86-zswfj.16beb4062a8dac4f
    - nginx-example/nginx-logs.16beb40360c0fcfe
    - nginx-example/nginx-logs.16beb40361448917
    - nginx-example/nginx-logs.16beb40363c4210c
  v1/Namespace:
    - nginx-example
  v1/PersistentVolume:
    - pvc-2642f490-41d6-4bb0-8c7b-14752a3c7a60
  v1/PersistentVolumeClaim:
    - nginx-example/nginx-logs
  v1/Pod:
    - nginx-example/nginx-deployment-f96b7fd86-zswfj
  v1/Secret:
    - nginx-example/default-token-2v85d
    - nginx-example/istio.default
  v1/Service:
    - nginx-example/my-nginx
  v1/ServiceAccount:
    - nginx-example/default

Velero-Native Snapshots: <none included>

Restic Backups:
  Completed:
    nginx-example/nginx-deployment-f96b7fd86-zswfj: nginx-logs
$
$ velero backup logs nginx-backup --insecure-skip-tls-verify | head
time="2021-12-08T06:41:03Z" level=info msg="Setting up backup temp file" backup=velero/nginx-backup logSource="pkg/controller/backup_controller.go:556"
time="2021-12-08T06:41:03Z" level=info msg="Setting up plugin manager" backup=velero/nginx-backup logSource="pkg/controller/backup_controller.go:563"
...

nginx-backup.logs
0.13MB

 

- MinIO 서버에서 백업 데이터를 조회한다.

   ✓ 쿠버네티스 리소스

        경로: Buckets > velero / backups / nginx-backup

   ✓ PV

      쿠버네티스 네임스페이스별로 디렉터리가 생성되고 디렉터리 내에 Pod에서 참조되는 PV들이 restic에서 관리하는 자료구조로 저장된다.

      경로: Buckets > velero / restic / nginx-example

 

c. 장애 유발

- 장애를 발생시키지 위하여 백업받은 네임스페이스를 삭제한다.

$ kubectl delete ns nginx-example
namespace "nginx-example" deleted
$

 

d. 복구

- Velero 백업 리스트를 조회한다.

$ velero backup get
NAME           STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   0        0          2021-12-08 15:41:03 +0900 KST   29d       default            <none>$

 

- nginx-example 네임스페이스의 리소스들을 복구하고 복구 결과를 조회한다.

$ velero restore create --from-backup nginx-backup --wait
Restore request "nginx-backup-20211208155305" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.....................................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe nginx-backup-20211208155305` and `velero restore logs nginx-backup-20211208155305`.
$
$ velero restore describe nginx-backup-20211208155305 --details
Name:         nginx-backup-20211208155305
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  11
Items restored:              11

Started:    2021-12-08 15:53:05 +0900 KST
Completed:  2021-12-08 15:53:42 +0900 KST

Backup:  nginx-backup

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Restic Restores:
  Completed:
    nginx-example/nginx-deployment-f96b7fd86-zswfj: nginx-logs

Preserve Service NodePorts:  auto
$

 

- nginx-exampel 네임스페이스가 정상적으로 복구되었는지 확인한다.

   복구 시 Services가 생성되면서 Cluster IP와 NodePort는 신규로 할당되었다. 

   백업 시점의 NodePort로 복구하고자 할 경우는 '--preserve-nodeports' 옵션을 지정하면 된다.

$ kubectl get all -n nginx-example
NAME                                   READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-f96b7fd86-hslwk   2/2     Running   0          4m55s

NAME               TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)        AGE
service/my-nginx   LoadBalancer   10.101.210.192   14.52.244.141   80:31099/TCP   4m54s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   1/1     1            1           4m55s

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-deployment-f96b7fd86   1         1         1       4m55s
$ 
$ kubectl exec -n nginx-example nginx-deployment-f96b7fd86-zswfj -it -- cat /var/log/nginx/access.log
Defaulting container name to nginx.
Use 'kubectl describe pod/nginx-deployment-f96b7fd86-zswfj -n nginx-example' to see all of the containers in this pod.
10.244.0.0 - - [08/Dec/2021:06:39:59 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-"
10.244.0.0 - - [08/Dec/2021:06:40:04 +0000] "GET /test HTTP/1.1" 404 153 "-" "curl/7.29.0" "-"
$

 

 

5. 스케줄 백업

- 주기적으로 백업할 경우는 'velero schedule create' 명령어를 실행하면 된다.

   백업 주기는 Crontab 형식으로 지정하면 되며, 다음 예제는 10분 주기로 nginx-example 네임스페이스를 백업하게 된다.

$ velero schedule create nginx-crontab --include-namespaces nginx-example --schedule="*/10 * * * *"
Schedule "nginx-crontab" created successfully.
$
[acp@iap01 velero]$ velero schedule get
NAME            STATUS    CREATED                         SCHEDULE      BACKUP TTL   LAST BACKUP   SELECTOR
nginx-crontab   Enabled   2021-11-16 21:15:59 +0900 KST   0/5 * * * *   720h0m0s     37s ago       <none>
$
$ velero backup get | egrep 'NAME|nginx-crontab'
NAME                           STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION SELECTOR
nginx-crontab-20211208080042   Completed   0        0          2021-12-08 17:00:42 +0900 KST   29d       default <none>
nginx-crontab-20211208075042   Completed   0        0          2021-12-08 16:50:42 +0900 KST   29d       default <none>
nginx-crontab-20211208074042   Completed   0        0          2021-12-08 16:40:42 +0900 KST   29d       default <none>
nginx-crontab-20211208073042   Completed   0        0          2021-12-08 16:30:42 +0900 KST   29d       default <none>
nginx-crontab-20211208072937   Completed   0        0          2021-12-08 16:29:37 +0900 KST   29d       default <none>
$
$ velero schedule delete nginx-crontab
Are you sure you want to continue (Y/N)? y
Schedule deleted: nginx
$

 

 

6. Trouble shooting

- Case #1

   Problem: "restic repository is not ready, The request signature we calculated does not match the signature you provided"

$ velero backup logs nginx-backup --insecure-skip-tls-verify
...
time="2021-12-01T02:32:59Z" level=info msg="Processing item" backup=velero/nginx-backup logSource="pkg/backup/backup.go:358" name=ml-pipeline-ui-artifact-84f654dd94-84mwv namespace=admin progress= resource=pods
time="2021-12-01T02:32:59Z" level=info msg="Backing up item" backup=velero/nginx-backup logSource="pkg/backup/item_backupper.go:121" name=ml-pipeline-ui-artifact-84f654dd94-84mwv namespace=admin resource=pods
time="2021-12-01T02:32:59Z" level=info msg="Executing custom action" backup=velero/nginx-backup logSource="pkg/backup/item_backupper.go:327" name=ml-pipeline-ui-artifact-84f654dd94-84mwv namespace=admin resource=pods
time="2021-12-01T02:32:59Z" level=info msg="Executing podAction" backup=velero/nginx-backup cmd=/velero logSource="pkg/backup/pod_action.go:51" pluginName=velero
time="2021-12-01T02:32:59Z" level=info msg="Done executing podAction" backup=velero/nginx-backup cmd=/velero logSource="pkg/backup/pod_action.go:77" pluginName=velero
time="2021-12-01T02:32:59Z" level=info msg="1 errors encountered backup up item" backup=velero/nginx-backup logSource="pkg/backup/backup.go:431" name=ml-pipeline-ui-artifact-84f654dd94-84mwv
time="2021-12-01T02:32:59Z" level=error msg="Error backing up item"
                                        backup=velero/nginx-backup
                                        error="restic repository is not ready: error running command=restic init --repo=s3:https://api.acp.kt.co.kr:9000/velero/restic/admin --password-file=/tmp/credentials/velero/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default225758965 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:https://api.acp.kt.co.kr:9000/velero/restic/admin failed: client.BucketExists: The request signature we calculated does not match the signature you provided. Check your key and signing method.\n\n: exit status 1"
                                        error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/repository_ensurer.go:144"
                                        error.function="github.com/vmware-tanzu/velero/pkg/restic.(*repositoryEnsurer).EnsureRepo"
                                        logSource="pkg/backup/backup.go:435" name=ml-pipeline-ui-artifact-84f654dd94-84mwv
...
$

 

   Cause:

      velero는 오브젝트 스토리지에 쿠버네티스 리소스를 백업하고, restic를 활성화한 경우 PV도 오브젝트 스토리지로 백업한다.

      restic도 추가로 오브젝트 스토리지에 접근해야 하는데, 만약 오브젝트 스토리지 계정의 암호에 특수 문자가 포함되어 있으면 위 에러가 발생된다.

 

   Solution:

      MinIO에서 생성된 계정의 암호를 변경할 수 없어서, 대안으로 service account를 생성하고 'velero install'시에 계정 정보로 지정하였다.

 

- Case #2

   Problem: "freeze failed: Operation not supported"

$ velero backup logs nginx-backup --insecure-skip-tls-verify
...
time="2021-11-30T08:55:05Z" level=info msg="Processing item" backup=velero/nginx-backup logSource="pkg/backup/backup.go:358" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example progress= resource=pods
time="2021-11-30T08:55:05Z" level=info msg="Backing up item" backup=velero/nginx-backup logSource="pkg/backup/item_backupper.go:121" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example resource=pods
time="2021-11-30T08:55:05Z" level=info msg="running exec hook" backup=velero/nginx-backup hookCommand="[/sbin/fsfreeze --freeze /var/log/nginx]" hookContainer=fsfreeze hookName="<from-annotation>" hookOnError=Fail hookPhase=pre hookSource=annotation hookTimeout="{30s}" hookType=exec logSource="pkg/podexec/pod_command_executor.go:124" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example resource=pods
time="2021-11-30T08:55:06Z" level=info msg="stdout: " backup=velero/nginx-backup hookCommand="[/sbin/fsfreeze --freeze /var/log/nginx]" hookContainer=fsfreeze hookName="<from-annotation>" hookOnError=Fail hookPhase=pre hookSource=annotation hookTimeout="{30s}" hookType=exec logSource="pkg/podexec/pod_command_executor.go:171" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example resource=pods
time="2021-11-30T08:55:06Z" level=info msg="stderr: fsfreeze: /var/log/nginx: freeze failed: Operation not supported\n" backup=velero/nginx-backup hookCommand="[/sbin/fsfreeze --freeze /var/log/nginx]" hookContainer=fsfreeze hookName="<from-annotation>" hookOnError=Fail hookPhase=pre hookSource=annotation hookTimeout="{30s}" hookType=exec logSource="pkg/podexec/pod_command_executor.go:172" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example resource=pods
time="2021-11-30T08:55:06Z" level=error msg="Error executing hook" backup=velero/nginx-backup error="command terminated with exit code 1" hookPhase=pre hookSource=annotation hookType=exec logSource="internal/hook/item_hook_handler.go:206" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example resource=pods
time="2021-11-30T08:55:06Z" level=error msg="Error backing up item" backup=velero/nginx-backup error="command terminated with exit code 1" logSource="pkg/backup/backup.go:441" name=nginx-deployment-f96b7fd86-w9k7f
time="2021-11-30T08:55:06Z" level=info msg="Backed up 1 items out of an estimated total of 11 (estimate will change throughout the backup)" backup=velero/nginx-backup logSource="pkg/backup/backup.go:398" name=nginx-deployment-f96b7fd86-w9k7f namespace=nginx-example progress= resource=pods
...
$

 

   Cause: 

      예제로 배포한 Pod에는 PV 백업 시 무결성을 보장하기 위하여 fsfreeze 명령어를 실행 하도록 annotation에 설정되어 있다. 

      백업할 볼륨은 NAS에 할당되어 있는데 fsfreeze 명령어는 NAS를 지원하지 않기 때문에 발생되었다.

$ kubectl describe pod nginx-deployment-f96b7fd86-jxz9 -n nginx-example
Name:         nginx-deployment-f96b7fd86-466tz
Namespace:    nginx-example
Priority:     0
Node:         iap14/14.52.244.50
Start Time:   Sat, 20 Nov 2021 09:40:07 +0900
Labels:       app=nginx
              pod-template-hash=f96b7fd86
Annotations:  post.hook.backup.velero.io/command: ["/sbin/fsfreeze", "--unfreeze", "/var/log/nginx"]
              post.hook.backup.velero.io/container: fsfreeze
              pre.hook.backup.velero.io/command: ["/sbin/fsfreeze", "--freeze", "/var/log/nginx"]
              pre.hook.backup.velero.io/container: fsfreeze
...
$ kubectl exec nginx-deployment-f96b7fd86-jxz9j -n nginx-example -c fsfreeze -it -- bash
root@nginx-deployment-f96b7fd86-jxz9j:/# /sbin/fsfreeze --freeze /var/log/nginx
fsfreeze: /var/log/nginx: freeze failed: Operation not supported
root@nginx-deployment-f96b7fd86-jxz9j:/#
# df -h | egrep 'File|nfs'
Filesystem               Size  Used Avail Use% Mounted on
14.52.244.215:/nfs_01     21T  6.8T   15T  33% /nfs_01
14.52.244.215:/nfs_02     12T  2.7T  9.3T  23% /nfs_02
14.52.244.215:/nfs_03    3.0T   83G  3.0T   3% /nfs_03
# fsfreeze --freeze /nfs_02
fsfreeze: /nfs_02: freeze failed: Operation not supported
# mkdir /temp
# fsfreeze --freeze /temp && fsfreeze --unfreeze /temp
#

 

   Solution:

      Annotation절의 Backup Hooks (annotaion) 설정을 삭제한다. 만약 제거하지 않는 경우에는 에러로 인해 restic이 PV를 백업하지 않는다. 

      잠재적으로 무결성이 없는 백업이 될 수 있다. 

 

댓글