1. 개요
- Kubeflow KFServing에서 제공하는 Canary rollout 기능을 살펴보고자 한다.
- Canary release
은 새로운 버전의 모델을 배포할 때, 소규모의 사용자들 에게만 먼저 제공함으로써 위험을 빠르게 감지할 수 있는 배포 전략이다.
https://m.blog.naver.com/muchine98/220262491992
2. Envrironments
- Kubernetes 1.16.15
- Kubeflow 1.2
- nfs-client-provisioner v3.1.0 (dynamic provisioning of Kubernetes Persistent Volumes)
참고 문서: NFS-Client Provisioner
3. Canary rollout 테스트
- PVC 생성 및 flowers 모델 배포
NFS Client provisioner를 사용할 경우 OS상에서 PV에 직접 접근이 가능하다.
NAS의 마운트 포인트는 '/nfs_01'이고, kfserving-pvc의 PV는 'yoosung-jeon-kfserving-pvc-pvc-a83bf69f-3b8f-41be-ba51-e76402a36f6d'이다.
Tensorflow flowers 모델을 복사한다. 디렉토리 구조는 'model_name/{numbers}/*' 형식을 준수해야 한다.
$ vi kfserving-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kfserving-pvc
namespace: yoosung-jeon
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: nfs-sc-iap
$ k apply -f kfserving-pvc.yaml
...
$ cd /nfs_01/yoosung-jeon-kfserving-pvc-pvc-a83bf69f-3b8f-41be-ba51-e76402a36f6d
$ gsutil cp -r gs://kfserving-samples/models/tensorflow/flowers .
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/saved_model.pb...
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/variables/variables.data-00000-of-00001...
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/variables/variables.index...
\ [3 files][109.5 MiB/109.5 MiB] 2.8 MiB/s
Operation completed over 3 objects/109.5 MiB.
$ tree flowers
flowers
└── 0001
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
2 directories, 3 files
$
- Knative service 배포
$ vi canary.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
namespace: yoosung-jeon
spec:
default:
predictor:
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers"
runtimeVersion: "2.5.1"
$ k apply -f canary.yaml
...
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr True 100 53d
$ k logs flowers-sample-predictor-default-jqcsk-deployment-554c9447hbfts -n yoosung-jeon -c kfserving-container | grep 'Reading SavedModel'
2021-08-13 07:38:21.433280: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /mnt/models/0001
2021-08-13 07:38:21.478291: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /mnt/models/0001
$
- 신규 모델 배포
$ gsutil cp -r gs://kfserving-samples/models/tensorflow/flowers-2 .
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/saved_model.pb...
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/variables/variables.data-00000-of-00001...
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/variables/variables.index..
\ [3 files][109.5 MiB/109.5 MiB] 2.8 MiB/s
Operation completed over 3 objects/109.5 MiB.
$
- Canary 배포
"canary:" 절에 Carry 배포할 내용을 작성하고, 'canaryTrafficPercent' 항목에 비율를 지정한다.
$ vi canary.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
namespace: yoosung-jeon
spec:
default:
predictor:
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers"
runtimeVersion: "2.5.1"
canaryTrafficPercent: 10
canary:
predictor:
# 10% of traffic is sent to this model
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers-2"
runtimeVersion: "2.5.1"
$ k apply -f canary.yaml
...
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample http://flowers-sample.yoosung-jeon.example.com True 90 10 3d21h
$ k get pod -n yoosung-jeon | egrep 'NAME|flowers'
NAME READY STATUS RESTARTS AGE
flowers-sample-predictor-canary-qr7k2-deployment-859bcf968bj9mh 2/2 Running 0 50s
flowers-sample-predictor-default-jqcsk-deployment-554c9447hbfts 2/2 Running 0 3d21h
$
- Pinned canary
The canary model can also be pinned and receive no traffic
$ vi canary-pinned.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
namespace: yoosung-jeon
spec:
default:
predictor:
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers"
runtimeVersion: "2.5.1"
# Defaults to zero, so can also be omitted or explicitly set to zero.
canaryTrafficPercent: 0
canary:
predictor:
# Canary is created but no traffic is directly forwarded.
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers-new"
runtimeVersion: "2.5.1"
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr True 100 5d
$
- Promoting canary
The canary model can also be promoted
$ vi carry-promotion.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
namespace: yoosung-jeon
spec:
# This is now the promoted canary model
default:
predictor:
tensorflow:
storageUri: "pvc://kfserving-pvc/flowers-2"
runtimeVersion: "2.5.1"
$ k apply -f carry-promotion.yaml
…
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr True 100 5d
$ k get pod -n yoosung-jeon| grep flower
flowers-sample-predictor-canary-x7b2t-deployment-6cb9b9496kf4hm 2/2 Terminating 0 20m
flowers-sample-predictor-default-g6pk8-deployment-6c95fd5bt2wv7 2/2 Running 0 5d
$
'Kubeflow > Management' 카테고리의 다른 글
KFServing - Deep dive (0) | 2021.10.14 |
---|---|
Dex 인증 / 우회 (0) | 2021.09.29 |
Kubeflow Jupyter Custom Image 추가 (0) | 2021.09.24 |
댓글