본문 바로가기
Kubeflow/Management

KFServing - Canary rollout 테스트

by 여행을 떠나자! 2021. 10. 17.

1. 개요

- Kubeflow KFServing에서 제공하는 Canary rollout 기능을 살펴보고자 한다.

- Canary release

   은 새로운 버전의 모델을 배포할 , 소규모의 사용자들 에게만 먼저 제공함으로써 위험을 빠르게 감지할 있는 배포 전략이다.

   https://m.blog.naver.com/muchine98/220262491992

 

 

2. Envrironments

- Kubernetes 1.16.15

- Kubeflow 1.2

- nfs-client-provisioner v3.1.0 (dynamic provisioning of Kubernetes Persistent Volumes)

  참고 문서: NFS-Client Provisioner

 

 

3. Canary rollout 테스트

- PVC 생성 및 flowers 모델 배포

   NFS Client provisioner를 사용할 경우 OS상에서 PV에 직접 접근이 가능하다.

   NAS의 마운트 포인트는 '/nfs_01'이고, kfserving-pvc의 PV는 'yoosung-jeon-kfserving-pvc-pvc-a83bf69f-3b8f-41be-ba51-e76402a36f6d'이다.

   Tensorflow flowers 모델을 복사한다. 디렉토리 구조는 'model_name/{numbers}/*' 형식을 준수해야 한다.

$ vi kfserving-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kfserving-pvc
  namespace: yoosung-jeon
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs-sc-iap
$ k apply -f kfserving-pvc.yaml
...
$ cd /nfs_01/yoosung-jeon-kfserving-pvc-pvc-a83bf69f-3b8f-41be-ba51-e76402a36f6d
$ gsutil cp -r gs://kfserving-samples/models/tensorflow/flowers .
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/saved_model.pb...
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/variables/variables.data-00000-of-00001...
Copying gs://kfserving-samples/models/tensorflow/flowers/0001/variables/variables.index...
\ [3 files][109.5 MiB/109.5 MiB]    2.8 MiB/s
Operation completed over 3 objects/109.5 MiB.
$ tree flowers
flowers
└── 0001
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

2 directories, 3 files
$

 

- Knative service 배포

$ vi canary.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample"
  namespace: yoosung-jeon
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers"
        runtimeVersion: "2.5.1"
$ k apply -f canary.yaml
...
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME             URL                                                       READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample   http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr   True    100                                53d
$ k logs flowers-sample-predictor-default-jqcsk-deployment-554c9447hbfts -n yoosung-jeon -c kfserving-container | grep 'Reading SavedModel'
2021-08-13 07:38:21.433280: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /mnt/models/0001
2021-08-13 07:38:21.478291: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /mnt/models/0001
$

 

- 신규 모델 배포 

$ gsutil cp -r gs://kfserving-samples/models/tensorflow/flowers-2 .
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/saved_model.pb...
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/variables/variables.data-00000-of-00001...
Copying gs://kfserving-samples/models/tensorflow/flowers-2/0001/variables/variables.index..
\ [3 files][109.5 MiB/109.5 MiB]    2.8 MiB/s
Operation completed over 3 objects/109.5 MiB.
$

 

- Canary 배포

   "canary:" 절에 Carry 배포할 내용을 작성하고, 'canaryTrafficPercent' 항목에 비율를 지정한다.

$ vi canary.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample"
  namespace: yoosung-jeon
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers"
        runtimeVersion: "2.5.1"
  canaryTrafficPercent: 10
  canary:
    predictor:
      # 10% of traffic is sent to this model
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers-2"
        runtimeVersion: "2.5.1"
$ k apply -f canary.yaml
...
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME            URL                                              READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample  http://flowers-sample.yoosung-jeon.example.com   True    90                10               3d21h
$ k get pod -n yoosung-jeon | egrep 'NAME|flowers'
NAME                                                              READY   STATUS    RESTARTS   AGE
flowers-sample-predictor-canary-qr7k2-deployment-859bcf968bj9mh   2/2     Running   0          50s
flowers-sample-predictor-default-jqcsk-deployment-554c9447hbfts   2/2     Running   0          3d21h
$

 

- Pinned canary

   The canary model can also be pinned and receive no traffic

$ vi canary-pinned.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample"
  namespace: yoosung-jeon
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers"
        runtimeVersion: "2.5.1"
  # Defaults to zero, so can also be omitted or explicitly set to zero.
  canaryTrafficPercent: 0
  canary:
    predictor:
      # Canary is created but no traffic is directly forwarded.
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers-new"
        runtimeVersion: "2.5.1"
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME                 URL                                                       READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample       http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr   True    100                                5d
$

 

- Promoting canary

   The canary model can also be promoted 

$ vi carry-promotion.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample"
  namespace: yoosung-jeon
spec:
  # This is now the promoted canary model
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://kfserving-pvc/flowers-2"
        runtimeVersion: "2.5.1"
$ k apply -f carry-promotion.yaml
…
$ k get inferenceservices.serving.kubeflow.org -n yoosung-jeon
NAME             URL                                                       READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample   http://flowers-sample.yoosung-jeon.kf-serv.acp.kt.co.kr   True    100                                5d
$ k get pod -n yoosung-jeon| grep flower
flowers-sample-predictor-canary-x7b2t-deployment-6cb9b9496kf4hm   2/2     Terminating   0          20m
flowers-sample-predictor-default-g6pk8-deployment-6c95fd5bt2wv7   2/2     Running       0          5d
$

 

'Kubeflow > Management' 카테고리의 다른 글

KFServing - Deep dive  (0) 2021.10.14
Dex 인증 / 우회  (0) 2021.09.29
Kubeflow Jupyter Custom Image 추가  (0) 2021.09.24

댓글