본문 바로가기
Kubeflow/기능 탐방 (Kubeflow 1.0)

Kubeflow 1.0 기능 #5 (KFServing, TFServing)

by 여행을 떠나자! 2021. 9. 25.

2020.03.13

 

1. Model serving overview

- https://v1-0-branch.kubeflow.org/docs/reference/pytorchjob/v1/pytorch/

- Kubeflow supports two model serving systems that allow multi-framework model servingKFServing and Seldon Core.

   Alternatively, you can use a standalone model serving system. 

 

a. Multi-framework model serving

- A check mark () indicates that the system (KFServing or Seldon Core) supports the feature specified in that row.

 

b. Standalone model serving system

- TensorFlow Serving

   For TensorFlow models you can use TensorFlow Serving for real-time prediction. However, if you plan to use multiple frameworks, you should consider KFServing or Seldon Core

- NVIDIA TensorRT Inference Server (https://developer.nvidia.com/tensorrt)

   NVIDIA TensorRT Inference Server is a REST and GRPC service for deep-learning inferencing of TensorRT, TensorFlow and Caffe2 models. 

   NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.

 

 

2. KFServing 

- https://v1-0-branch.kubeflow.org/docs/components/serving/kfserving/

- https://github.com/kubeflow/kfserving/blob/master/README.md

- KFServing provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. 

   It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.

- Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments.

- Knative Serving provides 

   ✓ Event triggered functions on Kubernetes

   ✓ Scale to and from zero

   ✓ Queue based autoscaling for GPUs and TPUs

   ✓ Traditional CPU autoscaling if desired. Traditional scaling hard for disparate devices (GPU, CPU, TPU)

 

- Predict on a InferenceService using Tensorflow

   https://github.com/kubeflow/kfserving/tree/master/docs/samples/tensorflow

$ git clone https://github.com/kubeflow/kfserving.git
$ cd ./kfserving/docs/samples/tensorflow
$ cat tensorflow.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample"
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
$ kubectl apply -f tensorflow.yaml -n kubeflow
inferenceservice.serving.kubeflow.org/flowers-sample created
$ kubectl get inferenceservices -n kubeflow     # or kubectl get -f tensorflow.yaml -n kubeflow
NAME             URL   READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample         False                                      5m34s
$
$ MODEL_NAME=flowers-sample
$ INPUT_PATH=@./input.json
$ SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n kubeflow -o jsonpath='{.status.url}' | cut -d "/" -f 3)
$
$ CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
##
## “5. Troubleshooting” a, b 항목 때문에 호춛 되지 못 함.

$ CLUSTER_IP=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
*   Trying 35.231.87.229...
* TCP_NODELAY set
* Connected to 35.231.87.229 (35.231.87.229) port 80 (#0)
> GET /v1/models/ HTTP/1.1
> Host: flowers-sample-predictor-default.kubeflow.example.com
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< location: http://flowers-sample-predictor-default.kubeflow.example.com/v1/models/
< date: Thu, 19 Mar 2020 05:22:10 GMT
< server: istio-envoy
< content-length: 0
<
* Curl_http_done: called premature == 0
* Connection #0 to host 35.231.87.229 left intact
$
##
## Not found 에러 발생, 추후 재 시험 할 예정
$ kubectl describe inferenceservice flowers-sample -n kubeflow
Name:         flowers-sample
Namespace:    kubeflow
Labels:       <none>
…
  Status:
  Canary:
  Conditions:
    Last Transition Time:  2020-03-18T07:50:44Z
    Message:               Configuration "flowers-sample-predictor-default" is waiting for a Revision to become ready.
    Reason:                RevisionMissing
    Status:                Unknown
    Type:                  DefaultPredictorReady
    Last Transition Time:  2020-03-18T07:50:43Z
    Message:               Failed to reconcile predictor
    Reason:                PredictorHostnameUnknown
    Status:                False
    Type:                  Ready
    Last Transition Time:  2020-03-18T07:50:43Z
    Message:               Failed to reconcile predictor
    Reason:                PredictorHostnameUnknown
    Status:                False
    Type:                  RoutesReady
  Default:
    Predictor:
      Name:  flowers-sample-predictor-default-trg2d
Events:      <none>
$

     발생된 에러를 해결하지 못하여 Canary deployment 테스트는 진행하지 못 하였음. 문서를 참조 할 것.

 

- Scale 변경 테스트

   Scale-to-zero is one of the main properties making Knative a serverless platform. 

   Assuming that Greeter service has been deployed, once no more traffic is seen going into that service, we’d like to scale this service down to zero replicas. That’s called scale-to-zero.

   ✓ 초기 상태

$ kubectl get kservice -n kubeflow
NAME                             URL                                                          LATESTCREATED                          LATESTREADY READY   REASON
flowers-sample-predictor-default http://flowers-sample-predictor-default.kubeflow.example.com flowers-sample-predictor-default-trg2d             Unknown RevisionMissing
$ kubectl get deployment flowers-sample-predictor-default-trg2d-deployment -n kubeflow
NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
flowers-sample-predictor-default-trg2d-deployment   0/0     0            0           16h
$ kubectl describe deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
Name:                   flowers-sample-predictor-default-trg2d-deployment
Namespace:              kubeflow
…
Replicas:               0 desired | 0 updated | 0 total | 0 available | 0 unavailable
…
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   flowers-sample-predictor-default-trg2d-deployment-54567d5587 (0/0 replicas created)
Events:          <none>
$ kubectl get rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
NAME                                                           DESIRED   CURRENT   READY   AGE
flowers-sample-predictor-default-trg2d-deployment-54567d5587   0         0         0       2d21h
$

   ✓ Scale=1 변경

$ kubectl scale --replicas=1 deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
deployment.extensions/flowers-sample-predictor-default-trg2d-deployment scaled
$ kubectl get rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
NAME                                                           DESIRED   CURRENT   READY   AGE
flowers-sample-predictor-default-trg2d-deployment-54567d5587   1         1         0       2d21h
$ kubectl describe rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
…
Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  4m23s  replicaset-controller  Created pod: flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp
$ k get pod flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp -n kubeflow
NAME                                                              READY   STATUS    RESTARTS   AGE
flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp   1/2     Running   0          20m
$ kubectl describe pod flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp -n kubeflow
…
Events:
  Type     Reason     Age      From                                                          Message
  ----     ------     ----     ----                                                          -------
  Normal   Scheduled  5m35s    default-scheduler                                             Successfully assigned kubeflow/flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp to         gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl
  Normal   Pulled     5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Container image     "index.docker.io/tensorflow/serving@sha256:ea44bf657f8cff7b07df12361749ea94628185352836bb08065345f5c8284bae" already present on machine
  Normal   Created    5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Created container kfserving-container
  Normal   Started    5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Started container kfserving-container
  Normal   Pulled     5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:792f6945c7bc73a49a470a5b955c39c8bd174705743abf5fb71aa0f4c04128eb" already present on machine
  Normal   Created    5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Created container queue-proxy
  Normal   Started    5m34s    kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Started container queue-proxy
  Warning  Unhealthy  27s (x30 over 5m22s)  kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Readiness probe failed: probe returned not ready
$ kubectl describe deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow | grep Replicas
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
  Available      False   MinimumReplicasUnavailable
$
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict
…
$

   ✓ Scale=0 변경

$ kubectl scale --replicas=0 deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
$ kubectl rollout history deployment.v1.apps/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
deployment.apps/flowers-sample-predictor-default-trg2d-deployment
REVISION  CHANGE-CAUSE
1         <none>
$ kubectl rollout history deployment.v1.apps/flowers-sample-predictor-default-trg2d-deployment -n kubeflow --revision=1
…
$ kubectl rollout status deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
…
$

 

 

3. Tensorflow serving (TFServing)

- https://v1-0-branch.kubeflow.org/docs/components/serving/tfserving_new/

- Serving a model

   ✓ A deployment to deploy the model using TFServing

   ✓ A K8s service to create an endpoint a service

   ✓ An Istio virtual service to route traffic to the model and expose it through the Istio gateway

   ✓ An Istio DestinationRule is for doing traffic splitting.

 

- TFServing 배포 

   The example contains three configurations for Google Cloud Storage (GCS) access: 

   volumes (secret user-gcp-sa), volumeMounts, and env (GOOGLE_APPLICATION_CREDENTIALS).

$ vi tfserving.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mnist
  name: mnist-service
  namespace: kubeflow
spec:
  ports:
  - name: grpc-tf-serving
    port: 9000
    targetPort: 9000
  - name: http-tf-serving
    port: 8500
    targetPort: 8500
  selector:
    app: mnist
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mnist
  name: mnist-v1
  namespace: kubeflow
spec:
  selector:
    matchLabels:
      app: mnist
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "true"
      labels:
        app: mnist
        version: v1
    spec:
      containers:
      - args:
        - --port=9000
        - --rest_api_port=8500
        - --model_name=mnist
        - --model_base_path=gs://kubeflow-examples-data/mnist
        command:
        - /usr/bin/tensorflow_model_server
        image: tensorflow/serving:1.11.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          initialDelaySeconds: 30
          periodSeconds: 30
          tcpSocket:
            port: 9000
        name: mnist
        ports:
        - containerPort: 9000
        - containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 1Gi
        env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /secret/gcp-credentials/user-gcp-sa.json
        volumeMounts:
        - name: gcp-credentials
          mountPath: /secret/gcp-credentials
      volumes:
      - name: gcp-credentials
        secret:
          secretName: user-gcp-sa
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  labels:
  name: mnist-service
  namespace: kubeflow
spec:
  host: mnist-service
  subsets:
  - labels:
      version: v1
    name: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  labels:
  name: mnist-service
  namespace: kubeflow
spec:
  gateways:
  - kubeflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - method:
        exact: POST
      uri:
        prefix: /tfserving/models/mnist
    rewrite:
      uri: /v1/models/mnist:predict
    route:
    - destination:
        host: mnist-service
        port:
          number: 8500
        subset: v1
      weight: 100
$ kubectl apply -f tfserving.yaml
service/mnist-service created
deployment.apps/mnist-v1 created
destinationrule.networking.istio.io/mnist-service created
virtualservice.networking.istio.io/mnist-service created
$
$ kubectl get svc mnist-service -n kubeflow
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)              AGE
mnist-service      ClusterIP   10.63.252.129   <none>        9000/TCP,8500/TCP    14s
$ kubectl get deployment mnist-v1 -n kubeflow
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
mnist-v1   1/1     1            1           55s
$ kubectl describe deployment mnist-v1 -n kubeflow
…
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   mnist-v1-9f989d696 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  109s  deployment-controller  Scaled up replica set mnist-v1-9f989d696 to 1
$ kubectl get rs mnist-v1-9f989d696 -n kubeflow
NAME                 DESIRED   CURRENT   READY   AGE
mnist-v1-9f989d696   1         1         1       5m37s
$ kubectl describe rs mnist-v1-9f989d696 -n kubeflow
…
Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  3m25s  replicaset-controller  Created pod: mnist-v1-9f989d696-6qdqw
$ k get pod mnist-v1-9f989d696-6qdqw -n kubeflow
NAME                        READY   STATUS              RESTARTS   AGE
mnist-v1-9f989d696-6qdqw   1/1     Running   0          4m6s
$ k describe pod mnist-v1-9f989d696-6qdqw -n kubeflow
…
Labels:             app=mnist
                    pod-template-hash=9f989d696
                    version=v1
…
Events:
  Type    Reason     Age    From                                                          Message
  ----    ------     ----   ----                                                          -------
  Normal  Scheduled  4m36s  default-scheduler                                             Successfully assigned kubeflow/mnist-v1-9f989d696-6qdqw to gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl
  Normal  Pulling    4m35s  kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Pulling image "tensorflow/serving:1.11.1"
  Normal  Pulled     4m28s  kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Successfully pulled image "tensorflow/serving:1.11.1"
  Normal  Created    4m26s  kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Created container mnist
  Normal  Started    4m26s  kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl  Started container mnist
$ kubectl get dr -n kubeflow
NAME            HOST            AGE
mnist-service   mnist-service   98s
$ kubectl get vs mnist-service -n kubeflow
NAME            GATEWAYS             HOSTS   AGE
mnist-service   [kubeflow-gateway]   [*]     3m51s
$

 

    - Sending prediction request

      a. POD 

$ kubectl get services -n kubeflow --selector=app=mnist
NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
mnist-service   ClusterIP   10.63.242.50   <none>        9000/TCP,8500/TCP   139m
$ kubectl get pod -n kubeflow --selector=app=mnist
NAME                       READY   STATUS    RESTARTS   AGE
mnist-v1-9f989d696-6qdqw   1/1     Running   0          140m
$ kubectl port-forward mnist-v1-9f989d696-6qdqw  8500:8500 -n kubeflow 2>&1 > /dev/null &
  [1] 1798
$ curl -X POST -d @input.json http://127.0.0.1:8500/v1/models/mnist:predict
{ "error": "Failed to process element: 0 key: image_bytes of \'instances\' list. Error: Invalid argument:
JSON object: does not have named input: image_bytes" }
$

 

b. Proxy

$ k proxy --port=8080 &
[1] 893
ysjeon71_kubeflow2@cs-6000-devshell-vm-92bccb22-ff7a-491b-99a2-c333678d033d:~/exam-tfserving$ Starting to serve on 127.0.0.1:8080
The service’s proxy URL:
  http://kubernetes_master_address/api/v1/namespaces/namespace_name/services/[https:]service_name[:port_name]/proxy
$ curl -X POST -d @input.json http://localhost:8080/api/v1/namespaces/kubeflow/services/mnist-service:8500/proxy/v1/models/mnist:predict
{ "error": "Failed to process element: 0 key: image_bytes of \'instances\' list. Error: Invalid argument:
JSON object: does not have named input: image_bytes" }
$

 

 

4. Seldon Core Serving

- Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.

- Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.

- https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html

 

 

5. Troubleshooting

a. SERVICE_NAME is null

   ▷ Problem:

       flowers-sample ready 값이 true 이어야 하고, URL 값이 지정되어 있어야 함, 해당 문제점은 버그 픽스 진행 중임  (https://github.com/kubeflow/kfserving/issues/734)

$ SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
$ echo $SERVICE_HOSTNAME
$ kubectl get inferenceservices -n kubeflow     # or kubectl get -f tensorflow.yaml -n kubeflow
NAME             URL   READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample         False                                      5m34s
$

   ✓ Workaround (진행중):   

        Knative serving 리스트 결과를 참조해서 해당 모델의 URL의 값를 이용하여 호출 하였으나 HTTP 404(Not found) 에러가 발생됨

$ kubectl get kservice -n kubeflow
NAME                               URL                                                            LATESTCREATED                            LATESTREADY   READY     REASON
flowers-sample-predictor-default   http://flowers-sample-predictor-default.kubeflow.example.com   flowers-sample-predictor-default-trg2d                 Unknown   RevisionMissing
$ kubectl api-resources
NAME                              SHORTNAMES         APIGROUP                           NAMESPACED   KIND
…
services                          kservice,ksvc      serving.knative.dev                true         Service
$

 

bCLUSTER_IP is null

   ✓ Problem:

$ CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo $CLUSTER_IP
$

   ✓ Workaround:   

$ kubectl -n istio-system get service | grep ingressgateway
NAME                      TYPE          CLUSTER-IP      EXTERNAL-IP     PORT(S)                                        AGE
istio-ingressgateway      NodePort      10.63.254.254   <none>          15020:30777/TCP,80:31380/TCP,443:31390/TCP,…   23h
kfserving-ingressgateway  LoadBalancer  10.63.247.89    35.231.87.229   15020:30509/TCP,80:32380/TCP,443:32390/TCP,…   23h
$ CLUSTER_IP=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo $CLUSTER_IP
35.231.87.229
$

 

c. KFserving 배포 에러
   ✓ Problem:

$ kubectl apply -f tensorflow.yaml
Error from server: error when creating "tensorflow.yaml": admission webhook "inferenceservice.kfserving-webhook-server.validator" denied the request: Can
not create the Inferenceservice "flowers-sample" in namespace "default": the namespace lacks label "serving.kubeflow.org/inferenceservice: enabled”

   ✓ Workaround:  namespace 명시

$ kubectl apply -f tensorflow.yaml -n kubeflow
inferenceservice.serving.kubeflow.org/flowers-sample created
$

 

댓글