2020.03.13
1. Model serving overview
- https://v1-0-branch.kubeflow.org/docs/reference/pytorchjob/v1/pytorch/
- Kubeflow supports two model serving systems that allow multi-framework model serving: KFServing and Seldon Core.
Alternatively, you can use a standalone model serving system.
a. Multi-framework model serving
- A check mark (✓) indicates that the system (KFServing or Seldon Core) supports the feature specified in that row.
b. Standalone model serving system
- TensorFlow Serving
For TensorFlow models you can use TensorFlow Serving for real-time prediction. However, if you plan to use multiple frameworks, you should consider KFServing or Seldon Core
- NVIDIA TensorRT Inference Server (https://developer.nvidia.com/tensorrt)
NVIDIA TensorRT Inference Server is a REST and GRPC service for deep-learning inferencing of TensorRT, TensorFlow and Caffe2 models.
NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
2. KFServing
- https://v1-0-branch.kubeflow.org/docs/components/serving/kfserving/
- https://github.com/kubeflow/kfserving/blob/master/README.md
- KFServing provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks.
It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.
- Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments.
- Knative Serving provides
✓ Event triggered functions on Kubernetes
✓ Scale to and from zero
✓ Queue based autoscaling for GPUs and TPUs
✓ Traditional CPU autoscaling if desired. Traditional scaling hard for disparate devices (GPU, CPU, TPU)
- Predict on a InferenceService using Tensorflow
https://github.com/kubeflow/kfserving/tree/master/docs/samples/tensorflow
$ git clone https://github.com/kubeflow/kfserving.git
$ cd ./kfserving/docs/samples/tensorflow
$ cat tensorflow.yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
predictor:
tensorflow:
storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
$ kubectl apply -f tensorflow.yaml -n kubeflow
inferenceservice.serving.kubeflow.org/flowers-sample created
$ kubectl get inferenceservices -n kubeflow # or kubectl get -f tensorflow.yaml -n kubeflow
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample False 5m34s
$
$ MODEL_NAME=flowers-sample
$ INPUT_PATH=@./input.json
$ SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n kubeflow -o jsonpath='{.status.url}' | cut -d "/" -f 3)
$
$ CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
##
## “5. Troubleshooting” a, b 항목 때문에 호춛 되지 못 함.
$ CLUSTER_IP=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
* Trying 35.231.87.229...
* TCP_NODELAY set
* Connected to 35.231.87.229 (35.231.87.229) port 80 (#0)
> GET /v1/models/ HTTP/1.1
> Host: flowers-sample-predictor-default.kubeflow.example.com
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< location: http://flowers-sample-predictor-default.kubeflow.example.com/v1/models/
< date: Thu, 19 Mar 2020 05:22:10 GMT
< server: istio-envoy
< content-length: 0
<
* Curl_http_done: called premature == 0
* Connection #0 to host 35.231.87.229 left intact
$
##
## Not found 에러 발생, 추후 재 시험 할 예정
$ kubectl describe inferenceservice flowers-sample -n kubeflow
Name: flowers-sample
Namespace: kubeflow
Labels: <none>
…
Status:
Canary:
Conditions:
Last Transition Time: 2020-03-18T07:50:44Z
Message: Configuration "flowers-sample-predictor-default" is waiting for a Revision to become ready.
Reason: RevisionMissing
Status: Unknown
Type: DefaultPredictorReady
Last Transition Time: 2020-03-18T07:50:43Z
Message: Failed to reconcile predictor
Reason: PredictorHostnameUnknown
Status: False
Type: Ready
Last Transition Time: 2020-03-18T07:50:43Z
Message: Failed to reconcile predictor
Reason: PredictorHostnameUnknown
Status: False
Type: RoutesReady
Default:
Predictor:
Name: flowers-sample-predictor-default-trg2d
Events: <none>
$
발생된 에러를 해결하지 못하여 Canary deployment 테스트는 진행하지 못 하였음. 문서를 참조 할 것.
- Scale 변경 테스트
Scale-to-zero is one of the main properties making Knative a serverless platform.
Assuming that Greeter service has been deployed, once no more traffic is seen going into that service, we’d like to scale this service down to zero replicas. That’s called scale-to-zero.
✓ 초기 상태
$ kubectl get kservice -n kubeflow
NAME URL LATESTCREATED LATESTREADY READY REASON
flowers-sample-predictor-default http://flowers-sample-predictor-default.kubeflow.example.com flowers-sample-predictor-default-trg2d Unknown RevisionMissing
$ kubectl get deployment flowers-sample-predictor-default-trg2d-deployment -n kubeflow
NAME READY UP-TO-DATE AVAILABLE AGE
flowers-sample-predictor-default-trg2d-deployment 0/0 0 0 16h
$ kubectl describe deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
Name: flowers-sample-predictor-default-trg2d-deployment
Namespace: kubeflow
…
Replicas: 0 desired | 0 updated | 0 total | 0 available | 0 unavailable
…
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: flowers-sample-predictor-default-trg2d-deployment-54567d5587 (0/0 replicas created)
Events: <none>
$ kubectl get rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
NAME DESIRED CURRENT READY AGE
flowers-sample-predictor-default-trg2d-deployment-54567d5587 0 0 0 2d21h
$
✓ Scale=1 변경
$ kubectl scale --replicas=1 deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
deployment.extensions/flowers-sample-predictor-default-trg2d-deployment scaled
$ kubectl get rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
NAME DESIRED CURRENT READY AGE
flowers-sample-predictor-default-trg2d-deployment-54567d5587 1 1 0 2d21h
$ kubectl describe rs flowers-sample-predictor-default-trg2d-deployment-54567d5587 -n kubeflow
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 4m23s replicaset-controller Created pod: flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp
$ k get pod flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp -n kubeflow
NAME READY STATUS RESTARTS AGE
flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp 1/2 Running 0 20m
$ kubectl describe pod flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp -n kubeflow
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m35s default-scheduler Successfully assigned kubeflow/flowers-sample-predictor-default-trg2d-deployment-54567d55czdkp to gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl
Normal Pulled 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Container image "index.docker.io/tensorflow/serving@sha256:ea44bf657f8cff7b07df12361749ea94628185352836bb08065345f5c8284bae" already present on machine
Normal Created 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Created container kfserving-container
Normal Started 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Started container kfserving-container
Normal Pulled 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:792f6945c7bc73a49a470a5b955c39c8bd174705743abf5fb71aa0f4c04128eb" already present on machine
Normal Created 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Created container queue-proxy
Normal Started 5m34s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Started container queue-proxy
Warning Unhealthy 27s (x30 over 5m22s) kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Readiness probe failed: probe returned not ready
$ kubectl describe deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow | grep Replicas
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
Available False MinimumReplicasUnavailable
$
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict
…
$
✓ Scale=0 변경
$ kubectl scale --replicas=0 deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
$ kubectl rollout history deployment.v1.apps/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
deployment.apps/flowers-sample-predictor-default-trg2d-deployment
REVISION CHANGE-CAUSE
1 <none>
$ kubectl rollout history deployment.v1.apps/flowers-sample-predictor-default-trg2d-deployment -n kubeflow --revision=1
…
$ kubectl rollout status deployment/flowers-sample-predictor-default-trg2d-deployment -n kubeflow
…
$
3. Tensorflow serving (TFServing)
- https://v1-0-branch.kubeflow.org/docs/components/serving/tfserving_new/
- Serving a model
✓ A deployment to deploy the model using TFServing
✓ A K8s service to create an endpoint a service
✓ An Istio virtual service to route traffic to the model and expose it through the Istio gateway
✓ An Istio DestinationRule is for doing traffic splitting.
- TFServing 배포
The example contains three configurations for Google Cloud Storage (GCS) access:
volumes (secret user-gcp-sa), volumeMounts, and env (GOOGLE_APPLICATION_CREDENTIALS).
$ vi tfserving.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: mnist
name: mnist-service
namespace: kubeflow
spec:
ports:
- name: grpc-tf-serving
port: 9000
targetPort: 9000
- name: http-tf-serving
port: 8500
targetPort: 8500
selector:
app: mnist
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mnist
name: mnist-v1
namespace: kubeflow
spec:
selector:
matchLabels:
app: mnist
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"
labels:
app: mnist
version: v1
spec:
containers:
- args:
- --port=9000
- --rest_api_port=8500
- --model_name=mnist
- --model_base_path=gs://kubeflow-examples-data/mnist
command:
- /usr/bin/tensorflow_model_server
image: tensorflow/serving:1.11.1
imagePullPolicy: IfNotPresent
livenessProbe:
initialDelaySeconds: 30
periodSeconds: 30
tcpSocket:
port: 9000
name: mnist
ports:
- containerPort: 9000
- containerPort: 8500
resources:
limits:
cpu: "4"
memory: 4Gi
requests:
cpu: "1"
memory: 1Gi
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /secret/gcp-credentials/user-gcp-sa.json
volumeMounts:
- name: gcp-credentials
mountPath: /secret/gcp-credentials
volumes:
- name: gcp-credentials
secret:
secretName: user-gcp-sa
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
labels:
name: mnist-service
namespace: kubeflow
spec:
host: mnist-service
subsets:
- labels:
version: v1
name: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
labels:
name: mnist-service
namespace: kubeflow
spec:
gateways:
- kubeflow-gateway
hosts:
- '*'
http:
- match:
- method:
exact: POST
uri:
prefix: /tfserving/models/mnist
rewrite:
uri: /v1/models/mnist:predict
route:
- destination:
host: mnist-service
port:
number: 8500
subset: v1
weight: 100
$ kubectl apply -f tfserving.yaml
service/mnist-service created
deployment.apps/mnist-v1 created
destinationrule.networking.istio.io/mnist-service created
virtualservice.networking.istio.io/mnist-service created
$
$ kubectl get svc mnist-service -n kubeflow
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mnist-service ClusterIP 10.63.252.129 <none> 9000/TCP,8500/TCP 14s
$ kubectl get deployment mnist-v1 -n kubeflow
NAME READY UP-TO-DATE AVAILABLE AGE
mnist-v1 1/1 1 1 55s
$ kubectl describe deployment mnist-v1 -n kubeflow
…
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: mnist-v1-9f989d696 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 109s deployment-controller Scaled up replica set mnist-v1-9f989d696 to 1
$ kubectl get rs mnist-v1-9f989d696 -n kubeflow
NAME DESIRED CURRENT READY AGE
mnist-v1-9f989d696 1 1 1 5m37s
$ kubectl describe rs mnist-v1-9f989d696 -n kubeflow
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 3m25s replicaset-controller Created pod: mnist-v1-9f989d696-6qdqw
$ k get pod mnist-v1-9f989d696-6qdqw -n kubeflow
NAME READY STATUS RESTARTS AGE
mnist-v1-9f989d696-6qdqw 1/1 Running 0 4m6s
$ k describe pod mnist-v1-9f989d696-6qdqw -n kubeflow
…
Labels: app=mnist
pod-template-hash=9f989d696
version=v1
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m36s default-scheduler Successfully assigned kubeflow/mnist-v1-9f989d696-6qdqw to gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl
Normal Pulling 4m35s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Pulling image "tensorflow/serving:1.11.1"
Normal Pulled 4m28s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Successfully pulled image "tensorflow/serving:1.11.1"
Normal Created 4m26s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Created container mnist
Normal Started 4m26s kubelet, gke-my-kubeflow-my-kubeflow-cpu-pool--5b5549e0-scbl Started container mnist
$ kubectl get dr -n kubeflow
NAME HOST AGE
mnist-service mnist-service 98s
$ kubectl get vs mnist-service -n kubeflow
NAME GATEWAYS HOSTS AGE
mnist-service [kubeflow-gateway] [*] 3m51s
$
- Sending prediction request
a. POD
$ kubectl get services -n kubeflow --selector=app=mnist
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mnist-service ClusterIP 10.63.242.50 <none> 9000/TCP,8500/TCP 139m
$ kubectl get pod -n kubeflow --selector=app=mnist
NAME READY STATUS RESTARTS AGE
mnist-v1-9f989d696-6qdqw 1/1 Running 0 140m
$ kubectl port-forward mnist-v1-9f989d696-6qdqw 8500:8500 -n kubeflow 2>&1 > /dev/null &
[1] 1798
$ curl -X POST -d @input.json http://127.0.0.1:8500/v1/models/mnist:predict
{ "error": "Failed to process element: 0 key: image_bytes of \'instances\' list. Error: Invalid argument:
JSON object: does not have named input: image_bytes" }
$
b. Proxy
$ k proxy --port=8080 &
[1] 893
ysjeon71_kubeflow2@cs-6000-devshell-vm-92bccb22-ff7a-491b-99a2-c333678d033d:~/exam-tfserving$ Starting to serve on 127.0.0.1:8080
The service’s proxy URL:
http://kubernetes_master_address/api/v1/namespaces/namespace_name/services/[https:]service_name[:port_name]/proxy
$ curl -X POST -d @input.json http://localhost:8080/api/v1/namespaces/kubeflow/services/mnist-service:8500/proxy/v1/models/mnist:predict
{ "error": "Failed to process element: 0 key: image_bytes of \'instances\' list. Error: Invalid argument:
JSON object: does not have named input: image_bytes" }
$
4. Seldon Core Serving
- Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
- Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.
- https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html
5. Troubleshooting
a. SERVICE_NAME is null
▷ Problem:
flowers-sample의 ready 값이 true 이어야 하고, URL 값이 지정되어 있어야 함, 해당 문제점은 버그 픽스 진행 중임 (https://github.com/kubeflow/kfserving/issues/734)
$ SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
$ echo $SERVICE_HOSTNAME
$ kubectl get inferenceservices -n kubeflow # or kubectl get -f tensorflow.yaml -n kubeflow
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample False 5m34s
$
✓ Workaround (진행중):
Knative serving 리스트 결과를 참조해서 해당 모델의 URL의 값를 이용하여 호출 하였으나 HTTP 404(Not found) 에러가 발생됨
$ kubectl get kservice -n kubeflow
NAME URL LATESTCREATED LATESTREADY READY REASON
flowers-sample-predictor-default http://flowers-sample-predictor-default.kubeflow.example.com flowers-sample-predictor-default-trg2d Unknown RevisionMissing
$ kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
…
services kservice,ksvc serving.knative.dev true Service
$
b. CLUSTER_IP is null
✓ Problem:
$ CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo $CLUSTER_IP
$
✓ Workaround:
$ kubectl -n istio-system get service | grep ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway NodePort 10.63.254.254 <none> 15020:30777/TCP,80:31380/TCP,443:31390/TCP,… 23h
kfserving-ingressgateway LoadBalancer 10.63.247.89 35.231.87.229 15020:30509/TCP,80:32380/TCP,443:32390/TCP,… 23h
$ CLUSTER_IP=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo $CLUSTER_IP
35.231.87.229
$
c. KFserving 배포 에러
✓ Problem:
$ kubectl apply -f tensorflow.yaml
Error from server: error when creating "tensorflow.yaml": admission webhook "inferenceservice.kfserving-webhook-server.validator" denied the request: Can
not create the Inferenceservice "flowers-sample" in namespace "default": the namespace lacks label "serving.kubeflow.org/inferenceservice: enabled”
✓ Workaround: namespace 명시
$ kubectl apply -f tensorflow.yaml -n kubeflow
inferenceservice.serving.kubeflow.org/flowers-sample created
$
'Kubeflow > 기능 탐방 (Kubeflow 1.0)' 카테고리의 다른 글
Kubeflow 1.0 기능 #6 (Metadata) (0) | 2021.09.25 |
---|---|
Kubeflow 1.0 기능 #4 (PyTorch Training) (0) | 2021.09.25 |
Kubeflow 1.0 기능 #3 (Katib) (0) | 2021.09.25 |
Kubeflow 1.0 기능 #2 (TF-Job, TF-Serving, Kubeflow pipeline) (0) | 2021.09.24 |
Kubeflow 1.0 기능 #1 (Jupyter notebook) (1) | 2021.09.24 |
댓글