본문 바로가기

Kubeflow21

KFServing - Canary rollout 테스트 1. 개요 - Kubeflow KFServing에서 제공하는 Canary rollout 기능을 살펴보고자 한다. - Canary release 은 새로운 버전의 모델을 배포할 때, 소규모의 사용자들 에게만 먼저 제공함으로써 위험을 빠르게 감지할 수 있는 배포 전략이다. https://m.blog.naver.com/muchine98/220262491992 2. Envrironments - Kubernetes 1.16.15 - Kubeflow 1.2 - nfs-client-provisioner v3.1.0 (dynamic provisioning of Kubernetes Persistent Volumes) 참고 문서: NFS-Client Provisioner 3. Canary rollout 테스트 - PVC .. 2021. 10. 17.
KFServing - Deep dive 1. Kubeflow KFServing? - Kubeflow supports two model serving systems that allow multi-framework model serving: KFServing and Seldon Core. - KFServing enables serverless inferencing on Kubernetes ✓ Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your .. 2021. 10. 14.
Dex 인증 / 우회 1. 개요 - Dex ? (https://github.com/dexidp/dex) Dex is an identity service that uses OpenID Connect to drive authentication for other apps. - Dex 구성 'kfctl_istio_dex.v1.2.0.yaml'를 이용하여 Kubeflow(v1.2)를 구성할 경우 dex(v2.22)가 포함되어 설치 된다. - Dex 인증 범위 다음과 같은 유형의 자원에 호출할 경우 Dex 인증을 요구 한다. ✓ Kubeflow dashboard login ✓ KFServing (Knative 기반에서 동작) ✓ Knative serving (Istio 기반에서 동작) ✓ Istio Virtual service - 적용.. 2021. 9. 29.
Distributed training 사례 #4 (From KF Jupyter, PyTorch) 2021.06.30 1. Kubeflow Jupyter 환경에서 Distributed training (PyTorch) - Environments ✓ Remote - 개발 환경 Kubeflow Jupyter (CPU) ✓ Remote - 학습 환경 Kubeflow 1.2 (The machine learning toolkit for Kubernetes) / Kubernetes 1.16.15 Nexus (Private docker registry) Nvidia V100 / Driver 450.80, cuda 11.2, cuDNN 8.1.0 - Flow (PyTorch) ✓ Remote (Kubeflow Jupyter) a. Docker Image build b. Docker Image Push c. Kube.. 2021. 9. 27.
Distributed training 사례 #3 (In Jupyter) 2021.06.25 1. Kubeflow Jupyter(GPU 할당) 환경에서 Distributed training (Tensorflow) - Environments ✓ Remote - 개발 환경 Kubeflow Jupyter (GPU) ✓ Remote - 학습 환경 Kubeflow 1.2 (The machine learning toolkit for Kubernetes) / Kubernetes 1.16.15 Harbor 2.2.1 (Private docker registry) Nvidia V100 / Driver 450.80, cuda 11.2, cuDNN 8.1.0 CentOS 7.8 - 관련 기술 - Tensorflow MirroredStrategy (Data parallelism) 2. 사전작업 a.. 2021. 9. 26.
Distributed training 사례 #2 (From KF Jupyter, Tensorflow) 2021.06.24 1. Kubeflow Jupyter 환경에서 Distributed training (Tensorflow) - Environments ✓ Remote - 개발 환경 Kubeflow Jupyter ✓ Remote - 학습 환경 Kubeflow 1.2 (The machine learning toolkit for Kubernetes) / Kubernetes 1.16.15 Nexus (Private docker registry) Nvidia V100 / Driver 450.80, cuda 11.2, cuDNN 8.1.0 CentOS 7.8 - Flow (Tensorflow) ✓ Remote (Kubeflow Jupyter) a. Docker Image build b. Docker Image P.. 2021. 9. 26.
Distributed training 사례 #1 (From MacOS) 2021.06.24 1. Local 환경에서 Distributed training (Tensorflow) - Environments ✓ Local - 개발 환경 Python 3.8.5, Jupyter / PyCharm (option) ✓ Remote - 학습 환경 Kubeflow 1.2 (The machine learning toolkit for Kubernetes) / Kubernetes 1.16.15 Master node: 3ea, Worker node: 4ea Harbor 2.2.1 (Private docker registry) Nvidia V100 / Driver 450.80, cuda 11.2, cuDNN 8.1.0 CentOS 7.8 - Flow (Tensorflow 기준) ✓ Local (M.. 2021. 9. 26.
Kubeflow 1.0 기능 #6 (Metadata) 2020.03.23 1. Kubeflow Metadata ? - Tracking and managing metadata of machine learning workflows in Kubeflow - metadata means information about executions (runs), models, datasets, and other artifacts. Artifacts are the files and objects that form the inputs and outputs of the components in your ML workflow 2. Try the Metadata SDK in a sample Jupyter notebook - demo.ipynb download from Github Te.. 2021. 9. 25.
Kubeflow 1.0 기능 #5 (KFServing, TFServing) 2020.03.13 1. Model serving overview - https://v1-0-branch.kubeflow.org/docs/reference/pytorchjob/v1/pytorch/ - Kubeflow supports two model serving systems that allow multi-framework model serving: KFServing and Seldon Core. Alternatively, you can use a standalone model serving system. a. Multi-framework model serving - A check mark (✓) indicates that the system (KFServing or Seldon Core) suppor.. 2021. 9. 25.
Kubeflow 1.0 기능 #4 (PyTorch Training) 2020.03.09 1. PyTorchJob ? - Kubeflow에서 PyTorch training할 때 사용되는 Kubernetes custom resource - https://v1-0-branch.kubeflow.org/docs/reference/pytorchjob/v1/pytorch/ 2. PyTorch training 하기 - https://v1-0-branch.kubeflow.org/docs/components/training/pytorch/components/training/pytorch/ a. Cloud shell 기동 b. Verify that PyTorch support is included in your Kubeflow deployment $ kubectl get crd | head.. 2021. 9. 25.
Kubeflow 1.0 기능 #3 (Katib) 2020.03.09 1. Kubeflow Katib ? - Katib uses for automated tuning of ML model’s hyperparameters. Hyperparameters are the variables that control the model training process. For example: ✓ Learning rate. ✓ Number of layers in a neural network. ✓ Number of nodes in each layer. Hyperparameter values are not learned. Hyperparameter tuning is the process of optimizing the hyperparameter values to maxim.. 2021. 9. 25.
Kubeflow 1.0 기능 #2 (TF-Job, TF-Serving, Kubeflow pipeline) 2020.02.26 1. 개요 - GKE에 설치한 Kubeflow의 Pipeline 기능을 이해하기 위해 아래 사이트를 참조하여 사용 해 봄 - Using Kubeflow for Financial Time Series (https://github.com/kubeflow/examples/tree/master/financial_time_series) - This example covers the following concepts: a. Deploying Kubeflow to a GKE cluster b. Exploration via JupyterHub (prospect data, preprocess data, develop ML model) c. Training several tensorflow models.. 2021. 9. 24.
Kubeflow 1.0 기능 #1 (Jupyter notebook) 2020.03.12 1. 참고 문서 - https://v1-0-branch.kubeflow.org/docs/notebooks/setup/ 2. Notebook server 생성 a. Cloud shell 기동 b. URL 확인 $ kubectl -n istio-system get ingress NAME HOSTS ADDRESS PORTS AGE envoy-ingress my-kubeflow.endpoints.my-kubeflow-269301.cloud.goog 34.107.211.135 80 6m42s$ $ c. Kubeflow 접속 (URL: my-kubeflow.endpoints.my-kubeflow-269301.cloud.goog) d. Create a Jupyter notebook server a.. 2021. 9. 24.
Running the MNIST using distributed training 2021.5.28 1. Running the MNIST on-prem Jupyter notebook - The MNIST on-prem notebook builds a Docker image, launches a TFJob to train a model, and creates an InferenceService (KFServing) to deploy the trained model. - https://v1-2-branch.kubeflow.org/docs/started/workstation/minikube-linux/#running-the-mnist-on-prem-jupyter-notebook a. Prerequisites - Step 1: Set up Python environment in MacOS y.. 2021. 9. 24.
Kubeflow Jupyter Custom Image 추가 2021.06.29 1. 개요 - kubeflow Notebook Server 생성을 위하여 Custom Image를 추가 - References https://www.kangwoo.kr/tag/jupyter/ https://github.com/kubeflow/kubeflow/tree/v1.2.0/components/tensorflow-notebook-image https://towardsdatascience.com/make-kubeflow-into-your-own-data-science-workspace-cc8162969e29 - Custom Image spec ✓ Base image: tensorflow/tensorflow:2.5.0-gpu-jupyter (cuda 11.2, cuDNN 8.1.0),.. 2021. 9. 24.