Kubeflow 1.2 in Minikube 구성

2021. 9. 24.



1. 개요

- Minikube 기반 하에 kubeflow 구성


   문서에서 설명한 옵션을 사용할 경우 Kubeflow dashboard 접근시 오류 발생 (service-account-signing-key-file 값, Troubleshooting 참조)



2. Envrionments

- minikube 1.17.1

- kubernetes 1.16.15

- kubeflow 1.2

- macOS 11.2



3. Install Kubeflow

a. Prerequisites

- Recommended resources

   8 cores, 16GB RAM, 250GB storage

- Minimum resources

   6 cores, 10 GB RAM, 30GB storage


b. Start minikube

- minikube에서 kubernetes 클러스터 환경을 구성 시 kubeflow 설치를 위한 최소한의  CPU/Memory/Disk 값을 설정하고,

- service-account-signing-key-file 값은 설치 문서에 기재된 apiserver.key 대신 sa.key 값을 사용 할 것 (문서 4.21 기준)

yoosungjeon@ysjeon-Dev ~ % minikube start --driver=hyperkit --kubernetes-version=1.16.15 \
    --cpus=6 --memory=8g --disk-size=40g --profile kf \
    --extra-config=apiserver.service-account-issuer=api \
    --extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key \
😄  [kf] Darwin 11.2.3 위의 minikube v1.17.1
✨  유저 환경 설정 정보에 기반하여 hyperkit 드라이버를 사용하는 중
👍  kf 클러스터의 kf 컨트롤 플레인 노드를 시작하는 중
🔥  hyperkit VM (CPUs=6, Memory=8192MB, Disk=40960MB) 를 생성하는 중 ...
🐳  쿠버네티스 v1.16.15 을 Docker 20.10.2 런타임으로 설치하는 중
    ▪ apiserver.service-account-issuer=api
    ▪ apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key
    ▪ apiserver.service-account-api-audiences=api
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔎  Verifying Kubernetes components...
🌟  Enabled addons: storage-provisioner, default-storageclass

❗  /usr/local/bin/kubectl is version 1.20.4, which may have incompatibilites with Kubernetes 1.16.15.
    ▪ Want kubectl v1.16.15? Try 'minikube kubectl -- get pods -A'
🏄  Done! kubectl is now configured to use "kf" cluster and "default" namespace by default
yoosungjeon@ysjeon-Dev ~ % minikube profile list
| Profile  | VM Driver | Runtime |      IP       | Port | Version  | Status  | Nodes |
| kf       | hyperkit  | docker  | | 8443 | v1.16.15 | Running |     1 |
| minikube | hyperkit  | docker  | | 8443 | v1.16.15 | Paused  |     1 |
yoosungjeon@ysjeon-Dev acp-kubeflow %


c. Installation of Kubeflow

% wget https://github.com/kubeflow/kfctl/archive/refs/tags/v1.2.0.zip
% unzip v1.2.0.zip
% wget https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
% tar xzf kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
% ./kfctl version
kfctl v1.2.0-0-gbc038f9

% export KF_NAME=acp-kubeflow
% export BASE_DIR=/Users/yoosungjeon/Private/k8s-oss/kf-deployments
% export KF_DIR=${BASE_DIR}/${KF_NAME}
% export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml"
% mkdir -p ${KF_DIR}
% cd ${KF_DIR}
% ../kfctl build -V -f ${CONFIG_URI}
% export CONFIG_FILE=${KF_DIR}/kfctl_k8s_istio.v1.2.0.yaml
% ../kfctl apply -V -f ${CONFIG_FILE}
INFO[0174] Successfully applied application spartakus    filename="kustomize/kustomize.go:291"
INFO[0174] Applied the configuration Successfully!       filename="cmd/apply.go:75"

# Kubeflow에서 사용하는 namespaces 조회
% k get ns | egrep -v "default|kube-"
NAME              STATUS   AGE
cert-manager      Active   17m
istio-system      Active   16m
knative-serving   Active   13m
kubeflow          Active   17m

# istio 관련 사항 조회: kubeflow namespace는 'istio-injection=enabled' label이 설정 됨
% k get ns --show-labels
NAME              STATUS   AGE   LABELS
cert-manager      Active   14h   app.kubernetes.io/component=cert-manager,app.kubernetes.io/name=cert-manager,kustomize.component=cert-manager
default           Active   14h   <none>
istio-system      Active   14h   kustomize.component=cluster-local-gateway
knative-serving   Active   14h   app.kubernetes.io/component=knative-serving-install,app.kubernetes.io/name=knative-serving-install,kustomize.component=knative,serving.knative.dev/release=v0.14.3
kube-node-lease   Active   14h   <none>
kube-public       Active   14h   <none>
kube-system       Active   14h   <none>
kubeflow          Active   14h   control-plane=kubeflow,istio-injection=enabled,katib-metricscollector-injection=enabled


d. Launch of Kubeflow central dashboard

% export INGRESS_HOST=$(minikube ip -p kf)
% export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
*   Trying
* Connected to ( port 31380 (#0)
> GET / HTTP/1.1
> Host:
> User-Agent: curl/7.64.1
> Accept: */*
< HTTP/1.1 200 OK
< x-powered-by: Express
< accept-ranges: bytes
< cache-control: public, max-age=0
< last-modified: Thu, 06 Aug 2020 03:45:40 GMT
< etag: W/"599-173c1dfdaa0"
< content-type: text/html; charset=UTF-8
< content-length: 1433
< date: Mon, 26 Apr 2021 08:41:51 GMT
< x-envoy-upstream-service-time: 46
< server: istio-envoy
* Connection #0 to host left intact
<!doctype html><html lang="en"><head>...

- URL:



4. Troubleshooting

- TS #1

   ▷ Problem: Kubeflow dashboard 접근시 http 503 에러 발생   

   ▷ Cause: Dash board 접근 에러

           ⇢ istio-ingressgateway POD: readiness 오류

           ⇢ istio-ingressgateway POD: Envoy proxy not ready

           ⇢ istio-ingressgateway POD: failed to get root cert, authenticate failure

           ⇢ istio-pilot POD : authenticate failure

% k get pod istio-ingressgateway-85d57dc8bc-cf476 -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istio-ingressgateway-85d57dc8bc-cf476   0/1     Running   0          14h
% k describe pod istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system | grep Events -A10
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  76s (x1201 over 41m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503
% k logs istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system -f | \
  egrep "Envoy proxy is NOT ready|failed to get root cert|request authenticate failure|connection failure|no healthy upstream"
[2021-04-23 02:51:43.217][55][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87]
PC config stream closed: 2, failed to get root cert
2021-04-23T02:51:45.102752Z   info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2021-04-23 02:51:45.869][55][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2021-04-23T02:51:47.101803Z   info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2021-04-23T02:51:49.103548Z   info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
% k exec istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system -it -- bash
root@istio-ingressgateway-85d57dc8bc-tt9mc:/# curl http://localhost:15020/healthz/ready -v
*   Trying
* Connected to localhost ( port 15020 (#0)
> GET /healthz/ready HTTP/1.1
> Host: localhost:15020
> User-Agent: curl/7.47.0
> Accept: */*
< HTTP/1.1 503 Service Unavailable
< Date: Fri, 23 Apr 2021 06:34:58 GMT
< Content-Length: 0
* Connection #0 to host localhost left intact
root@istio-ingressgateway-85d57dc8bc-tt9mc:~# ps -ef
root  1    0     0 05:46 ?    00:00:03 /usr/local/bin/pilot-agent proxy router --domain istio-system.svc.cluster.local …
root  23   1     0 05:46 ?    00:00:26 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json …
root@istio-ingressgateway-85d57dc8bc-tt9mc:/# grep istio-pilot /etc/istio/proxy/envoy-rev0.json
"socket_address": {"address": "istio-pilot.istio-system", "port_value": 15011}



   ▷ Solution

       - https://github.com/kubeflow/kubeflow/issues/5447

          we changed the flag extra-config=apiserver.service-account-signing-key-file  from /var/lib/minikube/certs/apiserver.key to /var/lib/minikube/certs/sa.key

% minikube ssh -p kf
                         _             _
            _         _ ( )           ( )
  ___ ___  (_)  ___  (_)| |/')  _   _ | |_      __
/' _ ` _ `\| |/' _ `\| || , <  ( ) ( )| '_`\  /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )(  ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)

$ ls /var/lib/minikube/certs/apiserver.*
/var/lib/minikube/certs/apiserver.crt  /var/lib/minikube/certs/apiserver.key
$ ls /var/lib/minikube/certs/sa.*
/var/lib/minikube/certs/sa.key  /var/lib/minikube/certs/sa.pub
$ sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep service-account
    - --service-account-api-audiences=api
    - --service-account-issuer=api
    - --service-account-key-file=/var/lib/minikube/certs/sa.pub
    - --service-account-signing-key-file=/var/lib/minikube/certs/apiserver.key
$ sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
    - --service-account-issuer=api
    - --service-account-key-file=/var/lib/minikube/certs/sa.pub
    - --service-account-signing-key-file=/var/lib/minikube/certs/sa.key  # append line
$ exit
### kubelet이 kube-apiserver.yaml 변경을 감지하고 kube-apiserver POD를 재기동 시켜서 변경 사항이 반영 됨
% k describe pod kube-apiserver-kf -n kube-system | grep service-account-
% k delete pod istio-citadel-6c468575db-98w4q -n istio-system
pod "istio-citadel-6c468575db-98w4q" deleted
% k delete pod istio-pilot-77bc8867cf-rc5v4 -n istio-system
pod "istio-pilot-77bc8867cf-rc5v4" deleted


- Istio Architecture v 1.9 (Kubeflow 1.2에는 istio 1.3이 포함되어 있음)

- Pilot abstracts platform-specific service discovery mechanisms and synthesizes them into a standard format that any sidecar conforming with the Envoy API can consume.

- istio 관련 POD

   ✓ istio-pilot-77bc8867cf-rc5v4

       ⇢ istio-proxy container

            /usr/local/bin/pilot-agent proxy    <= 15011 port listen & service

      ⇢ discovery container

            /usr/local/bin/pilot-discovery discovery

   ✓ istio-ingressgateway-bf8654559-5rcp5

       /usr/local/bin/pilot-agent proxy router 

       Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30

   ✓ ml-pipeline-viewer-crd-754d85df8d-24lpp

      ⇢ istio-proxy container 

           /usr/local/bin/pilot-agent proxy sidecar

           Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30

       ▷ ml-pipeline-viewer-crd container

   ✓ ml-pipeline-visualizationserver-769546b47b-qf5ll

      ⇢ istio-proxy container 

           /usr/local/bin/pilot-agent proxy sidecar

           Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30

      ⇢ ml-pipeline-visualizationserver container

