2021.04.27
1. 개요
- Minikube 기반 하에 kubeflow 구성
- https://v1-2-branch.kubeflow.org/docs/started/workstation/minikube-linux/
문서에서 설명한 옵션을 사용할 경우 Kubeflow dashboard 접근시 오류 발생 (service-account-signing-key-file 값, Troubleshooting 참조)
2. Envrionments
- minikube 1.17.1
- kubernetes 1.16.15
- kubeflow 1.2
- macOS 11.2
3. Install Kubeflow
a. Prerequisites
- Recommended resources
8 cores, 16GB RAM, 250GB storage
- Minimum resources
6 cores, 10 GB RAM, 30GB storage
b. Start minikube
- minikube에서 kubernetes 클러스터 환경을 구성 시 kubeflow 설치를 위한 최소한의 CPU/Memory/Disk 값을 설정하고,
- service-account-signing-key-file 값은 설치 문서에 기재된 apiserver.key 대신 sa.key 값을 사용 할 것 (문서 4.21 기준)
yoosungjeon@ysjeon-Dev ~ % minikube start --driver=hyperkit --kubernetes-version=1.16.15 \
--cpus=6 --memory=8g --disk-size=40g --profile kf \
--extra-config=apiserver.service-account-issuer=api \
--extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key \
--extra-config=apiserver.service-account-api-audiences=api
😄 [kf] Darwin 11.2.3 위의 minikube v1.17.1
✨ 유저 환경 설정 정보에 기반하여 hyperkit 드라이버를 사용하는 중
👍 kf 클러스터의 kf 컨트롤 플레인 노드를 시작하는 중
🔥 hyperkit VM (CPUs=6, Memory=8192MB, Disk=40960MB) 를 생성하는 중 ...
🐳 쿠버네티스 v1.16.15 을 Docker 20.10.2 런타임으로 설치하는 중
▪ apiserver.service-account-issuer=api
▪ apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key
▪ apiserver.service-account-api-audiences=api
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: storage-provisioner, default-storageclass
❗ /usr/local/bin/kubectl is version 1.20.4, which may have incompatibilites with Kubernetes 1.16.15.
▪ Want kubectl v1.16.15? Try 'minikube kubectl -- get pods -A'
🏄 Done! kubectl is now configured to use "kf" cluster and "default" namespace by default
yoosungjeon@ysjeon-Dev ~ % minikube profile list
|----------|-----------|---------|---------------|------|----------|---------|-------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes |
|----------|-----------|---------|---------------|------|----------|---------|-------|
| kf | hyperkit | docker | 192.168.64.27 | 8443 | v1.16.15 | Running | 1 |
| minikube | hyperkit | docker | 192.168.64.26 | 8443 | v1.16.15 | Paused | 1 |
|----------|-----------|---------|---------------|------|----------|---------|-------|
yoosungjeon@ysjeon-Dev acp-kubeflow %
c. Installation of Kubeflow
% wget https://github.com/kubeflow/kfctl/archive/refs/tags/v1.2.0.zip
% unzip v1.2.0.zip
% wget https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
% tar xzf kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
% ./kfctl version
kfctl v1.2.0-0-gbc038f9
%
% export KF_NAME=acp-kubeflow
% export BASE_DIR=/Users/yoosungjeon/Private/k8s-oss/kf-deployments
% export KF_DIR=${BASE_DIR}/${KF_NAME}
% export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml"
% mkdir -p ${KF_DIR}
% cd ${KF_DIR}
% ../kfctl build -V -f ${CONFIG_URI}
% export CONFIG_FILE=${KF_DIR}/kfctl_k8s_istio.v1.2.0.yaml
% ../kfctl apply -V -f ${CONFIG_FILE}
…
INFO[0174] Successfully applied application spartakus filename="kustomize/kustomize.go:291"
INFO[0174] Applied the configuration Successfully! filename="cmd/apply.go:75"
%
# Kubeflow에서 사용하는 namespaces 조회
% k get ns | egrep -v "default|kube-"
NAME STATUS AGE
cert-manager Active 17m
istio-system Active 16m
knative-serving Active 13m
kubeflow Active 17m
%
# istio 관련 사항 조회: kubeflow namespace는 'istio-injection=enabled' label이 설정 됨
% k get ns --show-labels
NAME STATUS AGE LABELS
cert-manager Active 14h app.kubernetes.io/component=cert-manager,app.kubernetes.io/name=cert-manager,kustomize.component=cert-manager
default Active 14h <none>
istio-system Active 14h kustomize.component=cluster-local-gateway
knative-serving Active 14h app.kubernetes.io/component=knative-serving-install,app.kubernetes.io/name=knative-serving-install,kustomize.component=knative,serving.knative.dev/release=v0.14.3
kube-node-lease Active 14h <none>
kube-public Active 14h <none>
kube-system Active 14h <none>
kubeflow Active 14h control-plane=kubeflow,istio-injection=enabled,katib-metricscollector-injection=enabled
%
d. Launch of Kubeflow central dashboard
% export INGRESS_HOST=$(minikube ip -p kf)
% export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
% curl $INGRESS_HOST:$INGRESS_PORT -v
* Trying 192.168.64.32...
* TCP_NODELAY set
* Connected to 192.168.64.32 (192.168.64.32) port 31380 (#0)
> GET / HTTP/1.1
> Host: 192.168.64.32:31380
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< x-powered-by: Express
< accept-ranges: bytes
< cache-control: public, max-age=0
< last-modified: Thu, 06 Aug 2020 03:45:40 GMT
< etag: W/"599-173c1dfdaa0"
< content-type: text/html; charset=UTF-8
< content-length: 1433
< date: Mon, 26 Apr 2021 08:41:51 GMT
< x-envoy-upstream-service-time: 46
< server: istio-envoy
<
* Connection #0 to host 192.168.64.32 left intact
<!doctype html><html lang="en"><head>...
%
- URL: 192.168.64.32:31380
4. Troubleshooting
- TS #1
▷ Problem: Kubeflow dashboard 접근시 http 503 에러 발생
▷ Cause: Dash board 접근 에러
⇢ istio-ingressgateway POD: readiness 오류
⇢ istio-ingressgateway POD: Envoy proxy not ready
⇢ istio-ingressgateway POD: failed to get root cert, authenticate failure
⇢ istio-pilot POD : authenticate failure
% k get pod istio-ingressgateway-85d57dc8bc-cf476 -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-85d57dc8bc-cf476 0/1 Running 0 14h
% k describe pod istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system | grep Events -A10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 76s (x1201 over 41m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
% k logs istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system -f | \
egrep "Envoy proxy is NOT ready|failed to get root cert|request authenticate failure|connection failure|no healthy upstream"
[2021-04-23 02:51:43.217][55][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87]
PC config stream closed: 2, failed to get root cert
2021-04-23T02:51:45.102752Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2021-04-23 02:51:45.869][55][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2021-04-23T02:51:47.101803Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2021-04-23T02:51:49.103548Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
…
%
% k exec istio-ingressgateway-85d57dc8bc-tt9mc -n istio-system -it -- bash
root@istio-ingressgateway-85d57dc8bc-tt9mc:/# curl http://localhost:15020/healthz/ready -v
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 15020 (#0)
> GET /healthz/ready HTTP/1.1
> Host: localhost:15020
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Date: Fri, 23 Apr 2021 06:34:58 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
root@istio-ingressgateway-85d57dc8bc-tt9mc:~# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:46 ? 00:00:03 /usr/local/bin/pilot-agent proxy router --domain istio-system.svc.cluster.local …
root 23 1 0 05:46 ? 00:00:26 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json …
…
root@istio-ingressgateway-85d57dc8bc-tt9mc:/# grep istio-pilot /etc/istio/proxy/envoy-rev0.json
"spiffe://cluster.local/ns/istio-system/sa/istio-pilot-service-account"
"socket_address": {"address": "istio-pilot.istio-system", "port_value": 15011}
root@istio-ingressgateway-85d57dc8bc-tt9mc:/#
▷ Solution
- https://github.com/kubeflow/kubeflow/issues/5447
we changed the flag extra-config=apiserver.service-account-signing-key-file from /var/lib/minikube/certs/apiserver.key to /var/lib/minikube/certs/sa.key
% minikube ssh -p kf
_ _
_ _ ( ) ( )
___ ___ (_) ___ (_)| |/') _ _ | |_ __
/' _ ` _ `\| |/' _ `\| || , < ( ) ( )| '_`\ /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )( ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)
$ ls /var/lib/minikube/certs/apiserver.*
/var/lib/minikube/certs/apiserver.crt /var/lib/minikube/certs/apiserver.key
$ ls /var/lib/minikube/certs/sa.*
/var/lib/minikube/certs/sa.key /var/lib/minikube/certs/sa.pub
$
$ sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep service-account
- --service-account-api-audiences=api
- --service-account-issuer=api
- --service-account-key-file=/var/lib/minikube/certs/sa.pub
- --service-account-signing-key-file=/var/lib/minikube/certs/apiserver.key
$ sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
…
- --service-account-issuer=api
- --service-account-key-file=/var/lib/minikube/certs/sa.pub
- --service-account-signing-key-file=/var/lib/minikube/certs/sa.key # append line
…
$ exit
### kubelet이 kube-apiserver.yaml 변경을 감지하고 kube-apiserver POD를 재기동 시켜서 변경 사항이 반영 됨
% k describe pod kube-apiserver-kf -n kube-system | grep service-account-
--service-account-api-audiences=api
--service-account-issuer=api
--service-account-key-file=/var/lib/minikube/certs/sa.pub
--service-account-signing-key-file=/var/lib/minikube/certs/sa.key
%
% k delete pod istio-citadel-6c468575db-98w4q -n istio-system
pod "istio-citadel-6c468575db-98w4q" deleted
% k delete pod istio-pilot-77bc8867cf-rc5v4 -n istio-system
pod "istio-pilot-77bc8867cf-rc5v4" deleted
%
- Istio Architecture v 1.9 (Kubeflow 1.2에는 istio 1.3이 포함되어 있음)
- Pilot abstracts platform-specific service discovery mechanisms and synthesizes them into a standard format that any sidecar conforming with the Envoy API can consume.
- istio 관련 POD
✓ istio-pilot-77bc8867cf-rc5v4
⇢ istio-proxy container
/usr/local/bin/pilot-agent proxy <= 15011 port listen & service
⇢ discovery container
/usr/local/bin/pilot-discovery discovery
✓ istio-ingressgateway-bf8654559-5rcp5
/usr/local/bin/pilot-agent proxy router
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
✓ ml-pipeline-viewer-crd-754d85df8d-24lpp
⇢ istio-proxy container
/usr/local/bin/pilot-agent proxy sidecar
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
▷ ml-pipeline-viewer-crd container
✓ ml-pipeline-visualizationserver-769546b47b-qf5ll
⇢ istio-proxy container
/usr/local/bin/pilot-agent proxy sidecar
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
⇢ ml-pipeline-visualizationserver container
'Kubeflow > Install' 카테고리의 다른 글
Kubeflow 1.4.1 in Minikube 구성 (0) | 2021.12.30 |
---|---|
Kubeflow 1.2 in On-prem 구성 (0) | 2021.09.24 |
Kubeflow 1.0 in On-prem 구성 (0) | 2021.09.24 |
Kubeflow 1.0 using MiniKF 구성 (Windows 10) (0) | 2021.09.24 |
Kubeflow 1.0 in GCE 구성 (0) | 2021.09.24 |
댓글