본문 바로가기
Kubernetes/Management

istio - Envoy CPU 과다 점유

by 여행을 떠나자! 2021. 9. 15.

2020.12.01

a. Problem: Worker node - CPU 과다 점유로 성능 저하 현상 발생

- Environments

  Kubernetes 1.16.15, istio 1.3

- 영향도

   Rook ceph의 rook-ceph-mon-o POD가 iap04 노드에서 동작될 경우 응답 속도가 느려서 quorum에서 제외 되면서 fail-over 동작

 

  [root@iap04 ~]# top

  top - 10:46:25 up 6 days, 17:58,  1 user,  load average: 73.37, 77.52, 79.46

  Tasks: 403 total,  19 running, 382 sleeping,   0 stopped,   2 zombie

  %Cpu(s): 90.1 us,  8.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  1.6 si,  0.0 st

  KiB Mem : 32490092 total,  6647024 free, 16691484 used,  9151584 buff/cache

  KiB Swap:        0 total,        0 free,        0 used. 13772944 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

  22552 1337      20   0  444656 313108  20312 R  85.0  1.0  16:14.61 envoy

  22891 1337      20   0  392428 261624  20308 R  85.0  0.8  16:06.74 envoy

  24234 1337      20   0  444712 315224  20312 R  85.0  1.0  15:35.31 envoy

  24256 1337      20   0  445736 314428  20304 R  85.0  1.0  15:55.10 envoy

  24590 1337      20   0  448804 317332  20312 R  85.0  1.0  15:32.17 envoy

  25626 1337      20   0  449828 318916  20316 R  85.0  1.0  15:59.56 envoy

  22889 1337      20   0  379172 247632  20304 R  75.0  0.8  16:01.93 envoy

  17206 root      20   0 3779588 391124  23104 S  20.0  1.2   1373:21 kubelet

  …

 

 

b.  Cause analysis: Istio의 envoy container에서 CPU 과다 점유 

  [root@iap04 ~]# ps -ef | grep 22552 | grep -v grep | cut -c-100

  1337  22552 22412 88 10:28 ?   00:16:56 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.

  [root@iap04 ~]# ps -ef | grep 22412 | grep -v grep | cut -c-100

  1337  22412 22380  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain kn

  1337  22552 22412 88 10:28 ?   00:17:03 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.

  [root@iap04 ~]# ps -ef | grep "/usr/local/bin/pilot-agent proxy sidecar" | grep -v grep | cut -c-130

  1337  22412 22380  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  22736 22682  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  22757 22707  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  24015 23965  0 10:28 ?   00:00:02 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  24028 23982  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  24412 24377  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  1337  25513 25471  0 10:28 ?   00:00:01 /usr/local/bin/pilot-agent proxy sidecar --domain knative-serving.svc.cluster.local

  [root@iap04 ~]#

 

    # Istio envoy(sidecar)를 injection 하도록 설정된 namespace 검색 및 비 정상 POD 확인

  [iap@iap01 ~]$ kubectl get namespace -L istio-injection | grep enabled

  admin                    Active   117d    enabled

  knative-serving          Active   117d    enabled

  [iap@iap01 ~]$ k get pod -n knative-serving -o wide | grep "1/2"

  activator-6dc4884-77wtg  1/2  Running   3   16h    10.244.6.217  iap04  <none> <none>

  activator-6dc4884-pnrt5  1/2  Running   12  3d16h  10.244.6.205  iap04  <none> <none>

  activator-6dc4884-w78wr  1/2  Running   3   16h    10.244.6.216  iap04  <none> <none>

  activator-6dc4884-wr9k4  1/2  Running   3   16h    10.244.6.214  iap04  <none> <none>

  activator-6dc4884-zrlbh  1/2  Running   3   16h    10.244.6.215  iap04  <none> <none>

  activator-6dc4884-zz9js  1/2  Running   4   16h    10.244.6.213  iap04  <none> <none>

  [iap@iap01 ~]$ k get pod -n knative-serving -o wide | grep activator | wc -l

  20

  [iap@iap01 ~]$ k get deployments.apps activator -n knative-serving

  NAME        READY   UP-TO-DATE   AVAILABLE   AGE

  activator   16/20   20           16          117d

  [iap@iap01 ~]$ k describe pod activator-6dc4884-77wtg -n knative-serving

  …

    istio-proxy:

      Container ID:  docker://474fcacc7b235a02c51a7a3e789f0a27c7c28e11d6126136d12787b4d48ac927

      Image:         docker.io/istio/proxyv2:1.5.8

  …

      State:          Running

        Started:      Tue, 01 Dec 2020 10:28:13 +0900

      Last State:     Terminated

        Reason:       Completed

        Exit Code:    0

        Started:      Mon, 30 Nov 2020 18:19:55 +0900

        Finished:     Tue, 01 Dec 2020 10:28:07 +0900

      Ready:          False

      Restart Count:  1

  …

 

    # envoy process가 CPU 과다 점유 내용 확인

  [iap@iap01 ~]$ k exec activator-6dc4884-77wtg -c istio-proxy -n knative-serving -it -- sh

  $ top

  top - 01:52:45 up 6 days, 18:04,  0 users,  load average: 73.67, 77.36, 78.76

  Tasks:   4 total,   2 running,   2 sleeping,   0 stopped,   0 zombie

  %Cpu(s): 91.8 us,  7.1 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  1.2 si,  0.0 st

  KiB Mem : 32490092 total,  5735552 free, 17246084 used,  9508456 buff/cache

  KiB Swap:        0 total,        0 free,        0 used. 13201572 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

     25 istio-p+  20   0  542956 412108  20312 R  97.3  1.3  21:18.08 envoy

      1 istio-p+  20   0  158516  28904  16552 S   0.3  0.1   0:01.96 pilot-agent

  …

 

 

c. Solution: 

  [iap@iap01 ~]$ k rollout restart deployment activator -n knative-serving

  deployment.apps/activator restarted

  [iap@iap01 ~]$

 

    CPU 과다 점유 현상 해소 됨 그러나 iap04/iap05는 저 사양으로 CPU 사용율은 여전히 높음

  [root@iap04 ~]# lscpu | grep -i socket

  Core(s) per socket:    4

  Socket(s):             1

  [root@iap04 ~]#

  [root@iap10 ~]# lscpu | grep -i socket

  Core(s) per socket:    20

  Socket(s):             2

 [root@iap10 ~]#

 

  [iap@iap01 ~]$ k get deployments.apps activator -n knative-serving

  NAME        READY   UP-TO-DATE   AVAILABLE   AGE

  activator   20/20   20           20          117d

  [iap@iap01 ~]$ k get pod -n knative-serving -o wide | grep activator | grep -v grep | tr -s ' ' | cut -d' ' -f 7 | sort | uniq -c

        5 iap04

        4 iap05

        8 iap10

        3 iap11

  [iap@iap01 ~]$ ~/bin/check-cpu.sh

         procs -----------memory-------------- ---swap--- -----io----- ----syste---- ------cpu-----

  node    r  b   swpd   free   buff   cache      si   so     bi    bo   in     cs    us sy  id wa st

  iap04:  9  8      0 4000816  420852 11492244    0    0      0   154   16678  30269 71 10   7 12  0

  iap05:  2 13      0 256456   117000 21081552    0    0      0    96   11376  22950 31  5  10 54  0

  iap06:  1  2      0 3699588  283164 13359000    0    0      0    52    8015  20149 24  3  62 12  0

  iap07:  0  0      0 7800952    6824 18504192    0    0      0     0    2387   6741  0  0 100  0  0

  iap08: 25  0      0 10890024   7072 18057196    0    0      0 16436   29340  33996 24  3  73  0  0

  iap09:  0  0      0 17706208  22344 12057468    0    0      0     8    1674   3914  1  0  99  0  0

  iap10:  2  1      0 38194324  13944  1099520    0    0  28672 20844   20330  15851  5  1  93  1  0

  iap11:  4  2      0 5350500   47616  6421024    0    0      0 29168   31602 110691  4  2  92  2  0

  [iap@iap01 ~]$

  [iap@iap01 ~]$ k exec activator-6c8699d66-g9q2c -c istio-proxy -n knative-serving -it -- sh

  $ top

  top - 04:05:26 up 6 days, 20:17,  0 users,  load average: 23.39, 22.03, 19.40

  Tasks:   4 total,   2 running,   2 sleeping,   0 stopped,   0 zombie

  %Cpu(s): 57.6 us,  4.3 sy,  0.0 ni, 20.4 id, 16.6 wa,  0.0 hi,  1.1 si,  0.0 st

  KiB Mem : 32490092 total,  2193476 free, 18648272 used, 11648344 buff/cache

  KiB Swap:        0 total,        0 free,        0 used. 11751124 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

     22 istio-p+  20   0  215340  76296  20384 R  83.7  0.2 105:44.26 envoy

'Kubernetes > Management' 카테고리의 다른 글

Cert-manager with LetsEncrypt (DNS challenge)  (1) 2021.09.23
Crobjob  (0) 2021.09.23
K8s - Slab memory leakage  (2) 2021.09.16
K8s - Node NotReady  (0) 2021.09.16
K8s - CNI not ready  (0) 2021.09.15

댓글