본문 바로가기
Kubernetes/Management

K8s - Slab memory leakage

by 여행을 떠나자! 2021. 9. 16.

2020.12.02

a. Problem: POD - cannot allocate memory

- Environments

   Kubernetes 1.16.15, centos 7.8 / 7.9, Docker 19.03 / 20.10

- leakage 발생

   centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 19.03 (iap10, iap11)

   centos 7.9 / 3.10.0-1160.15.2.el7.x86_64 / Dcoker 20.10.3 

- leakage 미 발생

   centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 18.06 (iap04 ~ iap09)

 

 [iap@iap01 ~]$ k describe pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph | egrep Events -A10

  Events:

   Type     Reason                    Age                    From               Message

   ----     ------                    ----                   ----               -------

   Normal   Scheduled                 <unknown>              default-scheduler  Successfully assigned rook-ceph/rook-ceph-osd-prepare-iap11-b69k7 to iap11

   Warning  FailedCreatePodContainer  3m24s (x255 over 58m)  kubelet, iap11     unable to ensure pod container exists: failed to create container for [kubepods besteffort podbde5ed67-dd1e-4d41-ba41-cade1108e04c] : mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c: cannot allocate memory

 [iap@iap01 ~]$ k get pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph -o wide

   NAME                                READY   STATUS     RESTARTS   AGE    IP       NODE    NOMINATED NODE   READINESS GATES

    rook-ceph-osd-prepare-iap11-b69k7   0/1     Init:0/1   0          128m   <none>   iap11   <none>           <none>

   [iap@iap01 ~]$

or

   [root@gmd01 ~]# k describe nodes gmd01 | egrep Events -A 8

   Events:

     Type    Reason                   Age                  From            Message

     ----    ------                   ----                 ----            -------

     Normal  Starting                 16m                  kubelet, gmd01  Starting kubelet.

     Normal  NodeAllocatableEnforced  16m                  kubelet, gmd01  Updated Node Allocatable limit across pods

     Normal  NodeHasNoDiskPressure    15m (x7 over 16m)    kubelet, gmd01  Node gmd01 status is now: NodeHasNoDiskPressure

     Normal  NodeHasSufficientPID     15m (x8 over 16m)    kubelet, gmd01  Node gmd01 status is now: NodeHasSufficientPID

     Normal  NodeHasSufficientMemory  64s (x129 over 16m)  kubelet, gmd01  Node gmd01 status is now: NodeHasSufficientMemory

   [root@gmd01 ~]# journalctl -u kubelet -f

   Mar 26 13:43:25 gmd01 kubelet[13204]: E0326 13:43:25.865256   13204 kubelet_node_status.go:94] Unable to register node "gmd01" with API server: Node "gmd01" is invalid: [status.capacity.hugepages-2Mi: Invalid value:    resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:67294654464, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"65717436Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes,TS #4: Slab memory leakage status.capacity.pods: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:61483446272, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"60042428Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]

   …

 

 

b.  Cause analysis: Kernel bug

       if you leaked too much memory cgroups, new memory cgroup cannot be created and will fail with "Cannot allocate memory".

        https://bugs.centos.org/view.php?id=17780

        https://bugzilla.redhat.com/show_bug.cgi?id=1507149

 

   [root@iap11 ~]# mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c

   mkdir: cannot create directory '/sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c': Cannot allocate memory

   [root@iap11 ~]# df -h | egrep "Filesystem|cgroup"

   Filesystem               Size  Used Avail Use% Mounted on

   tmpfs                     32G     0   32G   0% /sys/fs/cgroup

   [root@iap11 ~]# free  -h

                 total        used        free      shared  buff/cache   available

   Mem:            62G         46G        6.1G        3.1G         10G         12G

   Swap:            0B          0B          0B

   [root@iap11 ~]# 

   [root@iap11 ~]# ls /sys/kernel/slab | wc -l

   184990

   [root@iap11 ~]# slabtop -s -c

    Active / Total Objects (% used)    : 18607885 / 30427076 (61.2%)

    Active / Total Slabs (% used)      : 598690 / 598690 (100.0%)

    Active / Total Caches (% used)     : 125 / 178 (70.2%)

    Active / Total Size (% used)       : 4896567.75K / 10496016.66K (46.7%)

    Minimum / Average / Maximum Object : 0.01K / 0.34K / 15.25K

     OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

   6018034 6017202  99%    0.12K  88502   68    708016K kernfs_node_cache

   3973200 387465   9%    0.25K  62082    64    993312K kmalloc-256

   3863152 389465  10%    0.50K  61228    64   1959296K kmalloc-512

   2233182 392360  17%    0.19K  53172    42    425376K kmalloc-192

   …

 

   [iap@iap01 ~]$ sudo ssh root@iap04 ls /sys/kernel/slab | wc -l

   349

   [iap@iap01 ~]$ sudo ssh root@iap05 ls /sys/kernel/slab | wc -l

   344

   [iap@iap01 ~]$ sudo ssh root@iap06 ls /sys/kernel/slab | wc -l

   337

   [iap@iap01 ~]$ sudo ssh root@iap07 ls /sys/kernel/slab | wc -l

   328

   [iap@iap01 ~]$ sudo ssh root@iap08 ls /sys/kernel/slab | wc -l

   343

   [iap@iap01 ~]$ sudo ssh root@iap09 ls /sys/kernel/slab | wc -l

   328

   [iap@iap01 ~]$ sudo ssh root@iap10 ls /sys/kernel/slab | wc -l

   129786

   [iap@iap01 ~]$ sudo ssh root@iap11 ls /sys/kernel/slab | wc -l

   184990

   [iap@iap01 ~]$

 

 

c. Solution:  “cgroup.memory=nokmem” 설정 및 reboot

    - Trying to disable kernel memory accounting:

      according to https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

      passing cgroup.memory=nokmem to the kernel at boot time, should be able to archive that.

 

  [root@iap11 ~]# cat /proc/cmdline

  BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

  [root@iap11 ~]# vi /etc/default/grub

  GRUB_TIMEOUT=5

  …

  GRUB_CMDLINE_LINUX="crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 cgroup.memory=nokmem"

  GRUB_DISABLE_RECOVERY=“true"

  [root@iap11 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

  Generating grub configuration file ...

  Found linux image: /boot/vmlinuz-3.10.0-1127.el7.x86_64

  Found initrd image: /boot/initramfs-3.10.0-1127.el7.x86_64.img

  Found linux image: /boot/vmlinuz-0-rescue-25f834765d864b15b88bd778cf7d612b

  Found initrd image: /boot/initramfs-0-rescue-25f834765d864b15b88bd778cf7d612b.img

  [root@iap11 ~]# reboot

 

  [root@iap11 ~]# cat /proc/cmdline

  BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0   cgroup.memory=nokmem

  [root@iap11 ~]#

 

'Kubernetes > Management' 카테고리의 다른 글

Cert-manager with LetsEncrypt (DNS challenge)  (1) 2021.09.23
Crobjob  (0) 2021.09.23
K8s - Node NotReady  (0) 2021.09.16
K8s - CNI not ready  (0) 2021.09.15
istio - Envoy CPU 과다 점유  (0) 2021.09.15

댓글