K8s - Slab memory leakage

2020.12.02

a. Problem: POD - cannot allocate memory

- Environments

Kubernetes 1.16.15, centos 7.8 / 7.9, Docker 19.03 / 20.10

- leakage 발생

centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 19.03 (iap10, iap11)

centos 7.9 / 3.10.0-1160.15.2.el7.x86_64 / Dcoker 20.10.3

- leakage 미 발생

centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 18.06 (iap04 ~ iap09)

[iap@iap01 ~]$ k describe pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph | egrep Events -A10

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled <unknown> default-scheduler Successfully assigned rook-ceph/rook-ceph-osd-prepare-iap11-b69k7 to iap11

Warning FailedCreatePodContainer 3m24s (x255 over 58m) kubelet, iap11 unable to ensure pod container exists: failed to create container for [kubepods besteffort podbde5ed67-dd1e-4d41-ba41-cade1108e04c] : mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c: cannot allocate memory

[iap@iap01 ~]$ k get pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

rook-ceph-osd-prepare-iap11-b69k7 0/1 Init:0/1 0 128m <none> iap11 <none> <none>

[iap@iap01 ~]$

[root@gmd01 ~]# k describe nodes gmd01 | egrep Events -A 8

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Starting 16m kubelet, gmd01 Starting kubelet.

Normal NodeAllocatableEnforced 16m kubelet, gmd01 Updated Node Allocatable limit across pods

Normal NodeHasNoDiskPressure 15m (x7 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasNoDiskPressure

Normal NodeHasSufficientPID 15m (x8 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasSufficientPID

Normal NodeHasSufficientMemory 64s (x129 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasSufficientMemory

[root@gmd01 ~]# journalctl -u kubelet -f

Mar 26 13:43:25 gmd01 kubelet[13204]: E0326 13:43:25.865256 13204 kubelet_node_status.go:94] Unable to register node "gmd01" with API server: Node "gmd01" is invalid: [status.capacity.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:67294654464, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"65717436Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes,TS #4: Slab memory leakage status.capacity.pods: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:61483446272, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"60042428Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]

…

b. Cause analysis: Kernel bug

if you leaked too much memory cgroups, new memory cgroup cannot be created and will fail with "Cannot allocate memory".

https://bugs.centos.org/view.php?id=17780

https://bugzilla.redhat.com/show_bug.cgi?id=1507149

[root@iap11 ~]# mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c

mkdir: cannot create directory '/sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c': Cannot allocate memory

[root@iap11 ~]# df -h | egrep "Filesystem|cgroup"

Filesystem Size Used Avail Use% Mounted on

tmpfs 32G 0 32G 0% /sys/fs/cgroup

[root@iap11 ~]# free -h

total used free shared buff/cache available

Mem: 62G 46G 6.1G 3.1G 10G 12G

Swap: 0B 0B 0B

[root@iap11 ~]#

[root@iap11 ~]# ls /sys/kernel/slab | wc -l

184990

[root@iap11 ~]# slabtop -s -c

Active / Total Objects (% used) : 18607885 / 30427076 (61.2%)

Active / Total Slabs (% used) : 598690 / 598690 (100.0%)

Active / Total Caches (% used) : 125 / 178 (70.2%)

Active / Total Size (% used) : 4896567.75K / 10496016.66K (46.7%)

Minimum / Average / Maximum Object : 0.01K / 0.34K / 15.25K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME

6018034 6017202 99% 0.12K 88502 68 708016K kernfs_node_cache

3973200 387465 9% 0.25K 62082 64 993312K kmalloc-256

3863152 389465 10% 0.50K 61228 64 1959296K kmalloc-512

2233182 392360 17% 0.19K 53172 42 425376K kmalloc-192

…

[iap@iap01 ~]$ sudo ssh root@iap04 ls /sys/kernel/slab | wc -l

349

[iap@iap01 ~]$ sudo ssh root@iap05 ls /sys/kernel/slab | wc -l

344

[iap@iap01 ~]$ sudo ssh root@iap06 ls /sys/kernel/slab | wc -l

337

[iap@iap01 ~]$ sudo ssh root@iap07 ls /sys/kernel/slab | wc -l

328

[iap@iap01 ~]$ sudo ssh root@iap08 ls /sys/kernel/slab | wc -l

343

[iap@iap01 ~]$ sudo ssh root@iap09 ls /sys/kernel/slab | wc -l

328

[iap@iap01 ~]$ sudo ssh root@iap10 ls /sys/kernel/slab | wc -l

129786

[iap@iap01 ~]$ sudo ssh root@iap11 ls /sys/kernel/slab | wc -l

184990

[iap@iap01 ~]$

c. Solution: “cgroup.memory=nokmem” 설정 및 reboot

- Trying to disable kernel memory accounting:

according to https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt,

passing cgroup.memory=nokmem to the kernel at boot time, should be able to archive that.

[root@iap11 ~]# cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

[root@iap11 ~]# vi /etc/default/grub

GRUB_TIMEOUT=5

…

GRUB_CMDLINE_LINUX="crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 cgroup.memory=nokmem"

GRUB_DISABLE_RECOVERY=“true"

[root@iap11 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

Generating grub configuration file ...

Found linux image: /boot/vmlinuz-3.10.0-1127.el7.x86_64

Found initrd image: /boot/initramfs-3.10.0-1127.el7.x86_64.img

Found linux image: /boot/vmlinuz-0-rescue-25f834765d864b15b88bd778cf7d612b

Found initrd image: /boot/initramfs-0-rescue-25f834765d864b15b88bd778cf7d612b.img

[root@iap11 ~]# reboot

[root@iap11 ~]# cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 cgroup.memory=nokmem

[root@iap11 ~]#

'Kubernetes > Management' 카테고리의 다른 글

Cert-manager with LetsEncrypt (DNS challenge) (1)	2021.09.23
Crobjob (0)	2021.09.23
K8s - Node NotReady (0)	2021.09.16
K8s - CNI not ready (0)	2021.09.15
istio - Envoy CPU 과다 점유 (0)	2021.09.15

일주일만 하면 ...

K8s - Slab memory leakage

'Kubernetes > Management' 카테고리의 다른 글

댓글

티스토리툴바

K8s - Slab memory leakage

'Kubernetes > Management' 카테고리의 다른 글

관련글

댓글

티스토리툴바