2020.12.02
a. Problem: POD - cannot allocate memory
- Environments
Kubernetes 1.16.15, centos 7.8 / 7.9, Docker 19.03 / 20.10
- leakage 발생
centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 19.03 (iap10, iap11)
centos 7.9 / 3.10.0-1160.15.2.el7.x86_64 / Dcoker 20.10.3
- leakage 미 발생
centos 7.8 / 3.10.0-1127.el7.x86_64 / Docker 18.06 (iap04 ~ iap09)
[iap@iap01 ~]$ k describe pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph | egrep Events -A10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned rook-ceph/rook-ceph-osd-prepare-iap11-b69k7 to iap11
Warning FailedCreatePodContainer 3m24s (x255 over 58m) kubelet, iap11 unable to ensure pod container exists: failed to create container for [kubepods besteffort podbde5ed67-dd1e-4d41-ba41-cade1108e04c] : mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c: cannot allocate memory
[iap@iap01 ~]$ k get pod rook-ceph-osd-prepare-iap11-b69k7 -n rook-ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-prepare-iap11-b69k7 0/1 Init:0/1 0 128m <none> iap11 <none> <none>
[iap@iap01 ~]$
or
[root@gmd01 ~]# k describe nodes gmd01 | egrep Events -A 8
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 16m kubelet, gmd01 Starting kubelet.
Normal NodeAllocatableEnforced 16m kubelet, gmd01 Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 15m (x7 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 15m (x8 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 64s (x129 over 16m) kubelet, gmd01 Node gmd01 status is now: NodeHasSufficientMemory
[root@gmd01 ~]# journalctl -u kubelet -f
Mar 26 13:43:25 gmd01 kubelet[13204]: E0326 13:43:25.865256 13204 kubelet_node_status.go:94] Unable to register node "gmd01" with API server: Node "gmd01" is invalid: [status.capacity.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:67294654464, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"65717436Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes,TS #4: Slab memory leakage status.capacity.pods: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:2485125120, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2370Mi", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:61483446272, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"60042428Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.nvidia.com/gpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]
…
b. Cause analysis: Kernel bug
if you leaked too much memory cgroups, new memory cgroup cannot be created and will fail with "Cannot allocate memory".
https://bugs.centos.org/view.php?id=17780
https://bugzilla.redhat.com/show_bug.cgi?id=1507149
[root@iap11 ~]# mkdir /sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c
mkdir: cannot create directory '/sys/fs/cgroup/memory/kubepods/besteffort/podbde5ed67-dd1e-4d41-ba41-cade1108e04c': Cannot allocate memory
[root@iap11 ~]# df -h | egrep "Filesystem|cgroup"
Filesystem Size Used Avail Use% Mounted on
tmpfs 32G 0 32G 0% /sys/fs/cgroup
[root@iap11 ~]# free -h
total used free shared buff/cache available
Mem: 62G 46G 6.1G 3.1G 10G 12G
Swap: 0B 0B 0B
[root@iap11 ~]#
[root@iap11 ~]# ls /sys/kernel/slab | wc -l
184990
[root@iap11 ~]# slabtop -s -c
Active / Total Objects (% used) : 18607885 / 30427076 (61.2%)
Active / Total Slabs (% used) : 598690 / 598690 (100.0%)
Active / Total Caches (% used) : 125 / 178 (70.2%)
Active / Total Size (% used) : 4896567.75K / 10496016.66K (46.7%)
Minimum / Average / Maximum Object : 0.01K / 0.34K / 15.25K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
6018034 6017202 99% 0.12K 88502 68 708016K kernfs_node_cache
3973200 387465 9% 0.25K 62082 64 993312K kmalloc-256
3863152 389465 10% 0.50K 61228 64 1959296K kmalloc-512
2233182 392360 17% 0.19K 53172 42 425376K kmalloc-192
…
[iap@iap01 ~]$ sudo ssh root@iap04 ls /sys/kernel/slab | wc -l
349
[iap@iap01 ~]$ sudo ssh root@iap05 ls /sys/kernel/slab | wc -l
344
[iap@iap01 ~]$ sudo ssh root@iap06 ls /sys/kernel/slab | wc -l
337
[iap@iap01 ~]$ sudo ssh root@iap07 ls /sys/kernel/slab | wc -l
328
[iap@iap01 ~]$ sudo ssh root@iap08 ls /sys/kernel/slab | wc -l
343
[iap@iap01 ~]$ sudo ssh root@iap09 ls /sys/kernel/slab | wc -l
328
[iap@iap01 ~]$ sudo ssh root@iap10 ls /sys/kernel/slab | wc -l
129786
[iap@iap01 ~]$ sudo ssh root@iap11 ls /sys/kernel/slab | wc -l
184990
[iap@iap01 ~]$
c. Solution: “cgroup.memory=nokmem” 설정 및 reboot
- Trying to disable kernel memory accounting:
according to https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt,
passing cgroup.memory=nokmem to the kernel at boot time, should be able to archive that.
[root@iap11 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8
[root@iap11 ~]# vi /etc/default/grub
GRUB_TIMEOUT=5
…
GRUB_CMDLINE_LINUX="crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 cgroup.memory=nokmem"
GRUB_DISABLE_RECOVERY=“true"
[root@iap11 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1127.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1127.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-25f834765d864b15b88bd778cf7d612b
Found initrd image: /boot/initramfs-0-rescue-25f834765d864b15b88bd778cf7d612b.img
[root@iap11 ~]# reboot
[root@iap11 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 cgroup.memory=nokmem
[root@iap11 ~]#
'Kubernetes > Management' 카테고리의 다른 글
Cert-manager with LetsEncrypt (DNS challenge) (1) | 2021.09.23 |
---|---|
Crobjob (0) | 2021.09.23 |
K8s - Node NotReady (0) | 2021.09.16 |
K8s - CNI not ready (0) | 2021.09.15 |
istio - Envoy CPU 과다 점유 (0) | 2021.09.15 |
댓글