The issue affects not only Docker, but sometimes the whole system.
Issue 1
Sometimes when accessing the remote server I am unable to execute commands on the terminal and receive the error -bash: fork: retry: Resource temporarily unavailable
multiple times for simple commands such as cd
, ls
, and others. After some time the issue is gone, and later it comes back.
Issue 2
Another issue which I think is also part of the bigger issue is that sometimes I can't manage Docker properly, when I execute commands such as docker ps
, docker images
and others I receive the error runtime/cgo: pthread_create failed: Resource temporarily unavailable
.
Issue 3
Somethimes when I try to build a Docker image, at the start or during the build process in some step, the build fails also with the error runtime/cgo: pthread_create failed: Resource temporarily unavailable
.
The remote server runs on Ubuntu 18.04.6 LTS.
RAM Memory
Running free -m
I get the information about RAM in MB:
total used free shared buff/cache availableMem: 32768 3333 24768 133 4665 29300Swap: 0 0 0
CPU
Running lscpu
I get the information about the CPUs:
Architektur: x86_64CPU Operationsmodus: 32-bit, 64-bitByte-Reihenfolge: Little EndianCPU(s): 8Liste der Online-CPU(s): 0-7Thread(s) pro Kern: 1Kern(e) pro Socket: 8Sockel: 1Anbieterkennung: AuthenticAMDProzessorfamilie: 25Modell: 1Modellname: AMD EPYC 7453 28-Core ProcessorStepping: 1CPU MHz: 1999.725BogoMIPS: 5489.89Virtualisierung: AMD-VHypervisor-Anbieter: ParallelsVirtualisierungstyp: Container
Docker
Docker version: Docker version 24.0.2, build cb74dfc
Docker compose version: Docker Compose version v2.21.0
Limits
Running ulimit -a
I get the following:
core file size (blocks, -c) 0data seg size (kbytes, -d) unlimitedscheduling priority (-e) 0file size (blocks, -f) unlimitedpending signals (-i) 4124817max locked memory (kbytes, -l) 262144max memory size (kbytes, -m) unlimitedopen files (-n) 32768pipe size (512 bytes, -p) 8POSIX message queues (bytes, -q) 819200real-time priority (-r) 0stack size (kbytes, -s) 8192cpu time (seconds, -t) unlimitedmax user processes (-u) 62987virtual memory (kbytes, -v) unlimitedfile locks (-x) unlimited
The open files number was increased from 1024 to 32768 and the max locked memory was increased from 65536 to 262144. The issue still persists after the increasing.
I have increased those numbers by editing the /etc/security/limits.conf
file which now contains:
x_project soft nofile 32768x_project - memlock 262144root soft nofile 32768root - memlock 262144
I had to edit this file because setting the limits with the ulimit
command wasn't persisting the changes.
The /etc/security/limits.d/
directory is empty.
cAdvisor observation
A container running cAdvisor was set on the remote server in order to monitor the system's resources' usage by the containers.
When the Issue 1 was ocurring in the server and sunddenly stopped I opened the cAdvisor to see what happened and could check the system's memory usage by the containers:
Between 2:07:45 and 2:08:00 is possible to see that the cyan line decreases. That was the moment in which the issue 1 stopped.
Current situation
There are 24 containers running in the server. They are built with docker compose. The docker compose files doesn't set any memory/CPU limit to the containers. They all seem to work fine. But I'm unable to start/build new containers.
To demonstrate it, those are the results for trying to run docker run hello-world
:
root@h2877818:~# docker run hello-worlddocker: Error response from daemon: failed to create task for container: failed to create shim task: ttrpc: closed: unknown.ERRO[0001] error waiting for container:root@h2877818:~# docker run hello-worlddocker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/724798e971a7ba8d31d6f3f1b900d01152efee06a64adedad7c589afe9054bfd/log.json: no such file or directory): fork/exec /usr/bin/runc: resource temporarily unavailable: unknown.ERRO[0000] error waiting for container:root@h2877818:~# docker run hello-worlddocker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/04d0e67f08bac10bed19c650868af6bfbc27757584bb096846ed1d8c8602bad8/log.json: no such file or directory): runc did not terminate successfully: exit status 2: unknown.ERRO[0000] error waiting for container:
I don't know why this issue is happening. It's complex because I think it might be related to some limit configuration since the server has enough resources for running the current services and I also have been monitoring the resources' usage with htop
and the usages are always low.
Right now when I'm typing this we have the issue 3 happening in the server. It's uncertain when this issue is going to stop and when issues 1-3 are going to happen again.