Problem: we want to make sure that a given process cannot consume more than X% CPU.
Some sources have recommended the nice command/syscall to do this, but this seems to be inappropriate as nice changes only the process' scheduling priority, which does not guarantee that the process will use less than some threshold of CPU time. There is also the cpulimit utility by Angelo Marletta, though it now appears to be unmaintained. Taking a brief glance at the code, it seems like cpulimit works by letting the process run for some interval of time before pausing it, waiting and resuming the process. It accomplishes this by using the OS signal mechanism, sending SIGSTOP and SIGCONT signals to the process. According to the manpage, this can occasionally causes weirdness.
It seems like the best choice in 2023 is to use the Linux kernel's control group feature. This is a pseudo-filesystem that allows us to create a hierarchy of processes and attach resource controllers to process groups so as to control access to system resources like CPU, memory, I/O, etc. We can manage cgroup ourselves using standard tools (everything is a file!), but there's also a systemd way to consider. Let's first consider the DIY method.
cgroup comes in two flavors: version 1 and version 2. I'll only deal with version 2. By default, the pseudo-fs is mounted at /sys/fs/cgroup
on most systems. It can be mounted manually with mount -t cgroup2 none MOUNTPOINT
.
You can see by ls
ing that there's a bunch of automatically generated files like cgroup.controllers, cgroup.subtree_control
, and so on. These files tell us useful information about cgroups as well as allowing us to configure them.
To see what resource controllers are available to a cgroup, try cat /sys/fs/cgroup/cgroup.controllers
: you should get something like cpuset cpu io memory hugetlb pids rdma misc
.
Let's make a cgroup: mkdir /sys/fs/cgroup/newgroup
. If you try ls newgroup
, you'll see that like above, it has been filled automatically with various files.
Let's enable the cpu resource controller on the newgroup cgroup. Before we do that, let's ensure that the cpu resource controller is available to newgroup. The result of cat newgroup/cgroup.controllers
should contain cpu somewhere in a space-separated list of controller names.
If it is present, then enable the cpu resource controller by executing echo "+cpu" > cgroup.subtree_control
.
If it isn't present, then check the parent cgroup (one directory up) and ensure that it has the cpu resource controller. Enable access to children cgroups by writing echo "+cpu" > cgroup.subtree_control
. Then enable it in the newgroup cgroup.
Once enabled, we should see the presence of various files like cpu.idle, cpu.max, cpu.max.burst
, and more inside newgroup.
Processes are added to a cgroup simply by writing their pids to cgroup.procs
Let's add our shell to newgroup: run echo $$ > cgroup.procs
.
Now if you cat cgroup.procs
, you'll see two entries: one pid for the shell and another for the cat process.
Finally, let's limit the CPU bandwidth. This is done by specifying how long, in microseconds, all processes in a cgroup may run during one period, also in microseconds. The period is also configurable. First let's see what settings are there by default: cat cpu.max
should give something like max 100000
which suggests that any process in this cgroup can run for the entirety of a period.
Let's limit processes to 10% of this 100000ms period. Execute echo "10000 100000" > cpu.max
.
We can then test it out by running stress-ng --cpu-load 100 --cpu 1
in the shell that's inside the cgroup, and checking htop in another terminal.
When we're done, we can delete a cgroup usign rmdir
. Note that only leaves can be deleted: if the cgroup contains children, then all children must first be removed. Furthermore, the cgroup must not have any processes in it, so you have to write the pids of any processes up to the parent's cgroup.procs
file.
The cgroup system is a powerful Linux kernel feature that forms one of the building blocks of containers (along with namespaces and union filesystems). We can do a lot more with it than just limiting CPU!