限制进程组的 CPU 时间答案

【问题标题】：Limit CPU time of process group限制进程组的 CPU 时间
【发布时间】：2017-08-07 20:27:47
【问题描述】：

有没有办法限制在进程组中花费的绝对 CPU 时间（以 CPU 秒为单位）？

ulimit -t 10; ./my-process 看起来是个不错的选择，但如果my-process 分叉，那么进程组中的每个进程都会有自己的限制。通过每 9 秒分叉一次，整个进程组可以使用任意时间。

similar question 上接受的答案是使用 cgroups，但没有解释如何使用。但是，还有其他答案 (Limit total CPU usage with cgroups) 说这在 cgroups 中是不可能的，并且只能限制相对的 cpu 使用率（例如，每 1 秒中有 0.2 秒）。

Liran Funaro 建议对cpu.cfs_period_us (https://stackoverflow.com/a/43660834/892961) 使用较长的时间段，但配额的参数最多为 1 秒。所以即使有很长一段时间，我也看不出如何将 CPU 时间限制设置为 10 秒或一个小时。

如果ulimit 和 cgroups 都无法做到这一点，还有其他方法吗？

【问题讨论】：

标签： linux time limit ulimit cgroups

【解决方案1】：

你可以用 cgroups 做到这一点。以 root 身份执行：

# Create cgroup
cgcreate -g cpu:/limited

# set shares (cpu limit)
cgset -r cpu.shares=256 limited

# run your program
cgexec -g cpu:limited /my/hungry/program

您也可以使用cpulimit 程序，它可以定期冻结您的代码。 cgroups 是最先进的方法。

设置固定的cpu份额：

cgcreate -g cpu:/fixedlimit
# allow fix 25% cpu usage (1 cpu)
cgset -r cpu.cfs_quota_us=25000,cpu.cfs_period_us=100000 fixedlimit
cgexec -g cpu:fixedlimit /my/hungry/program

事实证明，我们的目标是在测量时将运行时间限制为特定秒数。设置所需的 cgroup 限制（为了获得公平的沙箱）后，您可以通过运行来实现此目标：

((time -p timeout 20 cgexec -g cpu:fixedlimit /program/to/test ) 2>&1) | grep user

20 秒后程序无论如何都会停止，我们可以解析用户时间（或系统时间或实时时间）来评估它的性能。

【讨论】：

根据文档shares 是一个相对度量。因此，如果我想将我的程序限制为 10 CPU 秒，设置 cpu.shares=10 不会在 10 秒后杀死它。它只会确保我的程序将获得 cpu.shares=100 程序的 10% 的 CPU 时间。我看错了吗？
当然。我的目标是“如何”部分 :) 当然，如果更适合您的需要，您可以使用 cpu.cfs_quota_us + cpu.cfs_period_us 对。
如果它不是您所需要的，您能稍微澄清一下您的主要目标吗？你想达到什么目标？
我想要一个绝对的 CPU 时间限制，例如“运行这个程序 10 秒”而不是“给这个程序 50% 的可用 CPU 时间”这样的相对限制。基本目标是评估编程竞赛。 “你的程序在CPU上有10秒的时间来解决这个问题，你能及时完成吗？”我可以将每个程序运行更长时间（例如 20 秒挂钟时间）并在之后进行测量/检查，但这似乎非常间接，并且当进程中断超过 10 秒以等待繁忙的硬盘时会导致问题。
时间限制的重点是停止运行时间过长的任务。 time 仅在程序终止后进行测量。如果我们在某个时候不阻止它们，程序实际上可以在本次比赛中运行 CPU 年。或者一个错误可能导致程序进入无限循环并且永远不会停止。至于公平性：比赛规则是否像现在这样好并不是这个问题的真正部分，但我认为它们足够公平。一切都在相同的硬件上运行，我们只比较彼此的结果，而不是在其他硬件上进行实验的结果。

【解决方案2】：

这不是直接回答问题，而是指对OP实际需要的讨论。

如果您的竞争对手忽略了除 CPU 时间之外的所有内容，那么它可能存在根本性缺陷。例如，可以简单地将结果缓存在主存储设备中。由于您不计算存储访问时间，它可能具有最少的 CPU 周期，但实际性能更差。一个完美的犯罪是简单地通过互联网将数据发送到另一台计算机，计算机计算任务然后返回答案。这将以看似零周期的方式完成任务。您实际上想要测量“实时”时间，并在您的系统中赋予此进程最高优先级（或实际运行它）。

在检查学生的家庭作业时，我们只是使用了一个不切实际的时间限制（例如，本应为 10 秒的程序设置为 5 分钟），然后如果进程没有及时完成并导致提交失败，则终止该进程。

如果您想选出获胜者，只需多次重新运行最好的竞争对手，以确保其结果的有效性。

【讨论】：

感谢您的回复。比赛现在已经结束了，所以这是以防万一有人发现这个有类似的问题。我们限制了程序写入的文件的数量和大小，并禁止互联网访问。我们对挂钟时间也有更高的限制（见我的回答），并研究了运行达到挂钟限制而不是 CPU 时间限制的情况。尽管在我们的例子中限制更严格，因为在 400 CPU 集群上运行所有竞争对手大约需要一周时间，所以我们无法承受 30 倍的限制（5 分钟而不是 10 秒）或重新运行多个竞争对手。

【解决方案3】：

我找到了适合我的解决方案。它仍然远非完美（使用前请阅读注意事项）。我对 bash 脚本有些陌生，因此欢迎任何有关此的 cmets。

#!/bin/bash
#
# This script tries to limit the CPU time of a process group similar to
# ulimit but counting the time spent in spawned processes against the
# limit. It works by creating a temporary cgroup to run the process in
# and checking on the used CPU time of that process group. Instead of
# polling in regular intervals, the monitoring process assumes that no
# time is lost to I/O (i.e., wall clock time = CPU time) and checks in
# after the time limit. It then updates its assumption by comparing the
# actual CPU usage to the time limit and waiting again. This is repeated
# until the CPU usage exceeds its limit or the monitored process
# terminates. Once the main process terminates, all remaining processes
# in the temporary cgroup are killed.
#
# NOTE: this script still has some major limitations.
# 1) The monitored process can exceed the limit by up to one second
#    since every iteration of the monitoring process takes at least that
#    long. It can exceed the limit by an additional second by ignoring
#    the SIGXCPU signal sent when hitting the (soft) limit but this is
#    configurable below.
# 2) It assumes there is only one CPU core. On a system with n cores
#    waiting for t seconds gives the process n*t seconds on the CPU.
#    This could be fixed by figuring out how many CPUs the process is
#    allowed to use (using the cpuset cgroup) and dividing the remaining
#    time by that. Since sleep has a resolution of 1 second, this would
#    still introduce an error of up to n seconds.


set -e

if [ "$#" -lt 2 ]; then
    echo "Usage: $(basename "$0") TIME_LIMIT_IN_S COMMAND [ ARG ... ]"
    exit 1
fi
TIME_LIMIT=$1
shift

# To simulate a hard time limit, set KILL_WAIT to 0. If KILL_WAIT is
# non-zero, TIME_LIMIT is the soft limit and TIME_LIMIT + KILL_WAIT is
# the hard limit.
KILL_WAIT=1

# Update as necessary. The script needs permissions to create cgroups
# in the cpuacct hierarchy in a subgroup "timelimit". To create it use:
#   sudo cgcreate -a $USER -t $USER -g cpuacct:timelimit
CGROUPS_ROOT=/sys/fs/cgroup
LOCAL_CPUACCT_GROUP=timelimit/timelimited_$$
LOCAL_CGROUP_TASKS=$CGROUPS_ROOT/cpuacct/$LOCAL_CPUACCT_GROUP/tasks

kill_monitored_cgroup() {
    SIGNAL=$1
    kill -$SIGNAL $(cat $LOCAL_CGROUP_TASKS) 2> /dev/null
}

get_cpu_usage() {
    cgget -nv -r cpuacct.usage $LOCAL_CPUACCT_GROUP
}

# Create a cgroup to measure the CPU time of the monitored process.
cgcreate -a $USER -t $USER -g cpuacct:$LOCAL_CPUACCT_GROUP


# Start the monitored process. In case it fails, we still have to clean
# up, so we disable exiting on errors.
set +e
(
    set -e
    # In case the process doesn't fork a ulimit is more exact. If the
    # process forks, the ulimit still applies to each child process.
    ulimit -t $(($TIME_LIMIT + $KILL_WAIT))
    ulimit -S -t $TIME_LIMIT
    cgexec -g cpuacct:$LOCAL_CPUACCT_GROUP --sticky $@
)&
MONITORED_PID=$!

# Start the monitoring process
(
    REMAINING_TIME=$TIME_LIMIT
    while [ "$REMAINING_TIME" -gt "0" ]; do
        # Wait $REMAINING_TIME seconds for the monitored process to
        # terminate. On a single CPU the CPU time cannot exceed the
        # wall clock time. It might be less, though. In that case, we
        # will go through the loop again.
        sleep $REMAINING_TIME
        CPU_USAGE=$(get_cpu_usage)
        REMAINING_TIME=$(($TIME_LIMIT - $CPU_USAGE / 1000000000))
    done

    # Time limit exceeded. Kill the monitored cgroup.
    if  [ "$KILL_WAIT" -gt "0" ]; then
        kill_monitored_cgroup XCPU
        sleep $KILL_WAIT
    fi
    kill_monitored_cgroup KILL
)&
MONITOR_PID=$!

# Wait for the monitored job to exit (either on its own or because it
# was killed by the monitor).
wait $MONITORED_PID
EXIT_CODE=$?

# Kill all remaining tasks in the monitored cgroup and the monitor.
kill_monitored_cgroup KILL
kill -KILL $MONITOR_PID 2> /dev/null
wait $MONITOR_PID 2>/dev/null

# Report actual CPU usage.
set -e
CPU_USAGE=$(get_cpu_usage)
echo "Total CPU usage: $(($CPU_USAGE / 1000000))ms"

# Clean up and exit with the return code of the monitored process.
cgdelete cpuacct:$LOCAL_CPUACCT_GROUP
exit $EXIT_CODE

【讨论】：