为什么 getpid 被实现为系统调用？答案

【问题标题】：Why is getpid implemented as a system call?为什么 getpid 被实现为系统调用？
【发布时间】：2017-06-24 12:28:58
【问题描述】：

我正在上我的第一堂操作系统课，所以希望我在这里没有任何大的误解。

我想知道为什么 getpid() 在 Linux 中被实现为系统调用。据我了解，某些函数被制成系统调用，因为它们访问或更改操作系统可能想要保护的信息，因此它们被实现为系统调用，以便将控制权转移到内核。

但据我了解，getpid() 只是返回调用进程的进程 ID。是否存在不授予此信息许可的情况？让 getpid() 成为普通的用户函数不是很安全吗？

感谢您的帮助。

【问题讨论】：

这不是为了保护，只是因为进程内部结构是在内核中实现的，而不是用户态库。

标签： linux operating-system kernel system

【解决方案1】：

在没有系统调用的情况下实现 getpid() 的唯一方法是先执行一个系统调用并缓存其结果。然后每次调用 getpid() 都将返回该缓存值而无需系统调用。

然而，Linux 手册页项目解释了为什么 getpid() 没有被缓存：

   From glibc version 2.3.4 up to and including version 2.24, the glibc
   wrapper function for getpid() cached PIDs, with the goal of avoiding
   additional system calls when a process calls getpid() repeatedly.
   Normally this caching was invisible, but its correct operation relied
   on support in the wrapper functions for fork(2), vfork(2), and
   clone(2): if an application bypassed the glibc wrappers for these
   system calls by using syscall(2), then a call to getpid() in the
   child would return the wrong value (to be precise: it would return
   the PID of the parent process).  In addition, there were cases where
   getpid() could return the wrong value even when invoking clone(2) via
   the glibc wrapper function.  (For a discussion of one such case, see
   BUGS in clone(2).)  Furthermore, the complexity of the caching code
   had been the source of a few bugs within glibc over the years.

   Because of the aforementioned problems, since glibc version 2.25, the
   PID cache is removed: calls to getpid() always invoke the actual
   system call, rather than returning a cached value.

总而言之，如果 getpid() 被缓存，它可能会返回错误的值（即使缓存在不允许任何程序写入的情况下完美地完成，等等......）并且它是错误的来源过去。

通常您只需要在任何进程中调用一次 getpid()，如果您多次使用结果，请将其保存在变量中（应用程序级缓存！）。

干杯！

【讨论】：

【解决方案2】：

Getpid() 可能只是从某个位置读取，但必须有人写入该位置。为了提供从向操作系统使用的位置写入垃圾的任何旧进程，需要保护它免受用户模式访问。为了让应用程序访问该位置，它需要在内核模式下进行。因此，它必须作为系统调用来完成。

【讨论】：

【解决方案3】：

我认为将pid 暴露给进程没有任何安全问题。进程地址空间隔离由操作系统强制执行。如果我没记错的话，对getpid() 的第一次调用是系统调用，但以后对getpid() 的调用会被缓存（可能是libc）并在本地处理。

【讨论】：

【解决方案4】：

正如其他答案所解释的，进程的PID 是内核的内部数据，用户空间的进程必须通过系统调用访问它，否则，它就有被恶意写入的风险。

但是，有一个错误的假设必须纠正：

getpid()只是返回调用进程的进程ID。

事实上，PID 比我们预期的要复杂得多，原因有两个：

命名空间。它是 Docker 等容器技术的关键基础之一。
线程组、进程组和会话组。

【讨论】：