在最近的 Linux 内核上测试 msync 使用的好方法是什么？答案

【问题标题】：What is a good way to test the use of msync on recent Linux kernels?在最近的 Linux 内核上测试 msync 使用的好方法是什么？
【发布时间】：2011-07-09 02:42:01
【问题描述】：

我在 Linux 2.6 上的应用程序中使用 msync 以确保发生崩溃时的一致性。我需要彻底测试我对 msync 的使用，但实现似乎正在为我刷新所有相关页面。有没有办法防止将 mmap 页面自动刷新到磁盘上，从而暴露我对 msync 的错误使用？

【问题讨论】：

标签： c linux kernel mmap

【解决方案1】：

向@samold 道歉，“swappiness”与此无关。 Swappiness 只会影响内核在内存不足时如何权衡交换脏匿名页面和驱逐页面缓存页面。

您需要使用Linux VM tunables controlling the pdflush task。对于初学者，我建议：

sysctl -w vm.dirty_writeback_centisecs=360000

默认情况下，vm.dirty_writeback_centisecs 是 3000，这意味着内核会将任何超过 30 秒的脏页视为“太旧”并尝试将其刷新到磁盘。通过将其延长至 1 小时，您应该能够完全避免将脏页刷新到磁盘，至少在短暂的测试期间是这样。除了...

sysctl -w vm.dirty_background_ratio=80

默认情况下，vm.dirty_background_ratio 为 10，即 10%。这意味着当超过 10% 的物理内存被脏页占用时，内核会认为它需要忙于将某些内容刷新到磁盘，即使它比dirty_writeback_centisecs 更年轻。将这个提高到 80 或 90，内核应该愿意容忍大部分 RAM 被脏页占用。（不过，我不会将这个 设置得太高，因为我敢打赌，没有人会这样做，这可能会引发奇怪的行为。）除了...

sysctl -w vm.dirty_ratio=90

默认情况下，vm.dirty_ratio 是 40，这意味着一旦 40% 的 RAM 是脏页，尝试创建更多脏页的进程将阻塞，直到某些东西被驱逐。始终使这个大于dirty_background_ratio。嗯，想一想，把这个放在那个之前，只是为了确保这个总是更大。

这就是我最初的建议。无论如何，您的内核可能会开始逐出页面； Linux VM 是一头神秘的野兽，似乎在每个版本中都会进行调整。希望这提供了一个起点。

请参阅内核源代码中的 Documentation/sysctl/vm.txt 以获取 VM 可调参数的完整列表。（最好参考您实际使用的内核版本的文档。）

最后，使用/proc/PID/pagemap interface随时查看哪些页面实际上是脏的。

【讨论】：

这似乎是对上述问题的回答，但我想我问错了问题。如果没有自动刷新（这是我对最坏情况的假设），我真正想要的是一种查看关闭/打开电源后文件内容的方法。但是，据我了解，即使未刷新页面，更改也可能会反映到其他进程中。那么问题是，我如何获得不包括未刷新更改的文件快照？如果需要的话，我很乐意为 mmap 编写一个虚假的包装器，但我不想重新发明轮子。
好吧，O_DIRECT 会让你绕过页面缓存读取文件，但我不确定它可能对页面缓存有什么影响对...请注意，所有这些读取需要 512 字节对齐（或者是 4096？我忘了）。因此，在这些设置（防止脏页被刷新）、/proc/PID/pagemap（确定哪些页是脏的）和 O_DIRECT（直接从磁盘读取文件数据）之间，您可能能够得到您想要的.

【解决方案2】：

一些猜测：

您可以通过/proc/sys/vm/swappiness 可调参数来调整系统的swappiness：

   /proc/sys/vm/swappiness
          The value in this file controls how aggressively the
          kernel will swap memory pages.  Higher values increase
          agressiveness, lower values descrease aggressiveness.
          The default value is 60.

（哇。proc(5) 需要通过拼写检查器运行。）

如果将swappiness 设置为0 不起作用，则有更多可调旋钮； Documentation/laptops/laptop-mode.txt 文件很好地描述了 laptop_mode 脚本的行为：

To increase the effectiveness of the laptop_mode strategy, the laptop_mode
control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
/proc/sys/vm to about 10 minutes (by default), which means that pages that are
dirtied are not forced to be written to disk as often. The control script also
changes the dirty background ratio, so that background writeback of dirty pages
is not done anymore. Combined with a higher commit value (also 10 minutes) for
ext3 or ReiserFS filesystems (also done automatically by the control script),
this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.

您可能希望将这些数字发挥到极致；如果您真的对应用程序的行为感到好奇，那么将这些值设置得相当高听起来很合理，然后看看sync(1) 命令完成后需要多长时间。但这些是系统范围的可调参数——其他应用程序可能不会这么高兴。

【讨论】：