C/C++ - 当我为单个 int 分配空间时，为什么堆这么大？答案

【问题标题】：C/C++ - Why is the heap so big when I'm allocating space for a single int?C/C++ - 当我为单个 int 分配空间时，为什么堆这么大？
【发布时间】：2014-07-19 05:22:57
【问题描述】：

我目前正在使用 gdb 来查看低级代码的效果。现在我正在做以下事情：

int* pointer = (int*)calloc(1, sizeof(int));

然而，当我在 gdb 中使用 info proc mappings 检查内存时，在我认为是 .text 部分之后看到以下内容（因为 Objfile 显示了我正在调试的二进制文件的名称）：

...
Start Addr    End Addr    Size     Offset    Objfile
0x602000      0x623000    0x21000  0x0       [heap]

我所做的只是为单个 int 分配空间，为什么堆这么大？

最奇怪的是，即使我在做calloc(1000, sizeof(int))，堆的大小仍然保持不变。

PS：我在 x86_64 机器上运行 Ubuntu 14.04。我正在使用 g++ 编译源代码（是的，我知道我不应该在 C++ 中使用 calloc，这只是一个测试）。

【问题讨论】：

当您以后可能很想分配更多空间时，为什么系统要推测性地分配少量内存？
不仅可能需要更多空间，而且很可能您需要更多空间。
@user3688293：堆是一种类似于数组或链表的数据结构，并且和链表一样，它具有下一个元素指针等。这些必须与您的数据一起存储。使用像单个 int 这样的小分配实际上是非常浪费的。
现在，浪费并不能解释为什么你的堆从 132 kB 开始。这与从操作系统请求内存的成本有关。您的 C 库分配器会立即从操作系统中获取大块，以避免经常支付该成本。
Ben 的描述对于大多数实现来说都是典型的。运行时可能实现一个sub-allocator，它通过系统调用获取一个相当大的最小堆块，并用它自己的子分配算法将其划分为更小的要求。大多数体面的运行时都以一种或另一种形式执行此操作，因为系统堆管理历来昂贵。

标签： c++ c gdb heap-memory

【解决方案1】：

我所做的只是为单个 int 分配空间，为什么堆这么大？

我在 Linux 上做了一个简单的测试。当有人调用calloc glibc 时调用 sbrk() 从操作系统获取内存：

(gdb) bt
#0  0x0000003a1d8e0a0a in brk () from /lib64/libc.so.6
#1  0x0000003a1d8e0ad7 in sbrk () from /lib64/libc.so.6
#2  0x0000003a1d87da49 in __default_morecore () from /lib64/libc.so.6
#3  0x0000003a1d87a0aa in _int_malloc () from /lib64/libc.so.6
#4  0x0000003a1d87a991 in malloc () from /lib64/libc.so.6
#5  0x0000003a1d87a89a in calloc () from /lib64/libc.so.6
#6  0x000000000040053a in main () at main.c:6

但glibc 不会要求操作系统准确获取您要求的 4 个字节。 glibc 计算自己的大小。在 glibc 中是这样完成的：

  /* Request enough space for nb + pad + overhead */
  size = nb + mp_.top_pad + MINSIZE;

mp_.top_pad 默认为 128*1024 字节，因此当您要求 4 字节时，系统分配 0x21000 字节的主要原因。

您可以通过调用mallopt 调整mp_.top_pad。这是来自 mallopt 的文档：

M_TOP_PAD

This parameter defines the amount of padding to employ when
calling sbrk(2) to modify the program break.  (The measurement
unit for this parameter is bytes.)  This parameter has an
effect in the following circumstances:

*  When the program break is increased, then M_TOP_PAD bytes
 are added to the sbrk(2) request.

In either case, the amount of padding is always rounded to a
system page boundary.

所以我改变了你的程序并添加了 mallopt：

#include <stdlib.h>
#include <malloc.h>
int main()
{
  mallopt(M_TOP_PAD, 1);
  int* pointer = (int*)calloc(1, sizeof(int));
  return 0;
}

我设置了 1 个字节的填充，根据文档，它必须是 be always rounded to a system page boundary。

这就是 gdb 告诉我的程序：

      Start Addr           End Addr       Size     Offset objfile
        0x601000           0x602000     0x1000        0x0 [heap]

所以现在堆是 4096 字节。正是我页面的大小：

(gdb) !getconf PAGE_SIZE
4096

有用的链接：

http://man7.org/linux/man-pages/man3/mallopt.3.html

【讨论】：

这样做会影响性能吗？我认为如果您设置它，它必须更频繁地调用 sbrk？
我认为可能会对性能产生影响。这是来自文档：Modifying M_TOP_PAD is a trade-off between increasing the number of system calls (when the parameter is set low) and wasting unused memory at the top of the heap (when the parameter is set high).

【解决方案2】：

既然你提到了，C/C++，最好使用下面的构造：

int* pointer = new int(1);

【讨论】：

是的，但是 OP 想知道为什么 calloc 分配这么多内存，而不是如何使代码更好。
这篇文章可能会有所帮助，stackoverflow.com/questions/12490534/…
这并不能解释为什么 int* pointer = (int*)calloc(1, sizeof(int));分配 0x21000 字节
new 不一定比calloc 好。后者可能会指示操作系统延迟对内存块进行零初始化，而最好的办法是访问所有内存并立即将其归零，从而导致一堆页面错误。