【问题标题】:pybind11 - Identify and remove memory leak in C++ wrapperpybind11 - 识别并消除 C++ 包装器中的内存泄漏
【发布时间】:2021-07-10 14:43:28
【问题描述】:

我有一个简单的 C++ 函数,我试图用 pybind11 包装(KMAC library 中的 ehvi3d_sliceupdate 函数)。它处于循环的深处,在我的 Python 模块中被调用了几十万到一百万次。不幸的是,它似乎正在泄漏内存(约 700k 调用后 12+GB),我不确定原因可能是什么。我编译的包装器如下所示:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <iostream>
#include "helper.h"
#include "ehvi_calculations.h"
#include "ehvi_sliceupdate.h"

namespace py = pybind11;


// Copied from main.cc
//Checks if p dominates P. Removes points dominated by p from P and return the number of points removed.
int checkdominance(deque<individual*> & P, individual* p){
  int nr = 0;
  for (int i=P.size()-1;i>=0;i--){
    if (p->f[0] >= P[i]->f[0] && p->f[1] >= P[i]->f[1] && p->f[2] >= P[i]->f[2]){
      cerr << "Individual " << (i+1) << " is dominated or the same as another point; removing." << endl;
      P.erase(P.begin()+i);
      nr++;
    }
  }
  return nr;
}


// Wrap the ehvi3d_sliceupdate function - not sure how to pass straight in
double wrap_ehvi3d_sliceupdate(py::array_t<double> y_par, py::array_t<double> ref_point, py::array_t<double> mean_vector, py::array_t<double> std_dev) {

  deque<individual*> nd_samples;

  // Get y_par and feed by individual via numpy direct access
  // https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html
  auto yp = y_par.unchecked<2>(); // y_par must have ndim = 2
  
  for (py::ssize_t i = 0; i < yp.shape(0); i++) {
    individual * tempvidual = new individual;
    tempvidual->f[0] = yp(i, 0);
    tempvidual->f[1] = yp(i, 1);
    tempvidual->f[2] = yp(i, 2);
    // cerr << i << ": " << yp(i, 0) << " " << yp(i, 1) << " " << yp(i, 2) << endl;
    checkdominance(nd_samples, tempvidual);
    nd_samples.push_back(tempvidual);
  }

  // Marshall ref_point, mean_vector, and std_dev into an array
  // (might be better ways to do this..)
  auto rp = ref_point.unchecked<1>(); // ref_point must have ndim = 1, len 3
  double r [] = {rp(0), rp(1), rp(2)};
  
  auto mv = mean_vector.unchecked<1>(); // mean_vector must have ndim = 1, len 3
  double mu [] = {mv(0), mv(1), mv(2)};

  auto sd = std_dev.unchecked<1>(); // std_dev must have ndim = 1, len 3
  double s [] = {sd(0), sd(1), sd(2)};
  
  double hvi = ehvi3d_sliceupdate(nd_samples, r, mu, s);
  return hvi;
  }

  
PYBIND11_MODULE(kmac, m) {
    // module docstring
    m.doc() = "EHVI using KMAC";

    // definie EHVI slice update function
    m.def("ehvi3d_sliceupdate", &wrap_ehvi3d_sliceupdate, "O(n^3) slice-update scheme for calculating the EHVI.");
    
}

可能有一种更简单的方法来包装它,因为我只是拼凑了我从 pybind11 文档和这里的 SO 中找到的位。我对 C++ 不是很熟悉,所以我可能犯了一些其他令人发指的编码错误,使用我有限的知识创建数组或传递指针。我是否在创建每次都需要清理的东西?起初,我想我可能需要像 thisthis 之前的帖子那样包含我的 numpy 数组,但我只返回一个双精度数,因此在 python 端没有要处理的 numpy 数组。


编辑

我尝试将 tempvidual 块更改为使用堆内存(我相信它被称为?),因为我读到它会自行清理,方法是:

    individual tempvidual;
    tempvidual.f[0] = yp(i, 0);
    tempvidual.f[1] = yp(i, 1);
    tempvidual.f[2] = yp(i, 2);
    checkdominance(nd_samples, &tempvidual);
    nd_samples.push_back(&tempvidual);

在最后返回hvi 之前,我尝试在返回python 之前添加nd_samples.clear(); 以清除deque,但每次调用包装器时我仍然会增加内存。还有什么需要清理的吗?


编辑 2

所以问题的一部分是库本身,根据valgrind,每次调用泄漏大约 4kb。感谢@ajum 对pybind11 gitter 的惊人帮助(并大声疾呼),他实际上指导我重构了大部分代码以使用shared_ptrmake_shared 而不是修复库中所有泄漏的原始指针.这也需要对包装器进行小的更新,见下文。不幸的是,即使使用无泄漏(我认为)库和更新的包装器,我也会收到以下报告:

==1932812== LEAK SUMMARY:
==1932812==    definitely lost: 676 bytes in 1 blocks
==1932812==    indirectly lost: 0 bytes in 0 blocks
==1932812==      possibly lost: 145,291 bytes in 80 blocks
==1932812==    still reachable: 1,725,888 bytes in 1,013 blocks

比以前少了,但我不知道是什么原因造成的。

包装器的编辑部分:

// Copied from main.cc
//Checks if p dominates P. Removes points dominated by p from P and return the number of points removed.
int checkdominance(deque<shared_ptr<individual>> & P, shared_ptr<individual> p){
  int nr = 0;
  for (int i=P.size()-1;i>=0;i--){
    if (p->f[0] >= P[i]->f[0] && p->f[1] >= P[i]->f[1] && p->f[2] >= P[i]->f[2]){
      cerr << "Individual " << (i+1) << " is dominated or the same as another point; removing." << endl;
      P.erase(P.begin()+i);
      nr++;
    }
  }
  return nr;
}


// Wrap the ehvi3d_sliceupdate function - not sure how to pass straight in
double wrap_ehvi3d_sliceupdate(py::array_t<double> y_par, py::array_t<double> ref_point, py::array_t<double> mean_vector, py::array_t<double> std_dev) {

  // deque<individual*> nd_samples;
  deque<shared_ptr<individual>> nd_samples;
  
  // Get y_par and feed by individual via numpy direct access
  // https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html
  auto yp = y_par.unchecked<2>(); // y_par must have ndim = 2
  
  for (py::ssize_t i = 0; i < yp.shape(0); i++) {
    auto tempvidual = make_shared<individual>();
    // individual * tempvidual = new individual;
    tempvidual->f[0] = yp(i, 0);
    tempvidual->f[1] = yp(i, 1);
    tempvidual->f[2] = yp(i, 2);
    // cerr << i << ": " << yp(i, 0) << " " << yp(i, 1) << " " << yp(i, 2) << endl;
    // cerr << i << ": " << tempvidual->f[0] << " " << tempvidual->f[1] << " " << tempvidual->f[2] << endl;
    checkdominance(nd_samples, tempvidual);
    nd_samples.push_back(tempvidual);
  }

  // Marshall ref_point, mean_vector, and std_dev into an array
  // (might be better ways to do this..)
  auto rp = ref_point.unchecked<1>(); // ref_point must have ndim = 1, len 3
  double r [] = {rp(0), rp(1), rp(2)};
  
  auto mv = mean_vector.unchecked<1>(); // mean_vector must have ndim = 1, len 3
  double mu [] = {mv(0), mv(1), mv(2)};

  auto sd = std_dev.unchecked<1>(); // std_dev must have ndim = 1, len 3
  double s [] = {sd(0), sd(1), sd(2)};
  
  double hvi = ehvi3d_sliceupdate(nd_samples, r, mu, s);
  
  return hvi;
  }

在 python 测试脚本上运行valgrind 的输出中,我无法确定问题所在。带有definitely lost 块的输出摘录如下所示:

==1932812== 676 bytes in 1 blocks are definitely lost in loss record 212 of 485
==1932812==    at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==1932812==    by 0x2D595F: _PyMem_RawWcsdup (obmalloc.c:592)
==1932812==    by 0x166786: _PyCoreConfig_Copy.cold (main.c:2535)
==1932812==    by 0x34C4C7: _Py_InitializeCore (pylifecycle.c:850)
==1932812==    by 0x34CCB3: pymain_init (main.c:3041)
==1932812==    by 0x3503EB: pymain_main (main.c:3063)
==1932812==    by 0x35085B: _Py_UnixMain (main.c:3103)
==1932812==    by 0x5A137B2: (below main) (in /usr/lib64/libc-2.28.so)
==1932812== 
==1932812== 688 bytes in 1 blocks are possibly lost in loss record 214 of 485
==1932812==    at 0x4C33419: realloc (vg_replace_malloc.c:834)
==1932812==    by 0x21E8F8: _PyObject_GC_Resize (gcmodule.c:1758)
==1932812==    by 0x2345DA: UnknownInlinedFun (frameobject.c:726)
==1932812==    by 0x2345DA: UnknownInlinedFun (call.c:272)
==1932812==    by 0x2345DA: _PyFunction_FastCallKeywords (call.c:408)
==1932812==    by 0x2979C7: call_function (ceval.c:4616)
==1932812==    by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812==    by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812==    by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812==    by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812==    by 0x2979C7: call_function (ceval.c:4616)
==1932812==    by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812==    by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812==    by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812==    by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812==    by 0x2979C7: call_function (ceval.c:4616)
==1932812==    by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812==    by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812==    by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812==    by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812== 
==1932812== 1,056 bytes in 2 blocks are possibly lost in loss record 350 of 485
==1932812==    at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==1932812==    by 0x221130: UnknownInlinedFun (obmalloc.c:520)
==1932812==    by 0x221130: UnknownInlinedFun (obmalloc.c:1584)
==1932812==    by 0x221130: UnknownInlinedFun (obmalloc.c:1576)
==1932812==    by 0x221130: UnknownInlinedFun (obmalloc.c:633)
==1932812==    by 0x221130: UnknownInlinedFun (gcmodule.c:1693)
==1932812==    by 0x221130: UnknownInlinedFun (gcmodule.c:1715)
==1932812==    by 0x221130: _PyObject_GC_NewVar (gcmodule.c:1744)
==1932812==    by 0x2344F2: UnknownInlinedFun (frameobject.c:713)
==1932812==    by 0x2344F2: UnknownInlinedFun (call.c:272)
==1932812==    by 0x2344F2: _PyFunction_FastCallKeywords (call.c:408)
==1932812==    by 0x2979C7: call_function (ceval.c:4616)
==1932812==    by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812==    by 0x206EAC: UnknownInlinedFun (ceval.c:547)
==1932812==    by 0x206EAC: UnknownInlinedFun (call.c:283)
==1932812==    by 0x206EAC: _PyFunction_FastCallDict (call.c:322)
==1932812==    by 0x20F1BA: UnknownInlinedFun (call.c:98)
==1932812==    by 0x20F1BA: object_vacall (call.c:1200)
==1932812==    by 0x28E2E6: _PyObject_CallMethodIdObjArgs (call.c:1250)
==1932812==    by 0x1FC4A6: UnknownInlinedFun (import.c:1652)
==1932812==    by 0x1FC4A6: PyImport_ImportModuleLevelObject (import.c:1764)
==1932812==    by 0x2C069F: UnknownInlinedFun (ceval.c:4770)
==1932812==    by 0x2C069F: _PyEval_EvalFrameDefault (ceval.c:2600)
==1932812==    by 0x205AF1: UnknownInlinedFun (ceval.c:547)
==1932812==    by 0x205AF1: _PyEval_EvalCodeWithName (ceval.c:3930)
==1932812==    by 0x206D08: PyEval_EvalCodeEx (ceval.c:3959)

这是由于pybind11 本身还是我称呼它的方式?

附:不确定添加编辑或用(长)更新替换原始问题是否是这样的风格。谢谢!

【问题讨论】:

  • 您不再使用new 的重写代码不应再泄漏。泄漏还严重吗?
  • 嗨@nanofarad - 事实证明库本身非常泄漏!我已经做了一些修复,我会在问题中添加更多信息。不幸的是,在应用 pybind11 包装器后它仍然会泄漏。
  • 我现在已经意识到,现在库是无泄漏的,我不再使用new,看起来如果我真的通过在while True 中从python 调用包装器来运行测试循环,内存使用是稳定的(或者至少没有以明显的速度增长)。使用python 运行时,valgrind 的输出可能是虚假的,但现在问题似乎已得到解决。我会相应地编辑问题。

标签: python c++ numpy pybind11


【解决方案1】:

事实证明,在可能的情况下删除了new 的使用(并在没有使用的地方添加了delete),并在基本库和包装器中将所有原始指针替换为make_sharedshared_ptr解决了这个问题。一旦变量超出范围,使用这些原始指针似乎会自动释放内存(知识渊博的 C++ 用户可以在 cmets 中纠正我。)

这对于 C++ 编码人员来说可能是基本的/显而易见的,但对于非 C++ 用户/初学者(以及我的记录,如果我忘记了),解决方法是:

//Change declarations like these:
// vector<mus*> pdf;
vector<shared_ptr<mus>> pdf;
// mus * tempmus = new mus;
auto tempmus = make_shared<mus>();
// newind = new specialind;
auto newind = make_shared<specialind>();
// deque<specialind*> Px, Py, Pz; 
deque<shared_ptr<specialind>> Px, Py, Pz;

// Replace function signatures and headers like this
// int checkdominance(deque<individual*> & P, individual* p);
int checkdominance(deque<shared_ptr<individual>> & P, shared_ptr<individual> p);

// Parts of structs like this
struct specialind{
  // individual *point;
  std::shared_ptr<individual> point;
};

// Couldn't figure out how to change this one to remove new as it was needed in a later scope... 
// Added delete at the end after it looked like it wasn't needed
Pstruct = new thingy[n*n];
// ...
delete [] Pstruct;  // Addded this at the end.

在这样做的过程中,我最初遇到了很多分段错误。我可以通过使用 SO post 来追踪导致它们的行。

虽然调用valgrind --leak-check=full --track-origins=yes python test.py 导致EDIT 2 泄漏消息,其中test.py 只是一个简单的循环(加上输入numpy ndarrays):

while True:
    hvi = kmac.ehvi3d_sliceupdate(dat, ref_point, mean_vector, std_dev)

--实际上看起来内存消耗是稳定的并且不再增长。 (我不确定为什么会有来自valgrind 的虚假消息,但它们似乎在运行期间并没有明显影响内存。)现在我可以运行python test.py 几分钟,它保持稳定在 15 MB 左右.

感谢pybind11 和 Adam Thompson 向我介绍了基础知识。

【讨论】:

  • 一个小技巧——如果您不需要真正的共享语义,您可以考虑在某些地方使用unique_ptr 以获得更好的性能,尽管您可能需要在某些地方使用 std::move 作为它们是不可复制的。
  • @nanofarad 谢谢!我将不得不检查如何做到这一点。它是shared_ptrmake_shared 的替代品吗? (或者它们是同一个东西?)
  • 它是shared_ptr的替代品;从 C++14 开始有一个 make_shared,否则它将类似于 std::unique_ptr&lt;individual&gt;(new individual)。关于 unique_ptr 的教程应该有准确的语法。
  • @nanofarad 谢谢!如果我以后再回到 C++,那将是我要提防的事情!
猜你喜欢
  • 2015-02-17
  • 2021-03-30
  • 2017-07-25
  • 2013-03-19
  • 2015-12-12
  • 2012-11-27
  • 2011-03-21
  • 2011-03-28
相关资源
最近更新 更多