【发布时间】:2018-08-02 09:49:54
【问题描述】:
我正在编写并行代码。在我的主函数中,我有一个随着时间推移的循环,并且在开始时我需要使用赋值运算符来复制类。但不知何故,在第 4 步,其中一个处理器发生了双重释放或损坏错误,而其他处理器则正常;以及 std::set 和 set::map 上的错误。下面是部分代码和主循环。
class Mesh
{
public:
const Mesh &operator=(const Mesh &mesh);
std::set<size_t> ghostSet;
std::map<size_t, size_t> localIndex;
}
赋值运算符:
const Mesh &operator=(const Mesh &mesh)
{
std::set<size_t>().swap(ghostSet); ///BUG here
std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here
for(auto const &it : mesh.localIndex)
localIndex[it.first] = it.second;
for(auto const &it : mesh.ghostSet)
ghostSet.insert(it);
return *this;
}
主要功能:
int main(int argc, char *argv[])
{
Mesh ms, ms_gh;
/// Some operation to ms;
for(size_t t = 0; t != 10; t++)
{
/// Some operation to ms;
ms_gh = ms;
/// Some operation to ms_gh;
}
}
#0 0x00002aaab2405207 in raise () from /lib64/libc.so.6
#1 0x00002aaab24068f8 in abort () from /lib64/libc.so.6
#2 0x00002aaab2447cc7 in __libc_message () from /lib64/libc.so.6
#3 0x00002aaab2450429 in _int_free () from /lib64/libc.so.6
#4 0x000000000041bfba in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned long> >::deallocate (this=07fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/ext/new_allocator.h:110
#5 0x000000000041835c in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_put_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:374
#6 0x000000000041276e in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_destroy_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:422
#7 0x000000000040c8ad in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1127
#8 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72f410)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#9 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72b760)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#10 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x70fce0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
#11 0x00000000004080c4 in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::~_Rb_tree (this=0x7fffffff8b50, __in_chrg=<optimized ut>)
at /usr/include/c++/4.8.2/bits/stl_tree.h:671
#12 0x0000000000407bbc in std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> ::~set (this=0x7fffffff8b50,
__in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_set.h:90
#13 0x0000000000405003 in Mesh::operator= (this=0x7fffffffa8a0, mesh=...)
at mesh.cpp:73
#14 0x000000000048eb98 in DynamicMesh::reattach_ghost (mpi_comm=1140850688,
ms=..., cn=..., ms_gh=..., gh=..., cn_gh=..., ale=..., t=4)
at dynamicMesh.cpp:273
在这种情况下,回溯 #13 对应于交换 std::set。
我的问题是为什么这种错误不会出现在第一个时间步,为什么它不会出现在所有处理器上。此外,此错误有时会出现在 std::map 相关行中。
另外,在我的macOS和Linux笔记本上,代码可以成功运行;但它不适用于 HPC。
【问题讨论】:
-
由于
std::set和std::map都有赋值运算符,我建议你使用the rule of zero 并使用编译器默认生成的赋值运算符。 -
"我正在处理并行代码"
std::set和std::map都不是线程安全的。您是否正在同步对这些的访问?因为你的operator=没有同步的迹象。 -
哦,通过不从操作员函数返回任何内容来修复您拥有的 UB。并返回一个非常量引用。
-
您的赋值运算符被声明为返回一个 const 引用,这完全没有意义。此外,其中没有实际的 return 语句。如果您的编译器没有警告您,请考虑切换编译器。与临时交换也没有多大意义,你可以清除()。您可以使用
=运算符分配向量和映射,无需编写自己的(低效)复制循环。实际上,您的班级可以放弃用户定义的分配并使用默认分配,因为那里没有非托管指针。我怀疑这些 ia 是崩溃的原因,只是要记住一些事情。 -
没有迹象表明此问题与 MPI 有关。也许它甚至不在您提供的代码中。请准备一个minimal reproducible example。
标签: c++ mpi assignment-operator