从 C++11 线程安全地向向量添加元素答案

【问题标题】：Adding elements to vectors safely from C++11 threads从 C++11 线程安全地向向量添加元素
【发布时间】：2017-12-15 19:45:45
【问题描述】：

我的程序需要生成大量样本字符串，并且由于生成字符串是计算密集型的，因此我想并行化该过程。我的代码是这样的：

mutex mtx;

void my_thread(vector<string> &V, int length)
{
     string s=generate_some_string(length);  //computationally intensive part

      mtx.lock();
       V.push_back(s);
      mtx.unlock();


}

int main()
{
   vector<string> S;

   while(S.size()<1000)
  {
    vector<thread> ths;
    ths.resize(10);

    for(int i=0; i<10;i++)
    {
       ths[i]=thread(my_thread,ref(S),100 );
    }

    for(auto &th: ths)  th.join();


  }  


}

我在运行它时收到“双重释放或损坏”错误。

【问题讨论】：

generate_some_string 丢失。我相信你的问题就在那里。您对 mutex 的使用是细粒度的、异常不安全的，并且可能是性能瓶颈，但没有错。
拥有一个功能齐全的代码将使这更容易帮助。
同意实际的向量访问似乎得到了很好的保护。不过，使用 std::future 的数组可能更容易。
该函数来自一个科学的 C 库，我不知道它的内部工作原理。它可能使用非自动内存分配。我没想到这可能是原因。
那你可能需要一个释放器？但是，错误听起来像是被释放了两次

标签： c++ multithreading c++11 vector memory-management

【解决方案1】：

您的代码

您对线程的使用通常看起来是正确的，因此问题很可能是generate_some_string 影响了全局状态。您可以通过以下任一方式解决此问题：

使用更好的库。
使用 MPI 进行并行处理，因为它将产生具有独立内存的进程。

并行哲学

回想起来，上述内容似乎很明显，所以有一个问题是为什么它没有立即显现出来。我认为这与您实现并行性的方式有关。

C++11 线程为您提供了很大的灵活性，但它也要求您明确地构建并行性。大多数情况下，这不是你想要的。向编译器提供有关它如何并行化您的代码并让其处理低级细节的信息会更容易且错误更少。

下面展示了如何使用 OpenMP 来实现这一点：一套行业标准的编译器指令集，包含在所有现代编译器中，并广泛用于高性能计算。

您会注意到代码通常比您编写的代码更易于阅读，因此也更易于调试。

以下所有代码都将使用命令编译（针对您的编译器进行适当修改：

g++ -O3 main.cpp -fopenmp

解决方案 0：使用更简单的并行形式

首先，我建议您使用 OpenMP 来实现并行性。它是一个行业标准，消除了处理线程的大部分痛苦，并允许您在概念级别表达并行性。

解决方案 1：私有内存

您可以通过让每个线程写入自己的私有内存然后将私有内存合并在一起来解决您的问题。这完全避免了互斥体，这可能会导致更快的代码并可能完全避免您遇到的问题。

请注意，每个线程都生成多个计算密集型字符串，但这项工作会自动在可用线程之间分配。这是

#include <vector>
#include <string>
#include <omp.h>
#include <cmath>
#include <thread>
#include <chrono>
#include <iostream>

const int STRINGS_PER_LENGTH = 10;
const int MAX_STRING_LENGTH  = 50;

using namespace std::chrono_literals;

//Computationally intensive string generation. Note that this function
//CANNOT have a global state, or the threads will maul it.
std::string GenerateSomeString(int length){
  double sum=0;
  for(int i=0;i<length;i++){
    std::this_thread::sleep_for(2ms);
    sum+=std::sqrt(i);
  }
  return std::to_string(sum);
}

int main(){
  //Build a vector that contains vectors of strings. Each thread will have its
  //own vector of strings
  std::vector< std::vector<std::string> > vecs(omp_get_max_threads());

  //Loop over lengths
  for(int length=10;length<MAX_STRING_LENGTH;length++){
    //Progress so the user does not get impatient
    std::cout<<length<<std::endl;
    //Parallelize across all cores
    #pragma omp parallel for
    for(int i=0;i<STRINGS_PER_LENGTH;i++){
      //Each thread independently generates its string and puts it into its own
      //private memory space
      vecs[omp_get_thread_num()].push_back(GenerateSomeString(length));
    }
  }

  //Merge all the threads' results together
  std::vector<std::string> S;
  for(auto &v: vecs)
    S.insert(S.end(),v.begin(),v.end());

  //Throw away the thread private memory
  vecs.clear();
  vecs.shrink_to_fit();
}

解决方案 2：减少使用量

我们可以定义一个自定义归约运算符来合并向量。在我们代码的并行部分中使用这个运算符可以让我们消除向量的向量和之后的清理。相反，随着线程完成它们的工作，OpenMP 会安全地处理它们的结果组合。

#include <vector>
#include <string>
#include <omp.h>
#include <cmath>
#include <thread>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;

const int STRINGS_PER_LENGTH = 10;
const int MAX_STRING_LENGTH  = 50;    

//Computationally intensive string generation. Note that this function
//CANNOT have a global state, or the threads will maul it.
std::string GenerateSomeString(int length){
  double sum=0;
  for(int i=0;i<length;i++){
    std::this_thread::sleep_for(2ms);
    sum+=std::sqrt(i);
  }
  return std::to_string(sum);
}

int main(){
  //Global vector, must not be accessed by individual threads
  std::vector<std::string> S;

  #pragma omp declare reduction (merge : std::vector<std::string> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))

  //Loop over lengths
  for(int length=10;length<50;length++){
    //Progress so the user does not get impatient
    std::cout<<length<<std::endl;
    //Parallelize across all cores
    std::vector<std::string> private_memory;
    #pragma omp parallel for reduction(merge: private_memory)
    for(int i=0;i<STRINGS_PER_LENGTH;i++){
      //Each thread independently generates its string and puts it into its own
      //private memory space
      private_memory.push_back(GenerateSomeString(length));
    }
  }
}

解决方案 3：使用critical

我们可以通过将push_back 放入一个临界区来完全消除这种减少，这会将对该部分代码的访问限制为一次只能访问一个线程。

//Compile with g++ -O3 main.cpp -fopenmp
#include <vector>
#include <string>
#include <omp.h>
#include <cmath>
#include <thread>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;

const int STRINGS_PER_LENGTH = 10;
const int MAX_STRING_LENGTH  = 50;    

//Computationally intensive string generation. Note that this function
//CANNOT have a global state, or the threads will maul it.
std::string GenerateSomeString(int length){
  double sum=0;
  for(int i=0;i<length;i++){
    std::this_thread::sleep_for(2ms);
    sum+=std::sqrt(i);
  }
  return std::to_string(sum);
}

int main(){
  //Global vector, must not be accessed by individual threads
  std::vector<std::string> S;

  //Loop over lengths
  for(int length=10;length<50;length++){
    //Progress so the user does not get impatient
    std::cout<<length<<std::endl;
    //Parallelize across all cores
    #pragma omp parallel for
    for(int i=0;i<STRINGS_PER_LENGTH;i++){
      //Each thread independently generates its string and puts it into its own
      //private memory space
      const auto temp = GenerateSomeString(length);
      //Only one thread can access this part of the code at a time
      #pragma omp critical
      S.push_back(temp);
    }
  }
}

【讨论】：

使用早于 C++11 的 API 来解决 c++11 问题似乎不是一个好方法。
@GemTaylor：我不确定我是否理解您的评论。我正在使用行业标准的多核并行技术，该技术对所有当前编译器都是标准的，并且在高性能计算中受到好评并广泛使用。它的最新版本是在 C++11 之后。
C++11 线程为您提供了极大的灵活性，但 OP 似乎不需要。结果是不必要的痛苦。无论如何，我首先讨论的技术适用于 C++11 线程，OP 将不得不承受使用它们的痛苦。
问题不在向量的同步上。
@FrançoisAndrieux：我将 OP 解释为将并行性和线程与问题相关联。如果使用复杂的、用户指定的并行性使得发现真正的问题变得更加困难，那么并行性就是问题所在。