提高驱动程序的时间效率答案

【问题标题】：Improve Time Efficiency of Driver Program提高驱动程序的时间效率
【发布时间】：2020-01-25 18:38:39
【问题描述】：

抱歉，标题含糊不清。本质上，我正在尝试批准 C++ 驱动程序的时间（和整体）效率：

使用 ifstream 逐行读取文件
单独处理行对我的程序至关重要，因此我目前有 4 次单独调用 getline。
程序使用字符串流将字符串行读入整数向量。
最后，它将向量转换为整数链表。有没有办法或者函数可以直接将文件中的整数读入整数的ll中？

这里是驱动代码：

int main(int argc, char *argv[])
{
    ifstream infile(argv[1]);

    vector<int> vals_add;
    vector<int> vals_remove;

    //Driver Code
    if(infile.is_open()){

        string line;
        int n;
        getline(infile, line);
        istringstream iss (line);


        getline(infile, line);
        istringstream iss2 (line);
        while (iss2 >> n){
            vals_add.push_back(n);
        }

        getline(infile, line);
        istringstream iss3 (line);

        getline(infile, line);
        istringstream iss4 (line);
        while (iss4 >> n){
            vals_remove.push_back(n);
        }


        int array_add[vals_add.size()];
        copy(vals_add.begin(), vals_add.end(), array_add);


        int array_remove[vals_remove.size()];
        copy(vals_remove.begin(), vals_remove.end(), array_remove);



        Node *ptr = CnvrtVectoList(array_add, sizeof(array_add)/sizeof(int));
        print(ptr);
        cout << "\n";

        for(int i = 0; i < vals_remove.size(); i++){
           deleteNode(&ptr, vals_remove[i]);
        }


        print(ptr);
        cout << "\n";

    }

这是一个小示例输入：

7

6 18 5 20 48 2 97

8

3 6 9 12 28 5 7 10

第 2 行和第 4 行必须作为单独的列表处理，第 1 行和第 3 行是列表的大小（它们必须动态分配内存，因此大小必须与输入保持精确）。

【问题讨论】：

int array_add[vals_add.size()]; -- 这一行和其他类似的行是不是有效的 C++。 C++ 中的数组的大小必须由常量表达式表示，而不是运行时派生的结果。你已经在使用std::vector，所以这些也应该是向量。
您不必单独阅读行来单独处理它们。我从不费心从文件中快速读取，因为它通常很慢，但我会尝试读取一个块中的内容，然后才将其分成几行。
@PaulMcKenzie 谢谢 - 不知道！我试图创建数组，以便它始终包含向量的确切大小。允许它动态调整大小。那么，我应该将向量-vals_add，而不是array_add，转换成链表？
if (file.is_open()) 是一种反模式，请避免。它实际上并没有执行足够的检查。
@foreknownas_463035818 -> 谢谢，所以本质上我会调用 getline -> 读取整个块 -> 使用 istringstream 通过它们的单独行解析整个块？

标签： c++ linked-list processing-efficiency

【解决方案1】：

首先：为什么要使用一些自定义的列表数据结构？它很可能是半生不熟的，即不支持分配器，因此很难适应良好的性能。只需将std::list 用于双向链表，或std::forward_list 用于单链表。很简单。

您似乎暗示了几个要求：

T 类型的值（例如：int）将存储在链表中 - std::list<T> 或 std::forward_list<T>（不是原始Nodes 的列表）。
不应不必要地复制数据 - 即不应重新分配内存块。
解析应该是可并行的，尽管这仅适用于 I/O 不会缩短 CPU 时间的快速数据源。

那么想法是：

使用自定义分配器将内存划分为可以存储多个列表节点的连续段。
将整个文件解析为使用上述分配器的链表。该列表将按需分配内存段。每个换行符都会开始一个新列表。
返回第 2 和第 4 个列表（即第 2 和第 4 行中的元素列表）。

值得注意的是，包含元素计数的行是不必要的。当然，可以将数据传递给分配器以预先分配足够的内存段，但这不允许并行化，因为并行解析器不知道元素计数在哪里——这些只有在并行解析的数据被协调后才能找到。是的，只要稍加修改，这个解析就可以完全并行化。太酷了！

让我们从最简单的开始：解析文件以生成两个列表。下面的示例在数据集的内部生成的文本视图上使用std::istringstream，但parse 当然也可以传递std::ifstream。

// https://github.com/KubaO/stackoverflown/tree/master/questions/linked-list-allocator-58100610
#include <forward_list>
#include <iostream>
#include <sstream>
#include <vector>

using element_type = int;

template <typename allocator> using list_type = std::forward_list<element_type, allocator>;

template <typename allocator>
std::vector<list_type<allocator>> parse(std::istream &in, allocator alloc)
{
   using list_t = list_type<allocator>;
   std::vector<list_t> lists;
   element_type el;
   list_t *list = {};
   do {
      in >> el;
      if (in.good()) {
         if (!list) list = &lists.emplace_back(alloc);
         list->push_front(std::move(el));
      }
      while (in.good()) {
         int c = in.get();
         if (!isspace(c)) {
            in.unget();
            break;
         }
         else if (c=='\n') list = {};
      }
   } while (in.good() && !in.eof());
   for (auto &list : lists) list.reverse();
   return lists;
}

然后，进行测试：

const std::vector<std::vector<element_type>> test_data = {
   {6, 18, 5, 20, 48, 2, 97},
   {3, 6, 9, 12, 28, 5, 7, 10}
};

template <typename allocator = std::allocator<element_type>>
void test(const std::string &str, allocator alloc = {})
{
   std::istringstream input{str};
   auto lists = parse(input, alloc);
   assert(lists.size() == 4);
   lists.erase(lists.begin()+2); // remove the 3rd list
   lists.erase(lists.begin()+0); // remove the 1st list
   for (int i = 0; i < test_data.size(); i++)
      assert(std::equal(test_data[i].begin(), test_data[i].end(), lists[i].begin()));
}

std::string generate_input()
{
   std::stringstream s;
   for (auto &data : test_data) {
      s << data.size() << "\n";
      for (const element_type &el : data) s << el << " ";
      s << "\n";
   }
   return s.str();
}

现在，让我们看一下自定义分配器：

class segment_allocator_base
{
protected:
   static constexpr size_t segment_size = 128;
   using segment = std::vector<char>;
   struct free_node {
      free_node *next;
      free_node() = delete;
      free_node(const free_node &) = delete;
      free_node &operator=(const free_node &) = delete;
      free_node *stepped_by(size_t element_size, int n) const {
         auto *p = const_cast<free_node*>(this);
         return reinterpret_cast<free_node*>(reinterpret_cast<char*>(p) + (n * element_size));
      }
   };
   struct segment_store {
      size_t element_size;
      free_node *free = {};
      explicit segment_store(size_t element_size) : element_size(element_size) {}
      std::forward_list<segment> segments;
   };
   template <typename T> static constexpr size_t size_for() {
      constexpr size_t T_size = sizeof(T);
      constexpr size_t element_align = std::max(alignof(free_node), alignof(T));
      constexpr auto padding = T_size % element_align;
      return T_size + padding;
   }
   struct pimpl {
      std::vector<segment_store> stores;
      template <typename T> segment_store &store_for() {
         constexpr size_t element_size = size_for<T>();
         for (auto &s : stores)
            if (s.element_size == element_size) return s;
         return stores.emplace_back(element_size);
      }
   };
   std::shared_ptr<pimpl> dp{new pimpl};
};

template<typename T>
class segment_allocator : public segment_allocator_base
{
   segment_store *d = {};
   static constexpr size_t element_size = size_for<T>();
   static free_node *advanced(free_node *p, int n) { return p->stepped_by(element_size, n); }
   static free_node *&advance(free_node *&p, int n) { return (p = advanced(p, n)); }
   void mark_free(free_node *free_start, size_t n)
   {
      auto *p = free_start;
      for (; n; n--) p = (p->next = advanced(p, 1));
      advanced(p, -1)->next = d->free;
      d->free = free_start;
   }
public:
   using value_type = T;
   using pointer = T*;
   template <typename U> struct rebind {
      using other = segment_allocator<U>;
   };
   segment_allocator() : d(&dp->store_for<T>()) {}
   segment_allocator(segment_allocator &&o) = default;
   segment_allocator(const segment_allocator &o) = default;
   segment_allocator &operator=(const segment_allocator &o) {
      dp = o.dp;
      d = o.d;
      return *this;
   }
   template <typename U> segment_allocator(const segment_allocator<U> &o) :
      segment_allocator_base(o), d(&dp->store_for<T>()) {}
   pointer allocate(const size_t n) {
      if (n == 0) return {};
      if (d->free) {
         // look for a sufficiently long contiguous region
         auto **base_ref = &d->free;
         auto *base = *base_ref;
         do {
            auto *p = base;
            for (auto need = n; need; need--) {
               auto *const prev = p;
               auto *const next = prev->next;
               advance(p, 1);
               if (need > 1 && next != p) {
                  base_ref = &(prev->next);
                  base = next;
                  break;
               } else if (need == 1) {
                  *base_ref = next; // remove this region from the free list
                  return reinterpret_cast<pointer>(base);
               }
            }
         } while (base);
      }
      // generate a new segment, guaranteed to contain enough space
      size_t count = std::max(n, segment_size);
      auto &segment = d->segments.emplace_front(count);
      auto *const start = reinterpret_cast<free_node*>(segment.data());
      if (count > n)
         mark_free(advanced(start, n), count - n);
      else
         d->free = nullptr;
      return reinterpret_cast<pointer>(start);
   }
   void deallocate(pointer ptr, std::size_t n) {
      mark_free(reinterpret_cast<free_node*>(ptr), n);
   }

   using propagate_on_container_copy_assignment = std::true_type;
   using propagate_on_container_move_assignment = std::true_type;
};

对于我们得到的少量测试数据，分配器只会分配一个段......一次！

测试：

int main()
{  
   auto test_input_str = generate_input();
   std::cout << test_input_str << std::endl;
   test(test_input_str);
   test<segment_allocator<element_type>>(test_input_str);
   return 0;
}

并行化将利用上面的分配器，启动多个线程并在每个线程中调用自己的分配器上的parse，每个解析器从文件中的不同点开始。解析完成后，分配器必须合并它们的段列表，以便它们比较相等。此时，可以使用通常的方法组合链接列表。除了线程启动开销之外，并行化的开销可以忽略不计，并且不会涉及数据复制来组合并行化后的数据。但我把这个练习留给读者。

【讨论】：

【解决方案2】：

有很多地方可以改进。

首先，删除不必要的代码：你没有使用iss 和iss3。接下来，您的array_add 和array_remove 似乎是多余的。直接使用向量。

如果您对平均读取多少个值有一个粗略的了解，请在向量中保留空间以避免重复调整大小和复制（实际上您的输入中似乎有这些数字；使用这个信息，而不是把它扔掉！）。你也可以用std::copy和std::istream_iterators替换你的while阅读循环。

您还没有展示CnvrtVectoList 是如何实现的，但一般由于缺乏局部性，链表并不是特别有效：它们将数据扔到整个堆中。连续容器（= 向量）几乎总是更有效，即使您需要删除中间的元素。尝试改用向量并仔细计算性能时间。

最后，你能排序这些值吗？如果是这样，那么您可以使用对std::lower_bound 的迭代调用或对std::set_difference 的单个调用更有效地实现值的删除。

如果（并且仅当！）开销实际上是从文件中读取数字，重构您的 IO 代码并且不要单独读取行（这样您将避免很多冗余分配）。相反，直接扫描输入文件（可选地使用缓冲区或内存映射）并手动跟踪您遇到的换行符的数量。然后，您可以使用strtod 系列函数扫描输入读取缓冲区中的数字。

或者，如果您可以假设输入是正确的，您可以通过使用文件中提供的信息来避免读取单独的行：

int add_num;
infile >> add_num;
std::copy_n(std::istream_iterator<int>(infile), std::inserter(your_list, std::end(your_list));

int del_num;
infile >> del_num;
std::vector<int> to_delete(del_num);
std::copy_n(std::istream_iterator<int>(infile), del_num, to_delete.begin());
for (auto const n : del_num) {
    deleteNode(&ptr, n);
}

【讨论】：