如何从源向量<>构建搜索结果的向量<>？答案

【问题标题】：How do I build a vector<> of search results from a source vector<>?如何从源向量<>构建搜索结果的向量<>？
【发布时间】：2013-03-08 21:12:56
【问题描述】：

考虑这个例子：

std::vector<Student> students;
//poplate students from a data source
std::vector<Student> searched(students.size());
auto s = std::copy_if(students.begin(), students.end(), searched.begin(),
    [](const Student &stud) {
        return stud.getFirstName().find("an") != std::string::npos;
    });
searched.resize(std::distance(searched.begin(), s));

我有以下问题：

是否可以为搜索到的向量分配内存等于初始向量？可能有 500 个不小的对象，可能没有一个满足搜索条件？有没有其他办法？
当复制到搜索到的向量时，它被称为复制赋值运算符，并且..显然会进行复制。如果从这 500 个对象中 400 个满足搜索条件呢？不只是浪费内存吗？

我是一个 c++ 菜鸟，所以我可能会说些愚蠢的话。我不明白为什么要使用vector<T>，其中T 是一个对象。我会一直使用vector<shared_ptr<T>>。如果T 是像 int 这样的原始类型，我想使用vector<T> 有点直接。

我考虑了这个示例，因为我认为它非常笼统，您总是必须从数据库或 xml 文件或任何其他来源中提取一些数据。您是否会在数据访问层中使用vector<T> 或vector<shared_ptr<T>>？

【问题讨论】：

老实说，我会使用 std::back_inserter(searched) 作为 copy_if 的输出迭代器，并完全放弃初始大小。
对大多数情况（尤其是#2）的一个很好的回应归结为一个问题，即为什么您首先要制作副本。如果可能，请完全避免复制，并使用transform_if 之类的东西来过滤和处理子集，而不仅仅是创建和存储子集。

标签： c++ c++11

【解决方案1】：

关于你的第一个问题：

1 - 是否可以为搜索到的向量分配内存等于初始向量？可能有 500 个不小的对象，可能没有一个满足搜索条件？有没有其他办法？

您可以使用后插入迭代器，使用std::back_inserter() 标准函数为searched 向量创建一个：

#include <vector>
#include <string>
#include <algorithm>
#include <iterator> // This is the header to include for std::back_inserter()

// Just a dummy definition of your Student class,
// to make this example compile...
struct Student
{
    std::string getFirstName() const { return "hello"; }
};

int main()
{
    std::vector<Student> students;

    std::vector<Student> searched;
    //                   ^^^^^^^^^
    //                   Watch out: no parentheses here, or you will be
    //                   declaring a function accepting no arguments and
    //                   returning a std::vector<Student>

    auto s = std::copy_if(
        students.begin(),
        students.end(),
        std::back_inserter(searched),
    //  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //  Returns an insert iterator
        [] (const Student &stud) 
        { 
            return stud.getFirstName().find("an") != std::string::npos; 
        });
}

关于你的第二个问题：

2 - 当复制到搜索到的向量时，它被称为复制赋值运算符，并且..显然会进行复制。如果从这 500 个对象中 400 个满足搜索条件呢？不只是浪费内存吗？

好吧，如果您没有关于谓词选择性的统计信息，那么您无能为力。当然，如果您的目的是以某种方式处理所有那些特定谓词为真的学生，那么您应该在源向量上使用std::for_each()，而不是创建一个单独的向量：

std::for_each(students.begin(), students.end(), [] (const Student &stud) 
{ 
    if (stud.getFirstName().find("an") != std::string::npos)
    {
        // ...
    }
});

但是，这种方法是否满足您的要求取决于您的特定应用程序。

我不明白为什么要使用vector<T>，其中T 是一个对象。我会一直使用vector<shared_ptr<T>>。

是否使用（智能）指针而不是值取决于whether or not you need reference semantics（除了关于复制和移动这些对象的可能性能考虑）。根据您提供的信息，尚不清楚是否是这种情况，因此这可能是一个好主意，也可能不是一个好主意。

【讨论】：

@JackWillson #2 完全取决于您是否要保持结果与原始容器的独立性以进行修改如果您需要它们进行您不想要的修改传播到原始容器中的对象，共享指针不是一个好的举措。如果您同意修改开始反映在两个向量中，或者如果您根本不打算修改它们，那么如果内存占用是一个问题，共享指针是值得考虑的。 但首先要确保这是一个问题。除非你知道这是一个问题，否则不要过度优化。
谢谢各位简洁的解释。
@JackWillson：我也尝试回答第二个问题，希望答案有意义。

【解决方案2】：

你打算对所有这些学生做什么？

只需这样做：

for(Student& student: students) {
    if(student.firstNameMatches("an")) {
        //.. do something
    }
}

【讨论】：