c ++：选择一个std :: vector的子集，基于预定义的元素索引答案

【问题标题】：c++: Select a subset of a std::vector, based predefined element indicesc ++：选择一个std :: vector的子集，基于预定义的元素索引
【发布时间】：2012-03-10 22:20:05
【问题描述】：

我正在寻找一种有效的方法来修剪或复制现有 std::vector 的子集。元素符合子集/保留条件的标准是它们的索引包含在单独的预定义 std::vector 中。

e.g std::vector<String> Test = { "A", "B", "C", "D", "E"}

std::vector<int> SelectionV = {1,2,5}

Result = {"A", "B", "E"}

我将在一个非常大的向量上执行此操作，并且可能会定期执行此操作，因此我正在寻找尽可能有效的方法。

我也在考虑另一种方法，但又不确定一种有效的方法是......

由于对象 Test 已填充（在我的情况下，它是第 3 方定义的对象），它是使用迭代器单次通过的结果（无法直接访问元素）。我想知道是否可以只添加到出现在 SelectionV 中定义的计数中的 Test 向量元素

例如

int count = 0

for (Iterator.begin, Iterator.end(), Iterator++) {
    if (count is a number contained in selectionV)
        add to Test
}

但我认为这将导致在每次迭代中通过 selectionV，这比简单地添加所有元素然后选择子集的效率要低得多。

非常感谢任何帮助。

【问题讨论】：

selectionV 需要是向量吗？填充测试/结果时它是静态的吗？
SelectionV 与 Test 相比有多大？（只有几个元素？几乎所有元素？）
No selectionV 不需要是向量。是测试/结果在填写时是静态的。 SelectionV 可能占 Test 的很大一部分，但这是在运行时定义的，可以是任何百分比，尽管它肯定是至少有 1000 个索引。

标签： c++ performance vector sample subset

【解决方案1】：

您也可以使用标准库：

std::vector<std::string> Result(SelectionV.size(), 0);

std::transform(SelectionV.begin(), SelectionV.end(), Result.begin(), [Test](size_t pos) {return Test[pos];});

【讨论】：

这是一个非常漂亮的解决方案！

【解决方案2】：

您可以通过增加顺序对您的 SelectionV 向量进行排序，然后您可以重写您的 for 循环，如下所示：

int index = 0, nextInSelectionV = 0;
for (Iterator.begin; nextInSelectionV < SelectionV.lengh() && Iterator.end(); Iterator++) {
    if (index == SelectionV[nextInSelectionV]) {
        add to Test
        nextInSelectionV++;
    }
    index++;
}

【讨论】：

【解决方案3】：

这取决于Test 的大小和SelectionV 的大小（占Test 的百分比），以及SelectionV 中的元素是否重复。您可以通过计算 Not SelectionV 来进行优化。
请注意，在您的示例中，由于 SelectionV 是一个索引，而不是一个值，因此查找速度已经是 O(1) 快（这已经是一个巨大的优势了）。
如果Test和SelectionV没有变化，如果它们很大，也可以将SelectionV除以n个线程，让每个线程独立查找Test中的值然后稍后组合各个输出（与 map-reduce 不同）。缺点可能是 CPU 缓存命中率的损失。
在重复调用时，您可能希望获取旧的SelectionV 和新的SelectionV 之间的差异并对这个值进行操作。这种类型的缓存优化适用于迭代之间的少量更改。

最重要的是，确保您确实需要在花时间进行优化之前对其进行优化（更糟糕的是，使您的代码复杂化）。

您的应用程序的其他部分（例如 I/O）很可能会慢很多倍。

【讨论】：

1) 所需子集的相对大小会有所不同，并在运行时决定。
2) 最初会发生的是 selectionV 将保持不变，尝试从许多不同的“测试”中选择相同的子集。
@oracle3001 您是否真正分析过您的样本？设置 selectionV 和 Test（尤其是从磁盘或数据库读取时）实际上可能比选择算法花费更长的时间。
我从中采样的对象是一系列网格对象（由这个库 OpenMesh.org 定义）。一旦加载到内存中，我会尝试从这些对象中重复采样顶点子集。

【解决方案4】：

也许以下内容可能对将来的某人有用：

template<typename T>
T vector_select(const std::vector<T>& vector, const std::size_t index)
{
  assert(index < vector.size());  
  return vector[index];
}

template<typename T>
class VectorSelector
{
public:
  VectorSelector(const std::vector<T>& v) : _v(&v) { }
  T operator()(const std::size_t index){ return vector_select(*_v, index); }
private:
  const std::vector<T>* _v;

};

template<typename T>
std::vector<T> vector_select(const std::vector<T>& vector,
                             const std::vector<std::size_t>& index)
{
  assert(*std::max_element(index.begin(), index.end()) < vector.size());
  std::vector<T> out(index.size());
  std::transform(index.begin(), index.end(), out.begin(),
                 VectorSelector<T>(vector));
  return out;
}

【讨论】：