未排序长度 n 数组中 k 个最大元素的索引答案

【问题标题】：indices of the k largest elements in an unsorted length n array未排序长度 n 数组中 k 个最大元素的索引
【发布时间】：2013-01-31 21:51:15
【问题描述】：

我需要找到 C++ 中未排序、长度为 n、数组/向量的 k 个最大元素的索引，其中 k

在没有 nth_element() 的情况下实现它似乎我必须遍历整个数组一次，在每一步填充最大元素的索引列表。

标准 C++ 库中是否有任何东西可以使它成为单行或任何巧妙的方式来自己用几行代码实现它？在我的特殊情况下，k = 3 和 n = 6，因此效率不是一个大问题，但如果找到一种干净有效的方法来对任意 k 和 n 执行此操作，那就太好了。

看起来Mark the top N elements of an unsorted array 可能是我能在 SO 上找到的最接近的帖子，其中的帖子是 Python 和 PHP。

【问题讨论】：

可以修改矢量吗？ nth_element 将在原地进行部分排序，因此它会修改向量。
向量可以修改，但最终结果需要是k个最大元素的索引（原始向量的）。
这只是一个选择算法。通常你会使用堆选择或快速选择。有关类似问题，请参阅 stackoverflow.com/q/7746648/56778。有一个好的 C++ 解决方案的答案。（使用priority_queue）
顺便说一下，如果 k=3 和 n=6，那么您最好只对数组进行排序并选择前 3 个项目。正如你所说，效率不是一个大问题，O(kn) 和 O(n) 之间的差异对于这么小的数字来说是微不足道的。

标签： c++ arrays max indices

【解决方案1】：

这应该是@hazelnusse 的改进版本，它在O(nlogk) 而不是O(nlogn) 中执行

#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
  std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
    int k = 5; // number of indices we need
  for (int i = 0; i < test.size(); ++i) {
    if(q.size()<k)
        q.push(std::pair<double, int>(test[i], i));
    else if(q.top().first < test[i]){
        q.pop();
        q.push(std::pair<double, int>(test[i], i));
    }
  }
  k = q.size();
  std::vector<int> res(k);
  for (int i = 0; i < k; ++i) {
    res[k - i - 1] = q.top().second;
    q.pop();
  }
  for (int i = 0; i < k; ++i) {
    std::cout<< res[i] <<std::endl;
  }
}

8 4 1 2 6

【讨论】：

【解决方案2】：

尽管以下代码可能无法满足所需的复杂性约束，但它可能是前面提到的优先级队列的有趣替代方案。

#include <queue>
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>

std::vector<int> largestIndices(const std::vector<double>& values, int k) {
    std::vector<int> ret;

    std::vector<std::pair<double, int>> q;
    int index = -1;
    std::transform(values.begin(), values.end(), std::back_inserter(q), [&](double val) {return std::make_pair(val, ++index); });
    auto functor = [](const std::pair<double, int>& a, const std::pair<double, int>& b) { return b.first > a.first; };
    std::make_heap(q.begin(), q.end(), functor);
    for (auto i = 0; i < k && i<values.size(); i++) {
        std::pop_heap(q.begin(), q.end(), functor);
        ret.push_back(q.back().second);
        q.pop_back();
    }

    return ret;
}

int main()
{
    std::vector<double> values = { 7,6,3,4,5,2,1,0 };
    auto ret=largestIndices(values, 4);
    std::copy(ret.begin(), ret.end(), std::ostream_iterator<int>(std::cout, "\n"));
}

【讨论】：

【解决方案3】：

您可以在O(n) 时间内通过单次订单统计计算完成此操作：

让r 成为k-th 阶统计量
初始化两个空列表bigger和equal。
对于每个索引i：
- 如果array[i] > r，将i 添加到bigger
- 如果array[i] = r，将i 添加到equal
丢弃equal 中的元素，直到两个列表的长度之和为k
返回两个列表的串联。

当然，如果所有项目都不同，您只需要一个列表。如果需要，您可以将两个列表合并为一个，尽管这会使代码更加复杂。

【讨论】：

【解决方案4】：

您可以使用快速排序算法的基础来做您需要的事情，除了重新排序分区之外，您可以删除超出您所需范围的条目。

它被称为“快速选择”和here is a C++ implementation：

int partition(int* input, int p, int r)
{
    int pivot = input[r];

    while ( p < r )
    {
        while ( input[p] < pivot )
            p++;

        while ( input[r] > pivot )
            r--;

        if ( input[p] == input[r] )
            p++;
        else if ( p < r ) {
            int tmp = input[p];
            input[p] = input[r];
            input[r] = tmp;
        }
    }

    return r;
}

int quick_select(int* input, int p, int r, int k)
{
    if ( p == r ) return input[p];
    int j = partition(input, p, r);
    int length = j - p + 1;
    if ( length == k ) return input[j];
    else if ( k < length ) return quick_select(input, p, j - 1, k);
    else  return quick_select(input, j + 1, r, k - length);
}

int main()
{
    int A1[] = { 100, 400, 300, 500, 200 };
    cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
    int A2[] = { 100, 400, 300, 500, 200 };
    cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
    int A3[] = { 100, 400, 300, 500, 200 };
    cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
    int A4[] = { 100, 400, 300, 500, 200 };
    cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
    int A5[] = { 100, 400, 300, 500, 200 };
    cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}

输出：

1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500

编辑

该特定实现的平均运行时间为 O(n)；由于选择枢轴的方法，它共享快速排序的最坏情况运行时间。通过optimizing the pivot choice，您最坏的情况也变为 O(n)。

【讨论】：

【解决方案5】：

问题有部分答案；即std::nth_element 返回“第 n 个统计量”，其属性第 n 个之前的元素都不大于它，并且它后面的元素都不小于它。

因此，只需调用一次std::nth_element 就足以获取 k 个最大元素。时间复杂度将为 O(n)，理论上是最小的，因为您必须至少访问每个元素一次才能找到最小（或在本例中为 k 最小）的元素。如果您需要对这些 k 个元素进行排序，那么您需要对它们进行排序，即 O(k log(k))。所以，总共 O(n + k log(k))。

【讨论】：

这会找到 k 个最大的元素，而 OP 的要求是找到 k 个最大的索引。
嗯，你是对的，（再次查看问题）我不知道我为什么首先给出这个答案以及为什么人们投票赞成它。但很可能，他们和我一样误解了这个问题，显然，这个答案在某种程度上帮助了他们，所以我会保持这样的状态。

【解决方案6】：

这是我的实现，它可以满足我的需求，并且我认为它相当有效：

#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
  std::priority_queue<std::pair<double, int>> q;
  for (int i = 0; i < test.size(); ++i) {
    q.push(std::pair<double, int>(test[i], i));
  }
  int k = 3; // number of indices we need
  for (int i = 0; i < k; ++i) {
    int ki = q.top().second;
    std::cout << "index[" << i << "] = " << ki << std::endl;
    q.pop();
  }
}

给出输出：

index[0] = 3
index[1] = 1
index[2] = 0

【讨论】：

我对一个使用 nth_element 的实现和一个使用 partial_sort 并使用自定义比较器的实现进行了计时......您的实现更快。
无需将所有项目都添加到优先级队列中。这使得算法 O(n log n)。如果您不添加小于队列中最小项目的内容，则可以在 O(n log k) 中完成。请参阅stackoverflow.com/q/7746648/56778 进行讨论。
@JimMischel 也许我遗漏了一些东西，但据我所知，如果我只添加大于队列中最小元素的元素，我最终可能会遗漏一些 k-顶级元素。例如，如果我添加到优先级队列中的第一个元素是最大元素，它同时也是队列中的最小元素，并且会导致算法不添加任何其他元素。
@BananaCode：如果您查看链接的答案，您会发现您最初使用第一个 k 元素填充了优先级队列。然后您对剩余元素使用 only-add-if-larger-than-smallest 规则。
如果有第 k 个最大元素的平局怎么办？很高兴为您的方法提供此扩展。

【解决方案7】：

标准库不会为您提供索引列表（它旨在避免传递冗余数据）。但是，如果您对 n 个最大元素感兴趣，请使用某种分区（std::partition 和 std::nth_element 都是 O(n)）：

#include <iostream>
#include <algorithm>
#include <vector>

struct Pred {
    Pred(int nth) : nth(nth) {};
    bool operator()(int k) { return k >= nth; }
    int nth;
};

int main() {

    int n = 4;
    std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};

    // Moves the nth element to the nth from the end position.
    std::nth_element(v.begin(), v.end() - n, v.end());

    // Reorders the range, so that the first n elements would be >= nth.
    std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));

    for (auto it = v.begin(); it != v.end(); ++it)
        std::cout << *it << " ";
    std::cout << "\n";

    return 0;
}

【讨论】：

我特别需要索引。
@hazelnusse 您可以为您的元素定义一个结构类型，存储值和原始索引，同时为其定义比较器。