使用 Kmean 找到最多元素的集群？答案

【问题标题】：find the cluster of a highest number of elements using Kmean?使用 Kmean 找到最多元素的集群？
【发布时间】：2013-07-17 17:11:34
【问题描述】：

我使用 kmean 函数将 8-D 向量聚类为一组簇：

 kmeans(Vectors, clusterCount, labels, TermCriteria(CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 100, 2), 10, KMEANS_PP_CENTERS, centers);

对我来说，最成功的集群是包含更多向量的集群。所以我的问题是如何找到人口最多的集群？ label param是每个向量所属的指标，我感觉如果用它来找频率会消耗时间。有没有人可以提出一个想法？

传统上，我执行以下任务：

int max = -1;int index = -1;
vector<int> classes;
classes.resize(clusterCount);
for (int i=0;i<labels.rows;i++)
{
  int idx = labels.at<int>(i,0);
  classes[idx]++;
  if (classes[idx] > max)
  {
    max = classes[idx];
    index = idx;
 }
}

有没有比这更快的解决方案？

【问题讨论】：

你们有多少样品？即使您有数百万个样本，也不会花费太多时间来找到最频繁的样本。

标签： c++ performance opencv cluster-analysis k-means

【解决方案1】：

我正在寻找相同的东西，但还没有发现任何（还）有很大不同的东西，但是你可以加快你的代码：

不要每次都更新你的最大值
避免使用中间变量（比如你的int idx）

这是我的代码：

int classes[clusterCount];
memset(classes, 0, sizeof(classes[0]) * clusterCount);
int * labels_ptr = labels.ptr<int>(0);
for (int i = 0; i < labels.rows; ++i)
    classes[*labels_ptr++]++;
for (int i = 0; i < clusterCount; ++i)
    {
    if (classes[i] > max)
        {
        max = count[i];
        index = i;
        }
    }

此代码给出的结果与您的相同，并且在我的电脑（英特尔酷睿 i7）上比您提供的代码快大约 5 倍（在不同图像上测试了 1000 次运行）。 p>

【讨论】：

@dervish : 如果我的回答对你有帮助，请考虑接受。