比较高性能的 int 数组答案

【问题标题】：Compare arrays of int in high performance比较高性能的 int 数组
【发布时间】：2011-09-07 12:19:00
【问题描述】：

我不记得在大学的日子里，比较两个未排序的 int 数组并找到匹配数的方法？每个值在其自己的数组中都是唯一的，并且两个数组的大小相同。

例如

int[5] a1 = new []{1,2,4,5,0}
int[5] a2 = new []{2,4,11,-6,7}

int numOfMatches = FindMatchesInPerformanceOfNLogN(a1,a2);

有人记得吗？

【问题讨论】：

数组中的值是否有上限，内部是否可以有相同的值？对于未绑定的值，恕我直言，没有比 2* 排序 (= O(n log n)) 和比较 (= O(n)) -> O(n log n) 更好的解决方案
一次排序（n log n）和一次二等分搜索（n 个元素 * log n）？
对这里的答案进行了一些投票，以弥补（在我看来）无故否决所有答案的人。
请问你如何比较 n*log(n) ？

标签： c# arrays algorithm data-structures

【解决方案1】：

如果您可以将其中一个数组的内容存储在HashMap 中，那么您可以通过查看它们是否存在于HashMap 中来检查另一个数组中的元素是否存在。这是 O(n)。

【讨论】：

如果您愿意容忍小概率的误报，也可以使用布隆过滤器。 petarv.blogspot.com/2007/06/bloom-filter.html

【解决方案2】：

必须对一个数组进行排序，以便您可以在 n*log(n) 中进行比较。也就是说，对于未排序数组 (n) 中的每个项目，您对已排序数组 (log(n)) 执行二进制搜索。如果两者都未排序，我看不到在 n*log(n) 中进行比较的方法。

【讨论】：

实际上是 m*log(n)，其中 m = 未排序数组的大小，n = 排序数组的大小。如果您的排序数组与未排序数组相比非常小，则性能将再次接近 O(n)。
那不是要排序的 nlog(n) 加上要搜索的 mlog(n) 吗？
如果你也进行排序，是的。但是大 O 并不关心常量 (stackoverflow.com/questions/22188851/…)。

【解决方案3】：

这个怎么样：

连接两个数组
快速排序结果
从 array[1] 到 array[array.length - 1] 并检查 array[i] 与 array[i-1]

如果它们相同，则您有重复。这也应该是 O(n*log(n)) 并且不需要对每个检查进行二进制搜索。

【讨论】：

【解决方案4】：

你可以使用 LINQ：

var a1 = new int[5] {1, 2, 4, 5, 0};
var a2 = new int[5] {2, 4, 11, -6, 7};
var matches = a1.Intersect(a2).Count();

我不确定您是在寻求一种直接的方式还是最快/最好的方式......

【讨论】：

【解决方案5】：

您有两种我知道的方法（参考：http://www2.cs.siu.edu/~mengxia/Courses%20PPT/220/carrano_ppt08.ppt）：

递归（伪代码）

Algorithm to search a[first] through a[last] for desiredItem
if (there are no elements to search)
    return false
else if (desiredItem equals a[first])
    return true
else    
    return the result of searching a[first+1] through a[last]

效率

May be O(log n) though I have not tried it.

顺序搜索（伪代码）

public boolean contains(Object anEntry)
{   
    boolean found = false;
    for (int index = 0; !found && (index < length); index++) {
    if (anEntry.equals(entry[index]))
            found = true;
    } 
    return found;
}

顺序搜索的效率

Best case  O(1)
    Locate desired item first
Worst case  O(n)
    Must look at all the items
Average case O(n)
    Must look at half the items 
    O(n/2) is still O(n)

除非排序，否则我不知道 O(log n) 搜索算法。

【讨论】：

第一个算法是对所有项目的简单 for 循环的递归实现，因此复杂度相同且 O(N)。（顺便说一句，我不是反对者）
@Don：感谢您的评论。
如果你用它来比较，正如开场白的意图，顺便说一句，它是 O(N^2)。这是你能做的最糟糕的事情。

【解决方案6】：

我不知道这是否是最快的方法，但你可以做到

int[] a1 = new []{1,2,4,5,0};
int[] a2 = new []{2,4,11,-6,7};
var result = a1.Intersect(a2).Count();

值得将此与其他针对 int 优化的方法进行比较，因为 Intersect() 对 IEnumerable 进行操作。

【讨论】：

似乎是很多没有 cmets 的随机投票。所有这些答案对我来说都是合理的。

【解决方案7】：

这个问题也适用于并行化：产生 n1 个线程并让每个线程将 a1 的一个元素与 a2 的 n2 个元素进行比较，然后求和。可能较慢，但值得考虑的是，产生 n1 * n2 个线程以同时进行所有比较，然后减少。如果在第一种情况下 P >> max(n1, n2) ，在第二种情况下 P >> n1 * n2 ，你可以在第一种情况下在 O(n) 中完成整个事情，在第二种情况下 O(log n) 。

【讨论】：