Find() 与 FirstOrDefault() 的性能 [重复]答案

【问题标题】：Performance of Find() vs. FirstOrDefault() [duplicate]Find() 与 FirstOrDefault() 的性能 [重复]
【发布时间】：2012-12-11 13:07:04
【问题描述】：

类似问题：
Find() vs. Where().FirstOrDefault()

在具有单个字符串属性的简单引用类型的大序列中搜索 Diana 得到了一个有趣的结果。

using System;
using System.Collections.Generic;
using System.Linq;

public class Customer{
    public string Name {get;set;}
}

Stopwatch watch = new Stopwatch();        
    const string diana = "Diana";

    while (Console.ReadKey().Key != ConsoleKey.Escape)
    {
        //Armour with 1000k++ customers. Wow, should be a product with a great success! :)
        var customers = (from i in Enumerable.Range(0, 1000000)
                         select new Customer
                         {
                            Name = Guid.NewGuid().ToString()
                         }).ToList();

        customers.Insert(999000, new Customer { Name = diana }); // Putting Diana at the end :)

        //1. System.Linq.Enumerable.DefaultOrFirst()
        watch.Restart();
        customers.FirstOrDefault(c => c.Name == diana);
        watch.Stop();
        Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", watch.ElapsedMilliseconds);

        //2. System.Collections.Generic.List<T>.Find()
        watch.Restart();
        customers.Find(c => c.Name == diana);
        watch.Stop();
        Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", watch.ElapsedMilliseconds);
    }

这是因为 List.Find() 中没有 Enumerator 开销，还是因为这个加上其他原因？

Find() 运行速度几乎是原来的两倍，希望 .Net 团队将来不会将其标记为过时。

【问题讨论】：

在FirstOrDefault 之前尝试计时Find()。那么结果如何呢？
@Oded 做到了。完全相同的。我还按顺序运行了两次 FirstOrDefault，但仍然是相同的 23-24 毫秒（在我的 iCore5 上）。看起来它没有缓存。
有趣。性能是否与列表大小成线性关系（FirstOrDefault 是否总是其他列表大小的两倍，或者使用 Linq 是否有固定的 10 毫秒成本）？
在 Mono 上甚至更多：戴安娜在 30 毫秒内通过 System.Collections.Generic.List.Find() 被发现。使用 System.Linq.Enumerable.FirstOrDefault() 在 176 毫秒内找到戴安娜。
FirstOrDefault 的每个项目间接调用三个，Find 的间接调用一个。

标签： c# .net performance linq

【解决方案1】：

我能够模仿你的结果，所以我反编译了你的程序，Find 和 FirstOrDefault 之间存在差异。

首先是反编译的程序。我将您的数据对象设为匿名数据项，仅用于编译

    List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
    {
      var local_0 = new
      {
        Name = Guid.NewGuid().ToString()
      };
      return local_0;
    }));
    source.Insert(999000, new
    {
      Name = diana
    });
    stopwatch.Restart();
    Enumerable.FirstOrDefault(source, c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
    stopwatch.Restart();
    source.Find(c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);

这里要注意的关键是FirstOrDefault 在Enumerable 上被调用，而Find 在源列表中作为方法被调用。

那么，find 是做什么的？这是反编译的Find方法

private T[] _items;

[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
  if (match == null)
    ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
  for (int index = 0; index < this._size; ++index)
  {
    if (match(this._items[index]))
      return this._items[index];
  }
  return default (T);
}

所以它在一个数组上迭代是有意义的，因为一个列表是一个数组的包装器。

但是，FirstOrDefault 在 Enumerable 类上使用 foreach 迭代项目。这使用一个迭代器到列表并移动到下一个。我认为您看到的是迭代器的开销

[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
  if (source == null)
    throw Error.ArgumentNull("source");
  if (predicate == null)
    throw Error.ArgumentNull("predicate");
  foreach (TSource source1 in source)
  {
    if (predicate(source1))
      return source1;
  }
  return default (TSource);
}

Foreach 只是 syntatic sugar 使用可枚举模式。看这张图

。

我点击了 foreach 来查看它在做什么，你可以看到 dotpeek 想要带我去 enumerator/current/next 实现，这是有道理的。

除此之外它们基本相同（测试传入的谓词以查看项目是否是您想要的）

【讨论】：

现在 100% 很明显它们之间的唯一区别是什么，我希望看到其他东西，比如更难以识别。看看 .net 框架下发生了什么总是很有趣的。谢谢！
祈祷 C# 获得更高阶类型的那一天
为了帮助阐明性能差异，列表上的Find() 不使用 LINQ。请参阅@Chris Sinclair 的回答。

【解决方案2】：

我打赌FirstOrDefault 正在通过IEnumerable 实现运行，也就是说，它将使用标准的foreach 循环进行检查。 List<T>.Find() 不是 Linq (http://msdn.microsoft.com/en-us/library/x0b5b5bc.aspx) 的一部分，并且可能使用从 0 到 Count 的标准 for 循环（或可能直接在其内部/包装数组上运行的另一种快速内部机制）。通过消除枚举（并进行版本检查以确保列表未被修改）的开销，Find 方法更快。

如果添加第三个测试：

//3. System.Collections.Generic.List<T> foreach
Func<Customer, bool> dianaCheck = c => c.Name == diana;
watch.Restart();
foreach(var c in customers)
{
    if (dianaCheck(c))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T> foreach.", watch.ElapsedMilliseconds);

它的运行速度与第一个大致相同（FirstOrDefault 为 25 毫秒 vs 27 毫秒）

编辑：如果我添加一个数组循环，它会非常接近Find() 的速度，并且鉴于@devshorts 查看源代码，我认为就是这样：

//4. System.Collections.Generic.List<T> for loop
var customersArray = customers.ToArray();
watch.Restart();
int customersCount = customersArray.Length;
for (int i = 0; i < customersCount; i++)
{
    if (dianaCheck(customers[i]))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with an array for loop.", watch.ElapsedMilliseconds);

这仅比 Find() 方法慢 5.5%。

所以底线：循环遍历数组元素比处理foreach 迭代开销要快。（但两者都有其优点/缺点，因此只需从逻辑上选择对您的代码有意义的内容。此外，速度上的微小差异永远很少会导致问题，因此只需使用对可维护性有意义的内容/可读性）

【讨论】：

与 foreach 和 for 的比较好。我一直是可维护性/可读性的粉丝，而不是那些为每一纳秒而战的糟糕的性能提升 :) 但是对于非常大的序列（尤其是当代码在服务器上运行时），查找性能优化是一项常见的任务，并且拥有选择快两倍的代码让我决定继续使用 List.Find() 进行繁重的操作。
@Arman 我基本同意。在这种情况下，您需要有 1000 万个条目才能获得真正可感知的性能影响（几百毫秒）。真的，在这一点上，你可能应该放弃这个 O(n) 迭代来进行一些 O(1) 查找，比如键入名称的字典。
@ChrisSinclar 甚至更好的算法 O（对不起）:) 当他说 Mono 需要 176 毫秒时，我对另一条评论感到更加惊讶。而且只有最简单的单一属性类。即使在具有 1000 个并发客户端的服务器上运行 10k 个真实客户会发生什么（我们经常处理类似的场景）？这就是 Linq、lambda、委托、迭代器、可枚举、反射和其他使 C# 的生活更轻松的习语的成本。