【发布时间】:2011-01-08 06:53:50
【问题描述】:
我正在尝试通过在 C# 中实现 Peter Norvig 的 spelling corrector 来更多地了解 LINQ。
第一部分涉及获取一个大的file of words(大约 100 万)并将其放入字典中,其中key 是单词,value 是出现次数。
我通常会这样做:
foreach (var word in allWords)
{
if (wordCount.ContainsKey(word))
wordCount[word]++;
else
wordCount.Add(word, 1);
}
其中allWords 是IEnumerable<string>
在 LINQ 中,我目前正在这样做:
var wordCountLINQ = (from word in allWordsLINQ
group word by word
into groups
select groups).ToDictionary(g => g.Key, g => g.Count());
我通过查看所有 <key, value> 来比较这两个字典,它们是相同的,所以它们产生了相同的结果。
foreach 循环需要 3.82 秒,而 LINQ 查询需要 4.49 秒
我正在使用 Stopwatch 类对其进行计时,并且正在 RELEASE 模式下运行。我不认为性能很差,我只是想知道是否存在差异的原因。
我是在以低效的方式执行 LINQ 查询还是遗漏了什么?
更新:这是完整的基准代码示例:
public static void TestCode()
{
//File can be downloaded from http://norvig.com/big.txt and consists of about a million words.
const string fileName = @"path_to_file";
var allWords = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
select m.Value;
var wordCount = new Dictionary<string, int>();
var timer = new Stopwatch();
timer.Start();
foreach (var word in allWords)
{
if (wordCount.ContainsKey(word))
wordCount[word]++;
else
wordCount.Add(word, 1);
}
timer.Stop();
Console.WriteLine("foreach loop took {0:0.00} ms ({1:0.00} secs)\n",
timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);
//Make LINQ use a different Enumerable (with the exactly the same values),
//if you don't it suddenly becomes way faster, which I assmume is a caching thing??
var allWordsLINQ = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
select m.Value;
timer.Reset();
timer.Start();
var wordCountLINQ = (from word in allWordsLINQ
group word by word
into groups
select groups).ToDictionary(g => g.Key, g => g.Count());
timer.Stop();
Console.WriteLine("LINQ took {0:0.00} ms ({1:0.00} secs)\n",
timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);
}
【问题讨论】:
-
除非您发布基准代码,否则无法评论差异。
-
我刚刚为你添加了这个问题。
-
感谢分享 Peter Norvig 拼写校正器的链接。
标签: .net linq performance foreach