【发布时间】:2016-09-20 22:26:56
【问题描述】:
在以下 C# 程序中,在 Broadwell CPU 和 Windows 8.1 上以 Visual Studio 2015 Update 2 x64 Release 模式编译,运行基准测试的两个变体。他们都做同样的事情——在一个数组中总共有 500 万个整数。
两个基准测试之间的区别在于,一个版本将运行总数(一个长整数)保存在堆栈上,而另一个版本将其保存在堆上。两个版本都没有分配;沿数组扫描时会添加总计。
在测试中,我发现基准变体与堆上的总数和堆栈上的变体之间存在一致的显着性能差异。对于某些测试大小,当总数在堆上时,速度会慢三倍。
为什么总的两个内存位置之间会有这样的性能差异?
using System;
using System.Diagnostics;
namespace StackHeap
{
class StackvHeap
{
static void Main(string[] args)
{
double stackAvgms, heapAvgms;
// Warmup
runBenchmark(out stackAvgms, out heapAvgms);
// Run
runBenchmark(out stackAvgms, out heapAvgms);
Console.WriteLine($"Stack avg: {stackAvgms} ms\nHeap avg: {heapAvgms} ms");
}
private static void runBenchmark(out double stackAvgms, out double heapAvgms)
{
Benchmarker b = new Benchmarker();
long stackTotalms = 0;
int trials = 100;
for (int i = 0; i < trials; ++i)
{
stackTotalms += b.stackTotaler();
}
long heapTotalms = 0;
for (int i = 0; i < trials; ++i)
{
heapTotalms += b.heapTotaler();
}
stackAvgms = stackTotalms / (double)trials;
heapAvgms = heapTotalms / (double)trials;
}
}
class Benchmarker
{
long heapTotal;
int[] vals = new int[5000000];
public long heapTotaler()
{
setup();
var stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = 0; i < vals.Length; ++i)
{
heapTotal += vals[i];
}
stopWatch.Stop();
//Console.WriteLine($"{stopWatch.ElapsedMilliseconds} milliseconds with the counter on the heap");
return stopWatch.ElapsedMilliseconds;
}
public long stackTotaler()
{
setup();
var stopWatch = new Stopwatch();
stopWatch.Start();
long stackTotal = 0;
for (int i = 0; i < vals.Length; ++i)
{
stackTotal += vals[i];
}
stopWatch.Stop();
//Console.WriteLine($"{stopWatch.ElapsedMilliseconds} milliseconds with the counter on the stack");
return stopWatch.ElapsedMilliseconds;
}
private void setup()
{
heapTotal = 0;
for (int i = 0; i < vals.Length; ++i)
{
vals[i] = i;
}
}
}
}
【问题讨论】:
-
“堆栈”总计器几乎可以肯定使用寄存器。拆解看过了吗?
-
我只知道答案与缓存命中和未命中有关
标签: c# performance heap-memory stack-memory