【发布时间】:2013-06-12 21:07:43
【问题描述】:
我正在尝试使用 C 程序来理解 CPU 缓存和缓存行,就像我对大多数 C 概念所做的那样。我使用的程序如下所示。我从博客中得到了这个想法。
http://igoro.com/archive/gallery-of-processor-cache-effects/
现在下面程序在我的机器上的输出如下所示。这是 CFLAGS="-g -O0 -Wall" 的输出。
./cache
CPU time for loop 1 0.460000 secs.
CPU time for loop 2 (j = 8) 0.050000 secs.
CPU time for loop 2 (j = 9) 0.050000 secs.
CPU time for loop 2 (j = 10) 0.050000 secs.
CPU time for loop 2 (j = 11) 0.050000 secs.
CPU time for loop 2 (j = 12) 0.040000 secs.
CPU time for loop 2 (j = 13) 0.050000 secs.
CPU time for loop 2 (j = 14) 0.050000 secs.
CPU time for loop 2 (j = 15) 0.040000 secs.
CPU time for loop 2 (j = 16) 0.050000 secs.
CPU time for loop 2 (j = 17) 0.040000 secs.
CPU time for loop 2 (j = 18) 0.050000 secs.
CPU time for loop 2 (j = 19) 0.040000 secs.
CPU time for loop 2 (j = 20) 0.040000 secs.
CPU time for loop 2 (j = 21) 0.040000 secs.
CPU time for loop 2 (j = 22) 0.040000 secs.
CPU time for loop 2 (j = 23) 0.040000 secs.
CPU time for loop 2 (j = 24) 0.030000 secs.
CPU time for loop 2 (j = 25) 0.040000 secs.
CPU time for loop 2 (j = 26) 0.030000 secs.
CPU time for loop 2 (j = 27) 0.040000 secs.
CPU time for loop 2 (j = 28) 0.030000 secs.
CPU time for loop 2 (j = 29) 0.040000 secs.
CPU time for loop 2 (j = 30) 0.030000 secs.
CPU time for loop 2 (j = 31) 0.030000 secs.
优化后的输出 (CFLAGS=-g -O3 -Wall)
CPU time for loop 1 0.130000 secs.
CPU time for loop 2 (j = 8) 0.040000 secs.
CPU time for loop 2 (j = 9) 0.050000 secs.
CPU time for loop 2 (j = 10) 0.050000 secs.
CPU time for loop 2 (j = 11) 0.040000 secs.
CPU time for loop 2 (j = 12) 0.040000 secs.
CPU time for loop 2 (j = 13) 0.050000 secs.
CPU time for loop 2 (j = 14) 0.050000 secs.
CPU time for loop 2 (j = 15) 0.040000 secs.
CPU time for loop 2 (j = 16) 0.040000 secs.
CPU time for loop 2 (j = 17) 0.050000 secs.
CPU time for loop 2 (j = 18) 0.040000 secs.
CPU time for loop 2 (j = 19) 0.050000 secs.
CPU time for loop 2 (j = 20) 0.040000 secs.
CPU time for loop 2 (j = 21) 0.040000 secs.
CPU time for loop 2 (j = 22) 0.040000 secs.
CPU time for loop 2 (j = 23) 0.030000 secs.
CPU time for loop 2 (j = 24) 0.040000 secs.
CPU time for loop 2 (j = 25) 0.030000 secs.
CPU time for loop 2 (j = 26) 0.040000 secs.
CPU time for loop 2 (j = 27) 0.030000 secs.
CPU time for loop 2 (j = 28) 0.030000 secs.
CPU time for loop 2 (j = 29) 0.030000 secs.
CPU time for loop 2 (j = 30) 0.030000 secs.
CPU time for loop 2 (j = 31) 0.030000 secs.
博客中指出
第一个循环将数组中的每个值乘以 3,第二个循环仅每 16 次乘以 >。第二个循环只做 大约是第一个循环的 6%,但在现代机器上, 两个 for 循环大约需要相同的时间:分别为 80 和 78 毫秒 我的机器。
我的机器上似乎不是这种情况。可以看到,执行的时间
loop 1 is 0.46 seconds.
那是为了
loop 2 is 0.03 seconds or 0.04 seconds or 0.05 seconds
对于不同的 j 值。
为什么会这样?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#define MAX_SIZE (64*1024*1024)
int main()
{
clock_t start, end;
double cpu_time;
int i = 0;
int j = 0;
/* MAX_SIZE array is too big for stack. This is an unfortunate rough edge of the way the stack works.
It lives in a fixed-size buffer, set by the program executable's configuration according to the
operating system, but its actual size is seldom checked against the available space. */
/* int arr[MAX_SIZE]; */
int *arr = (int*)malloc(MAX_SIZE * sizeof(int));
/* CPU clock ticks count start */
start = clock();
/* Loop 1 */
for (i = 0; i < MAX_SIZE; i++)
arr[i] *= 3;
/* CPU clock ticks count stop */
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 1 %.6f secs.\n", cpu_time);
for (j = 8 ; j < 32 ; j++)
{
/* CPU clock ticks count start */
start = clock();
/* Loop 2 */
for (i = 0; i < MAX_SIZE; i += j)
arr[i] *= 3;
/* CPU clock ticks count stop*/
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 2 (j = %d) %.6f secs.\n", j, cpu_time);
}
return 0;
}
【问题讨论】:
-
不是真的重复。我发布了关于段错误的另一个问题。我修复了这个问题,这是试图解释结果。
-
@hit:虽然两个问题的代码相同,但实际提出的问题却大不相同……
-
您能否报告您正在使用的编译器,以及您传递的标志?
-
哇,这是一个非常古老的编译器。此外,当您进行基准测试时,您应该在启用优化的情况下进行编译。我敢问你在什么硬件上运行它?
标签: c performance caching time