Java循环效率答案

【问题标题】：Java loop efficiencyJava循环效率
【发布时间】：2013-04-18 19:43:27
【问题描述】：

我正在比较 Java 中嵌套 for、while 和 do-while 循环的效率，我遇到了一些奇怪的结果，需要帮助理解。

public class Loops {
    public static void main(String[] args) {
        int L = 100000;    // number of iterations per loop
        // for loop
        double start = System.currentTimeMillis();
        long s1 = 0;
        for (int i=0; i < L; i++) {
            for (int j = 0; j < L; j++) {
                s1 += 1;
            }
        }
        double end = System.currentTimeMillis();
        String result1 = String.format("for loop: %.5f", (end-start) / 1000);
        System.out.println(s1);
        System.out.println(result1);

        // do-while loop
        double start1 = System.currentTimeMillis();
        int i = 0;
        long s2 = 0;
        do {
            i++;
            int j = 0;
            do {
                s2 += 1;
                j++;
            } while (j < L);
        } while (i < L);
        double end1 = System.currentTimeMillis();
        String result2 = String.format("do-while: %.5f", (end1-start1) / 1000);
        System.out.println(s2);
        System.out.println(result2);

        // while loop
        double start2 = System.currentTimeMillis();
        i = 0;
        long s3 = 0;
        while (i < L) {
            i++;
            int j = 0;
            while (j < L) {
                s3 += 1;
                j++;
            }
        }
        double end2 = System.currentTimeMillis();
        String result3 = String.format("while: %.5f", (end2-start2) / 1000);
        System.out.println(s3);
        System.out.println(result3);
    }
}

所有循环各自的计数器总和为 100 亿；结果让我感到困惑：

for 循环：6.48300

执行时间：0.41200

而：9.71500

为什么 do-while 循环这么快？这种性能差距与对 L 的任何更改并行扩展。我已经独立运行这些循环并且它们执行相同。

【问题讨论】：

我无法复制您的号码。两个 while 循环对我来说运行速度相同，而 for 循环稍慢。
无论如何，这不是一个特别好的基准，因为编译器或 JIT 可能能够完全删除内部循环。
必须是这种情况 - 某种只在 do-while 循环中执行的优化。不过，我很想了解更多关于这种机制的信息。
是的，我不太清楚这里发生了什么。我更像是一个 C 和 C++ 人，我几乎没有深入研究 JVM/JIT 怪异的经验。

标签： java performance for-loop while-loop do-while

【解决方案1】：

我已经运行了您提供的代码，并且惊讶地发现这些性能差异。出于好奇，我开始调查并发现，尽管这些循环似乎在做同样的事情，但它们之间还是有一些重要的区别。

第一次运行这些循环后的结果是：

for loop: 1.43100
do-while: 0.51300
while: 1.54500

但是当我运行这三个循环至少 10 次时，每个循环的性能几乎相同。

for loop: 0.43200
do-while: 0.46100
while: 0.42900

JIT 能够随着时间的推移优化这些循环，但一定存在一些差异，导致这些循环具有不同的初始性能。其实其实有两点不同：

do-while 循环进行的比较比 for 和 while 循环少

为简单起见，假设 L = 1

long s1 = 0;
for (int i=0; i < L; i++) {
    for (int j = 0; j < L; j++) {
        s1 += 1;

外循环：0 内循环：0 内循环：1 外循环：1

总共 4 次比较

int i = 0;
long s2 = 0;
do {
    i++;
    int j = 0;
    do {
        s2 += 1;
        j++;
    } while (j < L);
} while (i < L);

内循环：1 外循环：1

总共 2 次比较

生成的不同字节码

出于进一步调查的目的，我对您的课程稍作更改，但不会影响其工作方式。

public class Loops {
    final static int L = 100000; // number of iterations per loop

    public static void main(String[] args) {
        int round = 10;
        while (round-- > 0) {
            forLoop();
            doWhileLoop();
            whileLoop();
        }
    }

    private static long whileLoop() {
        int i = 0;
        long s3 = 0;
        while (i++ < L) {
            int j = 0;
            while (j++ < L) {
                s3 += 1;
            }
        }
        return s3;
    }

    private static long doWhileLoop() {
        int i = 0;
        long s2 = 0;
        do {
            int j = 0;
            do {
                s2 += 1;
            } while (++j < L);
        } while (++i < L);
        return s2;
    }

    private static long forLoop() {
        long s1 = 0;
        for (int i = 0; i < L; i++) {
            for (int j = 0; j < L; j++) {
                s1 += 1;
            }
        }
        return s1;
    }
}

然后编译它并调用javap -c -s -private -l Loop 来获取字节码。

首先是doWhileLoop的字节码。

   0:   iconst_0        // push the int value 0 onto the stack
   1:   istore_1        // store int value into variable 1 (i)
   2:   lconst_0        // push the long 0 onto the stack
   3:   lstore_2        // store a long value in a local variable 2 (s2)
   4:   iconst_0        // push the int value 0 onto the stack
   5:   istore  4   // store int value into variable 4 (j)
   7:   lload_2     // load a long value from a local variable 2 (i)
   8:   lconst_1        // push the long 1 onto the stack
   9:   ladd        // add two longs
   10:  lstore_2        // store a long value in a local variable 2 (i)
   11:  iinc    4, 1    // increment local variable 4 (j) by signed byte 1
   14:  iload   4   // load an int value from a local variable 4 (j)
   16:  iload_0     // load an int value from a local variable 0 (L)
   17:  if_icmplt   7   // if value1 is less than value2, branch to instruction at 7
   20:  iinc    1, 1    // increment local variable 1 (i) by signed byte 1
   23:  iload_1     // load an int value from a local variable 1 (i)
   24:  iload_0     // load an int value from a local variable 0 (L)
   25:  if_icmplt   4   // if value1 is less than value2, branch to instruction at 4
   28:  lload_2     // load a long value from a local variable 2 (s2)
   29:  lreturn     // return a long value

现在是whileLooP的字节码：

   0:   iconst_0        // push int value 0 onto the stack
   1:   istore_1        // store int value into variable 1 (i)
   2:   lconst_0        // push the long 0 onto the stack
   3:   lstore_2        // store a long value in a local variable 2 (s3)
   4:   goto        26
   7:   iconst_0        // push the int value 0 onto the stack
   8:   istore  4   // store int value into variable 4 (j)
   10:  goto        17
   13:  lload_2     // load a long value from a local variable 2 (s3)
   14:  lconst_1        // push the long 1 onto the stack
   15:  ladd        // add two longs
   16:  lstore_2        // store a long value in a local variable 2 (s3)
   17:  iload   4   // load an int value from a local variable 4 (j)
   19:  iinc    4, 1    // increment local variable 4 (j) by signed byte 1
   22:  iload_0     // load an int value from a local variable 0 (L)
   23:  if_icmplt   13  // if value1 is less than value2, branch to instruction at 13
   26:  iload_1     // load an int value from a local variable 1 (i)
   27:  iinc    1, 1    // increment local variable 1 by signed byte 1
   30:  iload_0     // load an int value from a local variable 0 (L)
   31:  if_icmplt   7   // if value1 is less than value2, branch to instruction at 7
   34:  lload_2     // load a long value from a local variable 2 (s3)
   35:  lreturn     // return a long value

为了使输出更具可读性，我附加了 cmets，描述了基于 ‪Java bytecode instruction listings 的每条指令的作用。

如果您仔细观察，您会发现这两个字节码之间有一个重要的区别。 while 循环（for 循环也是如此）在字节码末尾定义了 if 语句（if_icmplt 指令）。这意味着要检查第一个循环的退出条件，必须调用第 26 行的 goto，类似地，必须调用第二个循环的第 17 行的 goto。

以上字节码是在 Mac OS X 上使用 javac 1.6.0_45 生成的。

总结

我认为不同数量的比较加上在 while 和 for 循环字节码中存在 goto 指令是导致这些循环之间的性能差异的原因。

【讨论】：