编译器优化字节码案例答案

【问题标题】：A case of compiler optimized bytecode编译器优化字节码案例
【发布时间】：2014-03-28 08:46:57
【问题描述】：

我曾经在here评论过。

我建议应该使用a.length / 2 预先声明限制。一个人告诉他他相信编译器无论如何都会增强它

所以我尝试了。

public class Loop1 {
    public static void main(final String[] args) {
        final String[] a = {};
        for (int i = 0; i < a.length / 2; i++) {
        }
    }
}

public class Loop2 {
    public static void main(final String[] args) {
        final String[] a = {};
        final int l = a.length / 2;
        for (int i = 0; i < l; i++) {
        }
    }
}

当我用javap 打印这些类时，我得到了。

Loop1.javap.txt

...
     7: iload_2            <----- for loop?
     8: aload_1                 |
     9: arraylength        <----|---- a.length?
    10: iconst_2                |
    11: idiv                    |
    12: if_icmpge     21        |
    15: iinc          2, 1      |
    18: goto          7     -----
...

Loop2.javap.txt

...
     6: arraylength        <---- ---- a.length?
     7: iconst_2      
     8: idiv          
     9: istore_2      
    10: iconst_0                
    11: istore_3                
    12: iload_3            <----- for loop?
    13: iload_2                 |
    14: if_icmpge     23        |
    17: iinc          3, 1      |
    20: goto          12    -----
...

问题是我无法读取字节码。

编译器是否真的用 Loop1.java 优化了a.length /2 部分？

【问题讨论】：

不，它没有。不过，JIT 优化器可能会。
你的意思可能是 a 是 String[] - 不是 String
生成的字节码与 JIT 优化代码的速度完全无关。您需要检查生成的程序集。
HotSpot JIT 肯定是在做一个循环不变的运动。
除非例子是事后编辑的，否则说明根本没有优化，因为这个死代码并没有被删除。

标签： java optimization compiler-construction javap

【解决方案1】：

虽然实际答案（“不，它没有”）已经被接受，但我对这种情况很好奇，并认为这是一个深入了解 JIT 优化和热点反汇编世界的机会。

所以我创建了一个类

class Test03
{
    public static void main(String args[])
    {
        for (int i=1000; i<12000; i++)
        {
            int counter0 = callVar();
            System.out.println(counter0);
            int counter1 = callDiv();
            System.out.println(counter1);
        }
    }

    public static int callDiv()
    {
        int sum = 0;
        final int a[] = new int[0xCAFE];
        for (
            int i = 0;
            i < a.length / 2;
            i++)
        {
            sum+=a[i];
        }
        return sum;
    }

    public static int callVar()
    {
        int sum = 0;
        final int a[] = new int[0xCAFE];
        int x = a.length / 2;
        for (
            int i = 0;
            i < x;
            i++)
        {
            sum+=a[i];
        }
        return sum;
    }


}

并用

执行此操作

java" -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly Test03

（注意：为了完成这项工作，需要“HotSpot 反汇编程序”二进制文件。构建它（和预编译的）的说明可以在网上找到）。

这将创建一个巨大的hotspot.log 文件，其中包含有关热点编译器执行的优化的所有信息。

（提示：这个文件很难分析。但是，有人已经开始创建一个优秀的工具来分析热点日志文件：https://github.com/AdoptOpenJDK/jitwatch）

在这种情况下，我只对callDiv 和callVar 方法的汇编代码感兴趣。

callDiv 方法的程序集如下所示（没有理由真正阅读...）

Decoding compiled method 0x000000000269f890:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} &apos;callDiv&apos; &apos;()I&apos; in &apos;Test03&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000269f9e0: mov    %eax,-0x6000(%rsp)
  0x000000000269f9e7: push   %rbp
  0x000000000269f9e8: sub    $0x10,%rsp         ;*synchronization entry
                                                ; - Test03::callDiv@-1 (line 17)
  0x000000000269f9ec: mov    0x60(%r15),%r8
  0x000000000269f9f0: mov    %r8,%r10
  0x000000000269f9f3: add    $0x32c08,%r10
  0x000000000269f9fa: cmp    0x70(%r15),%r10
  0x000000000269f9fe: jae    0x000000000269fae5
  0x000000000269fa04: mov    %r10,0x60(%r15)
  0x000000000269fa08: prefetchnta 0xc0(%r10)
  0x000000000269fa10: movq   $0x1,(%r8)
  0x000000000269fa17: prefetchnta 0x100(%r10)
  0x000000000269fa1f: movl   $0xef5c0232,0x8(%r8)  ;   {oop({type array int})}
  0x000000000269fa27: prefetchnta 0x140(%r10)
  0x000000000269fa2f: movl   $0xcafe,0xc(%r8)
  0x000000000269fa37: prefetchnta 0x180(%r10)
  0x000000000269fa3f: mov    %r8,%rdi
  0x000000000269fa42: add    $0x10,%rdi
  0x000000000269fa46: mov    $0x657f,%ecx
  0x000000000269fa4b: xor    %eax,%eax
  0x000000000269fa4d: rep stos %rax,%es:(%rdi)  ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
  0x000000000269fa50: xor    %eax,%eax
  0x000000000269fa52: mov    $0x1,%r11d
  0x000000000269fa58: nopl   0x0(%rax,%rax,1)   ;*iload_0
                                                ; - Test03::callDiv@17 (line 24)
  0x000000000269fa60: add    0x10(%r8,%r11,4),%eax
  0x000000000269fa65: add    0x14(%r8,%r11,4),%eax
  0x000000000269fa6a: add    0x18(%r8,%r11,4),%eax
  0x000000000269fa6f: add    0x1c(%r8,%r11,4),%eax
  0x000000000269fa74: add    0x20(%r8,%r11,4),%eax
  0x000000000269fa79: add    0x24(%r8,%r11,4),%eax
  0x000000000269fa7e: add    0x28(%r8,%r11,4),%eax
  0x000000000269fa83: add    0x2c(%r8,%r11,4),%eax
  0x000000000269fa88: add    0x30(%r8,%r11,4),%eax
  0x000000000269fa8d: add    0x34(%r8,%r11,4),%eax
  0x000000000269fa92: add    0x38(%r8,%r11,4),%eax
  0x000000000269fa97: add    0x3c(%r8,%r11,4),%eax
  0x000000000269fa9c: add    0x40(%r8,%r11,4),%eax
  0x000000000269faa1: add    0x44(%r8,%r11,4),%eax
  0x000000000269faa6: add    0x48(%r8,%r11,4),%eax
  0x000000000269faab: add    0x4c(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callDiv@21 (line 24)
  0x000000000269fab0: add    $0x10,%r11d        ;*iinc
                                                ; - Test03::callDiv@23 (line 22)
  0x000000000269fab4: cmp    $0x6570,%r11d
  0x000000000269fabb: jl     0x000000000269fa60  ;*if_icmpge
                                                ; - Test03::callDiv@14 (line 21)
  0x000000000269fabd: cmp    $0x657f,%r11d
  0x000000000269fac4: jge    0x000000000269fad9
  0x000000000269fac6: xchg   %ax,%ax            ;*iload_0
                                                ; - Test03::callDiv@17 (line 24)
  0x000000000269fac8: add    0x10(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callDiv@21 (line 24)
  0x000000000269facd: inc    %r11d              ;*iinc
                                                ; - Test03::callDiv@23 (line 22)
  0x000000000269fad0: cmp    $0x657f,%r11d
  0x000000000269fad7: jl     0x000000000269fac8
  0x000000000269fad9: add    $0x10,%rsp
  0x000000000269fadd: pop    %rbp
  0x000000000269fade: test   %eax,-0x245fae4(%rip)        # 0x0000000000240000
                                                ;   {poll_return}
  0x000000000269fae4: retq   
  0x000000000269fae5: mov    $0xcafe,%r8d
  0x000000000269faeb: movabs $0x77ae01190,%rdx  ;   {oop({type array int})}
  0x000000000269faf5: xchg   %ax,%ax
  0x000000000269faf7: callq  0x000000000269e720  ; OopMap{off=284}
                                                ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
                                                ;   {runtime_call}
  0x000000000269fafc: mov    %rax,%r8
  0x000000000269faff: jmpq   0x000000000269fa50  ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
  0x000000000269fb04: mov    %rax,%rdx
  0x000000000269fb07: add    $0x10,%rsp
  0x000000000269fb0b: pop    %rbp
  0x000000000269fb0c: jmpq   0x00000000026a1760  ;   {runtime_call}
  0x000000000269fb11: hlt    
  0x000000000269fb12: hlt    
  0x000000000269fb13: hlt    
  0x000000000269fb14: hlt    
  0x000000000269fb15: hlt    
  0x000000000269fb16: hlt    
  0x000000000269fb17: hlt    
  0x000000000269fb18: hlt    
  0x000000000269fb19: hlt    
  0x000000000269fb1a: hlt    
  0x000000000269fb1b: hlt    
  0x000000000269fb1c: hlt    
  0x000000000269fb1d: hlt    
  0x000000000269fb1e: hlt    
  0x000000000269fb1f: hlt    
[Exception Handler]
[Stub Code]
  0x000000000269fb20: jmpq   0x000000000269e8e0  ;   {no_reloc}
[Deopt Handler Code]
  0x000000000269fb25: callq  0x000000000269fb2a
  0x000000000269fb2a: subq   $0x5,(%rsp)
  0x000000000269fb2f: jmpq   0x0000000002678d00  ;   {runtime_call}
  0x000000000269fb34: hlt    
  0x000000000269fb35: hlt    
  0x000000000269fb36: hlt    
  0x000000000269fb37: hlt    
<nmethod compile_id='1' compiler='C2' entry='0x000000000269f9e0' size='1000' address='0x000000000269f890' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callDiv ()I' bytes='31' count='5000' backedge_count='5000' iicount='10' stamp='0.736'/>
<writer thread='1316'/>

callVar 方法的程序集如下所示（没有理由真正阅读...）

Decoding compiled method 0x000000000269f490:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} &apos;callVar&apos; &apos;()I&apos; in &apos;Test03&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000269f5e0: mov    %eax,-0x6000(%rsp)
  0x000000000269f5e7: push   %rbp
  0x000000000269f5e8: sub    $0x10,%rsp         ;*synchronization entry
                                                ; - Test03::callVar@-1 (line 31)
  0x000000000269f5ec: mov    0x60(%r15),%r8
  0x000000000269f5f0: mov    %r8,%r10
  0x000000000269f5f3: add    $0x32c08,%r10
  0x000000000269f5fa: cmp    0x70(%r15),%r10
  0x000000000269f5fe: jae    0x000000000269f6e5
  0x000000000269f604: mov    %r10,0x60(%r15)
  0x000000000269f608: prefetchnta 0xc0(%r10)
  0x000000000269f610: movq   $0x1,(%r8)
  0x000000000269f617: prefetchnta 0x100(%r10)
  0x000000000269f61f: movl   $0xef5c0232,0x8(%r8)  ;   {oop({type array int})}
  0x000000000269f627: prefetchnta 0x140(%r10)
  0x000000000269f62f: movl   $0xcafe,0xc(%r8)
  0x000000000269f637: prefetchnta 0x180(%r10)
  0x000000000269f63f: mov    %r8,%rdi
  0x000000000269f642: add    $0x10,%rdi
  0x000000000269f646: mov    $0x657f,%ecx
  0x000000000269f64b: xor    %eax,%eax
  0x000000000269f64d: rep stos %rax,%es:(%rdi)  ;*newarray
                                                ; - Test03::callVar@4 (line 32)
  0x000000000269f650: xor    %eax,%eax
  0x000000000269f652: mov    $0x1,%r11d
  0x000000000269f658: nopl   0x0(%rax,%rax,1)   ;*iload_0
                                                ; - Test03::callVar@19 (line 39)
  0x000000000269f660: add    0x10(%r8,%r11,4),%eax
  0x000000000269f665: add    0x14(%r8,%r11,4),%eax
  0x000000000269f66a: add    0x18(%r8,%r11,4),%eax
  0x000000000269f66f: add    0x1c(%r8,%r11,4),%eax
  0x000000000269f674: add    0x20(%r8,%r11,4),%eax
  0x000000000269f679: add    0x24(%r8,%r11,4),%eax
  0x000000000269f67e: add    0x28(%r8,%r11,4),%eax
  0x000000000269f683: add    0x2c(%r8,%r11,4),%eax
  0x000000000269f688: add    0x30(%r8,%r11,4),%eax
  0x000000000269f68d: add    0x34(%r8,%r11,4),%eax
  0x000000000269f692: add    0x38(%r8,%r11,4),%eax
  0x000000000269f697: add    0x3c(%r8,%r11,4),%eax
  0x000000000269f69c: add    0x40(%r8,%r11,4),%eax
  0x000000000269f6a1: add    0x44(%r8,%r11,4),%eax
  0x000000000269f6a6: add    0x48(%r8,%r11,4),%eax
  0x000000000269f6ab: add    0x4c(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callVar@23 (line 39)
  0x000000000269f6b0: add    $0x10,%r11d        ;*iinc
                                                ; - Test03::callVar@25 (line 37)
  0x000000000269f6b4: cmp    $0x6570,%r11d
  0x000000000269f6bb: jl     0x000000000269f660  ;*if_icmpge
                                                ; - Test03::callVar@16 (line 36)
  0x000000000269f6bd: cmp    $0x657f,%r11d
  0x000000000269f6c4: jge    0x000000000269f6d9
  0x000000000269f6c6: xchg   %ax,%ax            ;*iload_0
                                                ; - Test03::callVar@19 (line 39)
  0x000000000269f6c8: add    0x10(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callVar@23 (line 39)
  0x000000000269f6cd: inc    %r11d              ;*iinc
                                                ; - Test03::callVar@25 (line 37)
  0x000000000269f6d0: cmp    $0x657f,%r11d
  0x000000000269f6d7: jl     0x000000000269f6c8
  0x000000000269f6d9: add    $0x10,%rsp
  0x000000000269f6dd: pop    %rbp
  0x000000000269f6de: test   %eax,-0x245f6e4(%rip)        # 0x0000000000240000
                                                ;   {poll_return}
  0x000000000269f6e4: retq   
  0x000000000269f6e5: mov    $0xcafe,%r8d
  0x000000000269f6eb: movabs $0x77ae01190,%rdx  ;   {oop({type array int})}
  0x000000000269f6f5: xchg   %ax,%ax
  0x000000000269f6f7: callq  0x000000000269e720  ; OopMap{off=284}
                                                ;*newarray
                                                ; - Test03::callVar@4 (line 32)
                                                ;   {runtime_call}
  0x000000000269f6fc: mov    %rax,%r8
  0x000000000269f6ff: jmpq   0x000000000269f650  ;*newarray
                                                ; - Test03::callVar@4 (line 32)
  0x000000000269f704: mov    %rax,%rdx
  0x000000000269f707: add    $0x10,%rsp
  0x000000000269f70b: pop    %rbp
  0x000000000269f70c: jmpq   0x00000000026a1760  ;   {runtime_call}
  0x000000000269f711: hlt    
  0x000000000269f712: hlt    
  0x000000000269f713: hlt    
  0x000000000269f714: hlt    
  0x000000000269f715: hlt    
  0x000000000269f716: hlt    
  0x000000000269f717: hlt    
  0x000000000269f718: hlt    
  0x000000000269f719: hlt    
  0x000000000269f71a: hlt    
  0x000000000269f71b: hlt    
  0x000000000269f71c: hlt    
  0x000000000269f71d: hlt    
  0x000000000269f71e: hlt    
  0x000000000269f71f: hlt    
[Exception Handler]
[Stub Code]
  0x000000000269f720: jmpq   0x000000000269e8e0  ;   {no_reloc}
[Deopt Handler Code]
  0x000000000269f725: callq  0x000000000269f72a
  0x000000000269f72a: subq   $0x5,(%rsp)
  0x000000000269f72f: jmpq   0x0000000002678d00  ;   {runtime_call}
  0x000000000269f734: hlt    
  0x000000000269f735: hlt    
  0x000000000269f736: hlt    
  0x000000000269f737: hlt    
<nmethod compile_id='2' compiler='C2' entry='0x000000000269f5e0' size='1000' address='0x000000000269f490' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callVar ()I' bytes='33' count='5000' backedge_count='5000' iicount='11' stamp='0.832'/>
<writer thread='10020'/>

我从来没有真正熟悉过 X86 汇编器（除了一些自学的基础知识）。但是，例如，JIT 似乎将循环展开为 16 个元素的块 - 至少，这是我认为在 16 个add 指令中看到的。

但重要是：为两种方法生成的指令相同。因此，正如预期的那样，JIT 确实优化了划分。

当然，这个例子有点无聊：数组有固定的长度，所以这个优化特别容易。（嗯......不是那么“容易”，我可以编写一个能够做这样的事情的 JITed VM，但是......你知道我的意思）。我还尝试通过更改方法使它们更有趣，以便它们接受数组长度的参数：

public static int callDiv(int arrayLength)
{
    final int a[] = new int[arrayLength];
    ...
}

但在这种情况下，两种方法变体之间至少存在轻微差异。尽管我很确定在这种情况下该部门也已被优化掉，但我不确定完全，所以我把最后的决定权留给那里的汇编专家......

【讨论】：

这指向了“写惯用的Java”的一般规则。编译器和 JIT 旨在寻找 Java 惯用语并对其进行优化。试图猜测编译器、JIT 和优化器可能会：a) 什么都不做 b) 使情况变得更糟。只写Java。编译器和 JIT 比你有更多的信息。
HotSpot JIT 和 x86 处理器只是可以运行此代码的许多其他可能环境之一。有些人可能这么聪明，有些人可能没有。无论如何，这是非常简单的手动优化，在性能很重要的情况下很容易做到。
世界上很少有人能真正在这里给出深刻的建议。其中之一是布赖恩·戈茨。他的建议是：“编写愚蠢的代码”（oracle.com/technetwork/articles/javase/devinsight-1-139780.html）。这与 Will Hartung 所说的一致。但我同意您不应该完全忽略您编写的代码可能对性能产生的影响，并且当您确定了应该只在某个目标平台上运行的时间关键部分时，您可以尝试一些（孤立的！）手动优化，只需看看你可以在性能方面挤出什么。

【解决方案2】：

不，它没有。

在第一种情况下，它在每次迭代中计算数组长度。为了优化它，编译器至少需要确保数组的长度不会被循环内的任何东西改变。从技术上讲，数组是“最终的”，它的长度不能改变，但使用不依赖优化器的语法 #2 仍然是一个好习惯。

【讨论】：