Java 1.8 安全点超时答案

【问题标题】：Java 1.8 safepoint timeoutJava 1.8 安全点超时
【发布时间】：2015-08-18 01:45:58
【问题描述】：

我似乎遇到了一种情况，即 JVM 在几个小时后无限期地试图到达安全点。但是，如果我使用 -F 选项执行 jstack，它似乎会摆脱等待并继续执行。

jdk1.8.0_45/bin/jstack -F 39924 >a.out

我在 Centos 上使用 jdk1.8.0_45

我的问题是：

i) 当从 jstack 发送中断时，JVM 似乎可以摆脱安全点无限期等待。没有jstack怎么会出不来。是否有一些 jvm 选项可以用来避免无限期等待。

ii) 我能否获得导致问题的线程的更明确的线程转储。安全点日志的输出似乎不准确。

我使用的选项是：。

-server
-XX:+AggressiveOpts
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MixedGCLiveThresholdPercent=85
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1HeapWastePercent=5 
-XX:MaxGCPauseMillis=1000
-XX:G1HeapRegionSize=4M
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UnlockExperimentalVMOptions
-XX:G1LogLevel=finest
-Xmx6000m
-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=999
-XX:+SafepointTimeout
-XX:+UnlockDiagnosticVMOptions
-XX:SafepointTimeoutDelay=20000
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1

安全点日志

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.115: G1IncCollectionPause             [     170          0              0    ]      [     0     0     0     0     8    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.125: RevokeBias                       [     170          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.127: RevokeBias                       [     170          1              1    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.131: RevokeBias                       [     170          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.955: RevokeBias                       [     169          0              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17772.160: BulkRevokeBias                   [     171          0              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17772.352: RevokeBias                       [     170          1              3    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17773.596: RevokeBias                       [     169          0              1    ]      [     0     0     0     0     0    ]  0

 # SafepointSynchronize::begin: Timeout detected:
 # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
 # SafepointSynchronize::begin: Threads which did not reach the safepoint:
 # "Thread-14" #115 prio=5 os_prio=0 tid=0x00007f20c8029000 nid=0x9cd0 runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE
 # SafepointSynchronize::begin: (End of list)

在 jstack 中断之后，这是我从安全点日志中看到的

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17779.826: G1IncCollectionPause             [     169          1              1    ]      [3315603     03315603     0     8    ]  1

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias                       [     169          2             13    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias                       [     169          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.441: RevokeBias                       [     184          3              4    ]      [     0     0     3     0     1    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.447: RevokeBias                       [     190          0              2    ]      [     0     0     4     0     2    ]  0

【问题讨论】：

你有一些可以重现问题的示例代码吗？另外，错误... 愚蠢的问题，但是 is 是什么安全点呢？一个总结它的链接会很有帮助——这个，或者一个简短的解释
另外，为什么会有这么多的 JVM 选项？这只是为了玩耍还是尝试解决实际问题？如果是，有什么问题？
这总结了什么是安全点：blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html。一些选项用于垃圾收集器，其他选项用于调试目的。该代码太复杂而无法发布，但显然 JVM 中似乎存在一些固有的东西，其中中断以某种方式导致 JVM 正常运行。
当您运行jstack -F 时，Thread-14 会显示什么堆栈？这可能会有所帮助。我对这个问题的回答：stackoverflow.com/questions/30393470/… 和我链接到的那个可能会给你一些额外的指示。
你的 JVM 选项有点像在不理解的情况下被复制粘贴在一起。

标签： java linux garbage-collection centos

【解决方案1】：

由于您可以通过中断 VM 来解决问题，并且您在 CentOS 上，所以问题让我想起了 this kernel bug。

线程列出了以下受影响的版本（假设是标准内核）：

RHEL 6（以及 CentOS 6 和 SL 6）：6.0-6.5 都很好。 6.6是坏的。 6.6.z 很好。

RHEL 7（以及 CentOS 7 和 SL 7）：7.1 不好。截至昨天。似乎还没有 7.x 修复。

RHEL 5（和 CentOS 5，以及 SL 5)：所有版本都很好（包括 5.11）。

【讨论】：