Java 8u40 Math.round() 非常慢答案

【问题标题】：Java 8u40 Math.round() very slowJava 8u40 Math.round() 非常慢
【发布时间】：2015-03-14 16:45:06
【问题描述】：

我有一个用 Java 8 编写的相当简单的爱好项目，它在其中一种操作模式中广泛使用重复的 Math.round() 调用。例如，一种这样的模式会产生 4 个线程并通过 ExecutorService 将 48 个可运行任务排入队列，每个任务都运行类似于以下代码块 2^31 次：

int3 = Math.round(float1 + float2);
int3 = Math.round(float1 * float2);
int3 = Math.round(float1 / float2);

实际情况并非如此（涉及数组和嵌套循环），但您明白了。无论如何，在 Java 8u40 之前，类似于上面的代码可以在 AMD A10-7700k 上在大约 13 秒内完成约 1030 亿个指令块的完整运行。使用 Java 8u40 大约需要 260 秒才能完成相同的操作。代码没有变化，什么都没有，只是 Java 更新。

有没有其他人注意到 Math.round() 变得越来越慢，尤其是在重复使用时？就好像 JVM 在它不再做之前做了某种优化。也许它在 8u40 之前使用 SIMD 而现在不是？

编辑：我已经完成了我在 MVCE 的第二次尝试。您可以在此处下载第一次尝试：

https://www.dropbox.com/s/rm2ftcv8y6ye1bi/MathRoundMVCE.zip?dl=0

第二次尝试如下。我的第一次尝试已经从这篇文章中删除，因为它被认为太长了，并且容易被 JVM 删除死代码优化（显然在 8u40 中发生的情况较少）。

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class MathRoundMVCE
{           
    static long grandtotal = 0;
    static long sumtotal = 0;

    static float[] float4 = new float[128];
    static float[] float5 = new float[128];
    static int[] int6 = new int[128];
    static int[] int7 = new int[128];
    static int[] int8 = new int[128];
    static long[] longarray = new long[480];

    final static int mil = 1000000;

    public static void main(String[] args)
    {       
        initmainarrays();
        OmniCode omni = new OmniCode();
        grandtotal = omni.runloops() / mil;
        System.out.println("Total sum of operations is " + sumtotal);
        System.out.println("Total execution time is " + grandtotal + " milliseconds");
    }   

    public static long siftarray(long[] larray)
    {
        long topnum = 0;
        long tempnum = 0;
        for (short i = 0; i < larray.length; i++)
        {
            tempnum = larray[i];
            if (tempnum > 0)
            {
                topnum += tempnum;
            }
        }
        topnum = topnum / Runtime.getRuntime().availableProcessors();
        return topnum;
    }

    public static void initmainarrays()
    {
        int k = 0;

        do
        {           
            float4[k] = (float)(Math.random() * 12) + 1f;
            float5[k] = (float)(Math.random() * 12) + 1f;
            int6[k] = 0;

            k++;
        }
        while (k < 128);        
    }       
}

class OmniCode extends Thread
{           
    volatile long totaltime = 0;
    final int standard = 16777216;
    final int warmup = 200000;

    byte threads = 0;

    public long runloops()
    {
        this.setPriority(MIN_PRIORITY);

        threads = (byte)Runtime.getRuntime().availableProcessors();
        ExecutorService executor = Executors.newFixedThreadPool(threads);

        for (short j = 0; j < 48; j++)
        {           
            executor.execute(new RoundFloatToIntAlternate(warmup, (byte)j));
        }

        executor.shutdown();

        while (!executor.isTerminated())
        {
            try
            {
                Thread.sleep(100);
            } 
            catch (InterruptedException e)
            {
                //Do nothing                
            }
        }

        executor = Executors.newFixedThreadPool(threads);

        for (short j = 0; j < 48; j++)
        {           
            executor.execute(new RoundFloatToIntAlternate(standard, (byte)j));          
        }

        executor.shutdown();

        while (!executor.isTerminated())
        {
            try
            {
                Thread.sleep(100);
            } 
            catch (InterruptedException e)
            {
                //Do nothing                
            }
        }

        totaltime = MathRoundMVCE.siftarray(MathRoundMVCE.longarray);   

        executor = null;
        Runtime.getRuntime().gc();
        return totaltime;
    }
}

class RoundFloatToIntAlternate extends Thread
{       
    int i = 0;
    int j = 0;
    int int3 = 0;
    int iterations = 0;
    byte thread = 0;

    public RoundFloatToIntAlternate(int cycles, byte threadnumber)
    {
        iterations = cycles;
        thread = threadnumber;
    }

    public void run()
    {
        this.setPriority(9);
        MathRoundMVCE.longarray[this.thread] = 0;
        mainloop();
        blankloop();    

    }

    public void blankloop()
    {
        j = 0;
        long timer = 0;
        long totaltimer = 0;

        do
        {   
            timer = System.nanoTime();
            i = 0;

            do
            {
                i++;
            }
            while (i < 128);
            totaltimer += System.nanoTime() - timer;            

            j++;
        }
        while (j < iterations);         

        MathRoundMVCE.longarray[this.thread] -= totaltimer;
    }

    public void mainloop()
    {
        j = 0;
        long timer = 0; 
        long totaltimer = 0;
        long localsum = 0;

        int[] int6 = new int[128];
        int[] int7 = new int[128];
        int[] int8 = new int[128];

        do
        {   
            timer = System.nanoTime();
            i = 0;

            do
            {
                int6[i] = Math.round(MathRoundMVCE.float4[i] + MathRoundMVCE.float5[i]);
                int7[i] = Math.round(MathRoundMVCE.float4[i] * MathRoundMVCE.float5[i]);
                int8[i] = Math.round(MathRoundMVCE.float4[i] / MathRoundMVCE.float5[i]);

                i++;
            }
            while (i < 128);
            totaltimer += System.nanoTime() - timer;

            for(short z = 0; z < 128; z++)
            {
                localsum += int6[z] + int7[z] + int8[z];
            }       

            j++;
        }
        while (j < iterations);         

        MathRoundMVCE.longarray[this.thread] += totaltimer;
        MathRoundMVCE.sumtotal = localsum;
    }
}

长话短说，这段代码在 8u25 和 8u40 中的表现大致相同。如您所见，我现在将所有计算的结果记录到数组中，然后将循环的定时部分之外的这些数组求和到一个局部变量，然后在外部循环结束时将其写入一个静态变量。

8u25以下：总执行时间为261545毫秒

8u40 下：总执行时间为 266890 毫秒

测试条件与之前相同。因此，似乎 8u25 和 8u31 正在执行 8u40 停止执行的死代码删除，导致代码在 8u40 中“减速”。这并不能解释所有突然出现的奇怪小东西，但这似乎是其中的大部分。作为额外的奖励，这里提供的建议和答案给了我灵感来改进我的爱好项目的其他部分，对此我非常感激。谢谢大家！

【问题讨论】：

能否提供一个MCVE：stackoverflow.com/help/mcve
我已经用 java7 和 java8 运行了这些方法 10K、100K、1M 和 10M 次，并且得到了非常相似的结果。绝对需要那个 MCVE
好的，开始重写程序，只关注 Math.round 部分。我会尽快解决的。如果可以的话，我会回滚以更新我的 Windows 分区上的 30。或者至少，我可以试试。 . .无论如何，有问题的代码在这里：dropbox.com/s/53zuk227qr4wdpn/mathtestersource03112015.zip?dl=0。有问题的方法在 RoundFloatToInt、RoundFloatToIntNoDiv、RoundFloatToIntAlternate、RoundFloatToIntNoDivAlternate 和 OmniLoop 类中（OmniLoop 中的方法是 roundfloattointloop/roundfloattointloopalternate）。
Math.sqrt() 在 8u40 中也显示出新的行为（阅读：更差的性能），但它并没有那么糟糕。我也会尝试将其包含在 MVCE 中。
好的，你要求的，我会在编辑中添加源代码。

标签： java jvm jit

【解决方案1】：

Casual benchmarking：你对 A 进行基准测试，但实际上是在测量 B，并且得出结论，您已经测量了 C。

现代 JVM 过于复杂，并且会进行各种优化。如果您尝试测量一小段代码，那么在没有非常非常详细地了解 JVM 正在做什么的情况下正确地执行它真的很复杂。许多基准测试的罪魁祸首是死代码消除：编译器足够聪明，可以推断出一些计算是多余的，并完全消除它们。请阅读以下幻灯片http://shipilev.net/talks/jvmls-July2014-benchmarking.pdf。为了“修复”亚当的微基准测试（我仍然无法理解它在测量什么，而且这个“修复”没有考虑预热、OSR 和许多其他微基准测试陷阱）我们必须将计算结果打印到系统输出：

    int result = 0;
    long t0 = System.currentTimeMillis();
    for (int i = 0; i < 1e9; i++) {
        result += Math.round((float) i / (float) (i + 1));
    }
    long t1 = System.currentTimeMillis();
    System.out.println("result = " + result);
    System.out.println(String.format("%s, Math.round(float), %.1f ms", System.getProperty("java.version"), (t1 - t0)/1f));

结果：

result = 999999999
1.8.0_25, Math.round(float), 5251.0 ms

result = 999999999
1.8.0_40, Math.round(float), 3903.0 ms

原始 MVCE 示例的相同“修复”

It took 401772 milliseconds to complete edu.jvm.runtime.RoundFloatToInt. <==== 1.8.0_40

It took 410767 milliseconds to complete edu.jvm.runtime.RoundFloatToInt. <==== 1.8.0_25

如果你想测量 Math#round 的实际成本，你应该写这样的东西（基于jmh）

package org.openjdk.jmh.samples;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.VerboseMode;

import java.util.Random;
import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS)
public class RoundBench {

    float[] floats;
    int i;

    @Setup
    public void initI() {
        Random random = new Random(0xDEAD_BEEF);
        floats = new float[8096];
        for (int i = 0; i < floats.length; i++) {
            floats[i] = random.nextFloat();
        }
    }

    @Benchmark
    public float baseline() {
        i++;
        i = i & 0xFFFFFF00;
        return floats[i];
    }

    @Benchmark
    public int round() {
        i++;
        i = i & 0xFFFFFF00;
        return Math.round(floats[i]);
    }

    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(RoundBench.class.getName())
                .build();
        new Runner(options).run();
    }
}

我的结果是：

1.8.0_25
Benchmark            Mode  Cnt  Score   Error  Units
RoundBench.baseline  avgt    6  2.565 ± 0.028  ns/op
RoundBench.round     avgt    6  4.459 ± 0.065  ns/op

1.8.0_40 
Benchmark            Mode  Cnt  Score   Error  Units
RoundBench.baseline  avgt    6  2.589 ± 0.045  ns/op
RoundBench.round     avgt    6  4.588 ± 0.182  ns/op

为了找到问题的根本原因，您可以使用https://github.com/AdoptOpenJDK/jitwatch/。为了节省时间，我可以说 Math#round 的 JITted 代码的大小在 8.0_40 中增加了。小方法几乎不会引起注意，但在大方法的情况下，过长的机器代码表会污染指令缓存。

【讨论】：

并不是这个问题的真正“答案”，但至少指出了到目前为止的测试有什么问题。死代码消除（事实上，对于 main 方法中的单个循环，大多数优化根本无法完成......）很重要，您的测量结果与我对时间的观察大致一致，直到现在.我仍然坚持使用 MCVE（与目前可用的 LNCNVE 相比）......
过去我在这个爱好项目中遇到过很多关于死代码消除/NOOP 的问题。通常执行时间会减少到零或接近零，因为 JVM 对整个包含方法采用焦土方法。我能够在不使用 volatile 的情况下避免这种情况的唯一方法是使用根本不展开或仅部分展开的循环。完全展开的 128 条指令的一个循环将导致 JVM 丢弃所有内容。到目前为止，我还没有看到 JVM 选择性地消除部分方法或循环，但也许我现在看到了。
我观察到的死代码消除/NOOP 的另一个明显迹象与 CPU 工作温度有关。如果 JVM 正在抛出死代码，CPU 在程序执行期间仍会显示 100% 的利用率，但 CPU 运行温度几乎不会变化。不同类型的操作会在 100% 利用率下导致不同的操作温度，但 NOOP 几乎没有记录。这是有道理的，因为除了在那个时候迭代一个空循环（如果那样的话）之外，CPU 并没有真正做任何事情。在时间允许的情况下，我将尝试重写将结果存储在数组中。更短的东西。

【解决方案2】：

基于OP的MVCE

可能会进一步简化
将int3 = 语句更改为int3 += 以减少删除死代码的机会。 int3 = 从 8u31 到 8u40 的差异是 3 倍慢。使用 int3 += 的差异仅慢 15%。
打印结果以进一步减少死代码删除优化的机会

代码

public class MathTime {
    static float[][] float1 = new float[8][16];
    static float[][] float2 = new float[8][16];

    public static void main(String[] args) {
        for (int j = 0; j < 8; j++) {
            for (int k = 0; k < 16; k++) {
                float1[j][k] = (float) (j + k);
                float2[j][k] = (float) (j + k);
            }
        }
        new Test().run();
    }

    private static class Test {
        int int3;

        public void run() {
            for (String test : new String[] { "warmup", "real" }) {

                long t0 = System.nanoTime();

                for (int count = 0; count < 1e7; count++) {
                    int i = count % 8;
                    int3 += Math.round(float1[i][0] + float2[i][0]);
                    int3 += Math.round(float1[i][1] + float2[i][1]);
                    int3 += Math.round(float1[i][2] + float2[i][2]);
                    int3 += Math.round(float1[i][3] + float2[i][3]);
                    int3 += Math.round(float1[i][4] + float2[i][4]);
                    int3 += Math.round(float1[i][5] + float2[i][5]);
                    int3 += Math.round(float1[i][6] + float2[i][6]);
                    int3 += Math.round(float1[i][7] + float2[i][7]);
                    int3 += Math.round(float1[i][8] + float2[i][8]);
                    int3 += Math.round(float1[i][9] + float2[i][9]);
                    int3 += Math.round(float1[i][10] + float2[i][10]);
                    int3 += Math.round(float1[i][11] + float2[i][11]);
                    int3 += Math.round(float1[i][12] + float2[i][12]);
                    int3 += Math.round(float1[i][13] + float2[i][13]);
                    int3 += Math.round(float1[i][14] + float2[i][14]);
                    int3 += Math.round(float1[i][15] + float2[i][15]);

                    int3 += Math.round(float1[i][0] * float2[i][0]);
                    int3 += Math.round(float1[i][1] * float2[i][1]);
                    int3 += Math.round(float1[i][2] * float2[i][2]);
                    int3 += Math.round(float1[i][3] * float2[i][3]);
                    int3 += Math.round(float1[i][4] * float2[i][4]);
                    int3 += Math.round(float1[i][5] * float2[i][5]);
                    int3 += Math.round(float1[i][6] * float2[i][6]);
                    int3 += Math.round(float1[i][7] * float2[i][7]);
                    int3 += Math.round(float1[i][8] * float2[i][8]);
                    int3 += Math.round(float1[i][9] * float2[i][9]);
                    int3 += Math.round(float1[i][10] * float2[i][10]);
                    int3 += Math.round(float1[i][11] * float2[i][11]);
                    int3 += Math.round(float1[i][12] * float2[i][12]);
                    int3 += Math.round(float1[i][13] * float2[i][13]);
                    int3 += Math.round(float1[i][14] * float2[i][14]);
                    int3 += Math.round(float1[i][15] * float2[i][15]);

                    int3 += Math.round(float1[i][0] / float2[i][0]);
                    int3 += Math.round(float1[i][1] / float2[i][1]);
                    int3 += Math.round(float1[i][2] / float2[i][2]);
                    int3 += Math.round(float1[i][3] / float2[i][3]);
                    int3 += Math.round(float1[i][4] / float2[i][4]);
                    int3 += Math.round(float1[i][5] / float2[i][5]);
                    int3 += Math.round(float1[i][6] / float2[i][6]);
                    int3 += Math.round(float1[i][7] / float2[i][7]);
                    int3 += Math.round(float1[i][8] / float2[i][8]);
                    int3 += Math.round(float1[i][9] / float2[i][9]);
                    int3 += Math.round(float1[i][10] / float2[i][10]);
                    int3 += Math.round(float1[i][11] / float2[i][11]);
                    int3 += Math.round(float1[i][12] / float2[i][12]);
                    int3 += Math.round(float1[i][13] / float2[i][13]);
                    int3 += Math.round(float1[i][14] / float2[i][14]);
                    int3 += Math.round(float1[i][15] / float2[i][15]);

                }
                long t1 = System.nanoTime();
                System.out.println(int3);
                System.out.println(String.format("%s, Math.round(float), %s, %.1f ms", System.getProperty("java.version"), test, (t1 - t0) / 1e6));
            }
        }
    }
}

结果

adam@brimstone:~$ ./jdk1.8.0_40/bin/javac MathTime.java;./jdk1.8.0_40/bin/java -cp . MathTime 
1.8.0_40, Math.round(float), warmup, 6846.4 ms
1.8.0_40, Math.round(float), real, 6058.6 ms
adam@brimstone:~$ ./jdk1.8.0_31/bin/javac MathTime.java;./jdk1.8.0_31/bin/java -cp . MathTime 
1.8.0_31, Math.round(float), warmup, 5717.9 ms
1.8.0_31, Math.round(float), real, 5282.7 ms
adam@brimstone:~$ ./jdk1.8.0_25/bin/javac MathTime.java;./jdk1.8.0_25/bin/java -cp . MathTime 
1.8.0_25, Math.round(float), warmup, 5702.4 ms
1.8.0_25, Math.round(float), real, 5262.2 ms

观察

对于 Math.round(float) 的琐碎用途，我发现在我的平台 (Linux x86_64) 上的性能没有差异。只有基准测试有所不同，我之前幼稚且不正确的基准测试仅暴露了优化行为的差异，正如 Ivan 的回答和 Marco13 的 cmets 所指出的那样。
8u40 在死代码消除方面不如以前的版本积极，这意味着在某些极端情况下执行的代码更多，因此速度更慢。
8u40 需要稍长的时间来预热，但一旦“到达”，就会更快。

来源分析

令人惊讶的是 Math.round(float) 是纯 Java 实现而不是原生的，8u31 和 8u40 的代码是相同的。

diff  jdk1.8.0_31/src/java/lang/Math.java jdk1.8.0_40/src/java/lang/Math.java
-no differences-

public static int round(float a) {
    int intBits = Float.floatToRawIntBits(a);
    int biasedExp = (intBits & FloatConsts.EXP_BIT_MASK)
            >> (FloatConsts.SIGNIFICAND_WIDTH - 1);
    int shift = (FloatConsts.SIGNIFICAND_WIDTH - 2
            + FloatConsts.EXP_BIAS) - biasedExp;
    if ((shift & -32) == 0) { // shift >= 0 && shift < 32
        // a is a finite number such that pow(2,-32) <= ulp(a) < 1
        int r = ((intBits & FloatConsts.SIGNIF_BIT_MASK)
                | (FloatConsts.SIGNIF_BIT_MASK + 1));
        if (intBits < 0) {
            r = -r;
        }
        // In the comments below each Java expression evaluates to the value
        // the corresponding mathematical expression:
        // (r) evaluates to a / ulp(a)
        // (r >> shift) evaluates to floor(a * 2)
        // ((r >> shift) + 1) evaluates to floor((a + 1/2) * 2)
        // (((r >> shift) + 1) >> 1) evaluates to floor(a + 1/2)
        return ((r >> shift) + 1) >> 1;
    } else {
        // a is either
        // - a finite number with abs(a) < exp(2,FloatConsts.SIGNIFICAND_WIDTH-32) < 1/2
        // - a finite number with ulp(a) >= 1 and hence a is a mathematical integer
        // - an infinity or NaN
        return (int) a;
    }
}

【讨论】：

您应该使用System.nanoTime() 进行基准测试，因为System.currentTimeMillis() 不能保证单调递增。这可能不会影响您在此处获得的有趣结果，但仍然如此。
@Adam 为解决这个问题做出了杰出贡献。 Плус 1.
@Adam 您能否按照this SO 问题的答案转储两个版本的机器代码？我认为它将揭示问题以及下一步该去哪里。通过检查 Hotspot VM 的 Mercurial 存储库提交和日志 here，我无法找到导致此错误的原因。
请在测试前预热JVM，在单独的方法中提取测试代码......您的微基准测试中有很多陷阱。写出正确的微基准并不容易，见How do I write a correct micro-benchmark in Java?
为了找到这种退化的根本原因，您可以使用github.com/AdoptOpenJDK/jitwatch。为了节省时间，我可以说 Math#round 的 JITted 代码的大小在 8.0_40 中增加了。小方法几乎不会引起注意，但在大方法的情况下，机器代码表过长会污染指令缓存。

【解决方案3】：

不是一个明确的答案，但也许是另一个小贡献。

最初，我以Adam in his answer 的身份遍历整个链（详情请参阅历史记录），跟踪和比较字节码、实现和运行时间——尽管正如 cmets 中指出的那样，在我的测试期间（在 Win7/ 8），并且使用“通常的微基准最佳实践”，性能差异并不像原始问题和第一个答案的第一个版本中建议的那样显着。

但是，有不同，所以我创建了另一个小测试：

public class MathRoundPerformance {

    static final int size = 16;
    static float[] data = new float[size];

    public static void main(String[] args) {
        for (int i = 0; i < size; i++) {
            data[i] = i;
        }

        for (int n=1000000; n<=100000000; n+=5000000)
        {
            long t0 = System.nanoTime();
            int result = runTest(n);
            long t1 = System.nanoTime();
            System.out.printf(
                "%s, Math.round(float), %s, %s, %.1f ms\n",
                System.getProperty("java.version"),
                n, result, (t1 - t0) / 1e6);
        }
    }

    public static int runTest(int n) {
        int result = 0;
        for (int i = 0; i < n; i++) {
            int i0 = (i+0) % size;
            int i1 = (i+1) % size;
            result += Math.round(data[i0] + data[i1]);
            result += Math.round(data[i0] * data[i1]);
            result += Math.round(data[i0] / data[i1]);
        }
        return result;
    }
}

计时结果（省略了一些细节）已经

...
1.8.0_31, Math.round(float), 96000000, -351934592, 504,8 ms

....
1.8.0_40, Math.round(float), 96000000, -351934592, 544,0 ms

我使用热点反汇编 VM 运行示例，使用

java -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading
     -XX:+LogCompilation -XX:+PrintInlining -XX:+PrintAssembly
     MathRoundPerformance

重要的是当程序结束时优化已经完成（或者至少，它似乎已经完成了）。这意味着将打印最后一次调用 runTest 方法的结果在调用之间进行任何额外的 JIT 优化。

我试图通过查看生成的机器代码找出差异。两个版本的大部分生成代码是相同的。但是作为Ivan pointed out，指令的数量确实在 8u40 中增加了。我比较了 Hotspot 版本 u20 和 u40 的源代码。我认为intrinsics for floatToRawIntBits 可能存在细微差别，但这些文件并没有改变。我认为最近添加的对AVX 或SSE4.2 的检查可能会以一种不幸的方式影响机器代码的生成，但是......我的汇编知识并不像我希望的那样好，并且因此，我不能在这里作出明确的陈述。总体而言，生成的机器代码看起来主要是重新排序（即主要在结构上进行了更改），但是手动比较转储文件是一件令人头疼的事情……（地址都是不同的，即使指令大体相同）。

（我想在这里转储为runTest 方法生成的机器代码的结果，但一个答案有一些奇怪的限制为 30k）

我将尝试进一步分析和比较机器代码转储和热点代码。但最终，很难将矛头指向导致性能下降的“那个”变化——就执行速度较慢的机器代码而言，以及导致机器变化的热点变化而言代码。

【讨论】：

有趣。我重新设计了我的 MVCE，使其变得不那么长，稍后我会在这里发布，但我发现在根据我从这里的回复中收集的信息进行更改后，我的代码在 8u25 中的运行与在 8u40 中的运行大致相同。它有点慢，但在约 260 秒的执行时间中只有 5 秒的差异。感谢您继续调查此事。