【问题标题】:Java 8 Streams: multiple filters vs. complex conditionJava 8 Streams:多个过滤器与复杂条件
【发布时间】:2014-07-26 03:02:30
【问题描述】:

有时您想过滤具有多个条件的Stream

myList.stream().filter(x -> x.size() > 10).filter(x -> x.isCool()) ...

或者您可以对复杂的条件和单个 filter

做同样的事情
myList.stream().filter(x -> x.size() > 10 && x -> x.isCool()) ...

我的猜测是第二种方法具有更好的性能特征,但我不知道

第一种方法在可读性方面胜出,但哪种方法对性能更好?

【问题讨论】:

  • 编写在这种情况下更具可读性的代码。性能差异很小(而且是高度情境化的)。
  • 忘记纳米优化并使用高度可读和可维护的代码。对于流,应该始终单独使用每个操作,包括过滤器。

标签: java lambda filter java-8 java-stream


【解决方案1】:

必须为两种备选方案执行的代码非常相似,以至于您无法可靠地预测结果。底层对象结构可能会有所不同,但这对热点优化器没有挑战。所以它取决于其他环境条件,如果有任何差异,将产生更快的执行。

组合两个过滤器实例会创建更多对象,因此会产生更多委托代码,但如果您使用方法引用而不是 lambda 表达式,这可能会发生变化,例如将filter(x -> x.isCool()) 替换为filter(ItemType::isCool)。这样,您就消除了为您的 lambda 表达式创建的合成委托方法。因此,使用两个方法引用组合两个过滤器可能会创建与使用带有 && 的 lambda 表达式的单个 filter 调用相同或更少的委托代码。

但是,如上所述,这种开销将被 HotSpot 优化器消除,并且可以忽略不计。

理论上,两个过滤器可以比单个过滤器更容易并行化,但这仅与计算密集型任务相关¹。

所以没有简单的答案。

最重要的是,不要考虑低于气味检测阈值的性能差异。使用更具可读性的内容。


¹...并且需要对后续阶段进行并行处理的实现,这是标准 Stream 实现目前未采用的道路

【讨论】:

  • 代码不是必须在每个过滤器之后迭代生成的流吗?
  • @Juan Carlos Diaz:不,流不是这样工作的。阅读“惰性评估”;中间操作不做任何事情,它们只会改变终端操作的结果。
【解决方案2】:

从性能角度来看,复杂的过滤条件更好,但最佳性能将显示旧式 for 循环,标准 if clause 是最佳选择。小数组 10 个元素的差异可能约为 2 倍,对于大数组,差异并没有那么大。
你可以看看我的GitHub project,我在那里对多个数组迭代选项进行了性能测试

对于小数组 10 元素吞吐量 ops/s: 对于中等 10,000 个元素吞吐量 ops/s: 对于大型阵列 1,000,000 个元素吞吐量 ops/s:

注意:测试在

上运行
  • 8 CPU
  • 1 GB 内存
  • 操作系统版本:16.04.1 LTS(Xenial Xerus)
  • java版本:1.8.0_121
  • jvm: -XX:+UseG1GC -server -Xmx1024m -Xms1024m

更新: Java 11 在性能上有一些进步,但动态保持不变

基准模式:吞吐量、操作/时间

【讨论】:

  • 据我了解,Ops/Sec 最小值是最好的;是吗?您能否解释一下这些数字(Ops/Sec)的含义?请举个例子
  • 今天有点晚了,但是@SpongeBob,Ops/Sec 是每秒操作数,所以 Ops/Sec 越高越好
  • 只是为了确定,这是否意味着并行流在大小小于 10k 的流中无效?
【解决方案3】:

此测试表明您的第二个选项可以执行得更好。先发现,再上代码:

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=4142, min=29, average=41.420000, max=82}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=13315, min=117, average=133.150000, max=153}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10320, min=82, average=103.200000, max=127}

现在是代码:

enum Gender {
    FEMALE,
    MALE
}

static class User {
    Gender gender;
    int age;

    public User(Gender gender, int age){
        this.gender = gender;
        this.age = age;
    }

    public Gender getGender() {
        return gender;
    }

    public void setGender(Gender gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }
}

static long test1(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter((u) -> u.getGender() == Gender.FEMALE && u.getAge() % 2 == 0)
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

static long test2(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter(u -> u.getGender() == Gender.FEMALE)
            .filter(u -> u.getAge() % 2 == 0)
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

static long test3(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter(((Predicate<User>) u -> u.getGender() == Gender.FEMALE).and(u -> u.getAge() % 2 == 0))
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

public static void main(String... args) {
    int size = 10000000;
    List<User> users =
    IntStream.range(0,size)
            .mapToObj(i -> i % 2 == 0 ? new User(Gender.MALE, i % 100) : new User(Gender.FEMALE, i % 100))
            .collect(Collectors.toCollection(()->new ArrayList<>(size)));
    repeat("one filter with predicate of form u -> exp1 && exp2", users, Temp::test1, 100);
    repeat("two filters with predicates of form u -> exp1", users, Temp::test2, 100);
    repeat("one filter with predicate of form predOne.and(pred2)", users, Temp::test3, 100);
}

private static void repeat(String name, List<User> users, ToLongFunction<List<User>> test, int iterations) {
    System.out.println(name + ", list size " + users.size() + ", averaged over " + iterations + " runs: " + IntStream.range(0, iterations)
            .mapToLong(i -> test.applyAsLong(users))
            .summaryStatistics());
}

【讨论】:

  • 有趣 - 当我在 test1 之前更改运行 test2 的顺序时,test1 运行速度稍慢。只有当 test1 首先运行时,它似乎更快。任何人都可以重现这一点或有任何见解吗?
  • 这可能是因为 HotSpot 编译的成本是由首先运行的任何测试产生的。
  • @Sperr 你是对的,当订单改变时,结果是不可预测的。但是,当我用三个不同的线程运行它时,总是复杂的过滤器会给出更好的结果,而不管哪个线程先启动。以下是结果。 Test #1: {count=100, sum=7207, min=65, average=72.070000, max=91} Test #3: {count=100, sum=7959, min=72, average=79.590000, max=97} Test #2: {count=100, sum=8869, min=79, average=88.690000, max=110}
【解决方案4】:

这是@Hank D 分享的样本测试的 6 种不同组合的结果 很明显,u -&gt; exp1 &amp;&amp; exp2 形式的谓词在所有情况下都是高性能的。

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=3372, min=31, average=33.720000, max=47}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9150, min=85, average=91.500000, max=118}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9046, min=81, average=90.460000, max=150}

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8336, min=77, average=83.360000, max=189}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9094, min=84, average=90.940000, max=176}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10501, min=99, average=105.010000, max=136}

two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=11117, min=98, average=111.170000, max=238}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8346, min=77, average=83.460000, max=113}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9089, min=81, average=90.890000, max=137}

two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10434, min=98, average=104.340000, max=132}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9113, min=81, average=91.130000, max=179}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8258, min=77, average=82.580000, max=100}

one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9131, min=81, average=91.310000, max=139}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10265, min=97, average=102.650000, max=131}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8442, min=77, average=84.420000, max=156}

one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8553, min=81, average=85.530000, max=125}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8219, min=77, average=82.190000, max=142}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10305, min=97, average=103.050000, max=132}

【讨论】:

    猜你喜欢
    • 2018-08-28
    • 1970-01-01
    • 1970-01-01
    • 2018-01-01
    • 2022-12-05
    • 2022-01-18
    相关资源
    最近更新 更多