不会有问题(问题是错误的结果),但正如API 注释所说的那样
在并行管道中保持 distinct() 的稳定性相对昂贵
但是,如果性能受到关注并且stability 不是问题(即结果与它处理的集合有关的元素顺序不同),那么您遵循 API 的说明
使用 BaseStream.unordered() 删除排序约束可能
导致 distinct() 的执行效率显着提高
并行管道,
我想为什么不对distinct 并行和顺序流的性能进行基准测试
public static void main(String[] args) {
List<String> strList = Arrays.asList("cat", "nat", "hat", "tat", "heart", "fat", "bat", "lad", "crab", "snob");
List<String> words = new Vector<>();
int wordCount = 1_000_000; // no. of words in the list words
int avgIter = 10; // iterations to run to find average running time
//populate a list randomly with the strings in `strList`
for (int i = 0; i < wordCount; i++)
words.add(strList.get((int) Math.round(Math.random() * (strList.size() - 1))));
//find out average running times
long starttime, pod = 0, pud = 0, sod = 0;
for (int i = 0; i < avgIter; i++) {
starttime = System.currentTimeMillis();
List<String> parallelOrderedDistinct = words.parallelStream().distinct().collect(Collectors.toList());
pod += System.currentTimeMillis() - starttime;
starttime = System.currentTimeMillis();
List<String> parallelUnorderedDistinct =
words.parallelStream().unordered().distinct().collect(Collectors.toList());
pud += System.currentTimeMillis() - starttime;
starttime = System.currentTimeMillis();
List<String> sequentialOrderedDistinct = words.stream().distinct().collect(Collectors.toList());
sod += System.currentTimeMillis() - starttime;
}
System.out.println("Parallel ordered time in ms: " + pod / avgIter);
System.out.println("Parallel unordered time in ms: " + pud / avgIter);
System.out.println("Sequential implicitly ordered time in ms: " + sod / avgIter);
}
以上代码由 open-jdk 8 编译并在 i3 第 6 代(4 个逻辑核心)上的 openjdk 的 jre 8(无 jvm 特定参数)上运行,我得到了这些结果
似乎在某个没有之后。在元素中,有序并行速度更快,讽刺的是,无序并行速度最慢。这背后的原因(感谢@Hulk)是因为它的实现方式(使用HashSet)。所以一般规则是,如果你有几个元素和大量重复,你可能会从@中受益987654325@。
1)
Parallel ordered time in ms: 52
Parallel unordered time in ms: 81
Sequential implicitly ordered time in ms: 35
2)
Parallel ordered time in ms: 48
Parallel unordered time in ms: 83
Sequential implicitly ordered time in ms: 34
3)
Parallel ordered time in ms: 36
Parallel unordered time in ms: 70
Sequential implicitly ordered time in ms: 32
无序并行比两者慢两倍。
然后我将wordCount 提高到5_000_000,这就是结果
1)
Parallel ordered time in ms: 93
Parallel unordered time in ms: 363
Sequential implicitly ordered time in ms: 123
2)
Parallel ordered time in ms: 100
Parallel unordered time in ms: 363
Sequential implicitly ordered time in ms: 124
3)
Parallel ordered time in ms: 89
Parallel unordered time in ms: 365
Sequential implicitly ordered time in ms: 118
然后到10_000_000
1)
Parallel ordered time in ms: 148
Parallel unordered time in ms: 725
Sequential implicitly ordered time in ms: 218
2)
Parallel ordered time in ms: 150
Parallel unordered time in ms: 749
Sequential implicitly ordered time in ms: 224
3)
Parallel ordered time in ms: 143
Parallel unordered time in ms: 743
Sequential implicitly ordered time in ms: 222