到目前为止,没有一个解决方案可以解决相交 N 个字典的一般情况。
所以,如果要处理N任意字典的交集:
from functools import reduce
def dict_intersection(*dict_list):
return reduce(lambda a,b: dict(a.items() & b.items()), dict_list)
a = {k:k for k in range(0,5)} # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
b = {k:k for k in range(2,7)} # {2: 2, 3: 3, 4: 4, 5: 5, 6: 6}
c = {k:k for k in range(3,8)} # {3: 3, 4: 4, 5: 5, 6: 6, 7: 7}
dict_intersection(a,b,c) # {3:3, 4:4}
# or if you have a list of dicts
dicts = [{k:k for k in range(0+n,5+n)} for n in (0,2,3)] # == [a,b,c]
dict_intersection(*dicts) # {3:3, 4:4}
使用functools.reduce 允许在字典列表的单次迭代中完成操作,而不是某些解决方案中的多个循环。它也不执行任何额外的条件语句。
权衡
将dict_intersection_v1 更改为dict_intersection_v2 我们可以看到它对于更大的字典和/或字典列表执行得更快(设计适当的实验来测试哪个是更大的因素超出了此解决方案的范围)。这种性能提升是由于减少了字典实例化的数量。
def dict_intersection_v1(*dict_list):
return reduce(lambda a,b: dict(a.items() & b.items()), dict_list)
def dict_intersection_v2(*dict_list):
return dict(reduce(lambda a,b: a & b, (d.items() for d in dict_list)))
dict_lst1 = [{k:k for k in range(0+n,5+n)} for n in (0,2,3)] # = [a,b,c]
dict_lst2 = [{k:k for k in range(0,50,n)} for n in range(1,5)]]
dict_lst3 = [{k:k for k in range(0,500,n)} for n in range(40)]
dict_lst4 = [{k:k for k in range(0+n,500+n)} for n in range(400)]
| dict list |
kv pair count |
dict_intersection_v1 |
dict_intersection_v2 |
relative difference |
| 1 |
15 |
808 ns ± 4.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) |
821 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) |
+ 1.6% |
| 2 |
105 |
3.14 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
2.38 µs ± 5.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
-24.2% |
| 3 |
2155 |
36.9 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) |
25.1 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) |
-32.0% |
| 4 |
200_000 |
9.08 ms ± 22 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
4.88 ms ± 5.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
-46.3% |
结果dict_lst1 的回归主要是由于在每个交集后创建字典之间的开销与生成器内dict.items() 调用造成的开销不同(以及python 的一般函数调用开销)。
注意:我确实使用预先计算的 dict.items() 列表作为字典而不是 v2 即时构建生成器。
我测试了在计时之外和计时内传入预先计算的列表,虽然它具有统计学意义,但分别小于 30 μs 和 10 μs。如果您想获得这些收益,请查看不同的语言或 Cython 可能会更好。