以下是一些计算重复项并忽略所有不在 b 中的值的变体。
from collections import Counter
# a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
a = [1, 4, 3, 1, 2, 4, 4, 5, 6, 6, 7, 7, 7, 7, 8, 9, 0, 1]
b = [1, 3, 6, 9]
counts = Counter()
# make counts order match b order
for item in b:
counts[item] = 0
for item in a:
if item in b:
counts[item] += 1
print("in 'b' order")
print([(k, v) for k, v in counts.items()])
print("in descending frequency order")
print(counts.most_common())
print("count all occurrences in a of elements that are also in b")
print(sum(counts.values()))
python count_b_in_a.py
in 'b' order
[(1, 3), (3, 1), (6, 2), (9, 1)]
in descending frequency order
[(1, 3), (6, 2), (3, 1), (9, 1)]
count all occurrences in a of elements that are also in b
7
针对您对性能的评论,下面是扫描列表和扫描集合在 Python 中的比较:
import datetime
def timestamp():
return datetime.datetime.now()
def time_since(t):
return (timestamp() - t).microseconds // 1000
a = list(range(1000_000))
b = set(a)
iterations = 10
t = timestamp()
for i in range(iterations):
c = 974_152 in a
print("Finished {iterations} iterations of list scan in {duration}ms"
.format(iterations=iterations, duration=time_since(t)))
t = timestamp()
for i in range(iterations):
c = 974_152 in b
print("Finished {iterations} iterations of set scan in {duration}ms"
.format(iterations=iterations, duration=time_since(t)))
python scan.py
Finished 10 iterations of list scan in 248ms
Finished 10 iterations of set scan in 0ms
要注意的第一点:Python 也毫不逊色。在一台旧笔记本电脑上 1/4 秒扫描 1000 万个列表元素还不错。但它仍然是线性扫描。
Python 集属于不同的类。如果您从time_since() 中取出// 1000,您会看到Python 在不到一微秒的时间内扫描了100 万个成员集10 次。您会发现其他集合操作也快如闪电。无论集合在 Python 中适用于何处,都可以使用它们:它们太棒了。
如果您打算将上述代码应用到更大的列表中,其中性能很重要,首先要做的可能是将b 转换为集合。