假设您有逗号分隔的值,您可以使用 frozenset 的配对并使用 Counter 字典来获取计数:
from collections import Counter
import csv
with open("test.csv") as f:
next(f)
counts = Counter(frozenset(tuple(row[-1].split(",")))
for row in csv.reader(f))
print(counts.most_common())
如果您希望根据更新后的输入获得所有组合或配对:
from collections import Counter
from itertools import combinations
def combs(s):
return combinations(s.split(","), 2)
import csv
with open("test.csv") as f:
next(f)
counts = Counter(frozenset(t)
for row in csv.reader(f)
for t in combs(row[-1]))
# counts -> Counter({frozenset(['Cheese', 'Cookie']): 2, frozenset(['Cheese', 'Pie']): 1, frozenset(['Cookie', 'Pie']): 1})
print(counts.most_common())
配对的顺序无关紧要,因为 frozenset([1,2]) 和 frozenset([2,1]) 将被视为相同。
如果你想考虑2-n的所有组合:
def combs(s):
indiv_items = s.split(",")
return chain.from_iterable(combinations(indiv_items, i) for i in range(2, len(indiv_items) + 1))
import csv
with open("test.csv") as f:
next(f)
counts = Counter(frozenset(t)
for row in csv.reader(f)
for t in combs(row[-1]))
print(counts)
print(counts.most_common())
为:
Receipt,Name,Address,Date,Time,Items
25007,A,ABC,pte,ltd,4/7/2016,10:40,"Cheese,Cookie,Pie"
25008,B,CCC,pte,ltd,4/7/2016,12:40,"Cheese,Cookie"
25009,B,CCC,pte,ltd,4/7/2016,12:40,"Cookie,Cheese,pizza"
25010,B,CCC,pte,ltd,4/7/2016,12:40,"Pie,Cheese,pizza"
会给你:
Counter({frozenset(['Cheese', 'Cookie']): 3, frozenset(['Cheese', 'pizza']): 2, frozenset(['Cheese', 'Pie']): 2, frozenset(['Cookie', 'Pie']): 1, frozenset(['Cheese', 'Cookie', 'Pie']): 1, frozenset(['Cookie', 'pizza']): 1, frozenset(['Pie', 'pizza']): 1, frozenset(['Cheese', 'Cookie', 'pizza']): 1, frozenset(['Cheese', 'Pie', 'pizza']): 1})
[(frozenset(['Cheese', 'Cookie']), 3), (frozenset(['Cheese', 'pizza']), 2), (frozenset(['Cheese', 'Pie']), 2), (frozenset(['Cookie', 'Pie']), 1), (frozenset(['Cheese', 'Cookie', 'Pie']), 1), (frozenset(['Cookie', 'pizza']), 1), (frozenset(['Pie', 'pizza']), 1), (frozenset(['Cheese', 'Cookie', 'pizza']), 1), (frozenset(['Cheese', 'Pie', 'pizza']), 1)]