我将提出一个使用 Python 的解决方案。
首先,让我们实现算法,然后解决内存限制问题
import itertools
# Let's build a list with your pairs
collection_items = [("foo", 0), ("bar", 1), ("baz", 2)]
"""
A Python generator is a function that produces a sequence of results.
It works by maintaining its local state, so that the function can resume again exactly where
it left off when called subsequent times. Same generator can't be used twice.
I will explain a little later why I use generators
"""
collection_generator1 = (el for el in collection_items) # Create the first generator
# For example; calling next(collection_generator1) => ("foo", 0); next(collection_generator1) => ("bar", 1),
# next(collection_generator1) => ("bar": 2)
collection_generator2 = (el for el in collection_items) # Create the second generator
cartesian_product = itertools.product(collection_generator1, collection_generator2) # Create the cartesian product
for pair in cartesian_product:
first_el, second_el = pair
str_pair1, val_pair1 = first_el
str_pair2, val_pair2 = first_el
name = "{str_pair1}+{str_pair2}".format(str_pair1=str_pair1, str_pair2=str_pair2)
item = (name, [first_el, second_el]) # Compose the item
print(item)
# OUTPUT
('foo+foo', [('foo', 0), ('foo', 0)])
('foo+foo', [('foo', 0), ('bar', 1)])
('foo+foo', [('foo', 0), ('baz', 2)])
('bar+bar', [('bar', 1), ('foo', 0)])
('bar+bar', [('bar', 1), ('bar', 1)])
('bar+bar', [('bar', 1), ('baz', 2)])
('baz+baz', [('baz', 2), ('foo', 0)])
('baz+baz', [('baz', 2), ('bar', 1)])
('baz+baz', [('baz', 2), ('baz', 2)])
现在让我们解决内存问题
因为你有很多数据,一个好主意是将它们存储在一个文件中,在每一行写一对(如你的例子)
现在让我们读取文件(“input.txt”)并使用其数据创建一个生成器。
file_generator_1 = (line.strip() for line in open("input.txt"))
file_generator_2 = (line.strip() for line in open("input.txt").readlines())
现在,您需要做的唯一修改是将变量名称 collection_generator1、collection_generator2 替换为 file_generator_1、file_generator_2