【发布时间】:2018-11-21 16:46:57
【问题描述】:
我有一个嵌套列表:
lists = [['a', 'b', 'c', 'd'],
['a', 'b', 'd', 'e'],
['a', 'b', 'd', 'f'],
['a', 'b', 'd', 'f', 'h', 'i']]
我知道如何构建一个简单的前缀树:
tree = {}
end = "END"
for lst in lists:
d = tree
for x in lst:
d = d.setdefault(x, {})
d[end] = {}
结果:
>>> from pprint import pprint
>>> pprint(tree)
{'a': {'b': {'c': {'d': {'END': {}}},
'd': {'e': {'END': {}},
'f': {'END': {}, 'h': {'i': {'END': {}}}}}}}}
现在我可以递归地遍历那棵树,只要一个节点只有一个子节点(只有一个元素的子字典),就加入这些节点。
def join(d, pref=[]):
if end in d:
yield [' '.join(pref)] if pref else []
for k, v in d.items():
if len(v) == 1:
for x in join(v, pref + [k]): # add node to prefix
yield x # yield next segment
else:
for x in join(v, []): # reset prefix
yield [' '.join(pref + [k])] + x # yield node + prefix and next
结果:
>>> for x in join(tree):
... print(x)
...
['a b', 'c d']
['a b', 'd', 'e']
['a b', 'd', 'f']
['a b', 'd', 'f', 'h i']
我需要的是一种算法,其中只有常见的对元素成为树的单个节点。理想情况下,节点的最小长度=n1,节点的最大长度=n2。期望的输出:
[['a b', 'c d'],
['a b', 'd e'],
['a b', 'd f'],
['a b', 'd f', 'h i']]
【问题讨论】:
标签: python python-3.x algorithm trie