递归减少元组列表答案

【问题标题】：Recursively reduce list of tuples递归减少元组列表
【发布时间】：2013-10-23 13:53:29
【问题描述】：

所以我有一个像这样的元组列表：

[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General'),
    ('1b', 'General', 'Basic'),
    ('1b', 'General', 'Basic', 'Data'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General'),
    ('1c', 'General', 'Basic'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None'),
    ...
]

每个元组的长度为 1-5。我需要递归地减少列表以结束：

[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None'),
    ...
]

基本上，如果下一行匹配上一行中的所有内容，+1 将其删除，直到具有相同层次结构的元组的最大长度。

因此，在我的示例中，有 3 行 1c 是元组中的第一项，因此它被缩减为最长。

【问题讨论】：

你有没有尝试过？你被困在哪里了？

标签： python recursion tuples

【解决方案1】：

def is_subtuple(tup1, tup2):
    '''Return True if all the elements of tup1 are consecutively in tup2.'''
    if len(tup2) < len(tup1): return False
    try:
        offset = tup2.index(tup1[0])
    except ValueError:
        return False
    # This could be wrong if tup1[0] is in tup2, but doesn't start the subtuple.
    # You could solve this by recurring on the rest of tup2 if this is false, but
    # it doesn't apply to your input data.
    return tup1 == tup2[offset:offset+len(tup1)]

然后，只需过滤您的输入列表（此处命名为 l）：

[t for i, t in enumerate(l) if not any(is_subtuple(t, t2) for t2 in l[i+1:])]

现在，这个列表理解假设输入列表的顺序与您显示它的方式一致，子元组早于它们所在的元组。它也有点贵（O(n**2)，我认为），但是它会完成工作。

【讨论】：

我喜欢这个！您能否根据您添加到代码中的注释文本给我一个示例？
@TGxANAHEiiMx [(a,b,c), (a,a,b,c)] 用于定义的 a、b 和 c
我知道你在做什么，但我很难把它放在上下文中。你能告诉我你上面提供的代码吗？再次感谢！

【解决方案2】：

对第一个元素的元组进行分组；使用itertools.groupby()（使用operator.itemgetter() 以便于创建密钥。

然后分别过滤每个组：

from itertools import groupby, chain
from operator import itemgetter

def filtered_group(group):
    group = list(group)
    maxlen = max(len(l) for l in group)
    return [l for l in group if len(l) == maxlen]

filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
output = list(chain.from_iterable(filtered))

演示：

>>> from itertools import groupby, chain
>>> from operator import itemgetter
>>> from pprint import pprint
>>> def filtered_group(group):
...     group = list(group)
...     maxlen = max(len(l) for l in group)
...     return [l for l in group if len(l) == maxlen]
... 
>>> filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
>>> pprint(list(chain.from_iterable(filtered)))
[('Worksheet',),
 ('1a', 'Calculated'),
 ('None', 'None', 'None', 'None', 'None'),
 ('1b', 'General', 'Basic', 'Data', 'Line 1'),
 ('1b', 'General', 'Basic', 'Data', 'Line 2'),
 ('None', 'None', 'None', 'None', 'None'),
 ('1c', 'General', 'Basic', 'Data'),
 ('None', 'None', 'None', 'None', 'None'),
 ('2', 'Active', 'Passive'),
 ('None', 'None', 'None', 'None', 'None')]

【讨论】：

这会杀死('1b', 'General', 'Basic', 'Data', 'Line 1')。
@kojiro：啊，你需要第一个长度为 5 的，还是最后一个可用的？
@Martijin Pieters - 感谢您的快速回复！但是我收到一条错误消息：IndexError: tuple index out of range
我不知道，我正在尝试自己找出“规则”。
基本上，我会选择“过滤掉所有属于另一个元组的子元组的元组”。（这里，当我说 subtuple 时，我的意思是像 substring。）

【解决方案3】：

from pprint import pprint

l=[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General'),
    ('1b', 'General', 'Basic'),
    ('1b', 'General', 'Basic', 'Data'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General'),
    ('1c', 'General', 'Basic'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None')
    #...
]

i=0
while i<len(l)-1:
  l0=l[i]
  l1=l[i+1]
  if len(l1)==len(l0)+1 and l1[:-1]==l0:
    del l[i]
  else:
    i+=1

pprint(l)

逻辑：将每一行（除了最后一行）与下一行进行比较。如果下一个与一个附加项目相同，则删除第一个。否则，前进到下一行。

这不是递归解决方案，但可以重新设计为一个。这是一个过滤操作，您需要条件中的下一项。

只是为了好玩，这里有一个递归的 Haskell 版本（这种类型的递归在 Haskell 和 Scheme 中是有效的，但不是 Python）：

prefixfilt :: Eq a => [[a]] -> [[a]]
prefixfilt [] = []
prefixfilt [x] = [x]
prefixfilt (x0:x1:xs) =
    if x0 == init x1 then rest else (x0:rest)
    where rest = prefixfilt (x1:xs)

【讨论】：

这确实产生了我想要的输出，所以谢谢！我正在努力看看是否可以使其递归。我将使用的元组列表将更长且更复杂，因此递归将是最好的。再次感谢！