根据条件从另一个列表创建一个新列表答案

【问题标题】：Create a new list from another list based on condition根据条件从另一个列表创建一个新列表
【发布时间】：2018-10-24 12:17:45
【问题描述】：

我正在尝试根据条件从另一个列表创建一个新列表：

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

table, instrlist = '', ''; code, instructions = [], []; qty = 0

for idx, l in enumerate(lst):
    table = l[0]
    if not l[1].startswith('#'):
        code = l[1]; qty = l[2]; instructions = []
    else:
        instructions.append(l[1])
    print idx, table, code, instructions, qty

每当代码出现在包含“#”的元组之后的元组上时，我都需要将正确的行传输到程序的另一部分并重置以开始处理另一行。我设置了一系列条件，得到了这样的结果：

0 Id01 Code1 [] 1
1 Id01 Code1 ['#instr1'] 1
2 Id01 Code1 ['#instr1', '#instr2'] 1
3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
4 Id01 Code2 [] 1
5 Id01 Code2 ['#instr3'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
7 Id02 Code2 [] 1
8 Id02 Code2 ['#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

然而我真正需要的结果是

3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

我需要再次过滤什么条件？

我不够熟练，无法使用列表解析或内置过滤器，我希望尽可能让代码更具可读性（对于新手而言），至少在我了解更多信息之前。

更新：

jpp 提供的解决方案似乎是最高效和可读的：

from collections import defaultdict
from itertools import count, chain

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

d = defaultdict(list)
enums = []
c = count()

for ids, action, num in lst:
    if not action.startswith('#'):
        my_ids, my_action = ids, action
        enums.append(next(c))
    else:
        d[(my_ids, my_action)].append([action, num])
        next(c)
enums = enums[1:] + [len(lst)]

for idx, ((key1, key2), val) in enumerate(d.items()):
    print (enums[idx]-1, key1, key2, list(chain.from_iterable(val)), val[0][-1])

但是我遇到了一些问题。

由于某些原因，顺序错误（最后一行变成了第一行）：结果：

(3, 'Id02', 'Code2', ['#instr2', 1, '#instr5', 1], 1)

(6, 'Id01', 'Code1', ['#instr1', 1, '#instr2', 1, '#instr4', 1], 1)

(9, 'Id01', 'Code2', ['#instr3', 1, '#instr2', 1], 1)
元组上的数字字段并不总是“1”，有时脚本不会尊重它（我身边缺少信息），因为它总是采用在元组中找到的数字。需要与'Code'元组配对，可以省略。

我正在努力解决问题，我会尽快更新我的帖子。

【问题讨论】：

标签： python python-2.7 list filter conditional-statements

【解决方案1】：

collections.defaultdict 提供直观的解决方案。这个想法是创建一个字典，如果第二个不以'#' 开头，则将键设置为元组的前两个组件。然后以您想要的格式将字典迭代到print。

itertools.count 有一些杂乱的工作来获得您想要的索引。我相信您可以改进这项工作。

from collections import defaultdict
from itertools import count, chain

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

d = defaultdict(list)
enums = []
c = count()

for ids, action, num in lst:
    if not action.startswith('#'):
        my_ids, my_action = ids, action
        enums.append(next(c))
    else:
        d[(my_ids, my_action)].append([action, num])
        next(c)

enums = enums[1:] + [len(lst)]

结果：

for idx, ((key1, key2), val) in enumerate(d.items()):
    print(enums[idx]-1, key1, key2, list(chain.from_iterable(val)), val[0][-1])

3 Id01 Code1 ['#instr1', 1, '#instr2', 1, '#instr4', 1] 1
6 Id01 Code2 ['#instr3', 1, '#instr2', 1] 1
9 Id02 Code2 ['#instr2', 1, '#instr5', 1] 1

【讨论】：

这种编码提醒我对python的了解是多么的少。谢谢，对我来说可读性并不高（但我会仔细分析脚本）但只是......有效！据我所知，只有一个小问题：以“#”开头的项目是列表中的列表。是否有可能已经将其作为单个列表（未嵌套）？请注意，我放置的列表的 idx 只需要确定调试位置。它不需要或至少会被忽略。我很抱歉，是缺少信息。
太棒了！我现在正在阅读有关集合和工具间模块的信息。谢谢，我过去完全错过了这些模块。
一点帮助，如果您再次运行脚本，您会注意到 idx 9 是第一个显示您声明 print(enums[idx]-1, key1, key2, list(chain.from_iterable( val)), val[0][-1])... 应该是索引 9 但索引是 3。
@FedericoLeoni，对不起，cmets 真的不适合代码。我根本看不懂。如果您需要解释某个特定的部分，请告诉我。正如我所说，如果你理解代码，你应该能够改进一些地方。如果你不明白某一点，请告诉我，我很乐意解释:)。
是的，在注释中添加代码几乎是不可能的。我开始明白其中的逻辑了，这比我想的要简单得多。但是你能轻轻看看你的脚本的结果吗？ for 循环末尾的 print 语句不尊重您在帖子中显示的列表顺序：当我在 pycharm 上运行脚本时，最后一个元组位于第一行。

【解决方案2】：

你可以使用itertools.groupby:

import itertools 
import re
lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
   ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
   ("Id02","#instr2",1),("Id02","#instr5",1)]
results = {a:list(b) for a, b in itertools.groupby(sorted(lst, key=lambda x:x[0]), key=lambda x:x[0])}
code_groupings = {a:[[c, list(d)] for c, d in itertools.groupby(b, key=lambda x:'Code' in x[1])] for a, b in results.items()}
count = 0
last_code = None
for a, b in sorted(code_groupings.items(), key=lambda x:x[0]):
  for c, results in b:
    if c:
      count += 3
      last_code = results[0][1]
    else:
      print('{} {} {} {} 1'.format(count, a, last_code, str([i[1] for i in results])))

输出：

3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

【讨论】：

@jpp 提供的解决方案似乎更干净（对我来说），但谢谢伙计！我会研究两者，看看哪一个更合适。
我看到您在打印语句中使用了固定的“1”，但实际上这是一个可能会改变的值，就像数量一样......
@FedericoLeoni 关于其价值的规则是什么？
元组中的最后一个数字是需要从数据库中提取的代码数量。 #instr 代码的数量尊重代码的数量，可以省略。

【解决方案3】：

由于我无法纠正我在 jpp 提供的解决方案中发现的问题（我的错，我需要花一些空闲时间来研究更多内容），我已经详细说明了自己的代码。显然不是“python方式”，但工作正常：

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

instr, newline = [], []
for idx, codex, qtx in reversed(lst): #reversed list is more simple to read

    if codex.startswith('#'):
        instr.insert(0, codex) #here I'm creating the tuple in the right order
    else:
        newline += tuple([(idx, codex, qtx) + tuple(instr)])
        instr = []

newline = newline[::-1] #reversed the list to respect the order of the original list (lst) 

for n in newline:
    print n

结果：

('Id01', 'Code1', 1, '#instr1', '#instr2', '#instr4')
('Id01', 'Code2', 1, '#instr3', '#instr2')
('Id02', 'Code2', 1, '#instr2', '#instr5')

基本思想是恢复输入列表 (lst)，因为在 for 循环上详细说明条件更简单。格式化元组后，我需要反转输出列表（换行符）以获得正确的顺序。我冒昧地添加了一些 cmets，以便像我这样的新手更好地阅读。

我知道这是一个肮脏的编码，我很确定我可以做得更好，但现在我在结合各种列表理解例程时遇到了严重的问题。随着时间的推移，我会提高我的编码技能。

【讨论】：