【问题标题】:Extract lists within lists containing a string in python在python中提取包含字符串的列表中的列表
【发布时间】:2016-04-15 18:43:02
【问题描述】:

我正在尝试使用列表推导将嵌套列表分成两个嵌套列表。如果不将内部列表转换为字符串,我就无法这样做,这反过来又会破坏我以后访问/打印/控制值的能力。

我试过这个::

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn't you like to know?'], ...]

derived = [k for k in paragraphs3 if 'Derived:' in k]
therest = [k for k in paragraphs3 if 'Derived:' not in k]

发生的情况是整个段落 3 = [] 最终都在其余 = [] 中,除非我这样做:

for i in paragraphs3:
    i = str(i)
    paragraphs4.append(i)

如果我随后将段落 4 提供给列表理解,我会得到两个列表,就像我想要的那样。但它们不再是嵌套列表,因为:

    for i in therest:
        g.write('\n'.join(i))
        g.write('\n\n') 

写入每个 ! 字符! in therest = [] 在单独的行中:

'
P
a
g
e
:

2
'

因此,我正在寻找一种更好的分割段落的方法3 ...或者解决方案可能在其他地方?我正在寻找的最终结果/输出是:

Page: 2
Bib: Something
Derived: This n that

Page: 3
Bib: Something
.
.
.

【问题讨论】:

  • 您能否更好地描述所需的输出?我的印象是你的输入已经是你想要的输出了
  • 嵌套列表深度是固定的还是任意的?
  • @Pynchia:它是 - 我只是想将两组项目分开,因为我稍后将它们分别写入文件。
  • @Lav: Fixed - 也就是说,paragraphs3 始终是一个列表列表,从不包含任何子列表。

标签: python list nested


【解决方案1】:

此代码根据子列表是否包含以'Derived:' 开头的字符串来分隔子列表。

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"], ]

def show(paragraphs):
    for para in paragraphs:
        print('\n'.join(para), '\n')

derived = []
therest = []

print('---input---')
show(paragraphs3)

for para in paragraphs3:
    if any(item.startswith('Derived:') for item in para):
        derived.append(para)
    else:
        therest.append(para)

print('---derived---')
show(derived)

print('---therest---')
show(therest)

输出

---input---
Page: 2
Bib: Something
Derived:  This n that 

Page: 3
Bib: Something
Argument: Wouldn't you like to know? 

---derived---
Page: 2
Bib: Something
Derived:  This n that 

---therest---
Page: 3
Bib: Something
Argument: Wouldn't you like to know? 

这段代码最重要的部分是

`any(item.startswith('Derived:') for item in para)`

这会遍历para(当前段落)中的各个字符串,并在找到以'Derived:' 开头的字符串时立即返回True


FWIW,for 循环可以压缩为:

for para in paragraphs3:
    (therest, derived)[any(item.startswith('Derived:') for item in para)].append(para)

因为 FalseTrue 分别计算为 0 和 1,所以它们可用于索引 (therest, derived) 元组。但是,许多人会认为这几乎是不可读的。 :)

【讨论】:

  • 我先检查了你的答案,它奏效了!谢谢你。稍后我会尝试其他的,我相信很多都是正确的,但我对旧的 for 循环最满意,尽管我听说它是​​最慢的选择?
  • @treakec:谢谢!带有append 的老式for 循环 比等效的列表理解慢,但并不多。但是,对于这个应用程序,使用 append 比执行两个列表解析要快得多,因为列表比较版本必须扫描和测试所有内容两次:一次用于 derived 列表,另一次用于 therest 列表。
  • @treakec:正如我在回答中提到的,在生成器表达式上使用any 会在找到匹配项后立即返回True,如果没有,它只需要扫描整个列表'找不到匹配项。
【解决方案2】:

您编写的代码几乎是正确的。您需要检查'Derived:' 是否存在于列表的第三个元素中。 k基本上包含paragraphs3的第一个元素

>>> paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?']]
>>> paragraphs3[0]
['Page: 2', 'Bib: Something', 'Derived:  This n that']
>>> paragraphs3[0][2] # Here is where you want to check the condition
'Derived:  This n that'

所以您只需将条件更改为if 'Derived:' in k[2]

>>> [k for k in paragraphs3 if 'Derived:' in k[2]]
[['Page: 2', 'Bib: Something', 'Derived:  This n that']]

>>> [k for k in paragraphs3 if 'Derived:' not in k[2]]
[['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"]]

【讨论】:

    【解决方案3】:

    解决方案

    derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
    therest = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' not in k, l))]
    

    详细解释

    复制整个列表:

    [l for l in paragraph3]
    

    复制带有条件的列表:

    [l for l in paragraph3 if sublist_contains('Derived: ', l)]
    

    函数sublist_contains还没有实现,我们来实现吧。

    仅检索与condition_check 匹配的项目:

    filter(condition_check, l)
    

    由于condition_check 可以表示为 lambda 函数:

    filter(lambda k: 'Derived: ' in k, l)
    

    将结果转换为布尔值(如果找到至少一个符合条件的项目,则为 True):

    any(filter(lambda k: 'Derived: ' in k, l))
    

    并将sublist_contains 替换为生成的内联代码:

    derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
    

    【讨论】:

      【解决方案4】:

      您的内部列表似乎具有结构;列表本身是一个值,而不仅仅是不相关值的列表。考虑到这一点,您可以编写一个类来表示该数据。

      paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?'], ...]
      
      class Paragraph(object):
          def __init__(self, page, bib, extra):
              self.page = page
              self.bib = bib
              self.extra = extra
      
          @property
          def is_derived(self):
              return 'Derived: ' in self.extra
      
      paras = [Paragraph(p) for p in paragraphs3]
      

      然后,您可以使用 itertools 中的 partition 配方将该列表拆分为两个迭代器。

      def partition(pred, iterable):
          'Use a predicate to partition entries into false entries and true entries'
          # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
          t1, t2 = tee(iterable)
          return filterfalse(pred, t1), filter(pred, t2)
      
      (not_derived_paras, derived_paras) = partition(lambda p: p.is_derived, paras)
      

      【讨论】:

        【解决方案5】:

        在我看来,这是最直接的做法:

        [p for p in paragraphs3 if 'Derived:' in '\n'.join(p)]
        [p for p in paragraphs3 if 'Derived:' not in '\n'.join(p)]
        

        但是,如果您愿意,您可以变得更漂亮,并在一行中完成(尽管它会比必要的更复杂)。

        result = {k:[p for p in paragraphs3 if ('Derived:' in '\n'.join(p)) == test]  for k,test in {'derived': True, 'therest': False}.items()}
        

        这会生成一个dict,其中'derived''therest' 作为键。现在你可以这样做了:

        for k,p in result.items():
            print(k)
            for i in p:
                print(''.join(i))
        

        【讨论】:

          猜你喜欢
          • 2016-01-12
          • 1970-01-01
          • 1970-01-01
          • 2020-02-23
          • 2018-12-13
          • 1970-01-01
          • 1970-01-01
          • 2019-07-19
          • 2017-11-04
          相关资源
          最近更新 更多