【问题标题】:Deleting elements with specific substring in python在python中删除具有特定子字符串的元素
【发布时间】:2013-06-25 17:48:54
【问题描述】:

我有一个列表,其中包含我使用 Beautiful Soup 从 html 页面中提取的许多元素。 在此列表中,我有许多具有相同子字符串的元素,我想提取包含该子字符串的每个元素。

我的列表如下:

[
u'File:Saddam Hussein (107).jpg',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'File:AlBakr.jpg',
... (and so on) ...
]

我想删除具有字符串“(页面不存在)”的元素。

有什么想法可以做到这一点吗?

【问题讨论】:

    标签: python list


    【解决方案1】:

    使用列表推导:

    >>> lis = [u'File:Saddam Hussein (107).jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'File:AlBakr.jpg', u'Template:Fn (page does not exist)', u'File:Chiracsaddam.jpg', u'File:Donald saddam.jpg', u'Template:Fn (page does not exist)', u'File:SaddamandCuellar.jpg.jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'File:SaddamBaghdadwalkabout.jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Kurdish Patriotic Front (page does not exist)', u'File:TrialSaddam.jpg', u'Mohammad Rashdan (page does not exist)', u'Emmanuel Ludot (page does not exist)', u'Marc Henzelin (page does not exist)', u'Adnan Khairallah Tuffah (page does not exist)', u'Nidal al-Hamdani (page does not exist)', u'Ali Hussein (page does not exist)', u'File:SaddamandRana.jpg.jpg', u'Saddam Kamel Majid (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)']
    

    如果要修改原始列表:

    >>> lis[:] = [item for item in lis if "(page does not exist)" not in item]
    

    或者创建一个新列表:

    new_lis = [item for item in lis if "(page does not exist)" not in item]
    

    【讨论】:

    • 为什么要复制,[:]?我很确定这是不必要的。
    • @johnthexiii lis[:] 不是副本,请参阅stackoverflow.com/questions/11297774/…
    • @AshwiniChaudhary,也许更好的问题是为什么要保留原始参考?我并不是暗示这是一件坏事,我只是好奇。
    • @johnthexiii OP 提到 “想删除”,所以我提供了两种选择。这是唯一的原因。
    • @johnthexiii:有时更改应该就地进行,例如,os.walk() 允许通过更改 dirs 列表来操作访问的目录。
    【解决方案2】:
    >>> for i in range(len(l)-1, 0, -1):
    ...    if l[i].find('(page does not exist)') > -1:
    ...       del (l[i])
    ...
    >>> l
    [u'File:Saddam Hussein (107).jpg']
    >>>
    

    【讨论】:

    • del l[i]- 你不需要括号。另外L 是比l 更好的变量名。
    • 请注意,delpop 是列表的昂贵操作。(popdel 稍快)
    猜你喜欢
    • 1970-01-01
    • 2018-08-27
    • 1970-01-01
    • 1970-01-01
    • 2020-04-05
    • 1970-01-01
    • 2018-09-08
    • 2021-10-02
    • 1970-01-01
    相关资源
    最近更新 更多