从字符串列表中提取没有任何空格的子字符串[重复]答案

【问题标题】：Extract substrings without any spaces from list of strings [duplicate]从字符串列表中提取没有任何空格的子字符串[重复]
【发布时间】：2021-10-25 12:35:00
【问题描述】：

假设我有以下列表：

l1 = ['apples', ' bananas' , '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']

提取每个单词并丢弃多余空格的最佳方法是什么？

我追求的结果是：

l2 = ['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

到目前为止我尝试过的是：

    clean_l = []

    # Get rid of white spaces 
    for item in l1:
        clean = re.sub("(?m)^\s+", "", item)
        clean_l.append(clean)

但这会返回与l1 完全相同的内容。

【问题讨论】：

一方面，您的正则表达式明确地只在字符串的 start 处找到空格。
你不妨使用" ".join(l1).split()。
最简单的正则表达式可能是re.findall: [w for string in l1 for w in re.findall("\w+", string)]
@WiktorStribiżew 你错过了一条，我猜
@DaniMesejo 输出：['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

标签： python string list

【解决方案1】：

用途：

l1 = ['apples', ' bananas' , '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']
res = [ei for e in l1 for ei in e.strip().split()]
print(res)

输出

['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

如果你坚持使用正则表达式，虽然我不推荐它用于这个特定的问题（见here），使用：

import re

l1 = ['apples', ' bananas', '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  ']
res = [ei for e in l1 for ei in re.findall(r"\w+", e)]
print(res)

输出

['apples', 'bananas', 'coconuts', 'dates', 'figs', 'guavas', 'lemons', 'mangoes']

第三种选择（@WiktorStribiżew）是使用：

res = " ".join(l1).split()

时间

l1 = ['apples', ' bananas', '  coconuts', '   dates figs guavas', 'lemons ', 'mangoes  '] * 1000
import re
%timeit [ei for e in l1 for ei in e.strip().split()]
1.76 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit " ".join(l1).split()
453 µs ± 3.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [ei for e in l1 for ei in re.findall(r"\w+", e)]
7.77 ms ± 59.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

【讨论】：

为什么正则表达式不起作用？
@jonrsharpe 已经在问题中发表了评论，但我尽量避免使用正则表达式