字符串Python中的列表中没有项目答案

【问题标题】：No item in list in string Python字符串Python中的列表中没有项目
【发布时间】：2015-06-11 03:11:23
【问题描述】：

我有一个我想从 csv 中过滤掉的东西的列表，我正在尝试找出一种 Python 的方法来做到这一点。 EG，这就是我正在做的：

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
         read = csv.reader(inf)
         outwriter = csv.writer(outf)
         notstrings = ['and', 'or', '&', 'is', 'a', 'the']
         for row in read:
             (if none of notstrings in row[3])
                 outwriter(row)

我不知道在括号中放什么（或者是否有更好的整体方法来解决这个问题）。

【问题讨论】：

如果第 4 列包含这些词中的任何一个，您的意思是要排除一行？
row[3]中有哪些值？是一个句子吗？有标点符号吗？应该只匹配整个单词吗？
不，只是第 3 列。此外，第 3 行应该是一个名称，但我正在慢慢创建一个过滤器列表以避免非名称（也更好热心比不够热心）。但是，我更多地使用它来学习最佳方法，而不是专门针对这一应用程序。
我从 1 开始数，row[0] 是第 1 列，依此类推

标签： python list python-2.7 csv

【解决方案1】：

您可以使用any() function 将列表中的每个单词与一列进行对比：

if not any(w in row[3] for w in notstrings):
    # none of the strings are found, write the row

如果这些字符串中有个出现在row[3] 中，这将是正确的。但是，它将匹配 子字符串，因此 false-positive 将匹配 'a' in 'false-positive。

放到上下文中：

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = ['and', 'or', '&', 'is', 'a', 'the']
        for row in read:
            if not any(w in row[3] for w in notstrings):
                outwriter(row)

如果您需要遵守单词边界，那么在这里使用正则表达式会是一个更好的主意：

notstrings = re.compile(r'(?:\b(?:and|or|is|a|the)\b)|(?:\B&\B)')
if not notstrings.search(row[3]):
    # none of the words are found, write the row

我为表达式创建了一个Regex101 demo 来演示它是如何工作的。它有两个分支：

\b(?:and|or|is|a|the)\b - 匹配列表中的任何单词，只要它们位于开头、结尾或在非单词字符之间（标点符号、空格等）
\B&\B - 匹配 & 字符（如果在开头、结尾或在非单词字符之间）。您不能在此处使用\b，因为& 本身不是单词字符。

【讨论】：

如何使用 \ba\b 来避免 a 的误报？ r'\ba\b' 足够了吗？
@Xodarap777：这不适用于&，因为它不是单词字符。其余的就足够了。您可以在没有any() 的情况下将其作为一个正则表达式进行测试，一步：r'\b(and|or|is|a|the)\b'。我会考虑在& 那里混音。

【解决方案2】：

您可以使用集合。在这段代码中，我将您的列表转换为一个集合。我将您的row[3] 转换为一组单词，并检查两组之间的交集。如果没有交集，则表示 notstrings 中的单词都不在row[3] 中。

使用集合，您可以确保只匹配单词而不是单词的一部分。

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = set(['and', 'or', '&', 'is', 'a', 'the'])
        for row in read:
            if not notstrings.intersection(set(row[3].split(' '))):
                outwriter(row)

【讨论】：

这个方法能避免any()的误报吗？
这要求row[3]中只有空格和单词；如果涉及标点符号，它将不起作用。
@MartijnPieters 没错。如果有标点，字符串需要用多个分隔符分割或者用正则解析