Python re：匹配所有仅由字母组成的单词答案

【问题标题】：Python re: Match all the word consisting of letters onlyPython re：匹配所有仅由字母组成的单词
【发布时间】：2021-10-15 02:34:13
【问题描述】：

我只是为了找到一个句子中的所有单词。我尝试了以下方法，但不起作用。谢谢

import re

string1= "This is a nice day - 23 Sep 2019 Oct2021"

word1=re.findall('\b[a-zA-Z]+\b', string1, flags=0)
word2=re.findall('^[a-zA-Z]+$', string1, flags=0)
word3=re.findall('\w+', string1, flags=0)

print(word1) --> []
print(word2) --> []
print(word3) --> ['This', 'is', 'a', 'nice', 'day', '23', 'Sep', '2019', 'Oct2021']

期望的结果

['This', 'is', 'a', 'nice', 'day', 'Sep']

【问题讨论】：

假设您的语言不使用变音符号，您似乎正在尝试匹配 [a-zA-Z]+\b。您的问题是字符串前面没有r，导致 `` 看起来像转义符
@Grismar 好吧，我想应该是\b[a-zA-Z]+\b 而不是[a-zA-Z]+\b。
这不是必需的，@MartesBerkeley - 无论如何它都会从第一个字母开始匹配。

标签： python python-re

【解决方案1】：

您只是缺少一个 r 来指示原始字符串，而反斜杠保持不变：

import re

subject = "This is a nice day - 23 Sep 2019 Oct2021"
for match in re.findall(r"[a-zA-Z]+\b", subject):
    print(match)

print(re.findall(r'[a-zA-Z]+\b', subject))

结果：

This
is
a
nice
day
Sep
['This', 'is', 'a', 'nice', 'day', 'Sep']

flags = 0 是多余的。

由于有些人坚持要添加开头\b，请注意，如果您的文本包含以数字或其他符号开头的元素，然后只包含您不想匹配的字母，则应使用前导\b:

import re

subject = "It all -depends on your 2source text"
print(re.findall(r'[a-zA-Z]+\b', subject))
print(re.findall(r'\b[a-zA-Z]+\b', subject))

输出：

['It', 'all', 'depends', 'on', 'your', 'source', 'text']
['It', 'all', 'depends', 'on', 'your', 'text']

【讨论】：

您可能应该在两边都有单词边界，即使用：\b[a-zA-Z]+\b
@TimBiegeleisen 我也是这么想的。尽管此模式匹配所有给定的情况，但它不会满足所有the words consisting of letters only。
@MartesBerkeley 您的评论不正确，\b[A-Za-z]+\b 将匹配所有仅包含字母的单词。