长度为偶数的单词的正则表达式答案

【问题标题】：Regex expression for words with length of even number长度为偶数的单词的正则表达式
【发布时间】：2022-01-22 19:28:39
【问题描述】：

我想为偶数长度的单词写一个正则表达式。

例如，我想要从包含以下单词的列表中输出：
{"blue", "ah", "sky", "wow", "neat"} 是{"blue", "ah", "neat}。

我知道表达式 \w{2} 或 \w{4} 会产生 2 字或 4 字的单词，但我想要的是适用于所有偶数的东西。我尝试使用\w{%2==0}，但它不起作用。

【问题讨论】：

只是好奇，但为什么不len() % 2？可能是 XY 问题。

标签： python regex

【解决方案1】：

您可以在锚点 ^ 之间重复 2 个单词字符作为一组，以断言开始，$ 以断言字符串的结尾，或在单词边界 \b 之间重复

^(?:\w{2})+$

查看regex demo。

import re

strings = [
    "blue",
    "ah",
    "sky",
    "wow",
    "neat"
]

for s in strings:
    m = re.match(r"(?:\w{2})+$", s)
    if m:
        print(m.group())

输出

blue
ah
neat

【讨论】：

【解决方案2】：

如果您不需要对集合中的字符串进行额外验证，您可以简单地使用

words = {"blue", "ah", "sky", "wow", "neat"}
print( list(w for w in words if len(w) % 2 == 0) )
# => ['ah', 'blue', 'neat']

见this Python demo。

如果你想确保你返回的单词是由字母组成的，你可以使用

import re
words = {"blue", "ah", "sky", "wow", "neat"}
rx = re.compile(r'(?:[^\W\d_]{2})+')   # For any Unicode letter words
# rx = re.compile(r'(?:[a-zA-Z]{2})+') # For ASCII only letter words
print( [w for w in words if rx.fullmatch(w)] )
# => ['blue', 'ah', 'neat']

见this Python demo。 (?:[^\W\d_]{2})+ 模式匹配任何两个 Unicode 字母的一次或多次出现。与re.fullmatch 一起，它要求字符串由偶数个字母组成。

【讨论】：