Python：如何测试字符串是否包含列表中不区分重音的字符串之一？答案

【问题标题】：Python: How to test if a string contains one of strings in a list accent insensitive?Python：如何测试字符串是否包含列表中不区分重音的字符串之一？
【发布时间】：2020-05-11 23:28:50
【问题描述】：

我需要测试一个字符串是否包含列表中的一个字符串，忽略重音符号。

我尝试使用 for + in + if + unidecode 但没有成功：

from unidecode import unidecode

def temServentiaExclusiva(nome_orgao):
     #fix-me: pegar ids dinamicamente
    regras = [
        {'especializada_id':70, 'termos': [u'orfaos e sucessoes', u'familia']}
    ]

    for r in regras:
      #if(unidecode(nome_orgao) in s for s in r['termos']):
      if([t for t in r['termos'] if(t in unidecode(nome_orgao))]):
        return r['especializada_id']


print(temServentiaExclusiva('orfãos'))
print(temServentiaExclusiva('Cartório da 6ª Vara de Orfãos e Sucessões'))

结果是无 :(

那么，我该如何实现呢？

【问题讨论】：

您可能希望在temServentiaExclusiva() 的开头添加代码，该代码将通过nome_orgao 查找任何带有重音符号的字符，然后在检查之前将这些字符更改为没有重音符号的版本。跨度>
@SpencerLutz 这是一个更大的“概念证明”

标签： python string unicode

【解决方案1】：

您可以使用嵌套的 for 循环而不是列表推导式来做到这一点：

from unidecode import unidecode

def temServentiaExclusiva(nome_orgao):
    regras = [
        {'especializada_id':70, 'termos': [u'orfaos e sucessoes', u'familia']}
    ]

    uni_nome_orgao = unidecode(nome_orgao).lower()

    for r in regras:
      for t in r['termos']:
          if uni_nome_orgao in t or t in uni_nome_orgao:
              return r['especializada_id']

print(temServentiaExclusiva('orfãos'))

关键是将 nome_orgao 转换为标准格式，然后对照所有术语进行检查。正如您已经完成的那样，unidecode 将删除所有重音符号。将.lower() 添加到末尾以使所有内容都小写。然后，遍历 regras 中的每个 r 和 termos 中的每个 t，并检查 t 是否在 uni_nome_orgao 或 uni_nome_orgao 在 t 中。

希望有帮助！

【讨论】：

你知道为什么第二个不起作用吗？ repl.it/repls/PurpleSizzlingAddition
您想知道 nome_orgao 是否在任何 termos 中，或者是否有任何 termos 在 nome_orgao 中？
两者都可以吗？
关于重音去除部分的注意事项：根据“重音不敏感”的特定任务定义，unidecode 可能不是正确的工具，特别是如果您想从其他脚本中去除重音比拉丁文，或者如果unidecode 将“€”变成“EUR”和“1°”变成“1deg”是一个问题。因为unidecode 是严格的 ASCII 格式。 Here 是一系列替代方法。