【问题标题】:Removing acronyms using regex , based on uppercase characters following parenthesis使用 regex 删除首字母缩写词,基于括号后的大写字符
【发布时间】:2021-07-26 18:53:58
【问题描述】:

如何删除以下内容:

  • 首字母缩写词以左括号开头,后跟大写或 号码:例如'(ABC' 或 '(ABC)' 或 '(ABC-2A)' 或 '(ABC-1)'。

NOT括号之间的单词以大写开头,后跟小写,例如'(Bobby)' 或 '(Bob went to the beach..)' --> 这是我正在努力解决的部分。


text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z]*\)?', '', string)
  print(cleaned_acronyms)

#current output:
>> 'went to the beach' #Correct
>>'The girl -2A) is walking' #Not correct
>>'The dog obby) is being walked' #Not correct
>>'They are there' #Correct


#desired & correct output:
>> 'went to the beach'
>>'The girl is walking'
>>'The dog (Bobby) is being walked' #(Bobby) is NOT an acronym (uppercase+lowercase)
>>'They are there'

【问题讨论】:

    标签: python regex string uppercase re


    【解决方案1】:

    使用模式\([A-Z0-9\-]+\)

    例如:

    import re
    
    text = ['ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
    ptrn = re.compile(r"\([A-Z0-9\-]+\)")
    for i in text:
        print(ptrn.sub("", i))
    

    输出:

    ABC went to the beach
    The girl  is walking
    The dog (Bobby) is being walked
    They are there
    

    【讨论】:

    • 看起来你在这里得到了错误的样本数据。第一个元素以(ABC 开头,没有右括号。
    • 哦...我以为是错字
    【解决方案2】:

    在以下上下文中使用\([A-Z\-0-9]{2,}\)?

    import re
    
    text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
    for string in text:
      cleaned_acronyms = re.sub(r'\([A-Z\-0-9]{2,}\)?', '', string)
      print(cleaned_acronyms)
    

    我得到了这些结果:

    ' went to the beach'
    'The girl  is walking'
    'The dog (Bobby) is being walked'
    'They are there '
    

    【讨论】:

    • 您可能还想捕捉空格以获得更好的结果,就像在 OP 中一样。 =) ++
    • 我意识到,但是在看到您的答案比我自己的答案好得多之后,我决定尝试从您那里获取“最佳答案”是自私的。干得好!
    • 不要担心这样的事情。改善您的答案,让 OP 决定哪些答案最适合他。不要自暴自弃!
    • 你的两个答案都很棒。非常感谢您的努力!
    【解决方案3】:

    尝试否定前瞻:

    \((?![A-Z][a-z])[A-Z\d-]+\)?\s*
    

    在线查看demo

    • \( - 文字开头的括号。
    • (?![A-Z][a-z]) - 否定前瞻断言位置不跟随大写跟随小写。
    • [A-Z\d-]+ - 匹配 1+ 个大写字母字符、数字或连字符。
    • \)? - 可选的文字结束括号。
    • \s* - 0+ 个空格字符。

    一些示例 Python 脚本:

    import re
    text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
    for string in text:
      cleaned_acronyms = re.sub(r'\((?![A-Z][a-z])[A-Z\d-]+\)?\s*', '', string)
      print(cleaned_acronyms)
    

    打印:

    went to the beach
    The girl is walking
    The dog (Bobby) is being walked
    They are there
    

    【讨论】:

    • 所有答案都非常有帮助。我接受了你的解释,因为你对每个部分都做了解释,这真的有助于我理解。
    猜你喜欢
    • 2022-01-21
    • 2022-01-25
    • 1970-01-01
    • 2019-08-25
    • 2023-04-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多