【问题标题】:search for comma(,) in string, if comma present then print the immediate word after comma in python在字符串中搜索逗号(,),如果存在逗号,则在 python 中打印逗号之后的直接单词
【发布时间】:2018-11-19 07:21:45
【问题描述】:

我是正则表达式匹配的新手,我有如下字符串

"karthika has symptoms cold,cough her gender is female and his age is 45"

在第一个匹配的字符串中,我将检查关键字“症状”并选择关键字的下一个单词,如下所示:

regexp = re.compile("symptoms\s(\w+)")
symptoms = regexp.search(textoutput).group(1)

这会将症状值设为“冷”,但我在文本中存在多个症状,因此在第二步中,如果存在逗号(,),我需要在“冷”之后检查文本,如果逗号存在意味着我需要使用正则表达式在逗号后立即打印值,即“咳嗽”。

请帮助我实现这一目标..

【问题讨论】:

标签: python regex


【解决方案1】:

您可以使用正则表达式来查找'symptoms' 之后的第一个单词,并可选择更多以逗号、mabye 空格和更多单词字符开头的匹配项:

import re

pattern = r"symptoms\s+(\w+)(?:,\s*(\w+))*"
regex = re.compile(pattern)

t = "kathy has symptoms cold,cough her gender is female. john's symptoms  hunger, thirst."
symptoms = regex.findall(t)

print(symptoms)

输出:

[('cold', 'cough'), ('hunger', 'thirst')]

解释:

r"symptoms\s+(\w+)(?:,\s*(\w+))*"
# symptoms\s+                      literal symptoms followed by 1+ whitepsaces 
#            (\w+)                 followed by 1+ word-chars (first symptom) as group 1
#                 (?:,        )*   non grouping optional matches of comma+spaces
#                        (\w+)     1+ word-chars (2nd,..,n-th symptom) as group 2-n 

另一种方式:

import re

pattern = r"symptoms\s+(\w+(?:,\s*\w+)*(?:\s+and\s+\w+)?)"

regex = re.compile(pattern)

t1 = "kathy has symptoms cold,cough,fever and noseitch her gender is female. "
t2 = "john's symptoms  hunger, thirst."
symptoms = regex.findall(t1+t2)

print(symptoms)

输出:

['cold,cough,fever and noseitch', 'hunger, thirst']

这仅适用于“英国”英语——美国的方式

"kathy has symptoms cold,cough,fever, and noseitch" 

只会导致cold,cough,fever, and 匹配。

您可以在','" and " 拆分每个单独的匹配项以获得您的唯一原因:

sym = [ inner.split(",") for inner in (x.replace(" and ",",") for x in symptoms)] 
print(sym)

输出:

[['cold', 'cough', 'fever', 'noseitch'], ['hunger', ' thirst']]

【讨论】:

    【解决方案2】:

    您可以使用正则表达式捕获组 例如,

    # the following pattern looks for 
    # symptoms<many spaces><many word chars><comma><many word chars>
    
    s_re = re.compile(r"symptoms\s+\w+,(\w+)")
    

    完整代码是

    import re
    from typing import Optional
    
    s_re = re.compile(r"symptoms\s+\w+,(\w+)")
    
    def get_symptom(text: str) -> Optional[str]:
        found = s_re.search(text)
    
        if found:
          return found.group(1)
        return None
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-07-05
      • 2016-04-24
      • 1970-01-01
      • 2013-06-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多