【问题标题】:modification of extracting age variations using regex使用正则表达式修改提取年龄变化
【发布时间】:2019-10-03 02:42:17
【问题描述】:
    import re
    s = '99year old 93yo 100 yo 97y.o. and his wife is 93 y.o. 20 y.o  90old 23 year old 29 years old but not 25-year-old and 91year old cousin is 99 now and 90-year-old or 102 year old'
    reg = r'(?:9\d|1\d{2})(?:\s|-)?years?(?:\s|-)?old'
    r1 = re.findall(reg,s)
    r1
    ['99year old', '91year old', '90-year-old', '102 year old']

以下代码运行良好,取自extracting age variations using regex

我的目标是提取r1 中列出的元素以及以y.o.yo 结尾的任何90 以上 数字。我想要的输出是

 ['99year old', '93yo', '100 yo', '97y.o., '93 y.o.',  '91year old', '90-year-old', '102 year old']

我已尝试将reg 更改如下,但这并不能安静地工作

reg = r'(?:9\d|1\d{2})(?:\s|-)?years?(?:\s|-)?old(?:9\d|1\d{2})y.o.|(?:9\d|1\d{2})yo' 

如何更改 reg 以获得我想要的输出?

【问题讨论】:

    标签: regex python-3.x string text


    【解决方案1】:

    我猜可能是一些类似的表达,

    \b(?:9\d|1\d{2})\s*-?y(?:ears?)?\.?\s*-?o(?:ld)?\.?\b
    

    调查一下可能没问题。

    测试

    import re
    
    regex = r'\b(?:9\d|1\d{2})\s*-?y(?:ears?)?\.?\s*-?o(?:ld)?\.?\b'
    string = '''
    99year old 93yo 100 yo 97y.o. and his wife is 93 y.o. 20 y.o  90old 23 year old 29 years old but not 25-year-old and 91year old cousin is 99 now and 90-year-old or 102 year old
    '''
    
    print(re.findall(regex, string))
    

    输出

    ['99岁', '93yo', '100岁', '97y.o', '93岁', '91岁', '90岁'、'102岁']


    如果您希望简化/修改/探索表达式,在regex101.com 的右上角面板中已对此进行了说明。如果您愿意,您还可以在this link 中观看它如何与一些示例输入匹配。


    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-12-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-06-10
      • 1970-01-01
      • 2014-04-14
      相关资源
      最近更新 更多