【问题标题】:How to find numbers following specific punctuation如何查找特定标点符号后的数字
【发布时间】:2021-11-25 05:06:58
【问题描述】:

文字:

3. MANAGEMENT, FOOD EMPLOYEE  Comments 234:  FOUND NO EMPLOYEE  ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments:   | 10. ADEQUATE HANDWASHING SINKS 7-38-030(C), NO CITATION ISSUED.  | 47. FOOD & NON-FOOD 

背景:输入以|分隔,目标是找到第一个数字和标点符号|之后的所有数字

首选结果是 = [3,5,10,47]

注意:避开234,7-38-030

【问题讨论】:

    标签: python pandas string


    【解决方案1】:

    另一种选择 - (我认为使用re 更好..但无论如何..)

    data = '3. MANAGEMENT, FOOD EMPLOYEE  Comments 234:  FOUND NO EMPLOYEE  ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments:   | 10. ADEQUATE HANDWASHING SINKS 7-38-030(C), NO CITATION ISSUED.  | 47. FOOD & NON-FOOD'
    
    
    def _get_int(val:str):
      if not val[-1] == '.':
        return None
      try:
        x = int(val[:-1])
        return x
      except ValueError:
        return None
    
    numbers = []
    for x in data.split():
      z = _get_int(x)
      if z is not None:
        numbers.append(z)
    print(numbers)
    

    输出

    [3, 5, 10, 47]
    

    【讨论】:

      【解决方案2】:

      我的方法错过了第 3 位,并且错过了包含 234,7,-38,-30 的结果。

      结果:[' 234', '. | 5', ': | 10', ' 7', '-38', '-030', '. | 47']

      def multi_re_find(patterns,phrase):
          '''
          Takes in a list of regex patterns
          Prints a list of all matches
          '''
          for pattern in patterns:
              print ('Searching the phrase using the re check: %r' %pattern)
              print (re.findall(pattern,phrase))
              print ('\n')
      
      test_patterns = [r'\W+''\d+']
      multi_re_find(test_patterns,text)
      

      【讨论】:

        【解决方案3】:

        您可以使用带有前瞻的正则表达式:

        s = '3. MANAGEMENT, FOOD EMPLOYEE  Comments 234:  FOUND NO EMPLOYEE  ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments:   | 10. ADEQUATE HANDWASHING SINKS 7-38-030(C), NO CITATION ISSUED.  | 47. FOOD & NON-FOOD'
        
        import re
        
        re.findall('\d+(?=\.)', s)
        

        或者为了确保匹配行的开头或| 之后,您还可以添加一个lookbehind:

        re.findall('(?:(?<=^)|(?<=\| ))\d+(?=\.)', s)
        

        输出:

        ['3', '5', '10', '47']
        

        并得到一个整数列表:

        list(map(int, re.findall('\d+(?=\.)', s)))
        

        【讨论】:

        • 非常感谢,它有效!能不能多解释一下,哪个参数代表|,怎么读(?=\.)?非常感谢!!!
        • 我刚刚为| 添加了一个选项,请参阅我的编辑。 (?=\.) 检查数字后是否有一个点,但不包括在匹配项中
        • 为什么不(\d+)\.
        猜你喜欢
        • 1970-01-01
        • 2013-07-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-12-02
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多