【问题标题】:Split this string using regular expression - python使用正则表达式拆分此字符串 - python
【发布时间】:2012-09-13 09:10:48
【问题描述】:
Input string
---------------
South Africa 109/0 
Australia 100
Sri Lanka 111
Sri Lanka 331/4

Expected Output
---------------
['South Africa', '109', '0']
['Australia', '100']
['Sri Lanka', '111']
['Sri Lanka', '331', '4']

我尝试了几个正则表达式,但不知道写出正确的一个。 在这种情况下,空格分隔符对我没有帮助,因为国家名称可能有也可能没有空格(南非、印度)。提前致谢

【问题讨论】:

    标签: python regex python-2.7


    【解决方案1】:

    我们可以使用正则表达式:

    r'(\D+)\s(\d+)(?:/(\d+))?'
    

    ("很多非数字,后跟一个空格,后跟很多数字,然后可选地跟一个斜线,然后是很多数字。")

    这将返回,例如

    >>> [re.match(r'(\D+)\s(\d+)(?:/(\d+))?', x).groups() 
    ...  for x in ['South Africa 109/0', 
    ...            'Australia 100',
    ...            'Sri Lanka 111',
    ...            'Sri Lanka 331/4']]
    [('South Africa', '109', '0'), 
     ('Australia', '100', None), 
     ('Sri Lanka', '111', None), 
     ('Sri Lanka', '331', '4')]
    

    注意Nones,您可能需要手动过滤掉它。

    【讨论】:

    • 您不应该使用[\w\s] 而不是\D 以在'Au$tralia' 上失败吗?
    • @PierreGM:如果 OP 希望 Bishop's StortfordXi'an 成功怎么办?也许Áŭ$t®å£ià 真的被认为是有效的。
    【解决方案2】:

    试试:

    import re
    re.split(r"(?<=[a-zA-Z])\s+(?=\d)|(?=\d)\s+(?=[a-zA-Z])|/", "South Africa 109/0")
    

    【讨论】:

      【解决方案3】:
      re.compile("^([\w\s]+)\s(\d+)\/?(\d+)?")
      

      为您提供三个组。我们可以分解它

      • 在行首的一组只有字母和空格([\w\s]+) (^)
      • 一个空格
      • 一组数字,至少一个(\d+)
      • / 与否
      • 一组数字(可能是None

      【讨论】:

      • 这会在第一组中输出Australia 100Sri Lanka 111
      • 不,最后会给你一个空组,就像@KennyTM 版本一样。
      【解决方案4】:

      这是您需要的正则表达式:

      for match in re.finditer(r"(?m)^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$", inputText):
          country = match.group("Country")
          number1 = match.group("Number1")
          number2 = match.group("Number2")
      

      你可以看到结果here

      下面是对模式的解释:

      # ^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$
      # 
      # Options: ^ and $ match at line breaks
      # 
      # Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
      # Match the regular expression below and capture its match into backreference with name “Country” «(?P<Country>.*?)»
      #    Match any single character that is not a line break character «.*?»
      #       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
      # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
      #    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      # Match the regular expression below and capture its match into backreference with name “Number1” «(?P<Number1>\d+)»
      #    Match a single digit 0..9 «\d+»
      #       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
      #    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
      # Match the character “/” literally «/?»
      #    Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
      #    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
      # Match the regular expression below and capture its match into backreference with name “Number2” «(?P<Number2>\d*?)»
      #    Match a single digit 0..9 «\d*?»
      #       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
      # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
      #    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
      # Assert position at the end of a line (at the end of the string or before a line break character) «$»
      

      【讨论】:

      【解决方案5】:

      你已经得到了正则表达式的答案,但我建议也考虑可用的内置 str 方法(无论如何对于这个用例):

      s = 'South Africa 109/0'
      country, numbers = s.rsplit(' ', 1)
      # ('South Africa', '109/0')
      new_list = [country] + numbers.split('/')
      # ['South Africa', '109', '0'] 
      

      【讨论】:

        猜你喜欢
        • 2021-03-06
        • 2013-04-26
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-02-23
        相关资源
        最近更新 更多