【问题标题】:Regex to split by comma, but ignore commas proceeding words near a colon正则表达式以逗号分隔,但忽略冒号附近的逗号前导词
【发布时间】:2022-01-17 00:49:36
【问题描述】:

我正在尝试使用 python 通过逗号拆分字符串,但允许用户在某些密钥对中包含逗号。以下是我正在使用的两个字符串示例:

title.search:The relation between visualization size, grouping, and user performance,publication_year:2020

author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020

我想把它变成:

["title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]

["author.id:c33432", "title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]

对我有帮助的是,冒号之前的部分(键)将始终以三种格式之一编写,例如:

  1. 类型
  2. 作者.id
  3. author.institutions.country_code

所以它可以是一个单词,两个单词之间用句点隔开,或者三个单词用句点隔开。

关于这是否可能的任何想法?

【问题讨论】:

标签: python regex


【解决方案1】:

据我所知,您试图在文本中用逗号分隔,在这种情况下,正则表达式是 \w,\w

【讨论】:

    【解决方案2】:

    请您尝试以下方法:

    #!/usr/bin/python
    
    import re
    
    s = ['title.search:The relation between visualization size, grouping, and user performance,publication_year:2020',
    'author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020']
    
    for str in s:
        m = re.split(r',(?=\s*[\w.]+:)', str)
        print(m)
    

    输出:

    ['title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']
    ['author.id:c33432', 'title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']
    

    正则表达式,(?=\s*[\w.]+:) 匹配一个逗号,后跟

    • 零个或多个空白字符
    • 一系列单词字符和/或一个点字符
    • 冒号

    按顺序。
    然后将字符串拆分为满足上述条件的逗号。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-07-13
      • 2013-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-12-21
      相关资源
      最近更新 更多