【问题标题】:splitting a dot delimited string into words but with a special case将点分隔的字符串拆分为单词,但有特殊情况
【发布时间】:2013-05-25 12:50:18
【问题描述】:

不确定是否有简单的方法来拆分以下字符串:

'school.department.classes[cost=15.00].name'

进入这个:

['school', 'department', 'classes[cost=15.00]', 'name']

注意:我想保持'classes[cost=15.00]' 不变。

【问题讨论】:

  • 这是一个例子,但不是很具体。不分裂的条件是什么?它总是在[] 内部还是可以在{}()"" 等内部?还是规则更复杂?
  • [cost=15.00] 真的是 classesschool.department.classes 的限定符吗?
  • @PaulMcGuire 这是我编的,但它是类的限定符。我一直在使用自己的 JSON 路径进行简单查询。我在网上找到的那些 python 没有按属性过滤(例如 python-jsonpath-rw )。

标签: python regex parsing split


【解决方案1】:
>>> import re
>>> text = 'school.department.classes[cost=15.00].name'
>>> re.split(r'\.(?!\d)', text)
['school', 'department', 'classes[cost=15.00]', 'name']

更具体的版本:

>>> re.findall(r'([^.\[]+(?:\[[^\]]+\])?)(?:\.|$)', text)
['school', 'department', 'classes[cost=15.00]', 'name']

详细:

>>> re.findall(r'''(                      # main group
                    [^  .  \[    ]+       # 1 or more of anything except . or [
                    (?:                   # (non-capture) opitional [x=y,...]
                       \[                 # start [
                       [^   \]   ]+       # 1 or more of any non ]
                       \]                 # end ]
                    )?                    # this group [x=y,...] is optional
                   )                      # end main group
                   (?:\.|$)               # find a dot or the end of string
                ''', text, flags=re.VERBOSE)
['school', 'department', 'classes[cost=15.00]', 'name']

【讨论】:

  • 如果字符串包含 classes[cost=foo.bar] 之类的内容,这可能会失败。
  • 我同意阿什维尼的观点。解决方案似乎太具体了。
  • 我认为答案和问题一样好。你打算用什么来压制分裂?括号? '=' 符号?
  • @AshwiniChaudhary 是的,我已经创建了一个单独支持[...] 的工具
  • @jamylak 谢谢你的详细回答。
【解决方案2】:

跳过括号内的点:

import re
s='school.department.classes[cost=15.00].name'
print re.split(r'[.](?![^][]*\])', s)

输出:

['school', 'department', 'classes[cost=15.00]', 'name']

【讨论】:

  • 我最初有这个但是如果你的变量名是department2
【解决方案3】:

这可能会很快变得一团糟,您可能需要实际解析这个字符串,而不是仅仅将其拆分:

from pyparsing import (Forward,Suppress,Word,alphas,quotedString,
                        alphanums,Regex,oneOf,Group,delimitedList)


# define some basic punctuation, numerics, operators
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word(alphas+'_',alphanums+'_')
real = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
compOper = oneOf('= != < > <= >=')

# a full reference may be composed of full references, i.e., a recursive
# grammar - forward declare a full reference
fullRef = Forward()

# a value in a filtering expression could be a full ref or numeric literal
value = fullRef | real | integer | quotedString
filterExpr = Group(value + compOper + value)

# a single dotted ref could be one with a bracketed filter expression
# (which we would want to keep together in a group) or just a plain identifier
ref = Group(ident + LBRACK + filterExpr + RBRACK) | ident

# now insert the definition of a fullRef, using '<<' instead of '='
fullRef << delimitedList(ref, '.')

# try it out
s = 'school.department.classes[cost=15.00].name'
print fullRef.parseString(s)
s = 'school[size > 10000].department[school.type="TECHNICAL"].classes[cost=15.00].name'
print fullRef.parseString(s)

打印:

['school', 'department', ['classes', ['cost', '=', 15.0]], 'name']
[['school', ['size', '>', 10000]], ['department', ['school', 'type', '=', '"TECHNICAL"']], ['classes', ['cost', '=', 15.0]], 'name']

(如果需要,将“classes[cost=15.00]”重新组合起来并不难。)

【讨论】:

  • 非常酷。如果我需要更强大的解析器,我可能会考虑这个。
猜你喜欢
  • 2014-11-28
  • 1970-01-01
  • 2021-09-07
  • 2011-06-20
  • 2021-07-02
  • 1970-01-01
  • 2011-11-29
相关资源
最近更新 更多