【问题标题】:Pyparsing finding more matches than expectedPyparsing 找到比预期更多的匹配项
【发布时间】:2017-11-16 16:40:10
【问题描述】:

我正在编写代码来解析基本计算机指令行。我的输入字符串是这样的ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)

我期待这样的结果:

<line>
  <instruction>
    <type>ADD</type>
    <args>
      <ITEM>input1</ITEM>
      <ITEM>input2</ITEM>
    </args>
  </instruction>
  <instruction>
    <type>DEL</type>
    <args>
      <ITEM>input3</ITEM>
    </args>
  </instruction>
</line>
<line>
  <instruction>
    <type>SUB</type>
    <args>
      <ITEM>input1</ITEM>
      <ITEM>input2</ITEM>
    </args>
  </instruction>
  <instruction>
    <type>INS</type>
    <args>
      <ITEM>input3</ITEM>
    </args>
  </instruction>
</line>

我的实际结果具有我正在寻找的一般结构,但是行和指令解析器似乎在错误的位置匹配,或者标签出现在错误的位置。

实际结果:

<line>
  <line>
    <instruction>
      <type>ADD</type>
      <args>
        <ITEM>input1</ITEM>
        <ITEM>input2</ITEM>
      </args>
    </instruction>
    <instruction>
      <type>DEL</type>
      <args>
        <ITEM>input3</ITEM>
      </args>
    </instruction>
  </line>
  <instruction>
    <instruction>
      <type>SUB</type>
      <args>
        <ITEM>input1</ITEM>
        <ITEM>input2</ITEM>
      </args>
    </instruction>
    <instruction>
      <type>INS</type>
      <args>
        <ITEM>input3</ITEM>
      </args>
    </instruction>
  </instruction>
</line>

结果转储

[[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
- branch: [[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
  [0]:
    [['OTE', ['output1']]]
    - instruction: ['OTE', ['output1']]
      - args: ['output1']
      - type: 'OTE'
  [1]:
    [['XIO', ['input2']], ['OTE', ['output2']]]
    - instruction: ['OTE', ['output2']]
      - args: ['output2']
      - type: 'OTE'

由于某种原因,line 匹配整个结构,第二行指令作为单个指令组匹配。我尝试在instruction 行上使用.setDebug() 函数,但是我不确定如何解释输出。我不明白为什么最后一个 line 应该作为指令匹配,因为它不遵循 Word(Word) 模式。

我的代码:

#!python3
from pyparsing import nestedExpr,alphas,Word,Literal,OneOrMore,alphanums,delimitedList,Group,Forward

theInput = r"ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)"

instructionType = Word(alphanums+"_")("type")
argument = Word(alphanums+"_[].")
arguments = Group(delimitedList(argument))("args")
instruction = Group(instructionType + Literal("(").suppress() + arguments + Literal(")").suppress())("instruction")

line = (delimitedList(Group(OneOrMore(instruction))))("line")

parsedInput = line.parseString(theInput).asXML()
print(parsedInput)

调试输出:

Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 0(1,1)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['ADD', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 18(1,19)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['DEL', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 30(1,31)
Exception raised:Expected W:(ABCD...) (at char 30), (line:1, col:31)
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 32(1,33)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['SUB', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 50(1,51)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['INS', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 62(1,63)
Exception raised:Expected W:(ABCD...) (at char 62), (line:1, col:63)

我做错了什么?

【问题讨论】:

  • 不要使用asXML打印结果,请使用dump
  • @PaulMcG 感谢您的帮助!我已将转储添加到原始问题中。
  • :) 你做了print(line.parseString(theInput).dump),你必须做print(line.parseString(theInput).dump())
  • @PaulMcG 嘿,哎呀。问题已更新。

标签: python-3.x pyparsing


【解决方案1】:

您发布的代码的转储输出如下所示:

ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)

[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
  [0]:
    [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
    - instruction: ['DEL', ['input3']]
      - args: ['input3']
      - type: 'DEL'
  [1]:
    [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
    - instruction: ['INS', ['input3']]
      - args: ['input3']
      - type: 'INS'

我们可以在 dump() 输出中看到所有指令都被解析,但只有每组中的最后一条指令显示在“指令”名称下。发生这种情况是因为,与 Python dict 一样,当多个值(如您可能在 ZeroOrMore 或 OneOrMore 中获得)分配给同一个键时,仅保留最后一个值。

有两种解决方案。一种是删除(“指令”)结果名称,这样您就可以在每个子列表中获取已解析的指令:

[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
  [0]:
    [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
    [0]:
      ['ADD', ['input1', 'input2']]
      - args: ['input1', 'input2']
      - type: 'ADD'
    [1]:
      ['DEL', ['input3']]
      - args: ['input3']
      - type: 'DEL'
  [1]:
    [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
    [0]:
      ['SUB', ['input1', 'input2']]
      - args: ['input1', 'input2']
      - type: 'SUB'
    [1]:
      ['INS', ['input3']]
      - args: ['input3']
      - type: 'INS'

在 pyparsing 中,有时应该为给定名称保存多个值。 setResultsName() 方法有一个可选参数listAllMatches,它启用了这个行为。使用 setResultsName 的可调用快捷方式时,您不能传递 listAllMatches=True - 相反,结果名称以“*”结尾:

instruction = Group(instructionType 
                                + Literal("(").suppress() 
                                + arguments 
                                + Literal(")").suppress())("instruction*")

给出这个输出:

[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
  [0]:
    [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
    - instruction: [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
      [0]:
        ['ADD', ['input1', 'input2']]
        - args: ['input1', 'input2']
        - type: 'ADD'
      [1]:
        ['DEL', ['input3']]
        - args: ['input3']
        - type: 'DEL'
  [1]:
    [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
    - instruction: [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
      [0]:
        ['SUB', ['input1', 'input2']]
        - args: ['input1', 'input2']
        - type: 'SUB'
      [1]:
        ['INS', ['input3']]
        - args: ['input3']
        - type: 'INS'

您可以选择更适合自己的方法。

【讨论】:

    猜你喜欢
    • 2012-06-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-04-12
    • 2018-04-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多