【问题标题】:Parsing Autosar xml using beautiful soup python 3使用漂亮的汤 python 3 解析 Autosar xml
【发布时间】:2019-05-06 06:20:18
【问题描述】:

我正在尝试使用 Python 解析 AUTOSAR 特定的 arxml(类似于 xml 文件),但我无法读取文件的内容。我想在多个 ECUC-CONTAINER-VALUE 标签中获取定义的 DEFINITION-REF 值,例如:

/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef

我尝试了多种方法,但无法打印出内容。

from bs4 import BeautifulSoup as Soup

def parseArxml():
    handler = open('input.arxml').read()
    soup = Soup(handler,"html.parser")
    for ecuc_container in soup.findAll('ECUC-CONTAINER-VALUE'):
        print(ecuc_container)

if __name__ == "__main__":
    parseArxml()

这是arxml文件的一部分:

<?xml version="1.0" encoding="UTF-8"?>
<AUTOSAR xmlns="http://autosar.org/schema/r4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://autosar.org/schema/r4.0 autosar_4-2-1.xsd">
      <ECUC-CONTAINER-VALUE UUID="c112c504-e546-41c3-abf9-0aaf06b18284">
      <SHORT-NAME>BswMLogicalExpression_2</SHORT-NAME>
      <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression</DEFINITION-REF>
      <REFERENCE-VALUES>
        <ECUC-REFERENCE-VALUE>
          <DEFINITION-REF DEST="ECUC-CHOICE-REFERENCE-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef</DEFINITION-REF>
          <VALUE-REF DEST="ECUC-CONTAINER-VALUE">/ARRoot/BswM_0/BswMConfig_0/BswMArbitration_0/BswMModeCondition_2</VALUE-REF>
        </ECUC-REFERENCE-VALUE>
      </REFERENCE-VALUES>
    </ECUC-CONTAINER-VALUE>

    <ECUC-CONTAINER-VALUE UUID="c112c504-e546-41c3-abf9-0aaf06b18284">
      <SHORT-NAME>BswMLogicalExpression_3</SHORT-NAME>
      <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression</DEFINITION-REF>
      <REFERENCE-VALUES>
        <ECUC-REFERENCE-VALUE>
          <DEFINITION-REF DEST="ECUC-CHOICE-REFERENCE-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef</DEFINITION-REF>
          <VALUE-REF DEST="ECUC-CONTAINER-VALUE">/ARRoot/BswM_2/BswMConfig_2/BswMArbitration_2/BswMModeCondition_3</VALUE-REF>
        </ECUC-REFERENCE-VALUE>
      </REFERENCE-VALUES>
    </ECUC-CONTAINER-VALUE>
</AUTOSAR>

【问题讨论】:

    标签: python python-3.x parsing beautifulsoup xml-parsing


    【解决方案1】:

    您将在print(soup) 中看到标记名称已被解析器转换为小写。所以在搜索标签名时使用小写:

    for ecuc_container in soup.findAll('ECUC-CONTAINER-VALUE'.lower()):
    

    或者简单地说:

    for ecuc_container in soup.findAll('ecuc-container-value'):
    

    甚至更好:将文档显式解析为XML,这样标签的大小写就不会被修改:

    soup = Soup(handler,'xml')
    

    以下是获取 &lt;DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF"&gt; 元素中文本列表的方法:

    def parseArxml():
        handler = open('input.arxml').read()
        soup = Soup(handler,'xml')
        dest = [d.text for d in soup.findAll('DEFINITION-REF') if d['DEST']=='ECUC-CHOICE-REFERENCE-DEF']   
        print(dest)
    

    输出:

    ['/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef',
    '/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef']
    

    或者如果你想获取所有definition-ref标签而不考虑属性,使用

    dest = [d.text for d in soup.findAll('definition-ref')] 
    

    【讨论】:

    • 谢谢。我什至继续尝试使用“lxml-xml”解析器来制作漂亮的汤,并且标签的大小写没有被修改。 (情况无关紧要)。你能建议我使用哪个解析器来处理更复杂的 XML 文件吗?
    • 我建议使用soup = Soup(handler,'xml') 将文档显式解析为 XML。这将保留大写字母。见crummy.com/software/BeautifulSoup/bs4/doc/…
    • 如果 XML 格式正确,还可以考虑直接使用纯 XML 解析器,如 ElementTree 或 LXML,而不是使用 BeautifulSoup。它会更快。
    • ElementTree 完美运行,遍历xml就像一棵树。谢谢:)
    【解决方案2】:

    您的解析器和 BeautifulSoup 版本似乎正在将标签转换为小写。

    你应该这样做:

    from bs4 import BeautifulSoup as Soup
    
    def parseArxml():
        handler = open('input.arxml').read()
        soup = Soup(handler,"html.parser")
        for ecuc_container in soup.find_all('ecuc-container-value'):
            for def_ref in ecuc_container.find_all('definition-ref'):
                print(def_ref.get_text())
    
    if __name__ == "__main__":
        parseArxml()
    

    输出:

    /AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression
    /AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef
    /AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression
    /AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-03-21
      • 2020-05-01
      • 2015-07-03
      • 2017-05-23
      • 2021-03-06
      • 1970-01-01
      • 2019-11-10
      • 1970-01-01
      相关资源
      最近更新 更多