【发布时间】:2022-01-21 09:51:12
【问题描述】:
我正在寻找一种拆分以下xml的好方法
<?xml version='1.0' encoding='US-ASCII'?><cml xmlns="http://www.chemaxon.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chemaxon.com/marvin/schema/mrvSchema_20_20_0.xsd" version="ChemAxon file format v20.20.0, generated by v21.14.0">
<MDocument><MChemicalStruct><molecule molID="m1"><atomArray atomID="a1 a2 a3" elementType="C C O"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
<MDocument><MChemicalStruct><molecule molID="m2"><atomArray atomID="a1 a2 a3 a4" elementType="C C C C"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/><bond id="b3" atomRefs2="a3 a4" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
</cml>
成碎片(在这个场合是两个):
<?xml version='1.0' encoding='US-ASCII'?><cml xmlns="http://www.chemaxon.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chemaxon.com/marvin/schema/mrvSchema_20_20_0.xsd" version="ChemAxon file format v20.20.0, generated by v21.14.0">
<MDocument><MChemicalStruct><molecule molID="m1"><atomArray atomID="a1 a2 a3" elementType="C C O"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
</cml>
和
<?xml version='1.0' encoding='US-ASCII'?><cml xmlns="http://www.chemaxon.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chemaxon.com/marvin/schema/mrvSchema_20_20_0.xsd" version="ChemAxon file format v20.20.0, generated by v21.14.0">
<MDocument><MChemicalStruct><molecule molID="m2"><atomArray atomID="a1 a2 a3 a4" elementType="C C C C"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/><bond id="b3" atomRefs2="a3 a4" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
</cml>
我正在试验下面的代码,但它看起来不是很优雅。有没有更好的方法来实现这一点?
from lxml import etree
starting_xml_string = '''<?xml version='1.0' encoding='US-ASCII'?><cml xmlns="http://www.chemaxon.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chemaxon.com/marvin/schema/mrvSchema_20_20_0.xsd" version="ChemAxon file format v20.20.0, generated by v21.14.0">
<MDocument><MChemicalStruct><molecule molID="m1"><atomArray atomID="a1 a2 a3" elementType="C C O"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
<MDocument><MChemicalStruct><molecule molID="m2"><atomArray atomID="a1 a2 a3 a4" elementType="C C C C"/><bondArray><bond id="b1" atomRefs2="a1 a2" order="1"/><bond id="b2" atomRefs2="a2 a3" order="1"/><bond id="b3" atomRefs2="a3 a4" order="1"/></bondArray></molecule></MChemicalStruct></MDocument>
</cml>'''
root = etree.fromstring(starting_xml_string.encode('utf-8'))
# remove all children
envelope = deepcopy(root)
for mol in envelope:
envelope.remove(mol)
fragments = []
for fragment in root.getchildren():
tmp = deepcopy(envelope)
tmp.append(fragment)
tmp = etree.tostring(tmp, xml_declaration=True, encoding=root.getroottree().docinfo.encoding).decode('utf-8')
fragments.append(tmp)
非常感谢您的帮助。
【问题讨论】: