【发布时间】:2022-01-22 16:40:17
【问题描述】:
我有一个复杂的、多重嵌套的 XML 文件,我试图从中提取数据并将其转换为数据框,以进行后续绘图和分析等。使用 R 或 Python 的解决方案都可以,但我从来没有使用 XML 文件,我正在努力理解如何提取我需要的数据(我正在阅读 XPath 语法,这对我来说是新的)。
我尝试过使用 R 包 XML、xml2 和 xmltools,并且我还尝试过使用 Python 元素树。我尝试过的大多数示例都使用了更简单的 XML 文件,而且我还没有弄清楚如何将逻辑扩展到我自己的案例中,结果却是一团糟。
XML文件的结构是:
(1) ------------
├── XMLFILE
├── DATASET
(2) ------------
└── GROUPDATA
└── GROUP
├── METHODDATA
├── SAMPLELISTDATA
├── SAMPLE
├── USERDATA
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
└── SAMPLE
├── USERDATA
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
└── CALIBRATIONDATA
├── COMPOUND
├── RESPONSE
└── CURVE
└── RESPONSEFACTOR
└── COMPOUND
├── RESPONSE
└── CURVE
├── CALIBRATIONCURVE
└── DETERMINATION
我只关心 SAMPLELISTDATA 部分中的内容。此外,我在每个 SAMPLE 中只展示了 2 个 SAMPLES 和 2 个 COMPOUNDS,但是在真实文件中两者都有很多。树中的所有标签也有很多属性,我需要从中提取数据。
实际的 XML 很大,但这里有一个(有点)最小的例子:
<QUANDATASET description="" version="1">
<XMLFILE filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\quandata.xml" modifieddate="20 Dec 2021" modifiedtime="15:53:06"/>
<DATASET filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\211220_MAA_Jack.qld" modifieddate="20 Dec 2021" modifiedtime="15:50:10" creationdate="20 Dec 2021" creationtime="14:18:02"/>
<GROUPDATA count="1">
<GROUP id="1" name="MAA_JACK">
<METHODDATA id="1" filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\MethDB\MAA_Jack.mdb" modifieddate="20 Dec 2021" modifiedtime="14:04:55" creationdate="20 Dec 2021" creationtime="14:04:55"/>
<SAMPLELISTDATA filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\SampleDB\MAA_211220.SPL" modifieddate="20 Dec 2021" modifiedtime="09:55:58" count="12">
<SAMPLE id="1" groupid="1" name="MAA_211220_01" createdate="20-Dec-21" createtime="10:00:08" type="Analyte" desc="'Umbilicalis' laver filtrate 7D7" dilutionfac="0.0000000000" extractvolume="0.0000000000" initamount="0.0000000000" injectvolume="2.0000000000" job="MAA_211220" sampleid="" samplenumber="1" stdconc="0.0000000000" stockdilutionfac="0.0000000000" subjecttext="" subjecttime="0.0000000000" userdilutionfac="0.0000000000" vial="1:A,1" inletmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAA_Dev_17" msmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAAs SIR5.EXP" prerunmethodname="" postrunmethodname="" switchmethodname="" hplcmethodname="" tunemethodname="C:\Masslynx Projects\Histamine_QDA_Dev.PRO\ACQUDB\Default.ipr" fractionlynxname="" instrument="ACQ-QDA#KAD3691" lab="" conditions="" submitter="" task="" user="" reinjections="0" text="'Umbilicalis' laver filtrate 7D7">
<COMPOUND id="1" sampleid="1" groupid="1" name="Palythine" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="514" foundrt="1.7100000381" foundrrt="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" area="89222.9220000000" height="1567686.0000000000" response="89222.9220000000" pkflags="MM!" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="20-Dec-21" modifiedtime="14:22:50" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="1.6399999857" endrt="1.7532999516" startht="-10476.0000000000" endht="-10476.0000000000" absresponse="89222.9220000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="11.0944900513" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="318_322" peaks="0" pkwidth="3.0210000000" pksigma="6.3800000000" pkskew="-0.1190000000" pkkurt="-0.4500000000" heightdivarea="17.5704400266" baselinewidth="6.7979979515" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="141303.1146768486" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.0700000003" peaktailwidth="0.0430000015" peakasymmetryvalue="0.6190000176" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="0.0000000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="318_322" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Palythine" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="1" groupid="1"/>
</COMPOUND>
<COMPOUND id="14" sampleid="1" groupid="1" name="Porphyra 334 SIR" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="161" foundrt="3.3292999268" foundrrt="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" area="2140861.2500000000" height="16134221.0000000000" response="2140861.2500000000" pkflags="bb" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="" modifiedtime="" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="3.1303999424" endrt="3.7107000351" startht="3651.8000000000" endht="16670.4000000000" absresponse="2140861.2500000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="334.2170715332" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="347.1" peaks="0" pkwidth="7.7870000000" pksigma="3.2770000000" pkskew="0.6590000000" pkkurt="1.4860000000" heightdivarea="7.5363225898" baselinewidth="34.8180055618" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="48274.6764729440" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.2000000030" peaktailwidth="0.3799999952" peakasymmetryvalue="1.8999999762" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="6160.2280000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="347.1" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Porphyra 334 SIR" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="1" groupid="1"/>
</COMPOUND>
<USERDATA sampleid="1" groupid="1"/>
</SAMPLE>
<SAMPLE id="2" groupid="1" name="MAA_211220_02" createdate="20-Dec-21" createtime="10:11:04" type="Analyte" desc="'Umbilicalis' laver filtrate 3D9" dilutionfac="0.0000000000" extractvolume="0.0000000000" initamount="0.0000000000" injectvolume="2.0000000000" job="MAA_211220" sampleid="" samplenumber="2" stdconc="0.0000000000" stockdilutionfac="0.0000000000" subjecttext="" subjecttime="0.0000000000" userdilutionfac="0.0000000000" vial="1:A,2" inletmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAA_Dev_17" msmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAAs SIR5.EXP" prerunmethodname="" postrunmethodname="" switchmethodname="" hplcmethodname="" tunemethodname="C:\Masslynx Projects\Histamine_QDA_Dev.PRO\ACQUDB\Default.ipr" fractionlynxname="" instrument="ACQ-QDA#KAD3691" lab="" conditions="" submitter="" task="" user="" reinjections="0" text="'Umbilicalis' laver filtrate 3D9">
<COMPOUND id="1" sampleid="2" groupid="1" name="Palythine" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="517" foundrt="1.7200000286" foundrrt="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" area="69654.0080000000" height="1250121.0000000000" response="69654.0080000000" pkflags="MM!" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="20-Dec-21" modifiedtime="14:24:57" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="1.6000000238" endrt="1.7599999905" startht="0.0000000000" endht="10847.0340000000" absresponse="69654.0080000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="4.1693286896" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="318_322" peaks="0" pkwidth="3.0090000000" pksigma="6.4940000000" pkskew="-0.4530000000" pkkurt="0.7820000000" heightdivarea="17.9475817099" baselinewidth="9.5999979973" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="299837.4781816338" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.1199999973" peaktailwidth="0.0399999991" peakasymmetryvalue="0.3330000043" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="0.0000000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="318_322" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Palythine" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="2" groupid="1"/>
</COMPOUND>
<COMPOUND id="14" sampleid="2" groupid="1" name="Porphyra 334 SIR" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="162" foundrt="3.3459000587" foundrrt="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" area="1934833.8750000000" height="14881056.0000000000" response="1934833.8750000000" pkflags="bb" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="" modifiedtime="" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="3.1800999641" endrt="3.7107000351" startht="5267.0000000000" endht="16324.8000000000" absresponse="1934833.8750000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="208.7208557129" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="347.1" peaks="0" pkwidth="7.5160000000" pksigma="3.2120000000" pkskew="0.6470000000" pkkurt="1.3920000000" heightdivarea="7.6911285213" baselinewidth="31.8360042572" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="71296.4497446734" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.1669999957" peaktailwidth="0.3639999926" peakasymmetryvalue="2.1860001087" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="5185.1130000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="347.1" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Porphyra 334 SIR" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="2" groupid="1"/>
</COMPOUND>
<USERDATA sampleid="2" groupid="1"/>
</SAMPLE>
</SAMPLELISTDATA>
<CALIBRATIONDATA filename="C:\Masslynx Projects\Caffeine.PRO\CurveDB\Meth1.cdb" modifieddate="25 Sep 2015" modifiedtime="00:20:14" count="2">
<COMPOUND id="1" name="Compound A ( 430.5 )">
<RESPONSE type="External Std" ref="" rah="Area"/>
<CURVE type="RF" origin="" weighting="" axistrans="">
<RESPONSEFACTOR cc="15552.5556000000" stddev="2208.2674143620" percrelsd="0.1319874310"/>
</CURVE>
</COMPOUND>
<COMPOUND id="2" name="Compound B ( 458.5 )">
<RESPONSE type="Internal Std" ref="1" rah="Area * ( IS Conc. / IS Area )"/>
<CURVE type="Linear" origin="Exclude" weighting="1/x" axistrans="None">
<CALIBRATIONCURVE curve="0.012594 * x + 0.005516"/>
<DETERMINATION rsquared="0.9741537568"/>
</CURVE>
</COMPOUND>
</CALIBRATIONDATA>
</GROUP>
</GROUPDATA>
</QUANDATASET>
我想要得到的是一个单一的数据框(在 R 或 Python/Pandas 中),其中每一行代表与 SAMPLE/COMPOUND 对关联的所有数据(属性)(即在我上面的示例中) 2 个样本,每个样本有 2 个化合物,然后应该是数据框的 4 行,其中包含与它们关联的所有节点/子/属性的所有属性的许多列。
数据框列表(每个样本一个)也可以使用,但是样本名称需要与该列表中的每个数据框相关联,所以我认为一个大数据框可能更容易。
非常感谢您提供任何帮助/见解/提示/建议。
【问题讨论】:
-
如果我们有
xml文件和您已经尝试过的示例,我们会更容易提供帮助 -
我已经提供了 XML 代码。它只是一个记录较少的缩短版本,但它是完整的文件结构。我正在努力制作我尝试过的所有主要方法的 MRE,并在完成后添加它们。
-
预期输出是什么? (基于您发布的 XML)
-
请发布预期的输出数据框