【发布时间】:2022-01-12 15:52:35
【问题描述】:
EPA CompTox Chemical Dashboard 收到了更新,我的旧代码不再能够获取化学品的沸点。有人能帮我刮一下实验平均沸点吗?我需要能够编写一个可以循环遍历多种化学物质的 R 代码。
示例网页:
丙酮:https://comptox.epa.gov/dashboard/chemical/properties/DTXSID8021482
甲烷:https://comptox.epa.gov/dashboard/chemical/properties/DTXSID8025545
我试过read_html() 和xmlParse() 都没有成功。实验平均沸点 (ExpAvBP) 值未显示在 XML 中。
我尝试过使用RCrawler 中的ContentScraper(),但无论我尝试什么,它都只会返回NA。此外,这仅适用于列出的第一个网页,因为单元格 ID 会随每种化学品而变化。
ContentScraper(Url="https://comptox.epa.gov/dashboard/chemical/properties/DTXSID8021482", XpathPatterns = "//*[@id='cell-225']")
我尝试过使用readLines(),但信息都被塞进了最后一个脚本标签,我不确定如何只隔离ExpAvBP 值。看起来价值存储在其他地方?例如,下面是我认为最后一个脚本标签中的沸点信息。
丙酮:
{unit:c_,name:"沸点",predicted:{rawData:[{value:c$,minValue:e,maxValue:e,source:am,description:an,modelName:"TEST_BP",modelId :T,hasOpera:d,globalApplicability:e,hasQmrfPdf:d,details:{value:B,link:"https:\u002F\u002Fs3.amazonaws.com\u002Fepa-comptox\u002Ftest-reports\u002FDTXCID101482-TEST_BP.html" ,showLink:a},qmrf:{value:e,link:e,showLink:d}},{value:44.8,minValue:e,maxValue:e,source:ci,description:cj,modelName:"EPISUITE_BP", modelId:dV,hasOpera:d,globalApplicability:e,hasQmrfPdf:d,details:{value:M,link:e,showLink:d},qmrf:{value:e,link:e,showLink:d}},{ value:46.458,minValue:e,maxValue:e,source:ad,description:V,modelName:"ACD_BP",modelId:135,hasOpera:d,globalApplicability:e,hasQmrfPdf:d,details:{value:M,link :e,showLink:d},qmrf:{value:e,link:e,showLink:d}},{value:da,minValue:e,maxValue:e,source:aL,description:bo,modelName:"OPERA_BP ",modelId:dS,hasOpera:a,globalApplicability:q,hasQmrfPdf:a,details:{value:B,link:"http:\u002F\u002Fcomptox-dev.epa.gov\u002Fdashboard\u002Fdsstoxdb\u0 02Fcalculation_details?model_id=27&search=21482",showLink:a},qmrf:{value:B,link:"http:\u002F\u002Fcomptox-dev.epa.gov\u002Fdashboard\u002Fdsstoxdb\u002Fdownload_qmrf_pdf?model=27",showLink: a}}],count:bu,mean:47.06289999999999,min:c$,max:da,range:[c$,da],median:45.629},experimental:{rawData:[{value:db,minValue:e ,maxValue:e,source:aN,description:aO,experimentalDetails:[]},{value:ak,minValue:ak,maxValue:ak,source:ck,description:cl,experimentalDetails:[]},{value:ak ,minValue:ak,maxValue:ak,source:ck,description:cl,experimentalDetails:[]},{value:ak,minValue:ak,maxValue:ak,source:"联合国粮食及农业组织",description :“FAO\u002FWHO 食品添加剂联合专家委员会 (JECFA) 是一个国际专家科学委员会,由联合国粮食及农业组织 (FAO) 和世界卫生组织 (WHO) 共同管理。网站:\u003Ca href="http:\u002F\u002Fwww.fao.org\u002Fhome\u002F" target="_blank"\u003Ehttp:\u002F\u002Fwww.fao.org\u002Fhome\u002F\u003C\u002Fa\u003E",实验细节:[]},{值:56.05,minValue:e,maxValue:e,来源:“Abooali et al。诠释。 J.冷藏。 2014, 40, 282–293",描述:"Abooali, D.; Sobati, M. A. 预测纯制冷剂正常沸点下的正常沸点和蒸发焓的新方法:QSPR 方法。 (\u003Ca href="http:\u002F\u002Fdx.doi.org\u002F10.1016\u002Fj.ijrefrig.2013.12.007" target="_blank"\u003EInt. J. Refrig. 2014, 40, 282–293\u003C \u002Fa\u003E)\r\n",experimentalDetails:[]},{value:bO,minValue:bO,maxValue:bO,source:hI,description:hJ,experimentalDetails:[]}],count:dK,mean :55.98518333333333,min:db,max:bO,range:[db,bO],median:ak},arrKey:"BOILING_POINT"}
甲烷:
{unit:cO,name:"沸点",predicted:{rawData:[{value:at,minValue:f,maxValue:f,source:bB,description:bb,modelName:"ACD_BP",modelId: 135,hasOpera:d,globalApplicability:f,hasQmrfPdf:d,details:{value:ag,link:f,showLink:d},qmrf:{value:f,link:f,showLink:d}},{value: hl,minValue:f,maxValue:f,source:aF,description:ba,modelName:"OPERA_BP",modelId:dv,hasOpera:a,globalApplicability:s,hasQmrfPdf:a,details:{value:O,link:" http:\u002F\u002Fcomptox-dev.epa.gov\u002Fdashboard\u002Fdsstoxdb\u002Fcalculation_details?model_id=27&search=25545",showLink:a},qmrf:{value:O,link:"http:\u002F\u002Fcomptox-dev. epa.gov\u002Fdashboard\u002Fdsstoxdb\u002Fdownload_qmrf_pdf?model=27",showLink:a}},{value:cP,minValue:f,maxValue:f,source:bZ,description:b_,modelName:"EPISUITE_BP",modelId: dy,hasOpera:d,globalApplicability:f,hasQmrfPdf:d,details:{value:ag,link:f,showLink:d},qmrf:{value:f,link:f,showLink:d}}],count: bH,mean:-129.25300000000001,min:at,max:cP,range:[at,cP],median:hl},experimental:{rawData:[{value:at,minValu e:at,maxValue:at,source:hm,description:hn,experimentalDetails:[]},{value:cQ,minValue:f,maxValue:f,source:bC,description:bD,experimentalDetails:[]}], count:H,mean:ho,min:at,max:cQ,range:[at,cQ],median:ho},arrKey:"BOILING_POINT"}
任何帮助或见解将不胜感激!
【问题讨论】:
标签: r web-scraping xml-parsing html-parsing rvest