Google 表格 =importXML() 返回“导入的内容为空”答案

【问题标题】：Google Sheets =importXML() returns "Imported Content is Empty"Google 表格 =importXML() 返回“导入的内容为空”
【发布时间】：2021-02-20 23:30:04
【问题描述】：

我正在尝试为使用 Google 表格的图书列表导入最常见的国会图书馆标识符。 ISBN 的 XML 文件是 http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true。为方便起见，将 XML 粘贴在下面。我想获取 lcc/mostPopular[@nsfa] 但公式 =importxml("http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true","lcc/mostPopular[@nsfa]") 返回“导入的内容为空”。

我输入的 xpath_query 是否错误？

我知道该链接有效，因为我可以使用 =importdata("http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true") 导入整个内容，但这会在电子表格中产生乱码。

<classify xmlns="http://classify.oclc.org">
<response code="0"/>
<!-- Classify is a product of OCLC Online Computer Library Center: http://classify.oclc.org -->
<work author="Piper, John, 1946-" editions="4" eholdings="97" format="Book" holdings="184" itemtype="itemtype-book" owi="769061307" title="Bloodlines : race, cross, and the Christian">696100305</work>
<authors>
<author lc="n78072014" viaf="109537817">Piper, John, 1946-</author>
</authors>
<orderBy>thold desc</orderBy>
<input type="isbn">1433528525</input>
<recommendations>
<ddc>
<mostPopular holdings="280" nsfa="270.089" sfa="270.089"/>
<mostRecent holdings="280" sfa="270.089"/>
<latestEdition holdings="280" sf2="22" sfa="270.089"/>
</ddc>
<lcc>
<mostPopular holdings="280" nsfa="BT738.27" sfa="BT738.27"/>
<mostRecent holdings="280" sfa="BT738.27"/>
</lcc>
</recommendations>
</classify>

【问题讨论】：

输出应该是什么？ BT738.27？
@player0 没错，这就是我想要的。

标签： xml web-scraping google-sheets google-sheets-formula importerror

【解决方案1】：

我认为在=importxml("http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true","lcc/mostPopular[@nsfa]")的公式中，需要修改xpath。在这个答案中，我想建议修改 xpath 以实现您的目标。那么，下面的修改呢？

修改公式：

=IMPORTXML(A1,"//*[local-name()='lcc']//@nsfa")

在这个公式中，请将http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true的URL放到单元格“A1”中。
在这种情况下，nsfa 属性仅存在于标签lcc 下。所以我认为你可以使用//*[local-name()='lcc']//@nsfa 作为xpath。
如果要使用lcc/mostPopular[@nsfa]，也可以使用//*[local-name()='lcc']/*[local-name()='mostPopular']/@nsfa的xpath。

结果：

参考资料：

【讨论】：

【解决方案2】：

尝试：

=REGEXEXTRACT(QUERY(FLATTEN(SPLIT(QUERY(IMPORTDATA(
 "http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true"),,9^9), 
 "<lcc>", 0)), 
 "where Col1 contains 'mostPopular' offset 1"), 
 "nsfa=""([^\s]+)""")

【讨论】：

太完美了！谢谢！我没有很多编码知识，但看起来你拿了=importdata()，把它分成一个表，然后搜索这个表？