“XML 内容似乎不是 XML”：R 中 xmlTreeParse 中的错误答案

【问题标题】："XML content does not seem to be XML" : Error in xmlTreeParse in R“XML 内容似乎不是 XML”：R 中 xmlTreeParse 中的错误
【发布时间】：2012-09-24 20:30:55
【问题描述】：

我正在阅读 R 中的 topicmodels 教程。在第 12 页左右，它们去除了 HTML 标记和希腊字母：

R> library("XML")
R> remove_HTML_markup <- function(s) {
+ doc <- htmlTreeParse(s, asText = TRUE, trim = FALSE)
+ xmlValue(xmlRoot(doc))
+ }
R> remove_HTML_markup(JSS_papers[1,"description"])
Error: XML content does not seem to be XML, nor to identify a file name ...

JSS_papers 存储与从期刊下载的论文集合相关的元数据。 description 标签下的条目是文章的摘要。这个没有任何标签：

JSS_papers[1,"description"] = "The fit of a variogram model to spatially-distributed 
    data is often difficult to assess. A graphical diagnostic written in S-plus is   
    introduced that allows the user to determine both the general quality of the fit of a 
    variogram model, and to find specific pairs of locations that do not have measurements 
    that are consonant with the fitted variogram. It can help identify nonstationarity,    
    outliers, and poor variogram fit in general. Simulated data sets and a set of soil      
    nitrogen concentration data are examined using this graphical diagnostic."

【问题讨论】：

它对我有用。你能发布你的sessionInfo()吗？

标签： r xml-parsing

【解决方案1】：

我最近遇到了同样的问题。我用 URL 分配的变量中有错字。仔细检查您的变量 s，看看那里是否有问题。

【讨论】：