xpathApply：如何传递多个路径或节点？答案

【问题标题】：xpathApply: How to pass multiple paths or nodes?xpathApply：如何传递多个路径或节点？
【发布时间】：2016-01-02 08:59:57
【问题描述】：

# parse PubMed data 

library(XML) # xpath
library(rentrez) # entrez_fetch

 pmids <- c("25506969","25032371","24983039","24983034","24983032","24983031","26386083",
          "26273372","26066373","25837167","25466451","25013473","23733758")

# Above IDs are mix of Books and journal articles 
# ID# 23733758 is an journal article and has No abstract
data.pubmed <- entrez_fetch(db = "pubmed", id = pmids, rettype = "xml",
               parsed = TRUE)
abstracts <-  xpathApply(data.pubmed, "//Abstract", xmlValue)
names(abstracts) <- pmids

如果每条记录都有一个摘要，效果会很好。但是，当 PMID (#23733758) 没有已发布的摘要（或书籍文章或其他内容）时，它会跳过导致错误 'names' attribute [5] must be the same length as the vector [4]

问：如何传递多个路径/节点，以便提取期刊文章、书籍或评论？更新：hrbrmstr 解决方案有助于解决 NA。但是，xpathApply 可以像c(//Abstract, //ReviewArticle , etc etc ) 这样的多个节点吗？

【问题讨论】：

您可以使用try() 或tryCatch()
嗨，理查德，不确定我是否理解您的解决方案。如果我的输入是 5 个 PMID，我的目标是获得 5 个摘要的输出。如果没有摘要，它仍应返回空值（4 个摘要和 1 个 Null）。所以，当我添加 PMID 作为名称时，我会知道哪个 PMID 没有抽象信息。

标签： r xpath ncbi rentrez

【解决方案1】：

你必须向上攻击它一个标签元素：

abstracts <-  xpathApply(data.pubmed, "//PubmedArticle//Article", function(x) {
  val <- xpathSApply(x, "./Abstract", xmlValue)
  if (length(val)==0) val <- NA_character_
  val
})
names(abstracts) <- pmids

str(abstracts)
List of 5
## $ 24019382: chr "Adenocarcinoma of the lung, a leading cause of cancer death, frequently displays mutational activation of the KRAS proto-oncoge"| __truncated__
## $ 23927882: chr "Mutations in components of the mitogen-activated protein kinase (MAPK) cascade may be a new candidate for target for lung cance"| __truncated__
## $ 23825589: chr "Aberrant activation of MAP kinase signaling pathway and loss of tumor suppressor LKB1 have been implicated in lung cancer devel"| __truncated__
## $ 23792568: chr "Sorafenib, the first agent developed to target BRAF mutant melanoma, is a multi-kinase inhibitor that was approved by the FDA f"| __truncated__
## $ 23733758: chr NA

根据您的评论，用另一种方法来做到这一点：

str(xpathApply(data.pubmed, '//PubmedArticle//Article', function(x) {
  xmlValue(xmlChildren(x)$Abstract)
}))

## List of 5
##  $ : chr "Adenocarcinoma of the lung, a leading cause of cancer death, frequently displays mutational activation of the KRAS proto-oncoge"| __truncated__
##  $ : chr "Mutations in components of the mitogen-activated protein kinase (MAPK) cascade may be a new candidate for target for lung cance"| __truncated__
##  $ : chr "Aberrant activation of MAP kinase signaling pathway and loss of tumor suppressor LKB1 have been implicated in lung cancer devel"| __truncated__
##  $ : chr "Sorafenib, the first agent developed to target BRAF mutant melanoma, is a multi-kinase inhibitor that was approved by the FDA f"| __truncated__
##  $ : chr NA

【讨论】：

您好 hrbrmstr，感谢您的回复。恐怕如果我这样做，这将仅限于 PubmedArticle。在批量搜索 PMID 中，一些可能是书籍、评论或其他类型的出版物。
您可以更改包装器的 XPath。（可以多选有and条件）
当我运行它时它会返回一个列表（我将输出添加到示例答案中）