【发布时间】:2018-06-13 01:12:52
【问题描述】:
我是 R 新手。现在我想解析一个 XML 文件 (https://da5020.weebly.com/uploads/8/6/5/9/8659576/pubmedsample.jun18.xml),除了每篇文章的作者数量外,每次都进行配对。我采用了Efficiently get the number of children with specific name using XML and R的一段代码:
authors_number = xpathSApply(xmldata, "count(//PubmedArticle/MedlineCitation/Article/AuthorList/Author/LastName)", xmlValue)
但它返回 XML 中的作者总数。其余的解析由
完成library(tidyverse)
library(XML)
library(methods)
xmldata <- xmlParse("pubmedsample.jun18.xml", useInternalNodes = TRUE)
publication <- tibble(PMID = as.numeric(xpathSApply(xmldata, '//MedlineCitation/PMID', xmlValue)),
ISSN = xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./Article/Journal/ISSN)")) {
xpathSApply(x, "./Article/Journal/ISSN", xmlValue)
} else {
NA
}}),#parse ISSN
data_completed_year = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateCompleted/Year)")) {
xpathSApply(x, "./DateCompleted/Year", xmlValue)
} else {
NA
}})),
data_completed_month = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateCompleted/Month)")) {
xpathSApply(x, "./DateCompleted/Month", xmlValue)
} else {
NA
}})),
data_completed_day = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateCompleted/Day)")) {
xpathSApply(x, "./DateCompleted/Day", xmlValue)
} else {
NA
}})),
data_revised_year = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateRevised/Year)")) {
xpathSApply(x, "./DateRevised/Year", xmlValue)
} else {
NA
}})),
data_revised_month = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateRevised/Month)")) {
xpathSApply(x, "./DateRevised/Month", xmlValue)
} else {
NA
}})),
data_revised_day = as.numeric(xpathSApply(xmldata, '//PubmedArticle/MedlineCitation', function(x) {
if (xpathSApply(x, "boolean(./DateRevised/Day)")) {
xpathSApply(x, "./DateRevised/Day", xmlValue)
} else {
NA
}})),
publication_type = as.character(xpathSApply(xmldata, '//PublicationTypeList', xmlValue))[1],#parse the first type, if more than one
article_title = as.character(xpathSApply(xmldata, '//ArticleTitle', xmlValue))) %>%
mutate(completed_date = as.character(make_date(data_completed_year, data_completed_month, data_completed_day)), revised_date = as.character(make_date(data_revised_year, data_revised_month, data_revised_day))) %>%
select(PMID, ISSN, completed_date, revised_date, publication_type, article_title)
有人可以教我如何获取每篇文章的作者数量吗?非常感谢!
【问题讨论】: