【发布时间】:2020-10-28 01:32:37
【问题描述】:
我正在尝试通过这个阿尔茨海默病数据库读取 R 中的这五个表。解决方案是什么?如果是网络上的 csv 文件,我可以读成read.table,但是你怎么读未定义的表呢?
我要阅读的表格在这里:Link
【问题讨论】:
我正在尝试通过这个阿尔茨海默病数据库读取 R 中的这五个表。解决方案是什么?如果是网络上的 csv 文件,我可以读成read.table,但是你怎么读未定义的表呢?
我要阅读的表格在这里:Link
【问题讨论】:
这是rvest 和xml2 的一种方法:
首先,打开页面并确定相关表格的位置。例如,在 Chrome 中,按 F12,切换到元素选项卡并展开元素,直到将鼠标悬停在元素上时表格突出显示:
右键单击并选择“复制 XPath”:
现在很简单:
library(xml2)
library(rvest)
library(magrittr) #for %>%
url <- "https://www.alzforum.org/mutations/search?genes=&diseases%5B%5D=145&keywords-entry=&keywords=#results"
my_xpath <- '//*[@id="results"]/article/div/table'
table <- read_html(url) %>% html_nodes(xpath = my_xpath) %>% html_table()
table[[1]][1:10,1:4]
# Mutation Clinical Phenotype Pathogenicity Neuropathology
#1 A201V None, Parkinson's Disease Dementia AD : Not Pathogenic Not applicable.
#2 A235V Alzheimer's Disease AD : Unclear Pathogenicity Unknown.
#3 D243N Alzheimer's Disease AD : Unclear Pathogenicity Unknown.
#4 E246K Alzheimer's Disease AD : Unclear Pathogenicity Unknown.
#5 E296K Alzheimer's Disease AD : Unclear Pathogenicity Unknown.
#6 P299L Alzheimer's Disease AD : Unclear Pathogenicity Unknown.
#7 R468H None AD : Not Pathogenic Not applicable.
#8 A479S None AD : Not Pathogenic Not applicable.
#9 K496Q Alzheimer's Disease AD : Unclear Pathogenicity One reported carrier of this variant had autopsy-confirmed AD.
#10 A500T None AD : Not Pathogenic Not applicable.
根据需要重复该过程以下载其他表。
【讨论】: