【问题标题】:Download multiple data tables from a website从网站下载多个数据表
【发布时间】:2020-10-28 01:32:37
【问题描述】:

我正在尝试通过这个阿尔茨海默病数据库读取 R 中的这五个表。解决方案是什么?如果是网络上的 csv 文件,我可以读成read.table,但是你怎么读未定义的表呢?

我要阅读的表格在这里:Link

【问题讨论】:

    标签: r web url


    【解决方案1】:

    这是rvestxml2 的一种方法:

    首先,打开页面并确定相关表格的位置。例如,在 Chrome 中,按 F12,切换到元素选项卡并展开元素,直到将鼠标悬停在元素上时表格突出显示:

    右键单击并选择“复制 XPath”:

    现在很简单:

    library(xml2)
    library(rvest)
    library(magrittr) #for %>%
    url <- "https://www.alzforum.org/mutations/search?genes=&diseases%5B%5D=145&keywords-entry=&keywords=#results"
    my_xpath <- '//*[@id="results"]/article/div/table'
    table <- read_html(url) %>% html_nodes(xpath = my_xpath) %>% html_table()
    table[[1]][1:10,1:4]
    #   Mutation                 Clinical Phenotype              Pathogenicity                                                 Neuropathology
    #1     A201V None, Parkinson's Disease Dementia        AD : Not Pathogenic                                                Not applicable.
    #2     A235V                Alzheimer's Disease AD : Unclear Pathogenicity                                                       Unknown.
    #3     D243N                Alzheimer's Disease AD : Unclear Pathogenicity                                                       Unknown.
    #4     E246K                Alzheimer's Disease AD : Unclear Pathogenicity                                                       Unknown.
    #5     E296K                Alzheimer's Disease AD : Unclear Pathogenicity                                                       Unknown.
    #6     P299L                Alzheimer's Disease AD : Unclear Pathogenicity                                                       Unknown.
    #7     R468H                               None        AD : Not Pathogenic                                                Not applicable.
    #8     A479S                               None        AD : Not Pathogenic                                                Not applicable.
    #9     K496Q                Alzheimer's Disease AD : Unclear Pathogenicity One reported carrier of this variant had autopsy-confirmed AD.
    #10    A500T                               None        AD : Not Pathogenic                                                Not applicable.
    

    根据需要重复该过程以下载其他表。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-01-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-05
      • 1970-01-01
      相关资源
      最近更新 更多