【问题标题】:Read table HTML in dropbox with XML package使用 XML 包在 Dropbox 中读取表格 HTML
【发布时间】:2016-04-04 11:52:28
【问题描述】:

我将尝试使用 XML 包读取 Dropbox 中的表格 HTML,但是 XML::readHTMLTable 函数在 html 中的 Dropbox 中不起作用,我不知道为什么,有人可以帮助我吗?

我的代码:

require(httr)
require(XML) 

在 Dropbox 中打开表格 html 文件

FILE <- GET(url="https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=0") 

阅读表格

tables <- getNodeSet(htmlParse(FILE), "//table") 
FE_tab <- readHTMLTable(tables[2], 
                    header = c("empresa","desc_projeto","desc_regiao", 
"cadastrador_por","cod_talhao","descricao", 
"formiga_area","qtd_destruido","latitude", 
                               "longitude","data_cadastro"), 
                    colClasses = c("character","character","character", 
"character","character","character", 
"character","character","character", 
                                   "character","character"), 
                    trim = TRUE, stringsAsFactors = FALSE 
                   ) 
head(FE_tab) ### Doesn’t work

【问题讨论】:

  • 将您的网址设为https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=0&amp;raw=1

标签: xml r html-table httr rvest


【解决方案1】:

你可以这样做:

require(rvest)
doc <- read_html("https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=1")
FE_tab <- doc %>% html_table() %>% `[[`(1)

在您的代码中,您需要在 URL 的末尾使用 ?dl=1。否则,您将获得打开 https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=0 时显示的 Dropbox 页面的源代码

如果您仍想使用XML 包,请执行以下操作:

FILE <- GET(url="https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=1")
tables <- getNodeSet(htmlParse(FILE), "//table") 
FE_tab <- readHTMLTable(tables[[1]], 
                        header = c("empresa","desc_projeto","desc_regiao", 
                                   "cadastrador_por","cod_talhao","descricao", 
                                   "formiga_area","qtd_destruido","latitude", 
                                   "longitude","data_cadastro"), 
                        colClasses = c("character","character","character", 
                                       "character","character","character", 
                                       "character","character","character", 
                                       "character","character"), 
                        trim = TRUE, stringsAsFactors = FALSE 
) 
head(FE_tab)

由于tables 是一个列表:使用tables[[1]] 并使用1 而不是2,因为表中只有一个列表元素。

【讨论】:

  • 两个包都很好用!!非常感谢 Floo0
猜你喜欢
  • 1970-01-01
  • 2015-07-08
  • 1970-01-01
  • 2021-10-08
  • 1970-01-01
  • 2014-06-13
  • 1970-01-01
  • 2015-06-02
  • 2020-02-17
相关资源
最近更新 更多