【问题标题】:Rvest output returning "Character(0)" instead of the column highlighted with selectorgadgetRvest 输出返回“Character(0)”而不是使用 selectorgadget 突出显示的列
【发布时间】:2015-06-26 02:40:59
【问题描述】:

我正在尝试使用 rvest 从盖茨基金会授予的资助表中抓取一些列。以下是我的代码:

library(rvest)    
data1 <- html('http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015')
table1 <- data1 %>%html_nodes('td:nth-child(5) , td:nth-child(3)') %>% html_text()
table1

我从“table1”命令收到的输出如下:

字符(0)

我使用的 CSS 选择器有问题吗?这种表不兼容rvest吗?

【问题讨论】:

  • 您想从该站点获取哪些数据?

标签: r web-scraping rvest


【解决方案1】:

这是使用 RSelenium 的最后两列的示例代码(您需要在工作目录中有 phantomjs 驱动程序才能运行以下代码)。详情见here

library(RSelenium)
library(rvest)

pJS <- phantom()
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = FALSE)
remDr$navigate("http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015")

test.html <- read_html(remDr$getPageSource()[[1]]) #html is deprecated in new version of rvest
test.text<-test.html%>%
  html_nodes("td:nth-child(5) , td:nth-child(3)")%>%
  html_text()
test.df<-data.frame(matrix(test.text,ncol=2,byrow=TRUE))
names(test.df)<-c("program","amount")
remDr$close()
pJS$stop()

df

test.df
program     amount
1     Postsecondary Success   $498,727
2          Community Grants   $200,000
3  Global Policy & Advocacy $1,035,523
4     Postsecondary Success    $95,000
5     Postsecondary Success    $25,000
6             College-Ready $1,257,526
7             College-Ready $1,066,403
8    Strategic Partnerships    $50,000
9             College-Ready $1,195,581
10            College-Ready   $300,000
11            College-Ready   $100,000
12            College-Ready    $21,200

【讨论】:

    猜你喜欢
    • 2013-10-03
    • 1970-01-01
    • 1970-01-01
    • 2021-05-12
    • 1970-01-01
    • 2022-01-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多