【发布时间】:2015-05-08 12:27:53
【问题描述】:
我有一个如下所示的数据框:
country <- c("Canada", "US", "Japan", "China")
url <- c("http://en.wikipedia.org/wiki/United_States", "http://en.wikipedia.org/wiki/Canada",
"http://en.wikipedia.org/wiki/Japan", "http://en.wikipedia.org/wiki/China")
df <- data.frame(country, url)
country link
1 Canada http://en.wikipedia.org/wiki/United_States
2 US http://en.wikipedia.org/wiki/Canada
3 Japan http://en.wikipedia.org/wiki/Japan
4 China http://en.wikipedia.org/wiki/China
使用 rvest 我想为每个 url 抓取 目录 并将它们绑定到一个输出。
此代码提取一个 url 的目录:
library(rvest)
toc <- html(url) %>%
html_nodes(".toctext") %>%
html_text()
期望的输出:
country toc
US Etymology
History
Native American and European contact
Settlements
...
Canada Etymology
History
Aboriginal peoples
European colonization
...etc
【问题讨论】: