【发布时间】:2015-03-04 20:01:47
【问题描述】:
如何循环使用rvest::follow_link() 函数来抓取链接的网页?
用例:
- 确定所有乐高电影演员
- 关注所有乐高电影演员链接
- 为所有演员获取每部电影(+ 年)的表格
我需要的选择器如下:
library(rvest)
lego_movie <- html("http://www.imdb.com/title/tt1490017/")
lego_movie <- lego_movie %>%
html_nodes(".itemprop , .character a") %>%
html_text()
# follow cast links
(".itemprop .itemprop")
# grab tables of all movies and dates for each cast member
(".year_column , b a")
期望的输出:
castMember movie year
Will Arnett Lego 2017
Will Arnett BoJack 2014
Will Arnett Wander 2014
............
Elizabeth Banks Moonbeam 2015
Elizabeth Banks Wet Hot 2015
............
Alison Brie Get Hard 2015
Alison Brie GetaJob 2015
.....etc.....
【问题讨论】:
标签: r web-scraping rvest