【发布时间】:2021-10-27 06:41:35
【问题描述】:
我想在 R 中编码,它可以下载此 URL 上给出的所有 pdf:https://www.rbi.org.in/scripts/AnnualPublications.aspx?head=Handbook%20of%20Statistics%20on%20Indian%20Economy 然后下载文件夹中的所有pdf。我在https://towardsdatascience.com 的帮助下尝试了以下代码,但代码错误为
library(tidyverse)
library(rvest)
library(stringr)
library(purrr)
page <- read_html("https://www.rbi.org.in/scripts/AnnualPublications.aspx?
head=Handbook%20of%20Statistics%20on%20Indian%20Economy") %>%
raw_list <- page %>% # takes the page above for which we've read the html
html_nodes("a") %>% # find all links in the page
html_attr("href") %>% # get the url for these links
str_subset("\\.pdf") %>% # find those that end in pdf only
str_c("https://rbi.org.in", .) %>% # prepend the website to the url
map(read_html) %>% # take previously generated list of urls and read them
map(html_node, "#raw-url") %>% # parse out the 'raw' url - the link for the download button
map(html_attr, "href") %>% # return the set of raw urls for the download buttons
str_c("https://www.rbi.org.in", .) %>% # prepend the website again to get a full url
for (url in raw_list)
{ download.file(url, destfile = basename(url), mode = "wb")
}
我无法解释为什么代码会出错。如果有人可以帮助我。
【问题讨论】:
标签: html r json url web-scraping