R中的Web抓取“open.connection（x，“rb”）中的错误：HTTP错误403。”答案

【问题标题】：Webscraping in R "Error in open.connection(x, "rb") : HTTP error 403."R中的Web抓取“open.connection（x，“rb”）中的错误：HTTP错误403。”
【发布时间】：2020-09-22 14:37:38
【问题描述】：

我想抓取下一页：'https://www.idealista.com/alquiler-viviendas/girona-provincia/' 带有 rvest 包，它给了我以下错误：' open.connection(x, "rb") 中的错误：HTTP 错误 403。'

library(rvest)
library(curl)
library(xm12)

url= 'https://www.idealista.com/alquiler-viviendas/girona-provincia/'
webidealista=read_html(url)

webidealista=read_html(url)

open.connection(x, "rb") 中的错误：HTTP 错误 403。

有人可以帮我解决吗？我将不胜感激。
enter image description here

【问题讨论】：

请不要发布代码/数据/错误的图像：它不能被复制或搜索 (SEO)，它会破坏屏幕阅读器，并且它可能不适合某些移动设备。请使用 dput 添加数据并显示相同的预期输出。请阅读有关How to ask good question 和Reproducible example 的信息
你到底想从网页上抓取什么？

标签： r web-scraping

【解决方案1】：

我能够使用以下代码获取页面的 html 内容：

library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://www.idealista.com/alquiler-viviendas/girona-provincia/")

# Close the pop-up ...
web_Obj_Accept <- remDr$findElement("xpath", "//*[@id='didomi-notice-agree-button']/span")
web_Obj_Accept$clickElement()

# Get content ...
html_Content <- remDr$getPageSource()[[1]]

【讨论】：