【问题标题】:Unable to get results when web scraping with rvest使用 rvest 抓取网页时无法获得结果
【发布时间】:2019-12-19 14:03:21
【问题描述】:

我想通过VISA ATM locator 获取西班牙 ATM 的位置。西班牙的结果显示在表格中,但我不知道如何访问这些元素。我试过了:

link <- "https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:Spain))"
visa_webpage <- read_html(link)
  visa_webpage %>%
  html_nodes("visaATMResultListItem") %>%
  html_text()

【问题讨论】:

  • 该表可能是动态构建的,因此 rvest 将无法捕获它。您可以尝试其他工具,例如 RSelenium

标签: r rvest


【解决方案1】:

数据是从浏览器发出的 jquery 请求中动态检索的。您可以使用它的简化版本来检索数据

library(httr)
library(stringr)
library(jsonlite)

headers = c('User-Agent' = 'Mozilla/5.0')

params = list(
  'callback' = 'jQuery112403101782845756018_1577837576284',
  'request' = '{"wsRequestHeaderV2":{"requestTs":"","applicationId":"VATMLOC","requestMessageId":"test12345678","userId":"CDISIUserID","userBid":"10000108","correlationId":"909420141104053819418"},"requestData":{"culture":"en-US","distance":"60","distanceUnit":"mi","metaDataOptions":0,"location":{"address":null,"placeName":"Spain","geocodes":{"latitude":"40.227949660000036","longitude":"-3.6460631049999392"}},"options":{"sort":{"primary":"distance","direction":"asc"},"range":{"start":0,"count":8},"operationName":"and","findFilters":[{"filterName":"PLACE_NAME","filterValue":""},{"filterName":"CARD_ACCEPT","filterValue":""},{"filterName":"OPER_HRS","filterValue":""},{"filterName":"AIRPORT_CD","filterValue":""},{"filterName":"WHEELCHAIR","filterValue":""},{"filterName":"BRAILLE_AUDIO","filterValue":""},{"filterName":"BALANCE_INQUIRY","filterValue":""},{"filterName":"CHIP_CAPABLE","filterValue":""},{"filterName":"PIN_CHANGE","filterValue":""},{"filterName":"RESTRICTED","filterValue":""},{"filterName":"PLUS_ALLIANCE_NO_SURCHARGE_FEE","filterValue":""},{"filterName":"ACCEPTS_PLUS_SHARED_DEPOSIT","filterValue":""},{"filterName":"V_PAY_CAPABLE","filterValue":""},{"filterName":"READY_LINK","filterValue":""}],"useFirstAmbiguous":true}}}',
  '_' = '1577837576288'
)

r <- httr::GET(url = 'https://www.visa.com/atmlocator_services/rest/findNearByATMs', httr::add_headers(.headers=headers), query = params)


data <- jsonlite::fromJSON(str_match(r%>%toString() , 'jQuery112403101782845756018_1577837576284\\((.*)\\)' )[1,2])
locations <- data.frame(data$responseData$foundATMLocations[1])

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-10-02
    • 2020-07-18
    • 2019-02-17
    • 2013-11-23
    • 2020-01-10
    • 2019-10-11
    • 1970-01-01
    相关资源
    最近更新 更多