使用 R 进行网页抓取 - 如何使用 AJAX 单击动态网页中的按钮？答案

【问题标题】：Web scraping with R - How to click on a button in a dynamic web page using AJAX?使用 R 进行网页抓取 - 如何使用 AJAX 单击动态网页中的按钮？
【发布时间】：2019-03-09 21:14:13
【问题描述】：

如何修改以下 R 代码以提取 Quarterly 数据？我正在尝试从 Yahoo Finance 获取数据，这是一个使用 AJAX 的动态网页，因此年度和季度数据的地址保持不变。选择器是“button.P\(0px\)”。到目前为止，我已经成功地提取了 AAPL 损益表表的年度数据，但仍在努力获取季度数据。欢迎任何建议:)

library(rvest)
url <- 'https://finance.yahoo.com/quote/AAPL/financials?p=AAPL'
webpage <- read_html(url)
tableIS <- html_table(html_nodes(webpage,'table.Lh\\(1\\.7\\)'), header = NA, trim = TRUE, fill = FALSE, dec = ".")
print (tableIS)

【问题讨论】：

标签： javascript r ajax web-scraping

【解决方案1】：

这应该会让你朝着正确的方向前进。

result <- read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:AAPL&region=usa&culture=en-US&cur=&reportType=is&period=3&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)
print(result)

您可能也对此感兴趣。

# financial metrics and ratios
read.csv("http://financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL",header=T,stringsAsFactors = F,skip = 2)[,-c(12)]->spreadsheet
View(spreadsheet)

【讨论】：

谢谢兄弟，非常感谢您的帮助。它工作得很好！只是大声思考，您认为是否有可能实现类似的代码来报废雅虎财经网页？单击“年度/季度”按钮时，我一直在关注 js 容器，这是消息： //cdn.mookie1.com/containr.js?at=1&anId=6862&advId=434861&campId=26326417&pubId=273&chanId=134644197&placementId=300x250&adsafe_par=&bidurl= https%3A%2F%2Ffinance.yahoo.com%2Fquote%2FAAPL%2Ffinancials&bidPr=&uId=&impId=3899873530262390300&BEGIN__ADSAFE=&prc=1047638&END__ADSAFE=
我很久以前研究过同样的事情。我从来没有弄清楚为什么年度和季度都使用相同的确切 URL。我认为您需要使用 Selenium 单击“季度”链接，然后在页面刷新时抓取数据。我不知道如何使用 R 来做到这一点。我可以用 Python 来做到这一点，但这在这里没有多大帮助。