【问题标题】:Quarterly Yahoo Finance Data using R使用 R 的季度雅虎财务数据
【发布时间】:2020-11-27 22:31:32
【问题描述】:

我正在尝试抓取雅虎财务数据。我找到了适用于某些数据的解决方案……但我不知道如何实现季度数据的飞跃。我想知道我是否走错了路。这是一个对我有用的解决方案,但我不知道如何跳到季度数据而不是年度数据: R: web scraping yahoo.finance after 2019 change

【问题讨论】:

  • 请向我们展示您为解决此问题而采取的措施。您预期的结果与您拥有的结果等。谢谢!

标签: r rvest


【解决方案1】:

抓取该页面的一个问题是它默认为年度数据。用户单击“季度”按钮后,季度数据将加载到浏览器中。虽然这不利于抓取,但它有利于拦截 API 请求。如果您在浏览器中打开开发人员的控制台,转到“网络”选项卡,然后选择“季度”按钮,您将看到一个请求(我将 URL 放在底部,因为它真的很长)。该请求将返回 JSON 数据。

免责声明:我对 R 了解不多。但是,在做了一些研究之后,我发现 R 有几个包可以让你读取 JSON 数据,你可以这样做:

# using rjson
url = "<get from down below>"
data = rjson::readJSON(file=url)

# using jsonlite
library(jsonlite)

url = "<get from down below>"
data <- readJSON(url)

这是网址:

https://query1.finance.yahoo.com/ws/fundamentals-timeseries/v1/finance/premium/timeseries/AAPL?lang=en-US&region=US&symbol=AAPL&padTimeSeries=true&type=annualEbitda%2CtrailingEbitda%2CannualDilutedAverageShares%2CtrailingDilutedAverageShares%2CannualBasicAverageShares%2CtrailingBasicAverageShares%2CannualDilutedEPS%2CtrailingDilutedEPS%2CannualBasicEPS%2CtrailingBasicEPS%2CannualNetIncomeCommonStockholders%2CtrailingNetIncomeCommonStockholders%2CannualNetIncome%2CtrailingNetIncome%2CannualNetIncomeContinuousOperations%2CtrailingNetIncomeContinuousOperations%2CannualTaxProvision%2CtrailingTaxProvision%2CannualPretaxIncome%2CtrailingPretaxIncome%2CannualOtherIncomeExpense%2CtrailingOtherIncomeExpense%2CannualInterestExpense%2CtrailingInterestExpense%2CannualOperatingIncome%2CtrailingOperatingIncome%2CannualOperatingExpense%2CtrailingOperatingExpense%2CannualSellingGeneralAndAdministration%2CtrailingSellingGeneralAndAdministration%2CannualResearchAndDevelopment%2CtrailingResearchAndDevelopment%2CannualGrossProfit%2CtrailingGrossProfit%2CannualCostOfRevenue%2CtrailingCostOfRevenue%2CannualTotalRevenue%2CtrailingTotalRevenue&merge=false&period1=493590046&period2=1596836602&corsDomain=finance.yahoo.com

您可以使用另一个 URL 来获取季度损益表数据,但在使用美国以外的公司时似乎有点不稳定:

https://query2.finance.yahoo.com/v10/finance/quoteSummary/aapl?modules=incomeStatementHistoryQuarterly

【讨论】:

    【解决方案2】:

    您也可以使用 R 包 RSelenium 更改为 quaterly 数据:

    library(rvest)
    library(stringr)
    library(magrittr)
    library(RSelenium)
    shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
    remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
    remDr$open()
    remDr$navigate("https://finance.yahoo.com/quote/AAPL/financials?p=AAPL")
    web_Obj_Quaterly <- remDr$findElement("xpath", '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button/div/span')
    web_Obj_Quaterly$clickElement()
    
    page_Content <- remDr$getPageSource()[[1]]
    
    page <- read_html(page_Content)
    nodes <- page %>% html_nodes(".fi-row")
    df <- NULL
    
    for(i in nodes)
    {
      r <- list(i %>% html_nodes("[title],[data-test='fin-col']") %>% html_text())
      df <- rbind(df,as.data.frame(matrix(r[[1]], ncol = length(r[[1]]), byrow = TRUE), stringsAsFactors = FALSE))
    }
    
    matches <- str_match_all(page %>% html_node('#Col1-3-Financials-Proxy') %>% html_text(),'\\d{1,2}/\\d{1,2}/\\d{4}')  
    headers <- c('Breakdown','TTM', matches[[1]][,1]) 
    names(df) <- headers
    View(df)
    

    这个答案依赖于:R: web scraping yahoo.finance after 2019 change

    【讨论】:

      猜你喜欢
      • 2018-05-21
      • 1970-01-01
      • 2017-11-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-12
      相关资源
      最近更新 更多