如何使用 R 从图表中刮取数据答案

【问题标题】：How to Scrape data from chart with R如何使用 R 从图表中刮取数据
【发布时间】：2021-09-14 19:50:18
【问题描述】：

我想使用 R 将推文在 https://bitinfocharts.com 上的音量图表中的数据抓取到一个数据文件中。我是这个世界的新手，在网上搜索了很多之后，我别无选择，只能要求你的帮助。我在论坛中发现了同样的问题，但它是针对 python (How to Scrape data from chart on https://bitinfocharts.com)

有问题的图表如下：https://bitinfocharts.com/comparison/decred-tweets.html#alltime

我正在寻找一个数据表，其中每个日期和当天的相应推文数量作为列。

希望你的经验对我有帮助

【问题讨论】：

请提供足够的代码，以便其他人更好地理解或重现问题。

标签： r web-scraping charts

【解决方案1】：

这段代码应该有助于提取您需要的数据：

library('rvest')
library('stringr')

url <- 'https://bitinfocharts.com/comparison/decred-tweets.html#alltime'
webpage <- read_html(url)
res <- str_match(webpage, 'new Dygraph\\(document.getElementById\\(\"container\\"\\),\\s*(.*?)\\s*, \\{labels')
res[,2]

完成此操作后，您应该解析 res[,2] 并根据需要对其进行转换。

【讨论】：

非常感谢 Dan，即使我不知道 'new Dygraph\(document.getElementById\(\"container\\"\),\\s*(.*?)\ \s*, \\{labels' 来自。但是，你能给我一些提示来解析得到的矩阵吗？如何取出我在 carachter 中的所有括号？

【解决方案2】：

新的 Dyagraph 部分来自页面源。如果您在页面源中搜索它（在浏览器中查看源：https://bitinfocharts.com/comparison/decred-tweets.html），您会注意到它。基本上，网站会根据这些数据创建图表。要解析矩阵，您需要先删除字符串的“new Date(”") 部分，然后使用 json 库解析完整的字符串。

以下是应该可以帮助您的完整代码：

library('rvest')
library('stringr')
library('jsonlite')

url <- 'https://bitinfocharts.com/comparison/decred-tweets.html#alltime'
webpage <- read_html(url)
res <- str_match(webpage, 'new Dygraph\\(document.getElementById\\(\"container\\"\\),\\s*(.*?)\\s*, \\{labels')
res[,2] <- gsub("new Date\\(", "", res[,2])
res[,2] <- gsub("\\)", "", res[,2])
document <- fromJSON(txt=res[,2])
document
print(document[1, 1])
print(document[1, 2])

【讨论】：