如何从这个网站导出数据到excel？答案

【问题标题】：How to export data from this website to excel?如何从这个网站导出数据到excel？
【发布时间】：2022-08-19 01:03:48
【问题描述】：

https://www.inc.com/inc5000/2022 上有大约 5000 家公司的列表。

我想提取这些数据并将其放入 excel 中。如果我可以只提取软件公司的数据，那就更好了（检查 \"Industries\" 中的 \"Software\" 过滤器）。

但我也可以在 excel 中做到这一点，所以无论哪种方式都对我有用。我现在只需要帮助将数据从网站导入 Excel。我尝试将链接直接粘贴到 Excel 中，但没有成功。

我尝试过使用 R，代码在 reddit post 上找到，但生成的 CSV 文件中只有 98 行，而不是应该有的 ~5000 行。我不是程序员，所以对我来说真的很愚蠢。

在下载 R、Rstudio 之后，我才设法运行从 Reddit 获得的 .R 文件，然后在 this code 中进行调整

library(tidyverse)
library(jsonlite)
df <- \"https://api.inc.com/rest/i5list/2021\" %>%
  fromJSON() %>% 
  .$companies %>% 
  bind_rows() %>%
  unnest(article) %>%
  select(-editorsPick) %>%
  write_csv(\"inc.csv\")

以便它从 2022 版 inc 5000 列表中提取，而不是从 2021 版中提取。

您可以将数据从他们的 API (api.inc.com/rest/i5list/2022) 提取到 Excel 中并按行业“软件”过滤吗？我在网上找不到任何 API 文档。
我可以使用curl https://api.inc.com/rest/i5list/2022 > companies_2022.txt 获取数据。我在新的 txt 文件中找到了 592 家软件公司。

标签： web-scraping

【解决方案1】：

如果您只想编写一个简单的脚本，那么最简单的事情可能就是使用 python 和 requests 库。

import requests

data = requests.get("https://api.inc.com/rest/i5list/2022",).json()
with open("data.csv", "w+") as f:
    f.write("Rank,Company,Growth,Industry,State,City\n")
    for i in data["companies"]:
        f.write("{},{},{}%,{},{},{}\n".format(i["rank"], i["company"], i["growth"], i["industry"], i["state_s"], i["city"]))

上面的代码应该对您有用，它将所有公司写入一个 csv 文件（您可以使用 excel 打开该文件）。 json 响应中有更多属性，但是我包含的是前端网站上显示的内容。

【讨论】：