如何使用 python 提取/下载和网络抓取网站源代码中的 doc.google.com/spreadsheet 链接？答案

【问题标题】：How to use python to extract/download and web-scrape a doc.google.com/spreadsheet link found in a websites source code?如何使用 python 提取/下载和网络抓取网站源代码中的 doc.google.com/spreadsheet 链接？
【发布时间】：2021-01-27 18:55:46
【问题描述】：

感谢您查看我的问题。

在检查页面源信息时，我发现了很多我想要检索的数据。在网站的源上，我打开网络找到一个包含有用数据的 XHR/.js 文件，当我转到它的标题时，我看到以下信息：

Request URL: https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0
Request Method: GET
Status Code: 200 
Remote Address: 172.217.12.206:443
Referrer Policy: strict-origin-when-cross-origin

有谁知道下载这个 doc.google 数据的方法吗？最好使用 python 及其库之一？

谢谢

【问题讨论】：

您是否尝试过在Download Files 上使用 Drive API 中的指南？

标签： python web-scraping google-sheets

【解决方案1】：

import requests

r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0')

with open('google_docs.txt', 'wb') as f:
    f.write(r.content)

【讨论】：

谢谢，这个 google_docs.txt 文件会保存在哪里？
在这个脚本所在的目录中