如何从地下天气中提取表格答案

【问题标题】：How to extract table from weather underground如何从地下天气中提取表格
【发布时间】：2020-05-13 02:38:44
【问题描述】：

网址：https://www.wunderground.com/history/daily/KLGA/date/2020-5-5

url='https://www.wunderground.com/history/daily/KLGA/date/2020-5-5'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
response = requests.get(url, headers=headers)

【问题讨论】：

到目前为止你尝试了什么？
这正在被渲染，我找不到任何引用 XHR josn 或 html 元素中的数据的东西。也许从某个地方的 api 获取数据
说得太早了！我确实找到了一个 XHR，并将其添加为答案
大家好，谢谢。我应用了 soup = BeautifulSoup(response.text) 但无法获取汤中的表格信息。我不知道为什么。我是 HTML 的初学者。

标签： python web-scraping

【解决方案1】：

我找到了一个 xhr 来获取你需要的数据，如果你只是想获取临时文件，除了将 unix 时间戳转换为人类可读的之外，不需要任何修改：

import json
import requests

url = 'https://api.weather.com/v1/location/KLGA:9:US/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&units=e&startDate=20200505&endDate=20200505'
json_response = json.loads(requests.get(url).text)

json_fields = [ # filters for fields in the json
    'valid_time_gmt',
    'temp'
]
root_node = 'observations'

temps = [{field: row[field] for field in json_fields}
         for row in json_response[root_node]]

print(json.dumps(temps, indent=4))

输出：

[
    {
        "valid_time_gmt": 1588654260,
        "temp": 51
    },
    {
        "valid_time_gmt": 1588657860,
        "temp": 50
    },
    {
        "valid_time_gmt": 1588661460,
        "temp": 49
    },
    {
        "valid_time_gmt": 1588665060,
        "temp": 49
    },
    ...
]

您实际上不需要过滤字段，但它使其更易于管理

【讨论】：

谢谢！这很有帮助。
我在网页抓取时总是检查 json 内容。在 chrome 中按 f12（开发人员模式）并转到网络选项卡，然后刷新页面。检查 XHR 选项卡下的所有项目