【问题标题】:How to extract table from NHC website in Python?如何用 Python 从 NHC 网站中提取表格?
【发布时间】:2020-08-01 10:11:34
【问题描述】:

这里,

https://www.nhc.noaa.gov/gis/

“数据和产品”部分下有一个表格。我想提取表格并将其保存到 CSV 文件中。我写了这个基本代码:

from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.nhc.noaa.gov/gis/")
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

我只知道抓取的基础知识。请从这里引导我。谢谢!

【问题讨论】:

  • 是提取表格本身还是表格链接的数据?
  • 我想提取表格里面的数据。谢谢指点!
  • 好的,那么您想下载 zip 和其他数据类型并从中创建一个表吗?
  • 是的,我想下载表格内的 zip 和其他文件。

标签: python python-3.x web-scraping beautifulsoup


【解决方案1】:

你可以使用熊猫

import pandas as pd

url = 'https://www.nhc.noaa.gov/gis/'
df = pd.read_html(url)[0]

# create csv file
df.to_csv("mycsv.csv")

【讨论】:

【解决方案2】:

很难知道,但我想这就是你想要的:

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.nhc.noaa.gov/gis/')

soup = BeautifulSoup(r.content, 'html.parser')

for a in soup.find_all('a'):
    if a.get('href'):
        if '.' in a.get('href').split('/')[-1]\
                and 'html' not in a.get('href')\
                and '.php' not in a.get('href')\
                and 'http' not in a.get('href')\
                and 'mailto' not in a.get('href'):
            print('https://www.nhc.noaa.gov' + a.get('href'))

打印:

https://www.nhc.noaa.gov/gis/examples/al112017_5day_020.zip
https://www.nhc.noaa.gov/gis/examples/AL112017_020adv_CONE.kmz
https://www.nhc.noaa.gov/gis/examples/AL112017_020adv_TRACK.kmz
https://www.nhc.noaa.gov/gis/examples/AL112017_020adv_WW.kmz
https://www.nhc.noaa.govforecast/archive/al092020_5day_latest.zip
https://www.nhc.noaa.gov/storm_graphics/api/AL092020_CONE_latest.kmz
https://www.nhc.noaa.gov/storm_graphics/api/AL092020_TRACK_latest.kmz
https://www.nhc.noaa.gov/storm_graphics/api/AL092020_WW_latest.kmz
https://www.nhc.noaa.govforecast/archive/al102020_5day_latest.zip
https://www.nhc.noaa.gov/storm_graphics/api/AL102020_CONE_latest.kmz
https://www.nhc.noaa.gov/storm_graphics/api/AL102020_TRACK_latest.kmz
https://www.nhc.noaa.gov/storm_graphics/api/AL102020_WW_latest.kmz
https://www.nhc.noaa.gov/gis/examples/al112017_fcst_020.zip
https://www.nhc.noaa.gov/gis/examples/AL112017_initialradii_020adv.kmz
https://www.nhc.noaa.gov/gis/examples/AL112017_forecastradii_020adv.kmz
https://www.nhc.noaa.govforecast/archive/al092020_fcst_latest.zip
https://www.nhc.noaa.gov/storm_graphics/api/AL092020_initialradii_latest.kmz
https://www.nhc.noaa.gov/storm_graphics/api/AL092020_forecastradii_latest.kmz
https://www.nhc.noaa.govforecast/archive/al102020_fcst_latest.zip

..等等...

【讨论】:

  • 我想要@dimay 回答的表格形式的输出。但是,该表缺少的是链接。它只有文字。有什么方法可以使表格内的链接起作用吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-06-05
  • 1970-01-01
  • 2023-04-06
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多