【发布时间】:2020-09-13 01:46:49
【问题描述】:
我正在尝试从此链接下载以特定字符“VX.csv”结尾的 csv 文件:
https://www.cboe.com/products/futures/market-data/historical-data-archive
这是我改编自另一个类似问题的代码:
# Import Key Modules
from bs4 import BeautifulSoup
import requests
import urllib.request
url = 'https://www.cboe.com/products/futures/market-data/historical-data-archive'
def scraper(url):
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
href = (tag.get('href', None))
if href.endswith("VX.csv"):
csv_url = urlparse.urljoin(url, href)
# ... do something with the csv file....
contents = urllib.urlopen(csv_url).read()
print("csv file size=", len(contents))
break # we only needed this one file, so we end the loop.
scraper(url)
我给了我以下错误:
AttributeError: 'NoneType' object has no attribute 'endswith'
我不确定我哪里出错了。有人有线索吗?
【问题讨论】:
标签: python python-3.x csv web-scraping beautifulsoup