【发布时间】:2017-03-28 19:24:13
【问题描述】:
嘿,我正在尝试从印度的 nse 网站下载股票数据
所以我为此使用python
链接是
import urllib
urllib.urlretrieve("https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip","fo01JAN2016bhav.csv.zip")
但是当我尝试打开下载的文件时,它说
compressed zipped file is invalid
当我尝试通过简单地粘贴链接从网站正常下载时,下载的文件会被打开
链接
https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip
所以如果我尝试使用 urllib 2 我明白了
f=urllib2.urlopen('https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip')
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
f=urllib2.urlopen('https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip')
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
我该如何解决这个问题?
只有这个链接才会发生这种情况,我尝试从 imgur 下载图像并且代码工作正常
为什么我可以通过浏览器正常访问时出现http 403错误?
【问题讨论】:
-
该站点进行了一些标题验证。设置
user-agent和accept似乎就足够了。