python urllib2 无法在某些站点上运行答案

【问题标题】：python urllib2 cannot work on some sitepython urllib2 无法在某些站点上运行
【发布时间】：2017-03-02 15:39:28
【问题描述】：

    import  urllib2

    def download(url,user_agent = 'wswp',num_retries=2):
        print 'downloading:',url
        headers = {'User-Agent': 'Mozilla/5.0'}
        request = urllib2.Request(url,headers=headers)
        try:
            html = urllib2.urlopen(request).read()
        except urllib2.URLError as e:
            print  "download error:"
            html = None
            if num_retries>0:
                if hasattr(e,'code') and 500<=e.code<600:
                    print "e.code = ",e.code
                    return download(url,num_retries-1)
        return  html
    print download("http://www.huaru.cc/mobile/product/xsim.html")

结果：C:\Python27\python.exe E:/py2_7/untitled1/secondClass_Agent 下载： http://www.huaru.cc/mobile/product/xsim.html

进程以退出代码 0 结束

【问题讨论】：

在我的机器上工作。在我修复缩进之后。
也可以在我的机器上使用。检查你的缩进。
你好，你的意思是你可以下载这个网站的所有代码？你能粘贴你的结果吗？谢谢。

标签： python web-crawler urllib2

【解决方案1】：

在 Python 中，缩进是关键。

import urllib2


def download(url,user_agent = 'wswp',num_retries=2):
    print('downloading:', url)
    headers = {'User-Agent': 'Mozilla/5.0'}
    request = urllib2.Request(url, headers=headers)
    try:
        html = urllib2.urlopen(request).read()
    except urllib2.URLError as e:
        print("download error: {}".format(e))
        html = None
        if num_retries > 0:
            if hasattr(e, 'code') and 500 <= e.code < 600:
                print("e.code = ", e.code)
                return download(url, num_retries-1)
    return  html

print download("http://www.huaru.cc/mobile/product/xsim.html")

它显示如下：

('downloading:', 'http://www.huaru.cc/mobile/product/xsim.html')
download error: HTTP Error 404: Not Found
None

这是因为网页返回404。

已在 Python 2.7.10 和 3.6 上测试

检查 PEP8：https://www.python.org/dev/peps/pep-0008/#id17

【讨论】：

我知道缩进，但我不知道如何在这个网络上更正它。对此我深表歉意。您粘贴的代码仍然有错误。你能通过运行这段代码粘贴你的结果吗。非常感谢。
@Zhang.h 不用担心。你不需要说对不起。相反，请再试一次。我修改了我的代码以向您展示错误所在。毫无疑问，该 url 返回 HTTP 404，这意味着未找到。我发现网站显示404。