python(四)网页下载器

python(四)网页下载器
网页下载器有两种:
1 urllib2 —Python官网基础模块
2 requests —第三方包更强大

urllib2
最简洁的方法
urllib2.urlopen(url)
python(四)网页下载器

urllib2下载网页方法2 :添加data,http header
python(四)网页下载器

python(四)网页下载器

urllib2方法3 :添加特殊情景的处理器
HTTPCookieProcessor :有些网站需要登录才可以使用,我们就用这个.
ProxyHandle:有些网页需要代理才可用使用,我们用这个.
HTTPSHandler:有些网页是使用HTTPS加密访问的,我们使用这个
HTTPRedirectHandler:有些网页URL自动跳转的关系,我们使用这个.
python(四)网页下载器
举个coockie的栗子

代码栗子:

import urllib.request,http.cookiejar

url = “http://www.baidu.com”
print(“第一种方法”)
response1 = urllib.request.urlopen(url)
print(response1.getcode())
print(len(response1.read()))

print(“第二种方法”)
request = urllib.request.Request(url)
request.add_header(“user-agent”, ‘Mozilla/5.0’)
response2 = urllib.request.urlopen(request)
print(response2.getcode())
print(len(response2.read()))

print(“第三种方法”)
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print(response3.getcode())
print(cj)
print(response3.read())

学习自:慕课网