python3 爬虫总结之代理请求

1、网页请求、数据请求

urllib.request

python3 爬虫总结之代理请求

请求头中带上headers，模拟浏览器访问网页或者数据请求。

如果在Request中在加入 proxies={'http': IP:Port},可以通过动态IP代理高匿访问资源。通过设置IP端口池，动态随机获取高匿IP端口。推荐西刺代理https://www.xicidaili.com/nn/。

Requests

python3 爬虫总结之代理请求

个人还是比较喜欢Requests，可以使用get,post,以及传入参数等，post请求传递参数时以data={'key1':'value1','key2':'value2'}方式，headers可以设置Content-Type的类型

{'Content-Type':'application/x-www-form-urlencoded'}、{'Content-Type':'multipart/form-data'}、{'Content-Type':'application/json'}、{'Content-Type':'binary'}四种。

注encoding要做utf-8设置。

2、bs4 、BeautifulSoup

数据类型的数据类型包括以下几种

python3 爬虫总结之代理请求

通过遍历html树,可以使用find或者css选择器select或者正则来定位查询要爬虫的数据。

附录w3c shool的爬虫案例教程：https://www.w3cschool.cn/python3/python3-u6ij2pw3.html

python3 爬虫总结之代理请求

https://www.aliyun.com/minisite/goods?userCode=hq1oihys