python爬虫笔记-request

1，学习体会

　　1，一个章节的学习，实现了自动登录抽屉和github。

2，知识点

--request参数
----url url
----params 在url中加参数
----cookies cookies
----date post请求中传数据 from data，一般只用data够啦，转json.dumps()
----json 传数据，字符串格式， payload
----proxies: 代理

----file 上传文件
----auth 基本认证
----allow_redirects: True
----stream: 下载大文件使用，一点一点下载。
----cert: 证书
--------爬小众网站，证书自己做的那种，需要携带证书。
----verity: 确认

参考　　官方文档：http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4

3，自动登录github

import requests
from bs4 import BeautifulSoup
'''
功能实现：自动登录github，获取用户名
'''
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KH'
                  'TML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
}


def login_github(user, password):

    # 第一步，登录github获取token和cookies
    try:
        r1 = requests.get(
            url='https://github.com/login',
            headers=headers
        )

        cookies = r1.cookies.get_dict()
        s1 = BeautifulSoup(r1.text, 'html.parser')
        token = s1.find(name='input', attrs={'name': 'authenticity_token'}).get('value')

        # 第二步，携带cookies、token、用户名密码登录github，进行验证。
        r2 = requests.post(
            url='https://github.com/session',
            headers=headers,
            data={
                'commit': 'Sign in',
                'utf8': '✓',
                'authenticity_token': token,
                'login': user,
                'password': password
            },
            cookies=cookies
        )
        # 解析页面，找到登陆用户的用户名
        s2 = BeautifulSoup(r2.text, 'html.parser')
        li = s2.find(name='li', attrs={'class': 'dropdown-header header-nav-current-user css-truncate'})
        username = li.find(name='strong', attrs={'class': 'css-truncate-target'}).string
    except Exception as e:
        return 'sorry，{}'.format(e)
    return username


if __name__ == '__main__':
    name = login_github('username', 'password')
    print(name)

posted on 2018-07-05 11:12 撸代码的日子阅读(...) 评论(...) 编辑收藏

刷新评论刷新页面返回顶部