leaf-wind

60行代码批量爬取抖音视频

​ 爬虫原理这里就不详细写了,直接贴代码,主要也是为了方便我本人拿取,需要的朋友自取顺便点个赞哦。

​ 操作方法:打开抖音,切换到某一个用户页面下,点击右上角的三个点,点击分享再点击复制链接,运行程序,把链接输入等待程序运行即可(“抖音,记录美好生活”这几个字记得去掉),然后就会把该用户下所有上传的视频全部爬取下来。

# !/usr/bin/env python3
# -*- coding:utf-8 -*-
# @Time : 2021-03-15
# @Author : wind_leaf
import requests
import json
import re
import sys

headers = {
    \'accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\',
    \'accept-language\': \'zh-CN,zh;q=0.9,en;q=0.8\',
    \'pragma\': \'no-cache\',
    \'cache-control\': \'no-cache\',
    \'upgrade-insecure-requests\': \'1\',
    \'User-Agent\': \'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1\',
}
\'\'\'
https://www.iesdouyin.com/web/api/v2/aweme/post/?
sec_uid=MS4wLjABAAAAeAIH1d_98INk5rNXF9Q4zrbGK9d1Eumyydy7qKL1WPk&
count=21&
max_cursor=0&
aid=1128&
_signature=j0NkqgAA7x12uIyl2MgN6I9DZL&dytk=
\'\'\'
\'\'\'
<a href="https://www.iesdouyin.com/share/user/72673737181?u_code=17fc9cg0a&amp;did=69773896663&amp;iid=1302254358919767&amp;sec_uid=MS4wLjABAAAAeAIH1d_98INk5rNXF9Q4zrbGK9d1Eumyydy7qKL1WPk&amp;timestamp=1615603669&amp;utm_source=copy&amp;utm_campaign=client_share&amp;utm_medium=android&amp;share_app_name=douyin">Found</a>.

\'\'\'
\'\'\'eg: https://v.douyin.com/eRENmGV/    # 一条小团团OvO\'\'\'

root_url = input(\'输入你要下载的用户的分享链接:\').strip()
max_cursor = 0      # 页码
has_more = True     # 是否有下一页
page = 1        # 1页20个视频
response = requests.get(url=root_url, headers=headers, allow_redirects=False)
sec_uid = re.findall(r\'sec_uid.*?&\', response.headers[\'location\'])[0][8:-1]     # 用户唯一id

while has_more:
    video_lis = []
    print(f\'获取第{page}页视频地址---\')
    response = requests.get(url=f\'https://www.iesdouyin.com/web/api/v2/aweme/post/?sec_uid={sec_uid}&count=21&max_cursor={max_cursor}&aid=1128&_signature=dpcuDQAAFtyPbMYCi7BbQ3aXLh&dytk=\', headers=headers)
    print(response.text)
    result = json.loads(response.text)
    if result[\'aweme_list\']:
        max_cursor = result[\'max_cursor\']
        has_more = result[\'has_more\']
        for video_data in result[\'aweme_list\']:
            dic = {\'desc\': video_data[\'desc\']}
            dic[\'url\'] = video_data[\'video\'][\'play_addr\'][\'url_list\'][2]
            video_lis.append(dic)
    print(\'开始下载---\')
    for i, video in enumerate(video_lis):
        print(f"第{page}页{i+1}个视频:{video[\'desc\']}")
        size = 0
        response = requests.get(url=video[\'url\'], headers=headers)
        content_size = int(response.headers[\'content-length\'])
        sys.stdout.write(\'----[文件大小]:%0.2f MB\n\' % (content_size / 1024 / 1024))

        with open(video[\'desc\']+\'.mp4\', \'wb\')as f:
            for data in response.iter_content(chunk_size=1024):
                f.write(data)
                size += len(data)
            f.flush()
    page += 1


分类:

技术点:

相关文章:

  • 2021-11-18
  • 2021-06-18
  • 2022-01-04
  • 2021-06-28
  • 2022-01-04
  • 2021-09-22
  • 2021-07-16
  • 2021-04-17
猜你喜欢
  • 2021-12-20
  • 2021-11-28
  • 2021-12-19
  • 2021-10-10
  • 2022-01-31
  • 2021-06-14
  • 2021-11-18
相关资源
相似解决方案