【问题标题】:Python expanding YouTube uploadsPython 扩展 YouTube 上传
【发布时间】:2016-04-01 23:44:01
【问题描述】:

所以,我正在开发一个小程序,它会自动检查并从一组给定的 YouTube 频道下载新音乐。我目前正在研究一种获取每个频道拥有的所有上传视频的链接的方法,我正在像刮刀一样做这件事。 (是的,YouTube API 可能是正确的方法,但我还不知道如何正确使用它。)

from __future__ import unicode_literals
from bs4 import BeautifulSoup
import urllib.request

ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
r = urllib.request.urlopen(ytlink).read()
soup = BeautifulSoup(r, "html.parser")
links = soup.find_all('a', {"class": "yt-uix-sessionlink yt-uix-tile-link spf-link  yt-ui-ellipsis yt-ui-ellipsis-2"})

for tag in links:
    link = tag.get('href', None)
    if link is not None:
        print(link)

这是我目前拥有的,问题是,它目前只抓取前 30 个视频链接,因为这些是屏幕上唯一的视频链接。我已经看到,当按下"Load More" 按钮时,它会执行一些由一些JavaScript 启动的Ajax。我的问题是:如何让 Python 继续触发 "Load More" 按钮,直到所有上传都可见?

【问题讨论】:

  • 你不能,除非你使用无头浏览器或类似的浏览器。 Selenium 或 Python Mechanize 是你应该为这些任务考虑的,因为 BeautifulSoup 没有能力为你实际解析 JS。
  • 好吧,否则你很幸运,你会找到 AJAX 调用被发送到的 URL,但是在缩小的页面源中遍历会非常讨厌。
  • 好的,我去看看。谢谢。

标签: python ajax youtube


【解决方案1】:

您可以轻松地模仿 ajax 调用并解析返回的 json 输出,我们只需要拉取 /browse_ajax?action_continuation=... url 并继续请求,直到它不再在返回的 json 中:

from bs4 import BeautifulSoup
import requests
from urlparse import urljoin # python 3 -> from urllib.parse import urljoin


def get_links():
    # cretate all css selectors
    ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
    ajax_css = "button[data-uix-load-more-href]"
    link_css = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2"
    base = "https://www.youtube.com/"


    r = requests.get(ytlink).content
    soup = BeautifulSoup(r, "lxml")

    # yield first visible links
    for link in soup.select(link_css):
        yield urljoin(base, link["href"])

    # Load more button
    ajax = soup.select(ajax_css)[0]["data-uix-load-more-href"]

    while True:
        print(ajax)
        r = requests.get(urljoin('https://www.youtube.com/', ajax))

        # next html is stored in the json.values()
        soup = BeautifulSoup("".join(r.json().values()), "lxml")
        for link in soup.select(link_css):
            yield urljoin(base, link["href"])

        ajax = soup.select(ajax_css)
        # if empty "Load more" button would be gone
        if not ajax:
            break
        ajax = ajax[0]["data-uix-load-more-href"]

这将为您提供所有 87 个链接。

In [26]: links = list(get_links())
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk1yZ0JBQSUzRCUzRA%253D%253D
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

In [27]: len(links)
Out[27]: 87

In [28]: print(links)
['https://www.youtube.com/watch?v=kjmzIu4VJEY', 'https://www.youtube.com/watch?v=ecRpNV8Xob8', 'https://www.youtube.com/watch?v=mdHoaoAhnMo', 'https://www.youtube.com/watch?v=3oqBKEvdrqE', 'https://www.youtube.com/watch?v=VIbvfOd34-A', 'https://www.youtube.com/watch?v=x4G8ge1VO5s', 'https://www.youtube.com/watch?v=EkW0f2iUOCc', 'https://www.youtube.com/watch?v=Ex2NIeXfYl8', 'https://www.youtube.com/watch?v=XMd4pSX-aVs', 'https://www.youtube.com/watch?v=ZS7KjUjlLWA', 'https://www.youtube.com/watch?v=ZEq9sQJLOgg', 'https://www.youtube.com/watch?v=nSgaCowC5TY', 'https://www.youtube.com/watch?v=nV5Ive_zJT4', 'https://www.youtube.com/watch?v=snThWzMroaA', 'https://www.youtube.com/watch?v=Ud6YhBCucPg', 'https://www.youtube.com/watch?v=1nSfyivyxdg', 'https://www.youtube.com/watch?v=b7hf2wqpUY4', 'https://www.youtube.com/watch?v=cVBvxkVt9wc', 'https://www.youtube.com/watch?v=pcI25yU9yso', 'https://www.youtube.com/watch?v=EMIZZS8HY8A', 'https://www.youtube.com/watch?v=xWD3Zi23rIs', 'https://www.youtube.com/watch?v=M-IbllcTi64', 'https://www.youtube.com/watch?v=U_tW_UxG8bM', 'https://www.youtube.com/watch?v=vQd0mopVnQg', 'https://www.youtube.com/watch?v=mG8NJlsg4rI', 'https://www.youtube.com/watch?v=PsaNY6xpnKY', 'https://www.youtube.com/watch?v=839h3eZMSWA', 'https://www.youtube.com/watch?v=Q_yytPtWmP0', 'https://www.youtube.com/watch?v=oGESQfB9dYM', 'https://www.youtube.com/watch?v=mO5R-1uTJhg', 'https://www.youtube.com/watch?v=wgqLck9SFOc', 'https://www.youtube.com/watch?v=GCaFEsxd-Y8', 'https://www.youtube.com/watch?v=VlpMbnOqP20', 'https://www.youtube.com/watch?v=bj1QT5bxFlA', 'https://www.youtube.com/watch?v=SMtKCu6a7gQ', 'https://www.youtube.com/watch?v=RV6x33mf4WI', 'https://www.youtube.com/watch?v=WhlXuTtmNqE', 'https://www.youtube.com/watch?v=7TWN1G5e-tg', 'https://www.youtube.com/watch?v=jgjeYTkROyk', 'https://www.youtube.com/watch?v=0hFkFoOf-aA', 'https://www.youtube.com/watch?v=yH1u_KQapfw', 'https://www.youtube.com/watch?v=5-l-FGDsbjw', 'https://www.youtube.com/watch?v=sFSgyE64Jjw', 'https://www.youtube.com/watch?v=OhDBtfvv2BM', 'https://www.youtube.com/watch?v=uFgPFi04oTo', 'https://www.youtube.com/watch?v=58a45EfYv1g', 'https://www.youtube.com/watch?v=jtYl5TbK2nc', 'https://www.youtube.com/watch?v=TI-1qxoDRnw', 'https://www.youtube.com/watch?v=Q0M90HqibHI', 'https://www.youtube.com/watch?v=Llb19v7QiXU', 'https://www.youtube.com/watch?v=sqhL_Ms6vuY', 'https://www.youtube.com/watch?v=YFFRgAjXs1Y', 'https://www.youtube.com/watch?v=8eHFG5AACHI', 'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', 'https://www.youtube.com/watch?v=9s_XvG3M-UI', 'https://www.youtube.com/watch?v=lzdO01_tKFo', 'https://www.youtube.com/watch?v=uA2KkxfSW_U', 'https://www.youtube.com/watch?v=29Lt1LQtp5k', 'https://www.youtube.com/watch?v=nfJ9p5iJGz8', 'https://www.youtube.com/watch?v=cjMHd1xVlS0', 'https://www.youtube.com/watch?v=tkZ0FISTxkk', 'https://www.youtube.com/watch?v=bkhD8kYi4MI', 'https://www.youtube.com/watch?v=_bQajpTnOrY', 'https://www.youtube.com/watch?v=XglzEbcjP8c', 'https://www.youtube.com/watch?v=KBszbh6Qwag', 'https://www.youtube.com/watch?v=rVGWndVjCYg', 'https://www.youtube.com/watch?v=AgJxj2cUoyQ', 'https://www.youtube.com/watch?v=TaEVwakp_rI', 'https://www.youtube.com/watch?v=-YnpS-IaYCw', 'https://www.youtube.com/watch?v=sEFSFU2a9CY', 'https://www.youtube.com/watch?v=Jc2aVD4pwnk', 'https://www.youtube.com/watch?v=aY1dOJEv4j4', 'https://www.youtube.com/watch?v=bwjXt2pWoBE', 'https://www.youtube.com/watch?v=Dqn26tWxNsI', 'https://www.youtube.com/watch?v=wiv6JqGhcCU', 'https://www.youtube.com/watch?v=IFi47HLPqoM', 'https://www.youtube.com/watch?v=N1zdWugNdy0', 'https://www.youtube.com/watch?v=ngOBscDs3T4', 'https://www.youtube.com/watch?v=RT5dQVZ-VQY', 'https://www.youtube.com/watch?v=bifExgZW7k0', 'https://www.youtube.com/watch?v=fBEbaEgox1Y', 'https://www.youtube.com/watch?v=wDy9aGFngkY', 'https://www.youtube.com/watch?v=i06Iv0k5fVY', 'https://www.youtube.com/watch?v=2NaRXV7uyPE', 'https://www.youtube.com/watch?v=Hl0nIoLJUU0', 'https://www.youtube.com/watch?v=iXo0T4dRdgA', 'https://www.youtube.com/watch?v=i-7H5Wq0_2Y']

我留下了print(ajax) 电话,以便您了解它的变化。

您可以将seleniumPhantomJs 一起使用,如下所示:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException,StaleElementReferenceException


ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
hrefs = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2"
ajax= "button[data-uix-load-more-href]"
dr = webdriver.PhantomJS()
dr.get(ytlink)

while True:
    try:
        load_mode_b = dr.find_element_by_css_selector(ajax)
        load_mode_b.click()
    except StaleElementReferenceException as e:
        print(e)
    except NoSuchElementException as e:
        print(e)
        break

如果我们运行,我们会看到完全相同的输出:

In [32]: l = [a.get_attribute("href") for a in dr.find_elements_by_css_selector(hrefs)]

In [33]: len(l)
Out[33]: 87

In [34]: print(l)
[u'https://www.youtube.com/watch?v=kjmzIu4VJEY', u'https://www.youtube.com/watch?v=ecRpNV8Xob8', u'https://www.youtube.com/watch?v=mdHoaoAhnMo', u'https://www.youtube.com/watch?v=3oqBKEvdrqE', u'https://www.youtube.com/watch?v=VIbvfOd34-A', u'https://www.youtube.com/watch?v=x4G8ge1VO5s', u'https://www.youtube.com/watch?v=EkW0f2iUOCc', u'https://www.youtube.com/watch?v=Ex2NIeXfYl8', u'https://www.youtube.com/watch?v=XMd4pSX-aVs', u'https://www.youtube.com/watch?v=ZS7KjUjlLWA', u'https://www.youtube.com/watch?v=ZEq9sQJLOgg', u'https://www.youtube.com/watch?v=nSgaCowC5TY', u'https://www.youtube.com/watch?v=nV5Ive_zJT4', u'https://www.youtube.com/watch?v=snThWzMroaA', u'https://www.youtube.com/watch?v=Ud6YhBCucPg', u'https://www.youtube.com/watch?v=1nSfyivyxdg', u'https://www.youtube.com/watch?v=b7hf2wqpUY4', u'https://www.youtube.com/watch?v=cVBvxkVt9wc', u'https://www.youtube.com/watch?v=pcI25yU9yso', u'https://www.youtube.com/watch?v=EMIZZS8HY8A', u'https://www.youtube.com/watch?v=xWD3Zi23rIs', u'https://www.youtube.com/watch?v=M-IbllcTi64', u'https://www.youtube.com/watch?v=U_tW_UxG8bM', u'https://www.youtube.com/watch?v=vQd0mopVnQg', u'https://www.youtube.com/watch?v=mG8NJlsg4rI', u'https://www.youtube.com/watch?v=PsaNY6xpnKY', u'https://www.youtube.com/watch?v=839h3eZMSWA', u'https://www.youtube.com/watch?v=Q_yytPtWmP0', u'https://www.youtube.com/watch?v=oGESQfB9dYM', u'https://www.youtube.com/watch?v=mO5R-1uTJhg', u'https://www.youtube.com/watch?v=wgqLck9SFOc', u'https://www.youtube.com/watch?v=GCaFEsxd-Y8', u'https://www.youtube.com/watch?v=VlpMbnOqP20', u'https://www.youtube.com/watch?v=bj1QT5bxFlA', u'https://www.youtube.com/watch?v=SMtKCu6a7gQ', u'https://www.youtube.com/watch?v=RV6x33mf4WI', u'https://www.youtube.com/watch?v=WhlXuTtmNqE', u'https://www.youtube.com/watch?v=7TWN1G5e-tg', u'https://www.youtube.com/watch?v=jgjeYTkROyk', u'https://www.youtube.com/watch?v=0hFkFoOf-aA', u'https://www.youtube.com/watch?v=yH1u_KQapfw', u'https://www.youtube.com/watch?v=5-l-FGDsbjw', u'https://www.youtube.com/watch?v=sFSgyE64Jjw', u'https://www.youtube.com/watch?v=OhDBtfvv2BM', u'https://www.youtube.com/watch?v=uFgPFi04oTo', u'https://www.youtube.com/watch?v=58a45EfYv1g', u'https://www.youtube.com/watch?v=jtYl5TbK2nc', u'https://www.youtube.com/watch?v=TI-1qxoDRnw', u'https://www.youtube.com/watch?v=Q0M90HqibHI', u'https://www.youtube.com/watch?v=Llb19v7QiXU', u'https://www.youtube.com/watch?v=sqhL_Ms6vuY', u'https://www.youtube.com/watch?v=YFFRgAjXs1Y', u'https://www.youtube.com/watch?v=8eHFG5AACHI', u'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', u'https://www.youtube.com/watch?v=9s_XvG3M-UI', u'https://www.youtube.com/watch?v=lzdO01_tKFo', u'https://www.youtube.com/watch?v=uA2KkxfSW_U', u'https://www.youtube.com/watch?v=29Lt1LQtp5k', u'https://www.youtube.com/watch?v=nfJ9p5iJGz8', u'https://www.youtube.com/watch?v=cjMHd1xVlS0', u'https://www.youtube.com/watch?v=tkZ0FISTxkk', u'https://www.youtube.com/watch?v=bkhD8kYi4MI', u'https://www.youtube.com/watch?v=_bQajpTnOrY', u'https://www.youtube.com/watch?v=XglzEbcjP8c', u'https://www.youtube.com/watch?v=KBszbh6Qwag', u'https://www.youtube.com/watch?v=rVGWndVjCYg', u'https://www.youtube.com/watch?v=AgJxj2cUoyQ', u'https://www.youtube.com/watch?v=TaEVwakp_rI', u'https://www.youtube.com/watch?v=-YnpS-IaYCw', u'https://www.youtube.com/watch?v=sEFSFU2a9CY', u'https://www.youtube.com/watch?v=Jc2aVD4pwnk', u'https://www.youtube.com/watch?v=aY1dOJEv4j4', u'https://www.youtube.com/watch?v=bwjXt2pWoBE', u'https://www.youtube.com/watch?v=Dqn26tWxNsI', u'https://www.youtube.com/watch?v=wiv6JqGhcCU', u'https://www.youtube.com/watch?v=IFi47HLPqoM', u'https://www.youtube.com/watch?v=N1zdWugNdy0', u'https://www.youtube.com/watch?v=ngOBscDs3T4', u'https://www.youtube.com/watch?v=RT5dQVZ-VQY', u'https://www.youtube.com/watch?v=bifExgZW7k0', u'https://www.youtube.com/watch?v=fBEbaEgox1Y', u'https://www.youtube.com/watch?v=wDy9aGFngkY', u'https://www.youtube.com/watch?v=i06Iv0k5fVY', u'https://www.youtube.com/watch?v=2NaRXV7uyPE', u'https://www.youtube.com/watch?v=Hl0nIoLJUU0', u'https://www.youtube.com/watch?v=iXo0T4dRdgA', u'https://www.youtube.com/watch?v=i-7H5Wq0_2Y']

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-05-22
    • 2014-04-10
    • 1970-01-01
    • 2018-01-01
    • 1970-01-01
    • 2013-10-21
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多