【问题标题】:youtube comment number scraping with youtube用 youtube 抓取 youtube 评论号
【发布时间】:2021-08-25 17:39:34
【问题描述】:

在我的项目中,我试图抓取 youtube 观众人数、评论人数、喜欢和不喜欢人数。我不能接受 cmets 号码,我尝试了不同的方法,但没有任何改变。这是我的代码,请帮助我:

import selenium
from selenium import webdriver
import pandas as pd
import time

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

#we choose our browser chromedriver must be in the path
driver = webdriver.Chrome()

#we need data to save variables

data = {'Likes' : [], 'Dislikes' : [], 'Comments' : [], 'Views' : []}

dataframe = pd.DataFrame(data)

# we  get the link 

driver.get("https://www.youtube.com/watch?v=fHI8X4OXluQ")

# we wait for opening the link
time.sleep(5)

# we find element by xpatch which means manually
Likes = driver.find_element_by_xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-watch- 
flexy/div[5]/div[1]/div/div[8]/div[2]/ytd-video-primary-info- 
renderer/div/div/div[3]/div/ytdmenu-renderer/div[2]/ytd-toggle-button-renderer[1]/a/yt- 
formatted-string').text

Dislikes = driver.find_element_by_xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-watch- 
flexy/div[5]/div[1]/div/div[8]/div[2]/ytd-video-primary-info-renderer/div/div/div[3]/div/ytd- 
menu-renderer/div[2]/ytd-toggle-button-renderer[2]/a/yt-formatted-string').text

View = driver.find_elements_by_xpath('//div[@id="count"]')

Comments=driver.find_elements_by_xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-watch- 
flexy/div[5]/div[1]/div/ytd-comments/ytd-item-section-renderer/div[1]/ytd-comments-header- 
renderer/div[1]/h2/yt-formatted-string/span[1]')


print(Likes)
print(Dislikes)
print(View[1].text)
print(Comments)




driver.quit()

【问题讨论】:

  • 编写相对 xpath 总是一个好习惯

标签: python selenium web-scraping youtube


【解决方案1】:

看看这是否适用于 cmets 计数:-


elem = driver.find_element_by_xpath(".//div[@class='style-scope ytd-comments-header-renderer' and @id='title']//following-sibling::yt-formatted-string[contains(@class,'ytd-comments-header-renderer')]/span[1]") 
driver.execute_script("arguments[0].scrollIntoView();", elem)
elem.text

【讨论】:

  • 很抱歉,它不起作用。 NoSuchElementException:没有这样的元素:无法找到元素:{"method":"xpath","selector":".//div[@class='style-scope ytd-cmets-header-renderer' and @id=' title']//following-sibling::yt-formatted-string[contains(@class,'ytd-cmets-header-renderer')]/span[1]"}(会话信息:chrome=91.0.4472.77)
  • 您必须将 cmets 计数 web 元素显示在视图中。更新了代码
  • 我仍然遇到同样的错误,我们找不到元素。我应该尝试使用 firefox explorer 吗?
  • 我在 chrome 浏览器中试了一下,效果很好
【解决方案2】:

基本上这样的东西应该可以工作

import selenium
from selenium import webdriver
import pandas as pd
import time

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

#we choose our browser chromedriver must be in the path
driver = webdriver.Chrome()

#we need data to save variables

data = {'Likes' : [], 'Dislikes' : [], 'Comments' : [], 'Views' : []}

dataframe = pd.DataFrame(data)

# we  get the link 

driver.get("https://www.youtube.com/watch?v=fHI8X4OXluQ")

# we wait for opening the link
time.sleep(5)


likes_xpath = '(//div[@id="top-level-buttons-computed"]//*[contains(@aria-label," likes")])[last()]'
# we find element by xpatch which means manually
Likes = driver.find_element_by_xpath(likes_xpath).text

dislikes_xpath = '//div[@id="top-level-buttons-computed"]//*[contains(@aria-label," dislikes")]'
Dislikes = driver.find_element_by_xpath(dislikes_xpath).text

views_xpath = '//*[name()="ytd-video-view-count-renderer"]/span[@class="view-count style-scope ytd-video-view-count-renderer"]'
View = driver.find_elements_by_xpath(views_xpath)

comments_xpath = '//*[name()="ytd-comment-renderer"]//*[name()="yt-formatted-string" and @id="content-text"]'
Comments=driver.find_elements_by_xpath(comments_xpath)


print(Likes)
print(Dislikes)
print(View[1].text)
print(Comments)


driver.quit()

但是那里有很多 cmets,所以要获得所有这些,您必须滚动此页面

【讨论】:

  • NoSuchElementException:没有这样的元素:无法找到元素:{"method":"xpath","selector":"//div[@id="top-level-buttons-computed"] //*[contains(@aria-label,"likes")]"} (Session info: chrome=91.0.4472.77) 它给了我这个错误:(
  • 如果我只使用 cmets 部分代码,除了注释之外一切都很好。评论将我返回为空列表。
  • 好吧,页面通常以视频全屏打开,因此cmets不在可见屏幕上。我猜这可能会导致空 cmets 列表
  • 我的 chrome.exe 页面没有全屏打开。我已经尝试过编辑过的代码,但它仍然是同样的错误。代码无法定位元素。我也尝试过滚动页面,但没有帮助。
猜你喜欢
  • 2018-04-12
  • 2014-10-03
  • 1970-01-01
  • 2016-11-14
  • 1970-01-01
  • 2012-01-31
  • 2020-05-25
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多