使用硒向下滚动谷歌评论答案

【问题标题】：Scroll down google reviews with selenium使用硒向下滚动谷歌评论
【发布时间】：2019-09-26 18:57:18
【问题描述】：

我正在尝试从该链接中获取评论：

https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1

我使用以下代码加载页面

from selenium import webdriver
import datetime
import time
import argparse
import os
import time

#Define the argument parser to read in the URL

url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"


# Initialize the Chrome webdriver and open the URL
#driver = webdriver.Chromium()


profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko")
#driver = webdriver.Firefox(profile)
# https://stackoverflow.com/questions/22476112/using-chromedriver-with-selenium-python-ubuntu
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

driver.get(url)

driver.implicitly_wait(2)



SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

页面加载正常，没有向下滚动，我在其他网站上使用了相同的代码，比如linkedn，它在那里工作。

【问题讨论】：

您是否向下滚动以加载页面上的任何元素？
是的，我需要滚动查看所有评论。
在下面查看我的答案，让我知道情况如何。不确定你什么时候说all，这就是为什么我在脚本中选择了desiredReviewsCount。

标签： python selenium screen-scraping

【解决方案1】：

这是您可以在不使用 javascript 向下滚动的情况下使用的逻辑。通过使用将滚动到元素的location_once_scrolled_into_view 方法简单而有效。

作为下面逻辑的一部分，我们滚动到最后一条评论，然后检查我们是否根据请求加载了所需的评论数量。

需要进口：

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

在下面的代码中根据您的要求更改 desiredReviewsCount 变量值。

wait = WebDriverWait(driver,10)
url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"
driver.get(url)
x=0
desiredReviewsCount=30
wait.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))
while x<desiredReviewsCount:
    driver.find_element_by_xpath("(//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review'])[last()]").location_once_scrolled_into_view
    x = len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']"))

print (len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))

【讨论】：

对我不起作用，它似乎将侧栏移动到初始页面的末尾，但它不会强制加载下面的评论。
您使用的是哪个版本的 FF 和 selenium。我没有看到任何问题（附上 gif）。抱歉，gif 质量低，由于屏幕截图大小限制，我无法上传高质量的屏幕截图。
我在 linux mint 上使用 selenium-3.141 和 chromium 73。你用的是哪个版本？
Ahhha ... 似乎问题出在 chrome 浏览器上（能够在 chrome 中重现您的问题）FF 没有这个问题。你能切换到 FF 还是让我更深入地研究这个问题？
没关系，我可以切换到FF。谢谢！