【发布时间】:2019-09-26 18:57:18
【问题描述】:
我正在尝试从该链接中获取评论:
我使用以下代码加载页面
from selenium import webdriver
import datetime
import time
import argparse
import os
import time
#Define the argument parser to read in the URL
url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"
# Initialize the Chrome webdriver and open the URL
#driver = webdriver.Chromium()
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko")
#driver = webdriver.Firefox(profile)
# https://stackoverflow.com/questions/22476112/using-chromedriver-with-selenium-python-ubuntu
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
driver.get(url)
driver.implicitly_wait(2)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
页面加载正常,没有向下滚动,我在其他网站上使用了相同的代码,比如linkedn,它在那里工作。
【问题讨论】:
-
您是否向下滚动以加载页面上的任何元素?
-
是的,我需要滚动查看所有评论。
-
在下面查看我的答案,让我知道情况如何。不确定你什么时候说
all,这就是为什么我在脚本中选择了desiredReviewsCount。
标签: python selenium screen-scraping