【问题标题】:Scraping web page after accepting cookies in python在python中接受cookie后抓取网页
【发布时间】:2021-10-25 15:48:15
【问题描述】:

我正在尝试抓取一个网页,但在访问该页面之前,有一个接受 cookie 的横幅。我正在使用 selenium 单击“接受所有 cookie”按钮,但即使单击该按钮后,我也无法访问正确的 HTML 页面。

这是我的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = 'https://www.wikiparfum.fr/explore/by-name?query=dior'

driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get(url)
driver.find_element_by_id('onetrust-accept-btn-handler').click()

html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

print(soup)

这是打印的 HTML 页面的开头:

如果有人能帮我解决这个问题,谢谢!

【问题讨论】:

    标签: python selenium selenium-webdriver web-scraping beautifulsoup


    【解决方案1】:

    您应该等待接受 cookie 按钮元素出现,然后再单击它

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from bs4 import BeautifulSoup
    
    url = 'https://www.wikiparfum.fr/explore/by-name?query=dior'
    
    driver = webdriver.Chrome(executable_path=DRIVER_PATH)
    wait = WebDriverWait(driver, 20)
    
    driver.get(url)
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#onetrust-accept-btn-handler"))).click()
    
    html = driver.page_source
    soup = BeautifulSoup(html, 'lxml')
    
    print(soup)
    

    【讨论】:

      猜你喜欢
      • 2021-10-15
      • 2022-01-05
      • 2013-09-24
      • 1970-01-01
      • 2015-04-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多