Python/Selenium - 如何网页抓取此下拉列表答案

【问题标题】：Python/Selenium - How to webscrape this dropdownPython/Selenium - 如何网页抓取此下拉列表
【发布时间】：2021-07-23 02:53:33
【问题描述】：

我的代码运行良好并打印所有行的标题但带有下拉菜单的行。

例如，如果单击，第 4 行会有一个下拉菜单。我实现了一个 try，理论上它会启动下拉菜单，然后拉出标题。

但是我对带有这些下拉菜单的行的点击/抓取没有打印。

预期输出 - 打印所有标题，包括下拉列表中的标题。

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list')
time.sleep(4)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')

productlist=soup.find_all('div',class_='card item-container session')
for property in productlist:
    sessiontitle=property.find('h4',class_='session-title card-title').text
    print(sessiontitle)
    try:
        ifDropdown=driver.find_elements_by_class_name('item-expand-action expand')
        ifDropdown.click()
        time.sleep(4)
        newTitle=driver.find_element_by_class_name('card-title').text
        print(newTitle)
    except:
        newTitle='none'

【问题讨论】：

item-expand-action expand 应该是 item-expand-action.expand
你的发现，一旦按照之前的评论进行调整，就会返回一个包含 8 个要打开的项目的列表

标签： python selenium web-scraping beautifulsoup

【解决方案1】：

有几个问题。首先，当您按类从驱动程序中定位并且有多个类时，您需要用点而不是空格分隔它们，以便驱动程序知道它正在处理另一个类。其次，find_elements 返回一个列表，并且该列表没有 .click()，因此您会收到一个错误，您的 except 捕获但假定意味着没有可单击的链接。我重写了它（现在没有汤），以便它检查（用点替换空间）在会话中打开一个链接，然后循环出现的新链接。这是我已经测试过的。最后请注意，这只获取视图中的会话和子会话。您将需要添加逻辑来滚动并获取其余部分。

# stuff to initialize driver is above here, I used firefox
# Open the website page
URL = "https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list"
driver.get(URL)
time.sleep(4)#time for page to populate


product_list=driver.find_elements_by_css_selector('div.card.item-container.session')
#above line gets all top level sessions
for product in product_list:
    session_title=product.find_element_by_css_selector('h4.card-title').text
    print(session_title)
    dropdowns=product.find_elements_by_class_name('item-expand-action.expand')
    #above line finds dropdown within this session, if any
    if len(dropdowns)==0:#nothing more for this session
        continue#move to next session
    #still here, click on the dropdown, using execute because link can overlap chevron
    driver.execute_script("arguments[0].scrollIntoView(true); arguments[0].click();",
                          dropdowns[0])
    time.sleep(4)#wait for subsessions to appear
    session_titles=product.find_elements_by_css_selector('h4.card-title')
    session_index = 0#suppress reprinting title of master session
    for session_title in session_titles:
        if session_index > 0:
            print("    " + session_title.text)#indent for clarity
        session_index = session_index + 1
    #still to do, deal with other sessions that only get paged into view when you scroll
    #that is a different question

【讨论】：