【问题标题】:Unable to extract URL names from table using Selenium webdriver无法使用 Selenium webdriver 从表中提取 URL 名称
【发布时间】:2021-06-11 06:01:05
【问题描述】:

我有一张如下表:

目标是使用 selenium webdriver 提取名称。

我尝试使用以下代码通过 xpath 获取名称:

wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.deakin.edu.au/information-technology/staff-listing")

names = wd.find_element_by_xpath('//*[@id="table09355"]/tbody/tr[1]/td/a').text

输出显示为空,即''。如何在 selenium webdriver 中使用 xpath 提取名称?名称是 URL 超链接。

谢谢,

【问题讨论】:

    标签: python-3.x selenium selenium-webdriver xpath


    【解决方案1】:

    你可能想使用下面的 xpath :

    //a[contains(@href,'https://')]
    

    并使用find_elements 将所有锚标记存储在这样的列表中:

    for names in wd.find_elements(By.XPATH, "//a[contains(@href,'https://')]")
        print(names.text)
    

    更新 1:

    driver.maximize_window()
    wait = WebDriverWait(driver, 10)
    driver.get('https://www.deakin.edu.au/information-technology/staff-listing')
    wait.until(EC.element_to_be_clickable((By.ID, "popup-accept"))).click()
    ActionChains(driver).move_to_element(wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Emeritus Professors']")))).perform()
    wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Emeritus Professors']"))).click()
    ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(text(), 'Emeritus Professors')]/ancestor::h3/following-sibling::div/descendant::a")))).perform()
    for names in driver.find_elements(By.XPATH, "//span[contains(text(), 'Emeritus Professors')]/ancestor::h3/following-sibling::div/descendant::a"):
        print(names.text)
    

    O/P:

    Emeritus Professor Lynn Batten
    Emeritus Professor Andrzej Goscinski
    
    Process finished with exit code 0
    

    进口:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    

    如果您想在 Google colab 上运行,请尝试以下代码:

    !pip install selenium
    !apt-get update 
    !apt install chromium-chromedriver
    
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.action_chains import ActionChains
    
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
    driver =webdriver.Chrome('chromedriver',chrome_options=chrome_options)
    wait = WebDriverWait(driver, 10)
    driver.get("https://www.deakin.edu.au/information-technology/staff-listing")
    wait.until(EC.element_to_be_clickable((By.ID, "popup-accept"))).click()
    ActionChains(driver).move_to_element(wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Emeritus Professors']")))).perform()
    wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Emeritus Professors']"))).click()
    ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(text(), 'Emeritus Professors')]/ancestor::h3/following-sibling::div/descendant::a")))).perform()
    for names in driver.find_elements(By.XPATH, "//span[contains(text(), 'Emeritus Professors')]/ancestor::h3/following-sibling::div/descendant::a"):
        print(names.text)
    

    【讨论】:

    • 感谢您的回答。你能用上面的代码举例说明吗?我仍然无法获得,因为我是新手
    • @user3046211 : 更新更新 1 部分下的代码
    • 知道了,但是当我运行您的代码时出现超时异常。你遇到过这样的例外吗?
    • 啊!是吗 ?让我再试一次 Colab
    • 建议进行异常处理
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-06
    • 2016-12-24
    • 2018-09-23
    • 2020-06-26
    • 1970-01-01
    相关资源
    最近更新 更多