【问题标题】:Can't collect information at the same time from two different depth using selenium无法使用硒同时从两个不同的深度收集信息
【发布时间】:2019-09-01 10:53:23
【问题描述】:

我在 python 中使用 selenium 编写了一个脚本,以从其登录页面使用get_names() 函数获取namereputation,然后单击不同帖子的链接以到达内页以便解析title 从那里使用 get_additional_info() 函数。

我试图解析的所有信息都可以在登录页面和内页中使用。而且,它们不是动态的,所以硒绝对是矫枉过正。 不过,我的意图是利用 selenium 从两个不同的深度同时抓取信息。

在下面的脚本中如果我注释掉namerep 行,我可以看到该脚本可以对着陆页的链接进行点击,并完美地解析内页中的titles。

但是,当我按原样运行脚本时,我收到 selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document 错误,它指向此 name = item.find_element_by_css_selector() 行。

我怎样才能摆脱这个错误,让它按照我已经实现的逻辑完美运行?

到目前为止我已经尝试过:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

lead_url = 'https://stackoverflow.com/questions/tagged/web-scraping'

def get_names():
    driver.get(lead_url)
    for count, item in enumerate(wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary")))):
        usableList = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary .question-hyperlink")))

        name = item.find_element_by_css_selector(".user-details > a").text
        rep = item.find_element_by_css_selector("span.reputation-score").text

        driver.execute_script("arguments[0].click();",usableList[count])
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink")))

        title = get_additional_info()
        print(name,rep,title)

        driver.back()
        wait.until(EC.staleness_of(usableList[count]))

def get_additional_info():
    title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink"))).text
    return title

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,5)
    get_names()

【问题讨论】:

    标签: python python-3.x selenium selenium-webdriver web-scraping


    【解决方案1】:

    与您的设计模式保持广泛一致...不要在item 上工作。使用count 索引从当前page_source 中提取的元素列表,例如

    driver.find_elements_by_css_selector(".user-details > a")[count].text
    

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    lead_url = 'https://stackoverflow.com/questions/tagged/web-scraping'
    
    def get_names():
        driver.get(lead_url)
        for count, item in enumerate(wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary")))):
            usableList = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".summary .question-hyperlink")))
    
            name = driver.find_elements_by_css_selector(".user-details > a")[count].text
            rep = driver.find_elements_by_css_selector("span.reputation-score")[count].text
    
            driver.execute_script("arguments[0].click();",usableList[count])
            wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink")))
    
            title = get_additional_info()
            print(name,rep,title)
    
            driver.back()
            wait.until(EC.staleness_of(usableList[count]))
    
    def get_additional_info():
        title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h1 > a.question-hyperlink"))).text
        return title
    
    if __name__ == '__main__':
        driver = webdriver.Chrome()
        wait = WebDriverWait(driver,5)
        get_names()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-17
      相关资源
      最近更新 更多