【问题标题】:Waiting for invisible elements not on the page等待页面上不可见的元素
【发布时间】:2017-12-20 13:54:05
【问题描述】:

我正在尝试通过关注script 删除此网页。

我不能等待这个元素,它没有正确抓取。

clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('//a[@class='style-scope match-pop-market']'))))

Chrome 检查中的元素是正确的。

//a[@class='style-scope match-pop-market'] 

如何获取当前页面 elem_href 而不是其他页面上似乎出现的其他元素不可见。

//div[@class='mpm_match_title' and .//div[@class='mpm_match_title style-scope match-pop-market']]//a[@class='style-scope match-pop-market'] 

虽然理论上应该可以解决此问题,但不起作用。有任何想法吗?当前输出:

None
None
None
None
None
None
None
None
None
None
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6381070
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386987
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386988
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386989
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386990
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386991
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386992
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387025
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387026
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387027
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387028

无法等待元素,因为它想等待当前页面上不可见的元素。

所以:

//div[contains(@class, 'mpm_match_title')] #TEXT
//div[contains(@class, 'mpm_match_title style-scope match-pop-market')]  #BAR
//a[contains(@class, 'style-scope match-pop-market')] #HREF
style-scope match-pop-market

综合:

//div[contains(@class, 'mpm_match_title') and .//div[contains(@class, 'mpm_match_title style-scope match-pop-market')]//a[@class='style-scope match-pop-market']

找不到。

期望的输出:

https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6381070
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386987
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386988
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386989
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386990
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386991
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6386992
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387025
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387026
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387027
https://www.palmerbet.com/sports/soccer/italy-serie-b/match/6387028

【问题讨论】:

    标签: python css python-3.x selenium xpath


    【解决方案1】:

    使用来自 cmets 中 pastebin 链接的代码,我基本上只是修改了 Xpath 以搜索可以识别当前页面上的链接的特定元素。

    from random import shuffle
    
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait as wait
    
    driver = webdriver.Chrome()
    driver.set_window_size(1024, 600)
    driver.maximize_window()
    driver.get('https://www.palmerbet.com/sports/soccer')
    
    clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, 
        ('//*[contains(@class,"filter_labe")]'))))
    options = driver.find_elements_by_xpath('//*[contains(@class,"filter_labe")]')
    
    indexes = [index for index in range(len(options))]
    shuffle(indexes)
    
    xp = '//sport-match-grp[not(contains(@style, "display: none;"))]' \
        '//match-pop-market[@class="sport-match-grp" and ' \
        'not(contains(@style, "display: none;")) and ' \
        './/a[@id="match_link" and boolean(@href)]]'
    
    for index in indexes:
        print(f'Loading index {index}')
        driver.get('https://www.palmerbet.com/sports/soccer')
        clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,
            '(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1))))
        driver.execute_script("arguments[0].scrollIntoView();", clickMe1)
        clickMe1.click()
    
        try:
            # this attempts to find any links on the page
            clickMe = wait(driver, 3).until(EC.element_to_be_clickable((
                By.XPATH, xp)))
            elems = driver.find_elements_by_xpath(xp)
    
            elem_href = []
            for elem in elems:
                print(elem.find_element_by_xpath('.//a[@id="match_link"]')
                    .get_attribute('href'))
                elem_href.append(elem.get_attribute("href"))
        except:
            print(f'There are no matches in index {index}.')
    

    【讨论】:

    • 单击页面时等待时间不起作用。参见:@Line28 clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, ("//a[@class='style-scope match-pop-market']")))) @ 987654321@。这会产生更好的输出,但无法输出,因为它等待页面上不可见的元素。
    • 请注意,它适用于单页和第一次点击。在那之后,nada。
    • 感谢您向我展示代码,我对原始请求感到有些困惑。更新我的回复以反映新的答案。
    • 另外,只是一个头像,英格兰足总杯和世界杯页面不包含链接。当等待失败时,添加 try / except 来处理它们。
    • 你是个传奇! :)。这真的很好。每个页面上出现的#match_title(团队名称)的 xp 选择器是什么。这意味着所有数据都被抓取,而不是一些丢失的页面。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-02-17
    • 1970-01-01
    • 2020-05-20
    • 1970-01-01
    • 1970-01-01
    • 2019-11-02
    • 2016-05-09
    相关资源
    最近更新 更多