【问题标题】:Python, Selenium - parse. Can't get info from dynamic fillingPython,Selenium - 解析。无法从动态填充中获取信息
【发布时间】:2021-11-06 20:23:43
【问题描述】:

我想学习如何从动态生成的字段中获取信息。 当我尝试简单的网站时,一切正常。然后我决定尝试更困难,现在我无法弄清楚。我花了大约两周的时间,一遍又一遍地划掉我在 Internet 上找到的解决方案选项。 现在我不确定我是否可以通过这种方式获得出现在网站上的信息。当然,很可能我做错了什么,但我无法接受一些新的想法。现在,我决定在这里问。或许有人明白这一点,可以提示。如果是 - 请给我一些例子。

我用来学习的网站 - kbp.aero/en

我想要获取的信息(到达时间表) - .tbody .tr .td

例如我试过:

1.

URL = 'https://kbp.aero/en/'
    HEADERS = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
    }
    time.sleep(1)
    response = requests.get(URL, headers = HEADERS)
    soup = BeautifulSoup(response.content, 'html.parser')
    items = soup.find('div', class_ = 'table_wrp out yesterday')
    items = items.findAll('tr', class_ = 'tr')
    comps = []
    if(len(items) > 0):
        for item in items:
            comps.append({
                'title':item.find('td', class_ = 'td').get_text(strip = True),
            })
    for comp in comps:
        print(comp['title'])
    # for item in items:
    #     comps.append({
    #         'text': item.get_text(strip=True)
    #     })
    #
    # for comp in comps:
    #     print(comp['text'])
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


def main():
    driver = webdriver.Chrome()
    driver.get("https://kbp.aero/en/")

    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.text_to_be_present_in_element((By.CLASS_NAME, 'tbody'), ''))

    tds = element.find_elements(By.CLASS_NAME, "td")
    for td in tds:
        print(td.text)

    # try:
    #     element = WebDriverWait(driver, 10).until(
    #         EC.presence_of_element_located((By.CLASS_NAME, "tbody"))
    #     )
    #     tds = element.find_elements(By.CLASS_NAME, "td")
    #     for td in tds:
    #         print(td.text)
    #
    # finally:
    #     driver.quit()

感谢您的建议。

【问题讨论】:

    标签: python selenium parsing


    【解决方案1】:

    这将获取整个表格数据:

       from time import sleep
       from selenium import webdriver
       from selenium.webdriver.common.by import By
    
       PATH = r"chromedriverexe path"
       driver = webdriver.Chrome(PATH)
    
       driver.get("https://kbp.aero/en/")
       driver.maximize_window()
       sleep(3)
       print(driver.find_element(By.CSS_SELECTOR, "div.table_wrp.out.today > table").text)
    

    输出:

    Рейс Час Призначення Перевізник Термінал Гейт Статус
    TK 1256 15:05 Istanbul Turkish Airlines D D5 Boarding Completed
    PS 9556 15:05 Istanbul Ukraine International Airlines D D5 Boarding Completed
    7W 163 15:10 Lviv Wind Rose D D19 Boarding
    FR 3167 15:10 Warsaw Ryanair D D9 Boarding
    PS 9013 15:15 Ivano-Frankivsk Ukraine International Airlines D D18 Boarding
    7W 113 15:15 Ivano-Frankivsk Wind Rose D D18 Boarding
    

    【讨论】:

    • 非常感谢! :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-03-05
    • 1970-01-01
    • 2015-04-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多