【发布时间】:2020-05-03 09:50:40
【问题描述】:
我目前正在尝试抓取一个包含 16 页的内部网站的表格。当我运行下面的代码时,最后一页上的表格没有被抓取,我收到以下错误:
Traceback (most recent call last):
File "C:/Users/mb4ig/PycharmProjects/Python/Test.py", line 56, in <module>
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
File "C:\Users\mb4ig\Python\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
当我选择 15 页时,代码运行良好,所有 15/16 页的表格都被抓取。第16页没有被刮掉。
请有人帮忙。谢谢。
page=1
max_page=16 # Only works when I select 15 pages but the last page isn't scraped.
name=[]
desc=[]
while page<=max_page:
rows= WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH,"//*[@id='container']/table/tbody/tr")))
for row in rows:
name.append(row.find_element_by_xpath('./td[1]').text)
desc.append(row.find_element_by_xpath('./td[2]').text)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
page=page+1
print('navigate to page: ' + str(page))
driver.close()
df=pd.DataFrame({"Name":name,"Description":desc})
print(df)
df.to_csv('Test.txt',index=False)
【问题讨论】:
标签: python selenium selenium-webdriver html-table