有没有更好的方法来使用 selenium 从 HTML 表中获取文本？答案

【问题标题】：Is there a better way to fetch text from HTML table using selenium?有没有更好的方法来使用 selenium 从 HTML 表中获取文本？
【发布时间】：2021-03-15 14:31:52
【问题描述】：

我一直在尝试获取下面所附图片中圈出的文字。

Table Image

Website URL

我的代码：

driver.find_element_by_xpath('/html/body/chrome/div/mat-sidenav-container/mat-sidenav-content/div/ng-component/entity-v2/page-layout/div/div/div/page-centered-layout[3]/div/div/div[1]/row-card[1]/profile-section/section-card/mat-card/div[2]/div/list-card/div/table/tbody/tr/td[2]/field-formatter/identifier-formatter/a/div/div')

下面是我的代码的输出：

NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/chrome/div/mat-sidenav-container/mat-sidenav-content/div/ng-component/entity-v2/page-layout/div/div/div/page-centered-layout[3]/div/div/div[1]/row-card[1]/profile-section/section-card/mat-card/div[2]/div/list-card/div/table/tbody/tr/td[2]/field-formatter/identifier-formatter/a/div/div"}

（会话信息：chrome=89.0.4389.82）。

请问我该如何解决这个问题？

【问题讨论】：

您可以遍历表格并使用 .text 获取文本。如果您需要更多帮助，则必须澄清您的问题。
谢谢。这是我要提取的文本“IPO 后债务 - Climeon”，如果可以的话，我需要代码方面的帮助。提前致谢。

标签： python selenium web-scraping selenium-chromedriver selenium-webdriver-python

【解决方案1】：

要从动态表中获取值，请使用WebDriverWait() 并等待visibility_of_all_elements_located() 并关注xpath。

driver.get("https://www.crunchbase.com/organization/climeon/company_financials")
columnRecords=WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[.='Funding Rounds']/following ::table[1]//tbody//tr//td")))

for col in columnRecords:
    print(col.text)

您需要导入以下库

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

【讨论】：

【解决方案2】：

这是一个如何获取表格中每个元素的文本的示例。

table = driver.find_elements_by_xpath('/html/body/chrome/div/mat-sidenav-container/mat-sidenav-content/div/ng-component/entity-v2/page-layout/div/div/div/page-centered-layout[3]/div/div/div[1]/row-card[1]/profile-section/section-card/mat-card/div[2]/div/list-card/div/table/tbody/tr/td')
for x in range(1, len(table) + 1):
    # Here you have to find what number varies between items and 
    # use x instead of that number
    text = driver.find_element_by_xpath(f'/html/body/chrome/div/mat-sidenav-container/mat-sidenav-content/div/ng-component/entity-v2/page-layout/div/div/div/page-centered-layout[3]/div/div/div[1]/row-card[1]/profile-section/section-card/mat-card/div[2]/div/list-card/div/table/tbody/tr/td[{x}]/field-formatter/identifier-formatter/a/div/div').text 
    print(text)

我在你的问题中使用了 xpath，但我不知道它们是否正确，所以测试一下并告诉我

【讨论】：

所以@DeasSec 我试过了，这是我得到的错误消息：for r in range(1, row + 1): File "", line 1 for r in range(1, row + 1): ^ SyntaxError: unexpected EOF while parsing
我猜你需要在你的案例行中获取表格的长度