如何使用 Selenium 和 Python 从这些 JavaScript 表中提取数据？答案

【问题标题】：How do I extract the data from these JavaScript tables using Selenium and Python?如何使用 Selenium 和 Python 从这些 JavaScript 表中提取数据？
【发布时间】：2020-07-16 00:39:03
【问题描述】：

我对 Python、JavaScript 和 Web-Scraping 非常陌生。我正在尝试编写将此类表中的所有数据写入 csv 文件的代码。网页是“https://www.mcmaster.com/cam-lock-fittings/material~aluminum/”

我开始尝试在 html 中查找数据，但后来意识到该网站使用 JavaScript。然后我尝试使用 selenium，但我无法在 JavaScript 代码中找到这些表中显示的实际数据的任何地方。我写了这段代码，看看是否可以在任何地方找到显示数据，但我找不到。

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='C:/Users/Brian Knoll/Desktop/chromedriver.exe', options=options)

driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

我敢肯定，有一个明显的答案在我脑海中浮现。任何帮助将不胜感激！谢谢！

【问题讨论】：

标签： javascript python selenium web-scraping beautifulsoup

【解决方案1】：

我想您需要等到您要查找的表格加载完毕。
为此，请添加以下行以等待 10 秒，然后再开始抓取数据

fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

这里是完整的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), options=options)

driver.get(url)
fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

【讨论】：

感谢您的回复，罗拉。我收到以下错误：NameError: name 'By' is not defined
将 from selenium.webdriver.common.by import By 放在脚本的顶部。将允许您访问 By.ID。
感谢您的回答，AaronS。它运行了那个时间但抛出了超时异常
@bknoll16 那是因为该元素不存在，抱歉我没有注意到 id 是自动生成的。我已经修改了我的代码，请立即重试
@Rola 感谢您对此的坚持。我仍然收到更新代码的超时异常。一切顺利吗？