如果网站有多个表格，我如何抓取特定表格？答案

【问题标题】：How do I scrape a specific table if the website has multiple tables?如果网站有多个表格，我如何抓取特定表格？
【发布时间】：2020-03-23 05:13:40
【问题描述】：

我最近编写了一个脚本，从网站 (https://www.cmegroup.com/trading/interest-rates/cleared-otc.html) 上抓取一些财务数据，以便跟踪项目交易量的变化。

但是，他们似乎稍微更改了 HTML，我的脚本不再工作。

我曾经使用它来从 'table20' 中获取值。

#Options for Chrome Driver (Selenium)
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Program Files\Anaconda\chromedriver\chromedriver.exe')
driver.get("https://www.cmegroup.com/trading/interest-rates/cleared-otc.html")

current_page = driver.page_source

#Grab all the information from website HTML

soup = BeautifulSoup(current_page, 'html.parser')
tbl = soup.find("div", {"id": "table20"})

但是，tbl 现在是一个“NoneType”，其中没有任何内容。

我也尝试了以下方法，但无济于事：

table_2 = soup.find(lambda tag: tag.name == 'table' and tag.has_attr('id') and tag['id'] == 'table20')

所以问题是，我如何为 table20 抓取所有这些货币值？

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

好吧，我认为没有理由在这种情况下使用 selenium，因为它会减慢您的任务。

网站加载了JavaScript 事件，该事件在页面加载后动态呈现其数据。

requests 库将无法即时渲染 JavaScript。所以你可以使用selenium 或requests_html。确实有很多模块可以做到这一点。

现在，我们在表格上确实有另一个选项，可以跟踪数据的呈现位置。我能够找到 XHR 请求，该请求用于从 back-end API 检索数据并将其呈现给用户端。

您可以通过打开Developer-Tools 并检查Network 和检查XHR/JS 请求来获取XHR 请求，具体取决于调用类型，例如fetch

import requests
import pandas as pd


r = requests.get("https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/irs_settlement_TOTALS.xsl&url=/md/Clearing/IRS?date=03/20/2020&exchange=XCME")
df = pd.read_html(r.content, header=0)[1][:-1]

df.iloc[:, :5].to_csv("data.csv", index=False)

输出：view-online

输出样本：

【讨论】：

@nut_flush 很容易改变date=03/20/2020，如果我的回答对你有帮助的话。请随意勾选复选标记接受它。如果您喜欢，也可以投票。