BeautifulSoup 找不到表答案

【问题标题】：BeautifulSoup can't find tableBeautifulSoup 找不到表
【发布时间】：2021-03-25 18:12:54
【问题描述】：

我正在尝试从此链接获取表格：https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall

url = 'https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('table')

但是，我返回了一个空列表。我查看了网站的 html，我可以看到表格标签。拉这些表我缺少什么？

【问题讨论】：

这可能会有所帮助：stackoverflow.com/questions/2935658/…
看起来表格是通过 JavaScript 加载的，所以不幸的是，从 URL 中获取不会包含它。您可以通过打印出requests 加载的内容来检查。

标签： python-3.x beautifulsoup

【解决方案1】：

需要selenium 来提取表数据，因为数据是通过 JavaScript 加载的。作为一个例子，我在这里提取表一数据并保存到 csv 文件。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

url = 'https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall'
driver = webdriver.Chrome(r"C:\Users\Subrata\Downloads\chromedriver.exe")
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser')
tables = soup.select('div.StandingsGridRender_standingsContainer__2EwPy')
table1 = []
for td in tables[0].find_all('tr'):
    first =[t.getText(strip=True, separator=' ') for t in td]
    table1.append(first)


df = pd.DataFrame(table1[1:], columns=table1[0] )

df.to_csv('x.csv')

【讨论】：

这适用于我的本地计算机，但是当我将代码部署到 Heroku 时，bs4 找不到表，并且在尝试遍历表时出现 IndexError。你知道为什么会这样吗？