用 BeautifulSoup 刮表——Python答案

【问题标题】：Scrape table with BeautifulSoup--Python用 BeautifulSoup 刮表——Python
【发布时间】：2020-07-05 05:25:33
【问题描述】：

我正在尝试从这个网站上抓取一张表格：

我正在使用以下代码：

import requests
from bs4 import BeautifulSoup

URL = 'https://covidactnow.org/state/CA'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

soup.find_all('tr')

我相信代码应该找到表格，但是它返回一个空列表。

【问题讨论】：

java脚本渲染的数据。需要使用selenium之类的浏览器工具。美汤不能处理java脚本。

标签： web-scraping beautifulsoup

【解决方案1】：

@KunduK 是对的。你需要使用硒

import time
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(executable_path='Your:/Path/to/chromedriver.exe') 
driver.get("https://covidactnow.org/state/CA")
time.sleep(5)
html = driver.page_source
tables = pd.read_html(html)
data = tables[-1]
driver.quit()

【讨论】：

您好，亲爱的 Prakhar - 非常感谢您分享您的想法并提供帮助。我想学习，我只是潜入所有这些 BS4 和硒的东西。问题是——我是否必须先安装 webdriver——才能在我的 winmachine 上运行这个示例代码！？喜欢听到你的声音-问候零
嗨@zero 是的，您实际上需要安装 selenium。具体来说， pip install selenium 应该可以完成这项工作。接下来，您需要下载与您的 chrome 版本匹配的 chrome 驱动程序，并在下载到上述驱动程序路径后提供 exe 路径。如果我不清楚，请告诉我