如何使用 BeautifulSoup、Requests、Python 从 HTML 中的特定表中抓取数据？答案

【问题标题】：How to scrape data from a specific table in HTML using BeautifulSoup, Requests, Python?如何使用 BeautifulSoup、Requests、Python 从 HTML 中的特定表中抓取数据？
【发布时间】：2017-11-13 09:59:47
【问题描述】：

这是我目前拥有的代码：

from bs4 import BeautifulSoup

import requests

url  = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065").content

soup = BeautifulSoup(url, 'html.parser')

table = soup.find('table', {'class': 'sidearm-table play-by-play'})

我的表变量不断返回为空（或“无”）。这可能只是一个语法问题。我非常精通 Matlab，但是我对 Python/BeautifulSoup/Requests/等还很陌生。

任何指针将不胜感激。

我主要尝试从播放表中提取数据，以便我可以在替代程序中解析这些数据并为各个玩家组装数据结构。这部分我很有信心在我收集数据后可以完成。

感谢您的帮助！

【问题讨论】：

对不起，我已经回答了这个问题。我认为问题是.content 无法正常工作的其他问题，但是，我错了。该网站只需要某种身份验证。

标签： python web-scraping beautifulsoup python-requests

【解决方案1】：

from bs4 import BeautifulSoup

import requests

header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

url = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065", headers=header).text

soup = BeautifulSoup(url, 'html.parser')
table = soup.find('table', {'class': 'sidearm-table play-by-play'})

print(table)

问题似乎是该网站需要某种标头，即使 requestmodule 有很好的支持，您也必须通过例如上面提到的一些东西。

【讨论】：