【问题标题】:Scraping box ccores with BeautifulSoup and using pandas to export to Excel用 BeautifulSoup 抓取盒子 ccores 并使用 pandas 导出到 Excel
【发布时间】:2017-09-12 03:34:37
【问题描述】:

我一直在尝试弄清楚如何使用 Python 3.6 以及 BeautifulSoup 和 Pandas 模块从 Fangraphs 中抓取棒球盒得分。我的最终目标是将网页的不同部分保存到 Excel 中的不同工作表中。

为了做到这一点,我想我必须通过各自的 id 标签分别拉出每个表。这是为构成第一个 Excel 工作表的四个表格(页面图表下方)执行此操作的代码。运行代码会导致这个错误:

Traceback (most recent call last):

File "Fangraphs Box Score Scraper.py", line 14, in <module>
df1 = pd.read_html(soup,attrs={'id': ['WinsBox1_dghb','WinsBox1_dghp','WinsBox1_dgab','WinsBox1_dgap']})

File "C:\Python36\lib\site-packages\pandas\io\html.py", line 906, in read_html
keep_default_na=keep_default_na)

File "C:\Python36\lib\site-packages\pandas\io\html.py", line 743, in _parse
raise_with_traceback(retained)

File "C:\Python36\lib\site-packages\pandas\compat\__init__.py", line 344, in raise_with_traceback

raise exc.with_traceback(traceback)

TypeError: 'NoneType' object is not callable

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=Red%20Sox&dh=0&season=2017'
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")

df1 = pd.read_html(soup,attrs={'id': ['WinsBox1_dghb','WinsBox1_dghp','WinsBox1_dgab','WinsBox1_dgap']})

writer = pd.ExcelWriter('Box Scores.xlsx')
df1.to_excel(writer,'Traditional Box Scores')

【问题讨论】:

  • 请添加完整的错误堆栈
  • 很抱歉。我刚刚添加了它。

标签: python excel pandas beautifulsoup


【解决方案1】:

你用错了id,你取自&lt;div&gt;,但需要取自&lt;table&gt;标签read_html attrs,我认为你不需要使用bs,试试看:

import pandas as pd

url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=Red%20Sox&dh=0&season=2017'
df1 = pd.read_html(
    url,
    attrs={'id': ['WinsBox1_dghb_ctl00', 'WinsBox1_dgab_ctl00']}
)

# and now df1 it is list of df
writer = pd.ExcelWriter('Box Scores.xlsx')
row = 0
for df in df1:
    df.to_excel(writer, sheet_name='tables', startrow=row , startcol=0)   
    row = row + len(df.index) + 3

writer.save()

【讨论】:

  • 很高兴为您提供帮助,请不要忘记接受答案
猜你喜欢
  • 2020-08-25
  • 2021-02-28
  • 2019-08-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-12-09
  • 1970-01-01
相关资源
最近更新 更多