尝试抓取 FBref 网页时出现错误消息答案

【问题标题】：Error Message when trying to scrape FBref webpage尝试抓取 FBref 网页时出现错误消息
【发布时间】：2022-12-04 21:46:17
【问题描述】：

免责声明：我仍然是一个 python 初学者，第一次尝试抓取。

我正在尝试从当前 (22/23) 冠军联赛赛季中抓取球员统计数据并将其转换为 .csv 文件。如果您看到任何其他明显的错误，请指出。

网址：https://fbref.com/en/comps/8/stats/Champions-League-Stats

我试图更改以下代码以使其满足我的需要，但我没有成功： https://colab.research.google.com/drive/1PoHtZWcy8WaU1hnWmL7eCVUbxzci3-fr#scrollTo=2qYGN7pfk3gK

可以直接下载 .csv 文件，但我需要实际抓取网页。

这是我的（从上面修改过的）代码，我收到以下错误消息并且不知道如何解决该问题：

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re


# Functions to get the data in a dataframe using BeautifulSoup

def get_tables(url, text):
    res = requests.get(url)
    ## The next two lines get around the issue with comments breaking the parsing.
    comm = re.compile("<!--|-->")
    soup = BeautifulSoup(comm.sub("", res.text), 'lxml')
    all_tables = soup.findAll("table")

    player_table = all_tables[2]
    if text == 'for':
        return player_table
    if text != 'for':
        pass


def get_frame(features, player_table):
    pre_df_player = dict()
    features_wanted_player = features
    rows_player = player_table.find_all('tr')
    for row in rows_player:
        if (row.find('th', {"scope": "row"}) is not None):

            for f in features_wanted_player:
                cell = row.find("td", {"data-stat": f})
                a = cell.data.text().encode()
                text = a.decode("utf-8")
                if (text == ''):
                    text = '0'
                if ((f != 'player') & (f != 'nationality') & (f != 'position') & (f != 'squad') & (f != 'age') & (
                        f != 'birth_year')):
                    text = float(text.replace(',', ''))
                if f in pre_df_player:
                    pre_df_player[f].append(text)
                else:
                    pre_df_player[f] = [text]
    df_player = pd.DataFrame.from_dict(pre_df_player)
    return df_player


def frame_for_category(category, top, end, features):
    url = (top + category + end)
    player_table = get_tables(url, 'for')
    df_player = get_frame(features, player_table)
    return df_player


# Function to get the player data for outfield player, includes all categories - standard stats, shooting
# passing, passing types, goal and shot creation, defensive actions, possession, and miscallaneous
def get_outfield_data(top, end):
    df1 = frame_for_category('stats', top, end, stats)
    df2 = frame_for_category('shooting', top, end, shooting2)
    df3 = frame_for_category('passing', top, end, passing2)
    df4 = frame_for_category('passing_types', top, end, passing_types2)
    df5 = frame_for_category('gca', top, end, gca2)
    df6 = frame_for_category('defense', top, end, defense2)
    df7 = frame_for_category('possession', top, end, possession2)
    df8 = frame_for_category('misc', top, end, misc2)
    df = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8], axis=1)
    df = df.loc[:, ~df.columns.duplicated()]
    return df


# Function to get keeping and advance goalkeeping data
def get_keeper_data(top, end):
    df1 = frame_for_category('keepers', top, end, keepers)
    df2 = frame_for_category('keepersadv', top, end, keepersadv2)
    df = pd.concat([df1, df2], axis=1)
    df = df.loc[:, ~df.columns.duplicated()]
    return df

#This cell is to get the outfield player data for any competition

#Go to the 'Standard stats' page of the league
#For Champions League 2022/23, the link is this: https://fbref.com/en/comps/8/stats/Champions-League-Stats
#Remove the 'stats', and pass the first and third part of the link as parameters like below
df_outfield = get_outfield_data('https://fbref.com/en/comps/8/','/Champions-League-Stats')

#Save csv file to Desktop
df_outfield.to_csv('CL2022_23_Outfield.csv',index=False)

df_outfield

错误信息：

Traceback (most recent call last):
  File "/home/student/Pycharm/Scraping FBREF.py", line 123, in <module>
    df_outfield = get_outfield_data('https://fbref.com/en/comps/8/','/Champions-League-Stats')
  File "/home/student/Pycharm/Scraping FBREF.py", line 97, in get_outfield_data
    df1 = frame_for_category('stats', top, end, stats)
  File "/home/student/Pycharm/Scraping FBREF.py", line 90, in frame_for_category
    df_player = get_frame(features, player_table)
  File "/home/student/Pycharm/Scraping FBREF.py", line 72, in get_frame
    a = cell.data.text().encode()
AttributeError: 'NoneType' object has no attribute 'text'

【问题讨论】：

cell.data 是 None。在尝试访问 .text 属性之前，您需要检查该条件。
这回答了你的问题了吗？ Why do I get AttributeError: 'NoneType' object has no attribute 'something'?
@JohnGordon 它说有一个 beautifulsoup 标签元素，但我似乎无法获取数据。我真的很感激进一步的帮助。

标签： python web-scraping beautifulsoup

【解决方案1】：

自己找到了解决方案。我必须添加一个 if 语句，以便仅在单元格不是 None 时进行编码：

            for f in features_wanted_player:
            cell = row.find("td", {"data-stat": f})
            if cell is not None:
                a = cell.text.strip().encode()

现在它工作得很好。

【讨论】：