【问题标题】:Nested list elements to data frame in PythonPython中将列表元素嵌套到数据框
【发布时间】:2020-03-06 04:14:21
【问题描述】:

公平警告这个问题确实需要一个非标准的 Python 包,nba_api。我有一个包含 3 个元素的列表,列表中的每个元素都包含另一个包含 2 个元素的列表:player 数据框和 team 数据框。实现以下预期结果的推荐方法是:1 个组合 player 数据框和 1 个组合 team 数据框?来自 R 背景,我将通过以下方式解决此问题:1. 将 players 数据框与 team 数据框连接到 joined_list 然后,2. 使用 do.call(rbind, joined_list) 将结果行绑定到一个数据框.我知道这对于许多有经验的 Python 用户来说可能是非常初级的,但是在经过多次搜索后,我正在努力寻找正确的方法。

import nba_api
import requests
import pandas as pd

from nba_api.stats.endpoints import boxscoreadvancedv2

# vector of game ids (test purposes)
gameids = ['0021900001','0021900002','0021900012']

headers1 = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://stats.nba.com/',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
}

# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
    temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))

# manually access elements of list and output to data frame
## there has to be an easier way to access list elements and rowbind the results!!!
df_out0 = temp[0].get_data_frames()
df_player0 = df_out0[0]
df_team0 = df_out0[1]

df_out1 = temp[1].get_data_frames()
df_player1 = df_out1[0]
df_team1 = df_out1[1]

【问题讨论】:

  • 能否提供一组数据?

标签: python pandas list


【解决方案1】:

经过更多阅读(和清晰)后,我能够将代码的手动部分组合到 for 循环中,生成一个包含球员数据的列表和一个包含团队数据的列表。然后,使用这篇文章:Concatenate a list of pandas dataframes together 我能够将playerteam 列表合并到各自的数据框中。

## output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_players.append(df_out[0])         # index 0 will always contain player frame

df_players = pd.concat(df_players)
print(df_players)

## output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_team.append(df_out[1])            # index 1 will always contain team frame

df_team = pd.concat(df_team)
print(df_team)

【讨论】:

    【解决方案2】:

    首先,恭喜您坚持并自己找到了解决方案! :D

    评论和提示

    你可以直接遍历一个列表,不需要索引

    lst_1 = [1, 2, 3, 4]
    
    for i in range(len(lst_1)):
        print(i)
    

    可以写成

    lst_1 = [1, 2, 3, 4]
    
    for item in lst_1:
        print(item)
    

    List comprehensionsgenerator expressions 太棒了

    奖励:注意我对变量名所做的更改。有关 Python 样式的一般参考,请参阅 PEP 8

    gameids = ['0021900001','0021900002','0021900012']
    
    headers1 = {
        'Host': 'stats.nba.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.5',
        'Referer': 'https://stats.nba.com/',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
    }
    
    # store player and team results for each gameids as elements of list temp
    temp = list()
    for i in range(len(gameids)):
        temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))
    

    可以写成

    game_ids = ['0021900001','0021900002','0021900012']
    
    api_headers = {
        'Host': 'stats.nba.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.5',
        'Referer': 'https://stats.nba.com/',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
    }
    
    api_results = [boxscoreadvancedv2.BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids]
    

    你在同一件事上迭代了两次

    # output player frames
    i=0
    df_out=[]
    df_players=[]
    for i in range(len(temp)):
        df_out = temp[i].get_data_frames()
        df_players.append(df_out[0])         # index 0 will always contain player frame
    
    df_players = pd.concat(df_players)
    print(df_players)
    
    # output team frames
    i=0
    df_out=[]
    df_team=[]
    for i in range(len(temp)):
        df_out = temp[i].get_data_frames()
        df_team.append(df_out[1])            # index 1 will always contain team frame
    
    df_team = pd.concat(df_team)
    print(df_team)
    

    使用前两个技巧,我们最终得到的结果如下:

    players_lst = []
    team_lst = []
    
    for curr_res in api_results:
        curr_dfs = curr_res.get_data_frames()
        players_lst.append(curr_dfs[0])
        team_lst.append(curr_dfs[1])
    
    players_df = pd.concat(players_lst)
    team_df = pd.concat(team_lst)
    

    我的解决方案

    在这里,为了清楚起见,稍微细分一下。

    import pandas as pd
    from nba_api.stats.endpoints.boxscoreadvancedv2 import BoxScoreAdvancedV2
    
    game_ids = ['0021900001', '0021900002', '0021900012']
    
    api_headers = {
        'Host': 'stats.nba.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.5',
        'Referer': 'https://stats.nba.com/',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
    }
    
    # generator of results from the API
    api_results = (BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids)
    
    # generator of lists of DataFrames from the API results
    # think of it like: [[Player DF, Team DF], [Player DF, Team DF], ...]
    api_res_dfs = (curr_res.get_data_frames() for curr_res in api_results)
    
    # unpacking the size 2 lists of DataFrames into 2 flat lists
    # [[Player DF, Team DF], [Player DF, Team DF], ...] -> [Player DF, Player DF, ...], [Team DF, Team DF, ...]
    # see https://stackoverflow.com/q/2921847/11301900 for more on the use of the asterisk (*)
    players_tupe, team_tupe = zip(*api_res_dfs)
    
    # concatenating the various DataFrames, exactly the same as in your original code
    players_df = pd.concat(players_tupe)
    team_df = pd.concat(team_tupe)
    
    print(players_df)
    print(team_df)
    

    这取决于这样一个事实,正如您所指出的,玩家 DataFrame 始终位于列表中的第一位,而团队 DataFrame 始终位于第二位,而且这些是列表中的唯一两项结果列表。


    如果您有任何问题,请告诉我 :)

    【讨论】:

    • 这是一个非常好的、高质量的答案!我非常感谢有关我的代码的反馈,因为这是我编写的第一个 Python 代码。我接受你的回答,因为这是一个更好的回应。谢谢!
    • 对于您的第一个 Python 程序,这非常强大!你肯定有其他语言的经验,不是吗?另外,您有什么问题吗?我可能解释得不够清楚。
    • 我的背景是 R 语言,所以我绝对有这种经验可以依靠。我对你的回答没有任何疑问。它对我想要完成的工作非常有效。随着我继续学习 Python,我肯定会有更多的问题,所以我相信你会时不时地看到我提出的一些问题。再次感谢!
    猜你喜欢
    • 2021-12-11
    • 1970-01-01
    • 2015-03-11
    • 2018-08-09
    • 1970-01-01
    • 2018-09-07
    • 1970-01-01
    • 2021-06-11
    • 1970-01-01
    相关资源
    最近更新 更多