【问题标题】:Efficiently Reshape Pandas Dataframe with Equivalent Variable使用等效变量有效地重塑 Pandas 数据框
【发布时间】:2021-03-05 13:40:00
【问题描述】:

我正在努力重塑 pandas 数据框。基本上,我正在试验体育统计数据,并试图整理得分。我有一个数据框,其中列出了客队和主队以及他们的分数,所有这些都在这样的独特列中:

Before Table

df = pd.DataFrame(data={'GameID': [1, 2], 'Date': ['9/10/2020', '9/13/2020'], 'Visitor': ['Houston Texans', 'Seattle Seahawks'], 'Score_V': [20,38], 'Home':['Kansas City Chiefs', 'Atlanta Falcons'], 'Score_H':[34,25]})

我正在重塑数据框,以便团队和分数都在他们自己的列中结束,所以我最终会得到这样的结果:

After Table

df2 = pd.DataFrame(data={'GameID': [1, 1, 2, 2], 'Date': ['9/10/2020', '9/10/2020', '9/13/2020', '9/13/2020'], 'Team': ['Houston Texans', 'Kansas City Chiefs', 'Seattle Seahawks', 'Atlanta Falcons'], 'Location':['Away', 'Home', 'Away', 'Home'], 'Score': [20,34,38,25]})

我想出了以下解决方案,我在其中融合了数据框并使用.loc() 逻辑来查找和替换各个列中的值。但我觉得这个解决方案不是很优雅,我错过了pandas.melt() 的一些明显功能。

df = df.melt(id_vars=['GameID','Date','Home','Visitor'])
df = df.rename(columns={"variable": "Location", "value": "Score"})
df.loc[df['Location'] == 'Score_V', 'Team'] = df['Visitor']
df.loc[df['Location'] == 'Score_V', 'Location'] = 'Away'
df.loc[df['Location'] == 'Score_H', 'Team'] = df['Home']
df.loc[df['Location'] == 'Score_H', 'Location'] = 'Home'
df = df.drop(columns=['Home', 'Visitor'])

还有比这更简单的解决方案吗?

【问题讨论】:

    标签: python pandas dataframe reshape melt


    【解决方案1】:

    目前尚不清楚如何使用比您的解决方案更简单的直接熔化方法,但是,您可以看看使用 pandas lreshape 进行重塑:

    df['location_V'] = 'Away'
    df['location_H'] = 'Home'
    
    cols = ['GameID','Date','Team','Location','Score']
    
    pd.lreshape(df, {'Team':['Visitor','Home'],
                     'Score':['Score_V','Score_H'],
                     'Location':['location_V','location_H']})\
      .sort_values(['GameID','Location']).reset_index(drop=True)[cols]
    
    >>> 
    
    #   GameID   Date              Team         Location    Score
    # 0   1    9/10/2020    Houston Texans        Away       20
    # 1   1    9/10/2020    Kansas City Chiefs    Home       34
    # 2   2    9/13/2020    Seattle Seahawks      Away       38
    # 3   2    9/13/2020    Atlanta Falcons       Home       25
    
    

    这是解决此问题的另一种方法,它涉及使用 lambda 函数提取行值,然后使用这些行值创建一个新数据框:

    home_score = lambda x: (x['GameID'],x['Date'],x['Home'],'Home',x['Score_H'])
    away_score = lambda x: (x['GameID'],x['Date'],x['Visitor'],'Away',x['Score_V'])
    
    data = pd.concat([df.apply(home_score,axis=1), df.apply(away_score,axis=1)],axis=0).to_list()
    
    pd.DataFrame(data, columns =['GameID','Date','Team','Location','Score']).sort_values(['GameID','Location']).reset_index(drop=True)
    
    >>> 
    
    #   GameID   Date              Team         Location    Score
    # 0   1    9/10/2020    Houston Texans        Away       20
    # 1   1    9/10/2020    Kansas City Chiefs    Home       34
    # 2   2    9/13/2020    Seattle Seahawks      Away       38
    # 3   2    9/13/2020    Atlanta Falcons       Home       25
    
    

    【讨论】:

      【解决方案2】:

      为了解决这个问题,我们需要找到一种重命名列的方法,以便我们可以将主队与“客场”配对,将客队与away配对:

          renamed = df.rename(
          columns=lambda column: "Team_Away"
          if column == "Visitor"
          else "Team_Home"
          if column == "Home"
          else f"{column[:-2]}_Away"
          if column.endswith("V")
          else f"{column[:-2]}_Home"
          if column.endswith("H")
          else column
      )
      
          GameID  Date    Team_Away   Score_Away  Team_Home   Score_Home
      0   1   9/10/2020   Houston Texans  20  Kansas City Chiefs  34
      1   2   9/13/2020   Seattle Seahawks    38  Atlanta Falcons 25
      

      然后您可以使用wide_to_long 重塑形状:

          pd.wide_to_long(
          renamed,
          stubnames=["Team", "Score"],
          i=["GameID", "Date"],
          j="Location",
          sep="_",
          suffix=".+",
      )
      
                                      Team          Score
      GameID  Date    Location        
         1    9/10/2020   Away    Houston Texans      20
                          Home    Kansas City Chiefs  34
         2    9/13/2020   Away    Seattle Seahawks    38
                          Home    Atlanta Falcons     25
      

      您也可以使用pyjanitor 中的pivot_longer 函数;目前你必须从github安装最新的开发版本:

       # install latest dev version
      # pip install git+https://github.com/ericmjl/pyjanitor.git
       import janitor
      renamed.pivot_longer(index=["GameID", "Date"], 
                           names_to=(".value", "Location"), 
                           names_sep="_",
                           sort_by_appearance=True)   
      
        GameID    Date      Location      Team          Score
      0   1      9/10/2020    Away    Houston Texans      20
      1   1      9/10/2020    Home    Kansas City Chiefs  34
      2   2      9/13/2020    Away    Seattle Seahawks    38
      3   2      9/13/2020    Home    Atlanta Falcons     25
      

      【讨论】:

        猜你喜欢
        • 2019-11-17
        • 1970-01-01
        • 1970-01-01
        • 2017-10-11
        • 2016-10-15
        • 2012-12-10
        • 1970-01-01
        • 1970-01-01
        • 2019-02-05
        相关资源
        最近更新 更多