【问题标题】:Better coding practice when using pandas?使用 pandas 时更好的编码实践?
【发布时间】:2020-07-20 22:37:49
【问题描述】:

我正在尝试在编写函数时改进我的编码实践。我的最终目标是获得在特定团队中完成某类任务所需的时间。我有一个带有代码的起始数据框:

    data = {'Team':['A', 'A', 'A', 'B','B','B','A','B'], 'Time':[20, 21, 19, 18,17,15,22,25],'Type':['Bike', 'Car', 'Walk', 'Scooter','Bike', 'Car', 'Walk', 'Scooter']} 

   df_new = pd.DataFrame(data) 

然后我写一个函数,比如:

def timer(df):
    team_A = df[df['Team'] == 'A']
    team_A_time_total = team_A.Time.sum()

    team_A_biketime_ = (team_A[team_A['Type'] == 'Bike'].Time.sum() / team_A_time_total)
    team_A_cartime_ = (team_A[team_A['Type'] == 'Car'].Time.sum() / team_A_time_total)
    team_A_walktime_ = (team_A[team_A['Type'] == 'Walk'].Time.sum() / team_A_time_total)
    team_A_scootertime_ = (team_A[team_A['Type'] == 'Scooter'].Time.sum() / team_A_time_total)

    team_B = df[df['Team'] == 'B']
    team_B_time_total = team_A.Time.sum()

    team_B_biketime_ = (team_B[team_B['Type'] == 'Bike'].Time.sum() / team_A_time_total)
    team_B_cartime_ = (team_B[team_B['Type'] == 'Car'].Time.sum() / team_A_time_total)
    team_B_walktime_ = (team_B[team_B['Type'] == 'Walk'].Time.sum() / team_A_time_total)
    team_B_scootertime_ = (team_B[team_B['Type'] == 'Scooter'].Time.sum() / team_A_time_total)

    return team_A_biketime_,team_A_cartime_, team_A_walktime_, team_A_scootertime_,team_B_biketime_,team_B_cartime_, team_B_walktime_, team_B_scootertime_

我知道这段代码可以用更简洁的方式编写,但很难让它正确。我尝试过:

def timer(df):
    types = ['Bike','Car','Walk','Scooter']
    teams = ['A','B']

    for team in teams: 
        df_team = df[df['Team'] == team]
        df_team_time = df_team.Time.sum()
        for value in types:
            df_value = df_team[df_team['Type'] == value]
            df_value_time = df_value.Time.sum() / df_team_time
    return df_value_time

这对我来说似乎不正确。

【问题讨论】:

    标签: python pandas loops dataframe for-loop


    【解决方案1】:

    关于pandas的更好实践,看来你需要了解更多关于df.loc[]

    我想这就是你想要的:

    import pandas as pd
    
    data = {'Team':['A', 'A', 'A', 'B','B','B','A','B'], 'Time':[20, 21, 19, 18,17,15,22,25],'Type':['Bike', 'Car', 'Walk', 'Scooter','Bike', 'Car', 'Walk', 'Scooter']} 
    
    df_new = pd.DataFrame(data) 
    
    def timer(df, team, type):
    
        return df.loc[df['Team']==team].loc[df['Type']==type]
    

    【讨论】:

      【解决方案2】:

      您正在编写一个函数来执行 pandas 已经以优化方式执行的操作。所以,只要有可能,您应该尝试将groupby 与聚合函数一起使用。

      因此,对于简单的聚合,您可以使用

      df_new.groupby(['Team', 'Type'])['Time'].agg(['sum', 'mean'])
      

      产生这个输出

                    sum  mean
      Team Type              
      A    Bike      20  20.0
           Car       21  21.0
           Walk      41  20.5
      B    Bike      17  17.0
           Car       15  15.0
           Scooter   43  21.5
      

      一旦熟悉了基础,就可以进行更复杂的操作了

      summdf = df_new.groupby(['Team', 'Type'])['Time'].agg(['sum'])
      summdf = summdf.reset_index()
      summdf = summdf.rename(columns={'sum': 'team_type_sum'})
      summdf['team_tot'] = summdf.groupby('Team')['team_type_sum'].transform('sum')
      summdf['team_type_pct'] = summdf['team_type_sum'] / summdf['team_tot']
      
      

      产生

        Team     Type  team_type_sum  team_tot  team_type_pct
      0    A     Bike             20        82       0.243902
      1    A      Car             21        82       0.256098
      2    A     Walk             41        82       0.500000
      3    B     Bike             17        75       0.226667
      4    B      Car             15        75       0.200000
      5    B  Scooter             43        75       0.573333
      
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-12-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-10-19
        相关资源
        最近更新 更多