【问题标题】:How to select specific rows in a dataframe, group them and find the sum using python?如何选择数据框中的特定行,对它们进行分组并使用 python 求和?
【发布时间】:2020-12-12 02:38:04
【问题描述】:

以下是一些示例数据:

mydf = {'Month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
        'Freq': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60]
        }
my_df = pd.DataFrame(mydf, columns=['Month', 'Freq'])
my_df

  Month Freq
0   1   5
1   2   10
2   3   15
3   4   20
4   5   25
5   6   30
6   7   35
7   8   40
8   9   45
9   10  50
10  11  55
11  12  60

如何创建一个新的数据框,将月份分组为季节并找到每个季节频率的总和,而输出仍然是一个数据框?

我想要这样的东西:(冬天是月份 = 12、1、2)(春天是月份 = 3、4、5)(等等......)

   Season Freq
0  Winter 75
1  Spring 60
2  Summer 105
3  Autumn 150

我尝试选择行并将它们连接起来,但不幸的是我不断收到错误。

【问题讨论】:

标签: python pandas dataframe pandas-groupby


【解决方案1】:

您可以在该列上创建一个包含季节和组的新列:

my_df['Season']=df['Month'].apply(lambda x: 'Winter' if x in (12,1,2) else 'Spring' if x in (3,4,5) else 'Summer' if x in (6,7,8) else 'Autumn')

res=my_df.groupby('Season')['Freq'].sum()

>>> print(res)

Season
Autumn    150
Spring     60
Summer    105
Winter     75

【讨论】:

    【解决方案2】:

    最简单的方法之一是创建月份到季节映射器,然后使用 panda 的 map 函数

    season_map = {1: 'Winter', 2: 'Winter', 3: 'Spring', 4: 'Spring', 5: 'Spring', 6: 'Summer', 7: 'Summer', 8: 'Summer', 9:'Autumn', 10:'Autumn', 11: 'Autumn', 12: 'Winter'}
    my_df.loc[:, 'season'] = my_df.Month.map(season_map)
    my_df.groupby('season').freq.sum()
    

    如果不想手动创建映射器,可以使用这个答案:Python: Datetime to season

    【讨论】:

      【解决方案3】:

      这并不像应有的那样时尚(由于大量的 if 语句),但它确实有效:

          import pandas as pd
      
      mydf = {'Month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
              'Freq': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60]}
      my_df = pd.DataFrame(mydf, columns=['Month', 'Freq'])
      
      winter_counter = 0
      spring_counter = 0
      summer_counter = 0
      autumn_counter = 0
      
      for i in range(len(my_df)):
          #print(my_df.at[i,'Month'])
          if(my_df.at[i,'Month'] == 12 or my_df.at[i,'Month'] == 1 or my_df.at[i,'Month'] == 2):
              winter_counter = winter_counter + my_df.at[i,'Freq']
          if(my_df.at[i,'Month'] == 3 or my_df.at[i,'Month'] == 4 or my_df.at[i,'Month'] == 5):
              spring_counter = spring_counter + my_df.at[i,'Freq']
          if(my_df.at[i,'Month'] == 6 or my_df.at[i,'Month'] == 7 or my_df.at[i,'Month'] == 8):
              summer_counter = summer_counter + my_df.at[i,'Freq']
          if(my_df.at[i,'Month'] == 9 or my_df.at[i,'Month'] == 10 or my_df.at[i,'Month'] == 11):
              autumn_counter = autumn_counter + my_df.at[i,'Freq']
      
      data_for_result = {
          'Season': ['Winter','Spring','Summer','Autumn'],
          'Freq': [winter_counter, spring_counter, summer_counter, autumn_counter],
      }
      my_result = pd.DataFrame(data_for_result,columns = ['Season','Freq'])
      print(my_result)
      

      如果您需要解释:

      .at:访问一个奇异值 [row,columnName],我首先使用它来查看该行属于哪个季节,然后访问 freq 以将其添加到其对应的计数器中

      https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html#pandas.DataFrame.at

      【讨论】:

        猜你喜欢
        • 2020-08-14
        • 1970-01-01
        • 1970-01-01
        • 2013-07-27
        • 1970-01-01
        • 1970-01-01
        • 2017-10-29
        • 2020-08-09
        • 2019-08-27
        相关资源
        最近更新 更多