【问题标题】:How to sum values by day in Pandas library?如何在 Pandas 库中按天计算值?
【发布时间】:2020-09-22 13:56:18
【问题描述】:

我已经创建了以下字典:

for k, er in dicio.items():
    #dicio[k]['Return %'] = er.iloc[:, 0].pct_change(-1)*100
    dicio[k]['Day'] = er.index.day
dicio

 {'WDOFUT':             WDOFUT  Day
 Data                   
 2020-09-11  5325.0   11
 2020-09-10  5325.0   10
 2020-09-09  5312.5    9
 2020-09-08  5366.0    8
 2020-09-04  5303.0    4
 ...            ...  ...
 1994-07-08     NaN    8
 1994-07-07     NaN    7
 1994-07-06     NaN    6
 1994-07-05     NaN    5
 1994-07-04     NaN    4
 
 [6482 rows x 2 columns],
 'WEGE3':             WEGE3  Day
 Data                  
 2020-09-11  62.42   11
 2020-09-10  62.42   10
 2020-09-09  64.93    9
 2020-09-08  63.00    8
 2020-09-04  64.49    4
 ...           ...  ...
 1994-07-08    NaN    8
 1994-07-07    NaN    7
 1994-07-06    NaN    6
 1994-07-05    NaN    5
 1994-07-04    NaN    4
 
 [6482 rows x 2 columns],
 'YDUQ3':             YDUQ3  Day
 Data                  
 2020-09-11  27.31   11
 2020-09-10  27.31   10
 2020-09-09  27.99    9
 2020-09-08  28.75    8
 2020-09-04  27.78    4
 ...           ...  ...
 1994-07-08    NaN    8
 1994-07-07    NaN    7
 1994-07-06    NaN    6
 1994-07-05    NaN    5
 1994-07-04    NaN    4
 
 [6482 rows x 2 columns]}

我可以按天分组,但只取字典的最后一项(YDUQ3):

grouped_by_day = dicio[k].groupby('Day')
grouped_by_day.describe()

YDUQ3
count   mean    std min 25% 50% 75% max
Day                             
1   86.0    13.974651   9.391865    2.96    5.4450  11.770  21.2000 39.75
2   95.0    15.022842   10.624683   2.57    5.6900  13.290  21.4050 49.19
3   102.0   15.262549   11.061839   2.44    5.8950  12.800  21.8575 53.85
              ................................................
29  96.0    14.498229   10.321219   2.61    5.4150  12.975  21.0425 50.88
30  92.0    14.914674   10.701043   2.61    5.5125  13.120  21.7150 51.32
31  51.0    15.339608   10.676544   2.96    6.1350  13.420  21.7150 51.73

我可以看到下面显示的每日分组字典,但只针对最后一项(我需要全部):

list(grouped_by_day)

[(1,
              YDUQ3  Day
  Data                  
  2020-09-01  27.89    1
  2020-07-01  34.41    1
  2020-06-01  29.82    1
  2020-04-01  21.30    1
  2019-11-01  39.75    1
  ...           ...  ...
  1995-02-01    NaN    1
  1994-12-01    NaN    1
  1994-11-01    NaN    1
  1994-09-01    NaN    1
  1994-08-01    NaN    1      
  [182 rows x 2 columns]),
   ......................
   ......................
  (31,
              YDUQ3  Day
  Data                  
  2020-08-31  26.95   31
  2020-07-31  33.89   31
  2020-03-31  21.76   31
  2020-01-31  51.73   31
  2019-10-31  38.52   31
  ...         ...    ...
  1995-05-31    NaN   31
  1995-03-31    NaN   31
  1995-01-31    NaN   31
  1994-10-31    NaN   31
  1994-08-31    NaN   31
  
  [113 rows x 2 columns])]

问题:

  • 我怎样才能显示字典的 3 项? (dicio[k] 只取一个键(最后一个))

  • 我想将所有同一天的回报率加起来。

    • 如果跨度为 10 年,则会有 ~120 天 01、~120 天 02 等等。

    • 每个品种都有一个 31 x ~120 的字典,我们可以在其中选择累积收益的最高日和累积收益的最低日。

    • 然后我想显示整个股票组合的最高/最低回报及其发生天数。

【问题讨论】:

    标签: python pandas dictionary


    【解决方案1】:

    从您的问题的详细信息,我不确定,但从您的问题的框架来看,您似乎对每只股票都有一个单独的数据框。如果是这种情况,您可以尝试将它们全部组合到一个数据框中。我把这个例子放在一起来说明我的意思。

      import pandas as pd
      import numpy as np
      dicio =  {
          'WDOFUT': [              
       [pd.Timestamp(year=2020, month= 9, day= 11),  5325.0, 11],
       [pd.Timestamp(year=2020, month= 9, day= 10),  5325.0, 10],
       [pd.Timestamp(year=2020, month= 9, day= 9),  5312.5, 9],
       [pd.Timestamp(year=2020, month= 9, day= 8),  5366.0, 8],
       [pd.Timestamp(year=2020, month= 9, day= 4),  5303.0, 4],
       [pd.Timestamp(year=1994, month= 7, day= 8),  np.nan,  8],
       [pd.Timestamp(year=1994, month= 7, day= 7),  np.nan, 7],
       [pd.Timestamp(year=1994, month= 7, day= 6),  np.nan, 6],
       [pd.Timestamp(year=1994, month= 7, day= 5),  np.nan,  5],
       [pd.Timestamp(year=1994, month= 7, day= 4),  np.nan, 4],],
          'WEGE3': [
       [pd.Timestamp(year=2020, month=9, day= 11),  62.42, 11],
       [pd.Timestamp(year=2020, month=9, day= 10),  62.42, 10],
       [pd.Timestamp(year=2020, month=9, day= 9),  64.93,  9],
       [pd.Timestamp(year=2020, month=9, day= 8), 63.00,  8],
       [pd.Timestamp(year=2020, month=9, day= 4),  64.49,  4],
       [pd.Timestamp(year=1994, month=7, day= 8), np.nan,  8],
       [pd.Timestamp(year=1994, month=7, day= 7), np.nan,  7],
       [pd.Timestamp(year=1994, month=7, day= 6), np.nan, 6],
       [pd.Timestamp(year=1994, month=7, day=5), np.nan,  5],
       [pd.Timestamp(year=1994, month=7, day=4), np.nan,  4]
       ],
          'YDUQ3':[                  
       [pd.Timestamp(year=2020, month=9, day= 11),  27.31,   11],
       [pd.Timestamp(year=2020, month=9, day= 10),  27.31,    10],
       [pd.Timestamp(year=2020, month=9, day= 9),  27.99,    9],
       [pd.Timestamp(year=2020, month=9, day= 8),  28.75,    8],
       [pd.Timestamp(year=2020, month=9, day= 4),  27.78,   4],
       [pd.Timestamp(year=1994, month=7, day= 8), np.nan,   8],
       [pd.Timestamp(year=1994, month=7, day= 7), np.nan,  7],
       [pd.Timestamp(year=1994, month=7, day= 6), np.nan,   6],
       [pd.Timestamp(year=1994, month=7, day= 5), np.nan,  5],
       [pd.Timestamp(year=1994, month=7, day= 4), np.nan,  4]],
       }
       data_list = []
       for stk in dicio.keys():
          for itm in dicio[stk]:
              dline =[stk]
              dline.extend(itm)
              data_list.append(dline)  
       df = pd.DataFrame(data= data_list, columns= ['Stock','Date', 'Return','Day'])
       grouped_by_day = df.groupby(by=['Day','Stock']).mean()
        
    

    grouped_by_day 产量的打印输出:

                 
    Day Stock   Return
    4   WDOFUT  5303.00
        WEGE3   64.49
        YDUQ3   27.78
    5   WDOFUT  NaN
        WEGE3   NaN
        YDUQ3   NaN
    6   WDOFUT  NaN
        WEGE3   NaN
        YDUQ3   NaN
    7   WDOFUT  NaN
        WEGE3   NaN
        YDUQ3   NaN
    8   WDOFUT  5366.00
        WEGE3   63.00
        YDUQ3   28.75
    9   WDOFUT  5312.50
        WEGE3   64.93
       YDUQ3    27.99
    10  WDOFUT  5325.00
        WEGE3   62.42
        YDUQ3   27.31
    11  WDOFUT  5325.00
        WEGE3   62.42
        YDUQ3   27.31
    

    我认为您应该能够从这个 group_by 结果中得出您正在寻找的结果。

    【讨论】:

    • 对不起,这不是我要找的。​​span>
    • 你想要达到什么目的?您想要的结果与建议的解决方案有何不同?
    • 从 2010 年到 2020 年会有多个股票代码。我计算每个代码每天的每日回报。然后,对于每个股票代码,将有 ~120 天 01、~120 天 02 等等。我想知道当月每一天的累积回报。假设股票 WEGE3 在第 03 天的 31 天中累计回报最高,在第 19 天最低。如果需要,请询问更多。
    • 你可以参考这个帖子:stackoverflow.com/questions/63901027/… .......这里先生。特伦顿·麦金尼(Trenton McKinney)通过使用回报而不考虑当天来得出结果。我想为每个符号找出最好和最坏的日子。
    猜你喜欢
    • 1970-01-01
    • 2015-02-12
    • 2021-04-18
    • 2018-08-04
    • 1970-01-01
    • 2019-03-05
    • 1970-01-01
    • 2011-05-24
    • 2017-10-18
    相关资源
    最近更新 更多