【问题标题】:Most recent max/min value最近的最大值/最小值
【发布时间】:2015-08-10 22:24:36
【问题描述】:

我有以下数据框:

date          value
2014-01-20    10
2014-01-21    12
2014-01-22    13
2014-01-23    9
2014-01-24    7
2014-01-25    12
2014-01-26    11

我需要能够跟踪最新的最大值和最小值出现在特定滚动窗口内的时间。例如,如果我要使用 5 的滚动窗口期,那么我需要如下输出:

date          value   rolling_max_date    rolling_min_date
2014-01-20    10      2014-01-20          2014-01-20
2014-01-21    12      2014-01-21          2014-01-20
2014-01-22    13      2014-01-22          2014-01-20
2014-01-23    9       2014-01-22          2014-01-23
2014-01-24    7       2014-01-22          2014-01-24
2014-01-25    12      2014-01-22          2014-01-24
2014-01-26    11      2014-01-25          2014-01-24

所有这些显示的是,滚动窗口内最新的最大值和最小值的日期是什么时候。我知道 pandas 有 rolling_min 和 rolling_max,但我不知道如何跟踪窗口内最近的 max/min 发生时间的索引/日期。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    还有一个更通用的rolling_apply,您可以在其中提供自己的功能。但是,自定义函数将窗口作为数组而不是数据帧接收,因此索引信息不可用(因此您不能使用idxmin/max)。

    但让我们尝试分两步实现:

    In [41]: df = df.set_index('date')
    In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
    Out[42]:
                value
    date
    2014-01-20      0
    2014-01-21      0
    2014-01-22      0
    2014-01-23      3
    2014-01-24      4
    2014-01-25      3
    2014-01-26      2
    

    这会为您提供找到最小值的窗口中的索引。但是,此索引适用于该特定窗口,而不适用于整个数据框。所以让我们添加窗口的开始,然后使用这个整数位置来检索正确的索引位置索引:

    In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
    
    In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
    
    In [47]: ilocs
    Out[47]:
    date
    2014-01-20    0
    2014-01-21    0
    2014-01-22    0
    2014-01-23    3
    2014-01-24    4
    2014-01-25    4
    2014-01-26    4
    Name: value, dtype: float64
    
    In [48]: df.index.take(ilocs)
    Out[48]:
    Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
           u'2014-01-24', u'2014-01-24', u'2014-01-24'],
          dtype='object', name=u'date')
    
    In [49]: df['rolling_min_date'] = df.index.take(ilocs)
    
    In [50]: df
    Out[50]:
                value rolling_min_date
    date
    2014-01-20     10       2014-01-20
    2014-01-21     12       2014-01-20
    2014-01-22     13       2014-01-20
    2014-01-23      9       2014-01-23
    2014-01-24      7       2014-01-24
    2014-01-25     12       2014-01-24
    2014-01-26     11       2014-01-24
    

    最大值也可以这样:

    ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
    ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
    df['rolling_max_date'] = df.index.take(ilocs)
    

    【讨论】:

      【解决方案2】:

      这是一种解决方法。

      import pandas as pd
      import numpy as np
      
      # sample data
      # ===============================================
      np.random.seed(0)
      df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
      df
      
                  value
      2015-01-01     13
      2015-01-02     16
      2015-01-03     22
      2015-01-04      1
      2015-01-05      4
      2015-01-06     28
      2015-01-07      4
      2015-01-08      8
      2015-01-09     10
      2015-01-10     20
      2015-01-11     22
      2015-01-12     19
      2015-01-13      5
      2015-01-14     24
      2015-01-15      7
      2015-01-16     25
      2015-01-17     25
      2015-01-18     13
      2015-01-19     27
      2015-01-20      2
      
      # processing
      # ==========================================
      # your cumstom function to track on max/min value/date
      def track_minmax(df):
          return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})
      
      window = 5
      # use list comprehension to do the for loop
      pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)
      
                 rolling_max_date  rolling_max_val rolling_min_date  rolling_min_val
      2015-01-01              NaT              NaN              NaT              NaN
      2015-01-02              NaT              NaN              NaT              NaN
      2015-01-03              NaT              NaN              NaT              NaN
      2015-01-04              NaT              NaN              NaT              NaN
      2015-01-05       2015-01-03               22       2015-01-04                1
      2015-01-06       2015-01-06               28       2015-01-04                1
      2015-01-07       2015-01-06               28       2015-01-04                1
      2015-01-08       2015-01-06               28       2015-01-04                1
      2015-01-09       2015-01-06               28       2015-01-05                4
      2015-01-10       2015-01-06               28       2015-01-07                4
      2015-01-11       2015-01-11               22       2015-01-07                4
      2015-01-12       2015-01-11               22       2015-01-08                8
      2015-01-13       2015-01-11               22       2015-01-13                5
      2015-01-14       2015-01-14               24       2015-01-13                5
      2015-01-15       2015-01-14               24       2015-01-13                5
      2015-01-16       2015-01-16               25       2015-01-13                5
      2015-01-17       2015-01-16               25       2015-01-13                5
      2015-01-18       2015-01-16               25       2015-01-15                7
      2015-01-19       2015-01-19               27       2015-01-15                7
      2015-01-20       2015-01-19               27       2015-01-20                2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2023-04-02
        • 2011-12-22
        • 2014-11-05
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多