【问题标题】:pandas - Pythonic way to slicing DataFrame with DateTimeIndexpandas - 使用 DateTimeIndex 对 DataFrame 进行切片的 Pythonic 方法
【发布时间】:2018-10-04 21:39:37
【问题描述】:

本题使用Python-3.7pandas-0.23.4

我目前正在处理我需要检索每个交易日 08:15 到 13:45 之间的数据的金融数据集

变量设置

为了说明这一点,我有一个 DataFrame 变量和 DateTimeIndex 连续分钟频率声明为以下代码:

y = (
    pd.DataFrame(columns=['x', 'y'])
    .reindex(pd.date_range('20100101', '20100105', freq='1min'))
)

问题介绍

我想在 08:1513:45 之间对每个 day 的数据进行切片。以下代码似乎可以工作,但我认为它不是非常 Pythonic,考虑到最后的双索引,它似乎不是很节省内存:

In [108]: y[y.index.hour.isin(range(8,14))][15:][:-14]
Out[108]: 
                       x    y
2010-01-01 08:15:00  NaN  NaN
2010-01-01 08:16:00  NaN  NaN
2010-01-01 08:17:00  NaN  NaN
2010-01-01 08:18:00  NaN  NaN
2010-01-01 08:19:00  NaN  NaN
...                  ...  ...
2010-01-04 13:41:00  NaN  NaN
2010-01-04 13:42:00  NaN  NaN
2010-01-04 13:43:00  NaN  NaN
2010-01-04 13:44:00  NaN  NaN
2010-01-04 13:45:00  NaN  NaN

[1411 rows x 2 columns]

编辑:彻底检查数据后,上面的索引并没有解决问题,因为数据仍然包含2010-01-01 13:45:00之后和2010-01-02 08:15:00之前的时间:

In [147]: y[y.index.hour.isin(range(8,14))][15:][:-14].index[300:400]
Out[147]: 
DatetimeIndex(['2010-01-01 13:15:00', '2010-01-01 13:16:00',
               '2010-01-01 13:17:00', '2010-01-01 13:18:00',
               '2010-01-01 13:19:00', '2010-01-01 13:20:00',
               ...
               '2010-01-01 13:35:00', '2010-01-01 13:36:00',
               '2010-01-01 13:37:00', '2010-01-01 13:38:00',
               '2010-01-01 13:39:00', '2010-01-01 13:40:00',
               '2010-01-01 13:41:00', '2010-01-01 13:42:00',
               '2010-01-01 13:43:00', '2010-01-01 13:44:00',
               '2010-01-01 13:45:00', '2010-01-01 13:46:00', # 13:46:00 should be excluded
               '2010-01-01 13:47:00', '2010-01-01 13:48:00', # this should be excluded
               '2010-01-01 13:49:00', '2010-01-01 13:50:00', # this should be excluded
               '2010-01-01 13:51:00', '2010-01-01 13:52:00', # this should be excluded
               '2010-01-01 13:53:00', '2010-01-01 13:54:00', # this should be excluded
               '2010-01-01 13:55:00', '2010-01-01 13:56:00', # this should be excluded
               '2010-01-01 13:57:00', '2010-01-01 13:58:00', # this should be excluded
               '2010-01-01 13:59:00', '2010-01-02 08:00:00', # this should be excluded
               '2010-01-02 08:01:00', '2010-01-02 08:02:00', # this should be excluded
               '2010-01-02 08:03:00', '2010-01-02 08:04:00', # this should be excluded
               '2010-01-02 08:05:00', '2010-01-02 08:06:00', # this should be excluded
               '2010-01-02 08:07:00', '2010-01-02 08:08:00', # this should be excluded
               '2010-01-02 08:09:00', '2010-01-02 08:10:00', # this should be excluded
               '2010-01-02 08:11:00', '2010-01-02 08:12:00', # this should be excluded
               '2010-01-02 08:13:00', '2010-01-02 08:14:00', # this should be excluded
               '2010-01-02 08:15:00', '2010-01-02 08:16:00',
               '2010-01-02 08:17:00', '2010-01-02 08:18:00',
               '2010-01-02 08:19:00', '2010-01-02 08:20:00',
               ...
               '2010-01-02 08:47:00', '2010-01-02 08:48:00',
               '2010-01-02 08:49:00', '2010-01-02 08:50:00',
               '2010-01-02 08:51:00', '2010-01-02 08:52:00',
               '2010-01-02 08:53:00', '2010-01-02 08:54:00'],
              dtype='datetime64[ns]', freq=None)

解决方法尝试

我尝试了多个布尔掩码,但以下代码将在每小时的每个 014 AND 4659 分钟之间截断:

y[(
    y.index.hour.isin(range(8,14)) & y.index.minute.isin(range(15, 46))
)]

问题

必须有更好的方法以更有效的方式执行此操作,我可能会错过(或者可能pandas 已经拥有该功能)。用DateTimeIndex 对数据进行切片的更精确/pythonic 方法是什么?例如:

y[(y.index.day("everyday") & y.index.time_between('08:15', '13:45'))]

甚至更好:

y[y.index("everyday 08:15 to 13:45")]

【问题讨论】:

    标签: python pandas slice datetimeindex


    【解决方案1】:

    是的,此功能内置于DataFrame.between_time

    y.between_time("08:15", "13:45")
    

    【讨论】:

    • 我的我的我的……我一生都在寻找答案。我以为该函数将被称为time_between,而不是between_time。非常感谢!
    【解决方案2】:

    您几乎猜到了正确的函数名称。您可以使用函数DataFrame.between_time 来实现所需的过滤。

    例子:

    y_active = y.between_time('08:15', '13:45')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-12-18
      • 2012-12-11
      • 1970-01-01
      • 1970-01-01
      • 2017-05-21
      • 2018-06-15
      • 1970-01-01
      相关资源
      最近更新 更多