Pandas：如果满足条件，则在数据框中包含新的时间戳行答案

【问题标题】：Pandas: include new timestamp rows in a dataframe if condition is metPandas：如果满足条件，则在数据框中包含新的时间戳行
【发布时间】：2019-04-02 06:16:56
【问题描述】：

我有一个如下所示的数据框：

    value                 timestamp
18.832939   2019-03-04 12:37:26 UTC
18.832939   2019-03-04 12:38:26 UTC
18.832939   2019-03-04 12:39:27 UTC
18.955200   2019-03-04 12:40:28 UTC
18.784912   2019-03-04 12:44:32 UTC
18.784912   2019-03-04 12:45:33 UTC
20.713936   2019-03-04 17:59:36 UTC
20.871742   2019-03-04 18:08:31 UTC
20.871742   2019-03-04 18:09:32 UTC
20.873871   2019-03-04 18:10:32 UTC

我想要以下结果，其中我确定所有大于 2 分钟但小于 15 分钟 (2

    value                 timestamp
18.832939   2019-03-04 12:37:26 UTC
18.832939   2019-03-04 12:38:26 UTC
18.832939   2019-03-04 12:39:27 UTC
18.955200   2019-03-04 12:40:28 UTC
      NaN   2019-03-04 12:41:28 UTC
      NaN   2019-03-04 12:42:28 UTC
      NaN   2019-03-04 12:43:28 UTC
18.784912   2019-03-04 12:44:32 UTC
18.784912   2019-03-04 12:45:33 UTC
20.713936   2019-03-04 17:59:36 UTC
      NaN   2019-03-04 18:00:36 UTC
      NaN   2019-03-04 18:01:36 UTC
      NaN   2019-03-04 18:02:36 UTC
      NaN   2019-03-04 18:03:36 UTC
      NaN   2019-03-04 18:04:36 UTC
      NaN   2019-03-04 18:05:36 UTC
      NaN   2019-03-04 18:06:36 UTC
      NaN   2019-03-04 18:07:36 UTC
20.871742   2019-03-04 18:08:31 UTC
20.871742   2019-03-04 18:09:32 UTC
20.873871   2019-03-04 18:10:32 UTC

也就是说，为了实现这个目标，我必须做两件事：

确定差距满足我想要的条件。因为我们可能会有超过 15 分钟的间隔，而我对此不感兴趣。
一旦确定，创建新行，增加 1 分钟，或者使用时间戳创建均匀间隔的值。

我可以用这个做第一个：

df['aux_1'] = ((df['timestamp'].diff() > '0 days 00:02:00') & (df['timestamp'].diff() < '0 days 00:15:00')).astype(int) #get ending of the gap.
df['aux_2'] = df['aux_1'].shift(-1) #beginning of the gap.
df['intervals'] = df['aux_1'] + df['aux_2'] #both beginning and end with numeric consecutive flags contained in a single column.

但是，我不确定如何做第二部分，至少不是“熊猫样”。最好以某种方式识别我打算填充的时间戳间隔的开始结束，然后应用 asfreq('1m')，并使用该向量来填充我想要的空白。只是不知道如何正确地做到这一点。

有人可以帮助我吗？提前致谢。

【问题讨论】：

我的建议是 1) 生成一个数据帧，其时间戳列的间隔为 1 分钟。 2）将数据重新加入到新创建的数据帧中的键上：时间戳精确到分钟级别。

标签： python pandas dataframe timestamp

【解决方案1】：

不是很喜欢熊猫，但我会做以下事情。

new_timestamp = []
for i, row in df.iterrows():
    if row['aux_2']==0:
        new_timestamp.append(row['timestamp'])
    elif row['aux_2']==1:
        new_timestamp += pd.date_range(row['timestamp'], df.iloc[i+1]['timestamp'], freq='min').to_list()

new_df = df.set_index('timestamp')
new_df = new_df.loc[new_timestamp]

这会导致

print(new_df)
timestamp                   value       aux_1   aux_2   intervals
2019-03-04 12:37:26+00:00   18.832939   0.0     0.0     0.0
2019-03-04 12:38:26+00:00   18.832939   0.0     0.0     0.0
2019-03-04 12:39:27+00:00   18.832939   0.0     0.0     0.0
2019-03-04 12:40:28+00:00   18.955200   0.0     1.0     1.0
2019-03-04 12:41:28+00:00   NaN     NaN     NaN     NaN
2019-03-04 12:42:28+00:00   NaN     NaN     NaN     NaN
2019-03-04 12:43:28+00:00   NaN     NaN     NaN     NaN
2019-03-04 12:44:28+00:00   NaN     NaN     NaN     NaN
2019-03-04 12:44:32+00:00   18.784912   1.0     0.0     1.0
2019-03-04 12:45:33+00:00   18.784912   0.0     0.0     0.0
2019-03-04 17:59:36+00:00   20.713936   0.0     1.0     1.0
2019-03-04 18:00:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:01:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:02:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:03:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:04:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:05:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:06:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:07:36+00:00   NaN     NaN     NaN     NaN
2019-03-04 18:08:31+00:00   20.871742   1.0     0.0     1.0
2019-03-04 18:09:32+00:00   20.871742   0.0     0.0     0.0

【讨论】：

这很好，我的意思是，我希望有一种方法可以做到这一点，而不必将时间戳设置为索引。但如果我愿意，我可以移动列位置并设置一个新的数字升序索引。谢谢你的这个！