从连续时间戳中提取数据集的离散时间戳值答案

【问题标题】：Extracting discrete timestamp values of a dataset from a continous timestamps从连续时间戳中提取数据集的离散时间戳值
【发布时间】：2020-09-20 16:14:39
【问题描述】：

我有一个每分钟都有数据点的数据集，假设 df_cont 的 Value 具有累积增加的数字

                Value
datetimeindex  
09:00:00         34
09:01:00         45
09:02:00         48
09:03:00         50
.                .
.                .
.                .
18:58:00         55
18:59:00         65
19:00:00         68

我有另一个只有时间值的数据集，比如说 df_time

Time_1       Time_2
09:05:00     09:15:00
10:05:00     10:25:00
11:55:00     12:15:00
17:05:00     17:15:00

现在，必须在 Time_1 和 Time_2 之间的时间差的时间戳中找出“Value”的差异。

我可以使用 DateTime 功能手动执行此操作

df_cont["2020-5-14 09:05:00":"2020-5-14 09:15:00"].Value.max() - df["2020-5-14 09:05:00":"2020-5-14 09:15:00"].Value.min()

但是，通过循环或使用 pandas 的一些其他功能，无法自动处理时间戳的所有差异。任何帮助将不胜感激。

【问题讨论】：

标签： python pandas datetime indexing timestamp

【解决方案1】：

您必须根据 time_1 和 time_2 加入，然后计算差异。在您的情况下，您可能需要重置索引。

df_cont = pd.DataFrame({'time':[1, 2, 3, 4, 5, 6], 'Value':[45, 48, 53, 55, 60, 64]})
df_time = pd.DataFrame({'time1':[1, 2, 5], 'time2':[3, 4, 6]})

df = (df_time.merge(df_cont.rename(columns={'time':'time1', 'Value':'value1'}), on='time1')
                  .merge(df_cont.rename(columns={'time':'time2', 'Value':'value2'}), on='time2'))

df['value_diff'] = df['value2'] - df['value1']

【讨论】：

【解决方案2】：

您可以使用cut 对interval index 进行分组：

df_cont = pd.DataFrame(index=pd.date_range('09:00', '19:00', freq='T'))
df_cont['Value'] = df_cont.index.hour*100 + df_cont.index.minute
df_time = pd.DataFrame({'Time_1': {0: pd.Timestamp('2020-06-02 09:05:00'), 1: pd.Timestamp('2020-06-02 10:05:00'), 2: pd.Timestamp('2020-06-02 11:55:00'), 3: pd.Timestamp('2020-06-02 17:05:00')}, 'Time_2': {0: pd.Timestamp('2020-06-02 09:15:00'), 1: pd.Timestamp('2020-06-02 10:25:00'), 2: pd.Timestamp('2020-06-02 12:15:00'), 3: pd.Timestamp('2020-06-02 17:15:00')}})

idx = pd.IntervalIndex.from_arrays(df_time.Time_1, df_time.Time_2, 'both')
groups = pd.cut(df_cont.index, idx)
df_cont.groupby(groups).Value.apply(lambda x: x.max() - x.min())

结果：

[2020-06-02 09:05:00, 2020-06-02 09:15:00]    10
[2020-06-02 10:05:00, 2020-06-02 10:25:00]    20
[2020-06-02 11:55:00, 2020-06-02 12:15:00]    60
[2020-06-02 17:05:00, 2020-06-02 17:15:00]    10

【讨论】：