计算一天中每一分钟数据框中有多少行“活动”的最有效方法是什么？答案

【问题标题】：What is the most efficient way to count how many rows in a dataframe were "active" for every minute of a day?计算一天中每一分钟数据框中有多少行“活动”的最有效方法是什么？
【发布时间】：2020-02-03 06:17:49
【问题描述】：

我有一个格式如下的数据框：

object_id  start_time  end_time
123        13:23       13:28
234        13:25       13:26

我想把它转换成这样的格式：

time    number_of_objects_active
13:22                          0
13:23                          1
13:24                          1
13:25                          2
13:26                          1
13:27                          1
13:28                          1
13:29                          0

每一行都有一天中的分钟以及在该点有多少对象处于活动状态（其中活动表示时间大于或等于开始时间且小于结束时间）。

我试图想出一些方法来做一个 groupby，但失败了。一个不太好的解决方案是循环遍历一天中的每一分钟，然后将在那一分钟内处于活动状态的行数相加：

results_dictionary = {}
for minute in minutes:
    results_dictionary[minute] = df.loc[(df.start_time <= minute) & (df.end_time > minute)].shape[0]

但我怀疑有更好的 pandas/pythonic 方式来做到这一点。

【问题讨论】：

在您的原始 DataFrame 中，时间是存储为字符串还是日期时间对象，还是其他？
目前它们是日期时间对象，但未附加到该方法

标签： python pandas timestamp time-series

【解决方案1】：

如果您使用的是 pandas v0.25 或更高版本，请使用 explode：

# Convert `start_time` and `end_time` to Timestamp, if they
# are not already. This also allows you to adjust cases where
# the times cross the day boundary, e.g.: 23:00 - 02:00
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])

# Make a `time` column that holds a minutely range. We will
# later explode it into individual minutes
f = lambda row: pd.date_range(row['start_time'], row['end_time'], freq='T')
df['time'] = df.apply(f, axis=1)

# The reporting range, adjust as needed
t = pd.date_range('13:23', '13:30', freq='T')

result = df.explode('time') \
            .groupby('time').size() \
            .reindex(t).fillna(0) \
            .to_frame('active')
result.index = result.index.time

结果：

          active
13:23:00     1.0
13:24:00     1.0
13:25:00     2.0
13:26:00     2.0
13:27:00     1.0
13:28:00     1.0
13:29:00     0.0
13:30:00     0.0

【讨论】：