Pandas 按数字分组（而不是时间）答案

【问题标题】：Pandas group by number (instead of time)Pandas 按数字分组（而不是时间）
【发布时间】：2018-08-22 12:24:33
【问题描述】：

在 pd.Grouper 中我们可以按时间分组，例如使用 10s

Time      Count
10:05:03   2
10:05:04   3
10:05:05   4
10:05:11   3
10:05:12   4

将提供以下结果：

Time  Count
10:05:10  9
10:05:20  7

我正在寻找相反的方法。我可以按计数对时间进行分组吗，例如使用 5

Count Time (s)
5    (4-3)=1s
5    (11-5)=6s
5    (12-11)=1s

非常感谢！

【问题讨论】：

你能解释一下你是怎么得到count = 5和相应时间的吗？

标签： python pandas pandas-groupby

【解决方案1】：

如果我理解你的问题，你可以试试

import io
import numpy as np
import pandas as pd

df_txt = """
Time    Count
10:05:03    2
10:05:04    3
10:05:05    4
10:05:11    3
10:05:12    4"""

df = pd.read_csv(io.StringIO(df_txt), sep='\t')
df['Time'] = df.Time.apply(lambda x: pd.to_datetime(x))
df['CumCount'] = df.Count.cumsum()
df['Ind1'] = df.CumCount // 5
df['Ind2'] = df.Ind1.shift()
df['LagTime'] = df.Time.shift()
df.loc[df.Ind1 == df.Ind2, 'LagTime'] = np.nan
df['StartTime'] = df.LagTime.bfill()
out = df.groupby(['StartTime'], as_index=False).last()
out['Time (s)'] = out.Time.values - out.StartTime.values

输出：

print(out['Time (s)'])
# 0   00:00:01
# 1   00:00:06
# 2   00:00:01
# Name: Time (s), dtype: timedelta64[ns]

【讨论】：

【解决方案2】：

也许这就是你的想法。从熊猫系列开始df：

2018-03-14 06:38:46.308425+00:00     2
2018-03-14 06:38:47.308425+00:00     3
2018-03-14 06:38:48.308425+00:00     4
2018-03-14 06:38:54.308425+00:00     3
2018-03-14 06:38:55.308425+00:00     4
dtype: int64

查找累积和超过 5 倍数的索引：

df[:] = df.values.cumsum() // 5 * 5
hit5 = (df.diff() == 5).nonzero()[0]

在这种情况下，它是array([1, 3, 4])。然后遍历这些索引并取与前一个索引的差异：

for i in hit5:
    print(df.index[i] - df.index[i-1])

给予：

0 days 00:00:01
0 days 00:00:06
0 days 00:00:01

【讨论】：