在 groupby 之后找到具有相应值的 nlargest(2)答案

【问题标题】：Find nlargest(2) with corresponding value after groupby在 groupby 之后找到具有相应值的 nlargest(2)
【发布时间】：2020-09-06 16:12:49
【问题描述】：

我有一个如下的数据框：

Datetime             Volume       Price
2020-08-05 09:15:00  1033         504
2020-08-05 09:15:00  1960         516
2020-08-05 09:15:00  1724         520
2020-08-05 09:15:00  1870         540
2020-08-05 09:20:00  1024         576
2020-08-05 09:20:00  1960         548
2020-08-05 09:20:00  1426         526
2020-08-05 09:20:00  1968         518
2020-08-05 09:30:00  1458         511
2020-08-05 09:30:00  1333         534
2020-08-05 09:30:00  1322         555
2020-08-05 09:30:00  1425         567
2020-08-05 09:30:00  1245         598

我想在日期时间列的 groupby 之后找到前两个最大交易量和对应的价格。

结果数据框如下：

Datetime             Volume       Price
2020-08-05 09:15:00  1960         516
2020-08-05 09:15:00  1870         540
2020-08-05 09:20:00  1960         548
2020-08-05 09:20:00  1968         518
2020-08-05 09:30:00  1858         511
2020-08-05 09:30:00  1925         567

【问题讨论】：

标签： python python-3.x pandas dataframe group-by

【解决方案1】：

在groupby之前使用sort_values：

print (df.sort_values("Volume", ascending=False)
         .groupby("Datetime").head(2).sort_index())

               Datetime  Volume  Price
1   2020-08-05 09:15:00    1960    516
3   2020-08-05 09:15:00    1870    540
5   2020-08-05 09:20:00    1960    548
7   2020-08-05 09:20:00    1968    518
8   2020-08-05 09:30:00    1458    511
11  2020-08-05 09:30:00    1425    567

【讨论】：

【解决方案2】：

使用groupby.rank + boolean indexing:

df[df.groupby("Datetime")['Volume'].rank(ascending=False).le(2)]

              Datetime  Volume  Price
1   2020-08-05 09:15:00    1960    516
3   2020-08-05 09:15:00    1870    540
5   2020-08-05 09:20:00    1960    548
7   2020-08-05 09:20:00    1968    518
8   2020-08-05 09:30:00    1458    511
11  2020-08-05 09:30:00    1425    567

【讨论】：

【解决方案3】：

既然你提到了nlargest

out = df.groupby('Datetime',as_index=False).apply(lambda x : x.nlargest(2, columns=['Volume']))

【讨论】：