大熊猫在分组后找到中位数答案

【问题标题】：pandas find median after group by大熊猫在分组后找到中位数
【发布时间】：2023-03-14 21:01:01
【问题描述】：

df.head(10).to_clipboard(sep=';', index=True)

我有一个如上所述的数据框，并且我有以下列描述

•   Id - the uuid of this delivery
•   PlanId - the uuid of the plan (the plan for deliveries of a given day)
 •  PlanDate - the date of delivery

•   MinTime - the minimal time (seconds from midnight) for delivering this delivery
•   MaxTime - the maximal time (seconds from midnight) for delivering this delivery
•   RouteId - the uuid of the route this delivery belongs to
•   ETA - the estimated time for arrival of this delivery on this date (from the eta you can of course order the deliveries in a route)
•   TTN - the time to next delivery in the route, i.e., at index 3 that would be the time distance between delivery index 3 and delivery index 4
•   DTN - the distance to next delivery in the route.

我需要找到给定计划中每条路线的交货中位数。

给定计划中每条路线行驶的中位距离。

给定计划中每条路线的平均行驶时间。

我该怎么做？

我想知道这是否只是简单地计算中位数，您只需按分组和聚合我试过这样的方法来找到中间距离

Tx = df.groupby(by=['plan_id','route_id'], as_index=False)['dtn'].sum()


 Tx.groupby(['plan_id','route_id'])['dtn'].median()

但是我可能不确定这是否正确。

【问题讨论】：

请以文本格式提供示例数据。无法通过图像数据重现示例。

标签： python pandas pandas-groupby

【解决方案1】：

这是显示所需数字的方法：

# Subset dataframe to only have the desired plan_id
sub_Tx = Tx[Tx['plan_id'] == '869BB6FB-.....']

# median of deliveries per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id']]
sub_df['count_deliveries'] = 1
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

# median distance travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'dtn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

# median time travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'ttn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

祝你好运

更新：

因此，您可以按以下方式计算每个 plan_id 的路线数字的中位数（nb 个交付、距离和时间）：

# median of deliveries per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id']]
sub_df['count_deliveries'] = 1
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'count_deliveries']].rename(columns={'count_deliveries': 'median_deliveries'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

# median distance travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'dtn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'dtn']].rename(columns={'dtn': 'median_dtn'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

# median time travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'ttn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'ttn']].rename(columns={'ttn': 'median_ttn'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

【讨论】：

您好，感谢您的回复，如果您注意到中位数的结果与求和的结果相同。对吗？
你好，你每个plan_id有不同的route_id吗？
我刚刚编辑了我的帖子，计算每个 plan_id 的中位数
sub_df = sub_df.groupby(by=['plan_id', 'route_id','dtn'], axis=0, as_index=False).sum() sub_df.groupby(by=[' plan_id'], axis=0, as_index=False).median() 是我对距离和时间的处理方式吗？
是的，这似乎是怎么做的，你得到预期的结果了吗？