【问题标题】:pandas find median after group by大熊猫在分组后找到中位数
【发布时间】:2023-03-14 21:01:01
【问题描述】:

df.head(10).to_clipboard(sep=';', index=True)

我有一个如上所述的数据框,并且我有以下列描述

•   Id - the uuid of this delivery
•   PlanId - the uuid of the plan (the plan for deliveries of a given day)
 •  PlanDate - the date of delivery

•   MinTime - the minimal time (seconds from midnight) for delivering this delivery
•   MaxTime - the maximal time (seconds from midnight) for delivering this delivery
•   RouteId - the uuid of the route this delivery belongs to
•   ETA - the estimated time for arrival of this delivery on this date (from the eta you can of course order the deliveries in a route)
•   TTN - the time to next delivery in the route, i.e., at index 3 that would be the time distance between delivery index 3 and delivery index 4
•   DTN - the distance to next delivery in the route.

我需要找到给定计划中每条路线的交货中位数。

给定计划中每条路线行驶的中位距离。

给定计划中每条路线的平均行驶时间。

我该怎么做?

我想知道这是否只是简单地计算中位数,您只需按分组和聚合 我试过这样的方法来找到中间距离

Tx = df.groupby(by=['plan_id','route_id'], as_index=False)['dtn'].sum()


 Tx.groupby(['plan_id','route_id'])['dtn'].median()

但是我可能不确定这是否正确。

【问题讨论】:

  • 请以文本格式提供示例数据。无法通过图像数据重现示例。

标签: python pandas pandas-groupby


【解决方案1】:

这是显示所需数字的方法:

# Subset dataframe to only have the desired plan_id
sub_Tx = Tx[Tx['plan_id'] == '869BB6FB-.....']

# median of deliveries per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id']]
sub_df['count_deliveries'] = 1
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

# median distance travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'dtn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

# median time travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'ttn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).median()

祝你好运

更新:

因此,您可以按以下方式计算每个 plan_id 的路线数字的中位数(nb 个交付、距离和时间):

# median of deliveries per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id']]
sub_df['count_deliveries'] = 1
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'count_deliveries']].rename(columns={'count_deliveries': 'median_deliveries'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

# median distance travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'dtn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'dtn']].rename(columns={'dtn': 'median_dtn'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

# median time travelled per route in the given plan
sub_df = sub_Tx[['plan_id', 'route_id', 'ttn']]
sub_df = sub_df.groupby(by=['plan_id', 'route_id'], axis=0, as_index=False).sum()
sub_df = sub_df[['plan_id', 'ttn']].rename(columns={'ttn': 'median_ttn'})
sub_df.groupby(by=['plan_id'], axis=0, as_index=False).median()

【讨论】:

  • 您好,感谢您的回复,如果您注意到中位数的结果与求和的结果相同。对吗?
  • 你好,你每个plan_id有不同的route_id吗?
  • 我刚刚编辑了我的帖子,计算每个 plan_id 的中位数
  • sub_df = sub_df.groupby(by=['plan_id', 'route_id','dtn'], axis=0, as_index=False).sum() sub_df.groupby(by=[' plan_id'], axis=0, as_index=False).median() 是我对距离和时间的处理方式吗?
  • 是的,这似乎是怎么做的,你得到预期的结果了吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2014-04-15
  • 2022-01-16
  • 1970-01-01
  • 2020-03-24
  • 2018-10-02
  • 2020-10-12
  • 1970-01-01
相关资源
最近更新 更多