【发布时间】:2020-12-24 16:58:47
【问题描述】:
我有一个包含几千行的 DataFrame。 DF 保存我组织内单位的单位标识符和响应时间。 它在 DF 中结构化,列 ["Event#", "UnitID", "First UnitEnroute", "First UnitArrived", "First UnitAtHospital"]
同一个Event#有很多不同的行,最后我只希望每个Event#有一行,["First UnitEnroute", "First UnitArrived", "First UnitAtHospital]" 由其他填写具有相同 Event# 的行。
造成这种情况的原因是季度末的结算失败,我们需要知道这些不同的事件是否有这 3 次分布在不同的单元中。我不需要列出的单位,只需要从同一事件的其他行中提取的第一个非 0 值。
以下是一些示例数据:
Event# Unit First UnitEnroute First UnitArrived First UnitAtHospital
2020000394 37 ['1/1/2020', '10:45:34 PM'] ['1/1/2020', '10:48:33 PM'] ['1/1/2020', '11:45:01 PM']
2020000394 38 ['1/1/2020', '10:45:34 PM'] ['1/1/2020', '10:48:33 PM'] ['1/1/2020', '11:45:01 PM']
2020000394 36 ['1/1/2020', '10:45:34 PM'] ['1/1/2020', '10:48:33 PM'] ['1/1/2020', '11:45:01 PM']
2020000394 39 ['1/1/2020', '10:45:34 PM'] ['1/1/2020', '10:48:33 PM'] ['1/1/2020', '11:45:01 PM']
2020000617 58 ['1/2/2020', '12:06:13 PM'] ['1/2/2020', '12:07:39 PM'] ['1/2/2020', '12:43:10 PM']
2020000849 74 ['1/2/2020', '6:42:19 PM'] ['1/2/2020', '6:53:53 PM'] ['1/2/2020', '7:28:32 PM']
2020000849 75 ['0'] ['0'] ['0']
2020000927 81 ['0'] ['0'] ['0']
2020000927 80 ['0'] ['0'] ['0']
2020000997 86 ['0'] ['0'] ['0']
2020000997 87 ['0'] ['0'] ['0']
2020001218 99 ['1/3/2020', '11:50:39 AM'] ['1/3/2020', '11:52:40 AM'] ['1/3/2020', '12:29:37 PM']
2020001218 98 ['0'] ['1/3/2020', '11:52:40 AM'] ['0']
2020001255 102 ['1/3/2020', '12:44:30 PM'] ['0'] ['0']
2020001255 103 ['1/3/2020', '12:40:19 PM'] ['0'] ['0']
2020001258 98 ['1/3/2020', '12:49:00 PM'] ['1/3/2020', '12:57:22 PM'] ['1/3/2020', '1:39:03 PM']
2020001258 103 ['0'] ['0'] ['0']
2020001258 104 ['0'] ['0'] ['0']
2020001258 105 ['0'] ['0'] ['0']
这是我尝试过的:
-
蛮力遍历每一行以找到一个非 0 值,然后将该值附加到该行。
for row in DF: compare = list() for i in DF: if i[0] == row[0]: addition = list(i) compare = compare.append(addition) print("Compare: {}".format(compare)) return compare for el in row.index: whatisit = row[el] if whatisit == 0: for item in compare.index: if item[el] == 0: return else: replacement = item[el] print("Replacement: {}".format(replacement)) return replacement row[el] = replacement return DF -
使用 pandas.groupby().fillna() 对具有类似 event#s 和回填时间的行进行分组。 --我没有达到任何可以接受的结果。大多数时候,它把我的 DF 切成了一些奇怪的形状。我可能误解了如何使用它。
DF = DF.groupby("Event#")["Unit", "First UnitEnroute", "First UnitArrived", "First UnitAtHospital"].fillna(method="bfill")
感谢任何方向,如果之前已经发布过,我很抱歉,我花了很多时间寻找可能的答案。我想我还没有完全发展出我需要查看代码的直觉,看看我如何将它应用到我的项目中。我不是专业的开发人员,我更像是一个动手提重物的员工哈哈。
【问题讨论】:
标签: python pandas dataframe indexing pandas-groupby