【发布时间】:2021-01-30 07:55:09
【问题描述】:
假设有一个大的 Timeindex-DataFrame a,时间索引中有一些重复项。
其中一些重复的时间索引可能包含 NaN,其中第二个/第三个/...重复项确实有一个值。
如何将值“向上推”到上面的NaNs(以便它们被填充),然后删除除第一个之外的所有重复项? (这种向后填充应该只发生在相同日期时间的行之间,例如12.06.2019 00:00:05
使用pandas 或numpy 的适当/有效方法是什么?
Time A B C D
12.06.2019 00:00:00 1.1412 NaN 1.21412 1.21412
12.06.2019 00:00:01 1.1464 1.12643 1.21412 1.21412
12.06.2019 00:00:02 NaN 1.12634 NaN 1.21445
12.06.2019 00:00:02 1.1453 NaN 1.21423 NaN
12.06.2019 00:00:03 1.1536 1.12589 1.21445 2. 2452
12.06.2019 00:00:04 1.1612 1.12978 1.21445 4.12451
12.06.2019 00:00:05 1.1275 NaN NaN NaN
12.06.2019 00:00:05 NaN 1.12978 1.21445 NaN
12.06.2019 00:00:06 1.1612 1.12978 1.21445 4.12451
a = pd.DataFrame({'A':[1.1412,1.1464,np.nan,1.1453,1.1536,1.1612,1.1275,np.nan,1.1612], 'B':[np.nan, 1.12643,1.12634,np.nan,1.12589,1.12978,np.nan,1.12978,1.12978], 'C':[1.21412,1.21412,np.nan,1.21423,1.21445,1.21445,np.nan,1.21445,1.21445], 'D':[1.21412,1.21412,1.21445,np.nan,2. 2452,4.12451,np.nan, np.nan, 4.12451]}, indexpd.DatetimeIndex=["12.06.2019 00:00:00","12.06.2019 00:00:01","12.06.2019 00:00:02","12.06.2019 00:00:02","12.06.2019 00:00:03","12.06.2019 00:00:04","12.06.2019 00:00:05","12.06.2019 00:00:05","12.06.2019 00:00:06"])
预期结果:
Time A B C D
12.06.2019 00:00:00 1.1412 NaN 1.21412 1.21412
12.06.2019 00:00:01 1.1464 1.12643 1.21412 1.21412
12.06.2019 00:00:02 1.1453 1.12634 1.21423 1.21445
12.06.2019 00:00:03 1.1536 1.12589 1.21445 2. 2452
12.06.2019 00:00:04 1.1612 1.12978 1.21445 4.12451
12.06.2019 00:00:05 1.1275 1.12978 1.21445 NaN
12.06.2019 00:00:06 1.1612 1.12978 1.21445 4.12451
【问题讨论】:
-
嗨,有趣,也许过滤掉空值?
-
@IronMan 你能澄清一下过滤的方式吗?
标签: pandas numpy interpolation nan data-cleaning