【发布时间】:2020-12-28 16:37:59
【问题描述】:
TL;DR:我想右对齐这个 df,覆盖 NaN/将它们向左移动:
In [6]: series.str.split(':', expand=True)
Out[6]:
0 1 2
0 1 25.842 <NA>
1 <NA> <NA> <NA>
2 0 15.413 <NA>
3 54.154 <NA> <NA>
4 3 2 06.284
将其作为填充最右侧列的连续数据:
0 1 2
0 0 1 25.842 # 0 or NA
1 <NA> <NA> <NA> # this NA should remain
2 0 0 15.413
3 0 0 54.154
4 3 2 06.284
我真正想做的事:
我有一个 Pandas 系列的持续时间/时间增量,大致采用 H:M:S 格式 - 但有时“H”或“H:M”部分可能会丢失- 所以我不能把它传给Timedelta 或datetime。我想要做的是将它们转换为秒,我已经完成了,但这似乎有点令人费解:
In [1]: import pandas as pd
...:
...: series = pd.Series(['1:25.842', pd.NA, '0:15.413', '54.154', '3:2:06.284'], dtype='string')
...: t = series.str.split(':') # not using `expand` helps for the next step
...: t
Out[1]:
0 [1, 25.842]
1 <NA>
2 [0, 15.413]
3 [54.154]
4 [3, 2, 06.284]
dtype: object
In [2]: # reverse it so seconds are first; and NA's are just empty
...: rows = [i[::-1] if i is not pd.NA else [] for i in t]
In [3]: smh = pd.DataFrame.from_records(rows).astype('float')
...: # left-aligned is okay since it's continuous Secs->Mins->Hrs
...: smh
Out[3]:
0 1 2
0 25.842 1.0 NaN
1 NaN NaN NaN
2 15.413 0.0 NaN
3 54.154 NaN NaN
4 6.284 2.0 3.0
如果我不执行此fillna(0) 步骤,那么它稍后会为秒转换生成 NaN。
In [4]: smh.iloc[:, 1:] = smh.iloc[:, 1:].fillna(0) # NaN's in first col = NaN from data; so leave
...: # convert to seconds
...: smh.iloc[:, 0] + smh.iloc[:, 1] * 60 + smh.iloc[:, 2] * 3600
Out[4]:
0 85.842
1 NaN
2 15.413
3 54.154
4 10926.284
dtype: float64
^ 预期的最终结果。
(或者,我可以编写一个仅 Python 的小函数来拆分:,然后根据每个列表的值进行转换。)
【问题讨论】:
-
致未来的访问者:请参阅下面我的回答中的performance of the various solutions。 Accepted answer 根据完整的输入序列拆分为 2 列或 3 列更加稳健;所以这些检查需要添加到其他解决方案中。
标签: python pandas dataframe series timedelta