如何通过优雅的编码更快地将一些逻辑应用于数据框列答案

【问题标题】：How to apply some logic to dataframe column faster and with elegant coding如何通过优雅的编码更快地将一些逻辑应用于数据框列
【发布时间】：2019-12-19 10:14:31
【问题描述】：

我正在读取一个包含许多列的 csv 文件，其中之一是 TOD（一天中的时间）。有些事件会在午夜过后，而不是回滚到 00:00，时间只会在 24:00 之后不断增加。例如 23:59:50、24:00:01、24:00:10,...) EntryTOD 被解析为字符串。

我想应用一个简单的逻辑，即时间大于 24，只需减去 24 小时。这是我的代码：

for row in f2.itertuples():
    # Fix times > 24h
    if int(row.EntryTOD[0:2]) >= 24:
        actualTime =  int(row.EntryTOD[0:2]) - 24
        f2.EntryTOD[row.Index-1] = str(actualTime) + row.EntryTOD[2:]

此代码有效，但对于 80k+ 行来说有点慢。运行大约需要 30-40 秒。

我的问题是：

1) 有更快的方法吗？

2) 另外，由于我不擅长 Python，有没有更优雅的方法？它可能仍然涉及遍历整个列，但我觉得这可以在 1 行代码中完成

提前谢谢你，

圭多

解决方案：感谢雷内：

f2.EntryTOD = f2.EntryTOD.apply(lambda x: str(int(x.split(':')[0])-24)+x[2:] if int(x.split(':')[0]) > 23 else x)

这是非常快的单班轮！

【问题讨论】：

标签： python pandas performance dataframe iterator

【解决方案1】：

我想这就是你要找的东西：

# Sample df
data = [
    ['25:22:22', 1, 5],
    ['01:01:01', 36, 2]
]
cols = ['EntryTOD', 'two', 'three']

df = pd.DataFrame(data, columns = cols)

df

    EntryTOD    two three
0   25:22:22    1   5
1   01:01:01    36  2

解决方案：

df['hour'] = (df['EntryTOD'].str[0:2]).astype(int)

df.loc[
    df.hour >= 24, 'hour'
] = df.loc[df.hour >= 24, 'hour'] - 24

# Edit EntryTOD variable
for i in range(df.shape[0]):
    df.EntryTOD.iloc[i] = df.EntryTOD.iloc[i].replace(
        df['EntryTOD'].str[0:2].iloc[i], '0'+df['hour'].astype(str).iloc[i]
    )

输出：

    EntryTOD    two three   hour
0   01:22:22    1    5      1
1   01:01:01    36   2      1

【讨论】：

【解决方案2】：

你可以试试：

f2 = pd.DataFrame(['23:59', '23:59:59', '24:00', '24:01', '25:25:25'], columns=['TOD'])
f2.TOD.apply(lambda x: f"{int(x.split(':')[0])-24}:{x.split(':')[1]}" if int(x.split(':')[0]) > 23 else x)

结果：

0       23:59
1    23:59:59
2        0:00
3        0:01
4        1:25
Name: TOD, dtype: object

【讨论】：

太棒了，我想我们已经很接近了。由于某些原因，它不起作用。这也不会更紧凑： f2.EntryTOD.apply(lambda x: str(int(x.split(':')[0])-24)+x[2:] if int(x.split( ':')[0]) > 23 否则 x)
我用一些打印测试了逻辑并且它有效。 f2.EntryTOD 只是没有更新......
这行得通： f2.EntryTOD = f2.EntryTOD.apply(lambda x: str(int(x.split(':')[0])-24)+x[2:] if int(x.split(':')[0]) > 23 else x)