【发布时间】:2019-02-11 14:46:18
【问题描述】:
我有一些代码,如果我不进行多重分配,而不是跨多行分配,例如
快速:
onset = pitch_df.loc[idx, 'onset_time']
dur = pitch_df.loc[idx, 'duration']
慢:
onset, dur = pitch_df.loc[idx, ['onset_time', 'duration']]
这是否有明显的原因,或者是一种更“熊猫”的方式来做我正在做的事情。我想在这里分配以使我的代码更具可读性(即我不想到处写.loc[...])。
这是一个最小的工作示例(此处加速 4 倍):
import pandas as pd
import numpy as np
from timeit import timeit
df = pd.DataFrame(
{'onset_time': [0, 0, 1, 2, 3, 4],
'pitch': [61, 60, 60, 61, 60, 60],
'duration': [4, 1, 1, 0.5, 0.5, 2]}
).sort_values(['onset_time', 'pitch']).reset_index(drop=True)
def foo():
for pitch, pitch_df in df.groupby('pitch'):
for iloc in range(len(pitch_df)):
idx = pitch_df.index[iloc]
onset = pitch_df.loc[idx, 'onset_time']
dur = pitch_df.loc[idx, 'duration']
note_off = onset + dur
def bar():
for pitch, pitch_df in df.groupby('pitch'):
for iloc in range(len(pitch_df)):
idx = pitch_df.index[iloc]
onset, dur = pitch_df.loc[idx, ['onset_time', 'duration']]
note_off = onset + dur
print(f'foo time: {timeit(foo, number=100)}')
print(f'bar time: {timeit(bar, number=100)}')
下面的图片便于阅读。
【问题讨论】:
-
您也可以尝试
.at而不是.loc来访问单个单元格 - 应该更快。