【发布时间】:2022-01-04 01:44:04
【问题描述】:
我有 19 个带有日期时间索引的数据帧,我想并行迭代每个数据帧。因此,我从一个 df 开始,将其切片到给定的时间范围,并对其他 df 执行相同的操作。这样就完成了 while 循环的整个迭代。在下一次迭代中,我想创建一个新切片,从旧切片的末尾开始,直到所有数据帧的下一个最接近的时间戳。我想出了这段代码,它正在运行,但由于迭代次数很多,它非常耗时,我想知道是否有更快的方法来做到这一点。
import pandas as pd
import datetime
# creating test data frames
df1 = pd.DataFrame({'A': range(9)})
df1.index = [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:01:00'),
pd.Timestamp('20130101 09:30:00'),
pd.Timestamp('20130101 09:44:00'),
pd.Timestamp('20130101 09:50:00'),
pd.Timestamp('20130101 10:16:00'),
pd.Timestamp('20130101 10:47:00'),
pd.Timestamp('20130101 10:53:00'),
pd.Timestamp('20130101 11:22:00')]
df2 = pd.DataFrame({'B': range(9)})
df2.index = [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:01:00'),
pd.Timestamp('20130101 09:04:00'),
pd.Timestamp('20130101 09:05:00'),
pd.Timestamp('20130101 09:09:00'),
pd.Timestamp('20130101 10:10:00'),
pd.Timestamp('20130101 10:15:00'),
pd.Timestamp('20130101 10:16:00'),
pd.Timestamp('20130101 11:18:00')]
db_dict = {"a": df1, "b": df2}
time_dict_start = {}
time_dict_end = {}
complete_list = []
start_time = datetime.datetime.now()
# starting the main loop
while True:
# check if all data has been processed
if len(complete_list) == len(db_dict):
print(datetime.datetime.now() - start_time)
break
# iterate over every data frame
for name in db_dict:
# skip completed data frames
if name in complete_list:
continue
db = db_dict[name]
# first iteration
if name not in time_dict_start:
start = db.index[0]
end = start + datetime.timedelta(seconds=10)
# all other iterations
else:
start = time_dict_start[name]
# get smallest time stamp
time_list = [v for k, v in time_dict_end.items()]
time_list.sort()
end = time_list[0]
time_dict_start[name] = end + datetime.timedelta(seconds=1)
split = db.loc[start: end]
try:
# find next closest index
next_idx = db.index[np.searchsorted(db.index, end + datetime.timedelta(seconds=1))]
time_dict_end[name] = next_idx
except IndexError:
del time_dict_end[name]
complete_list.append(name)
# do something with the sliced data frame
【问题讨论】:
-
嗨,看看多处理模块和
pool函数。对于等待多个并行代码完成并将结果发送到另一部分,这可能有点棘手。
标签: python-3.x pandas dataframe loops