【发布时间】:2020-05-13 18:32:22
【问题描述】:
我有两个 DataFrame。两者都有 X 和 Y 坐标。但 DF1 比 DF2 密集得多。我想根据 DF2 中的 X Y 坐标对 DF1 进行下采样。具体来说,对于 DF2 中的每个 X/Y 对,我选择 X +/-delta 和 Y +/-delta 之间的 DF1 数据,并计算 Z 的平均值。New_DF1 将具有与 DF2 相同的 X Y 坐标,但具有平均值通过下采样得到 Z 值。
以下是我为此目的而制作的一些示例和函数。我的问题是对于大型数据集来说太慢了。如果有人对矢量化操作而不是粗略的循环有更好的想法,我们将不胜感激。
创建数据示例:
DF1 = pd.DataFrame({'X':[0.6,0.7,0.9,1.1,1.3,1.8,2.1,2.8,2.9,3.0,3.3,3.5],"Y":[0.6,0.7,0.9,1.1,1.3,1.8,2.1,2.8,2.9,3.0,3.3,3.5],'Z':[1,2,3,4,5,6,7,8,9,10,11,12]})
DF2 = pd.DataFrame({'X':[1,2,3],'Y':[1,2,3],'Z':[10,20,30]})
功能:
def DF1_match_DF2_target(half_range, DF2, DF1):
### half_range, scalar, define the area of dbf target
### dbf data
### raw pwg pixel map
DF2_X =DF2.loc[:,["X"]]
DF2_Y =DF2.loc[:,['Y']]
results = list()
for i in DF2.index:
#Select target XY from DF2
x= DF2_X.at[i,'X']
y= DF2_Y.at[i,'Y']
#Select X,Y range for DF1
upper_lmt_X = x+half_range
lower_lmt_X = x-half_range
upper_lmt_Y = y+half_range
lower_lmt_Y = y-half_range
#Select data from DF1 according to X,Y range, calculate average Z
subset_X = DF1.loc[(DF1['X']>lower_lmt_X) & (DF1['X']<upper_lmt_X)]
subset_XY = subset_X.loc[(subset_X['Y']>lower_lmt_Y) & (subset_X['Y']<upper_lmt_Y)]
result = subset_XY.mean(axis=0,skipna=True)
result[0] = x #set X,Y in new_DF1 the same as the X,Y in DF2
result[1] = y #set X,Y in new_DF1 the same as the X,Y in DF2
results.append(result)
results = pd.DataFrame(results)
return results
测试和结果:
new_DF1 = DF1_match_DF2_target(0.5,DF2,DF1)
new_DF1
【问题讨论】: