【发布时间】:2020-01-23 17:14:47
【问题描述】:
我有各种商店的每周数据,格式如下:
pd.DataFrame({'Store':['S1', 'S1', 'S1', 'S2','S2','S2','S3','S3','S3'], 'Week':[1, 2, 3,1,2,3,1,2,3],
'Sales' : [20,30,40,21,31,41,22,32,42],'Cust_count' : [2,4,6,3,5,7,4,6,8]})
Store Week Sales Cust_count
0 S1 1 20 2
1 S1 2 30 4
2 S1 3 40 6
3 S2 1 21 3
4 S2 2 31 5
5 S2 3 41 7
6 S3 1 22 4
7 S3 2 32 6
8 S3 3 42 8
如您所见,数据处于商店周级别,我想计算同一周内每个商店之间的欧几里得距离,然后取计算距离的平均值。因此,例如 Store S1 和 S2 的计算如下所示:
For week 1: sqrt((20-21)^2 + (2-3)^2) = sqrt(2)
For week 2: sqrt((30-31)^2 + (4-5)^2) = sqrt(2)
For week 3: sqrt((40-41)^2 + (6-7)^2) = sqrt(2)
The final value for distance between S1 and S2 = sqrt(2) which is calculated as
average distance of the 3 weeks i.e. (3 * sqrt(2)) / 3
最后输出应该如下:
S1 S2 S3
S1 0 1.414 2.818
S2 1.414 0 some val
S3 2.818 some val 0
我对分组数据帧中的列和 scipy.spatial.distance.cdist 计算欧几里得距离有一些想法,但我无法将这些概念联系起来并提出解决方案。
【问题讨论】:
标签: python pandas dataframe scipy euclidean-distance