【问题标题】:Vectorised average K-Nearest Neighbour distance in PythonPython中的矢量化平均K最近邻距离
【发布时间】:2014-07-10 17:40:07
【问题描述】:

这是 Rn 中点的 K-最近邻算法,它应该计算每个点到其 k-最近邻的平均距离。问题是,尽管它是矢量化的,但从我重复自己的意义上来说它是低效的。如果有人可以帮助我改进此代码,我会很高兴:

import numpy as np
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform

def nn_args_R_n_squared(points):
    """Calculate pairwise distances of points and return the matrix together with matrix of indices of the first matrix sorted"""
    dist_mat=squareform(pdist(points,'sqeuclidean'))
    return dist_mat,np.argsort(dist_mat,axis=1)
def knn_avg_dist(X,k):
    """Calculates for points in rows of X, the average distance of each, to their k-nearest      neighbours"""
    X_dist_mat,X_sorted_arg=nn_args_R_n_squared(X)
    X_matrices=(X[X_sorted_arg[:,1:k+1]]-X[...,None,...]).astype(np.float64)
    return np.mean(np.linalg.norm(X_matrices,axis=2)**2,axis=1)
X=np.random.randn(30).reshape((10,3))
print X
print knn_avg_dist(X,3)

输出:

[[-1.87979713  0.02832699  0.18654558]
 [ 0.95626677  0.4415187  -0.90220505]
 [ 0.86210012 -0.88348927  0.32462922]
 [ 0.42857316  1.66556448 -0.31829065]
 [ 0.26475478 -1.6807253  -1.37694585]
 [-0.08882175 -0.61925033 -1.77264525]
 [-0.24085553  0.64426394 -0.01973027]
 [-0.86926425  0.93439913 -0.31657442]
 [-0.30987468  0.02925649 -1.38556347]
 [-0.41801804  1.40210993 -1.04450895]]
[ 3.37983833  2.1257945   3.60884158  1.67051682  2.85013297  1.66756279
  1.2678029   1.20491026  1.54623574  1.30722388]

如您所见,我计算了两次距离,但我无法想出从X_dist_mat 读取相同信息的方法,因为我必须同时从每一行读取多个元素。

【问题讨论】:

  • 如果您在代码中添加了imports 和虚拟数据的生成,那么您可以复制并粘贴它来查看。否则你应该能够从sklearn中的现有实现中获得灵感

标签: python numpy vectorization


【解决方案1】:

使用scipy.spatial.cKDTree:

>>> data = np.random.rand(1000, 3)
>>> import scipy.spatial

>>> kdt = scipy.spatial.cKDTree(data) 
>>> k = 5 # number of nearest neighbors 
>>> dists, neighs = kdt.query(data, k+1)
>>> avg_dists = np.mean(dists[:, 1:], axis=1)

【讨论】:

  • 谢谢!你在 Python 世界里摇滚! :)
猜你喜欢
  • 2011-11-12
  • 2019-05-26
  • 2015-01-25
  • 2018-12-03
  • 2019-06-05
  • 1970-01-01
  • 2015-01-07
  • 2018-12-20
  • 2013-03-21
相关资源
最近更新 更多