熊猫：在最大距离内寻找点答案

【问题标题】：Pandas: finding points within maximum distance熊猫：在最大距离内寻找点
【发布时间】：2014-11-12 01:50:34
【问题描述】：

我试图在彼此的最大距离内找到成对的 (x,y) 点。我认为最简单的做法是生成一个 DataFrame 并逐个遍历每个点，计算在给定点 (x_0, y_0) 的距离 r 内是否有坐标为 (x,y) 的点。然后，将发现的对的总数除以 2。

%pylab inline
import pandas as pd

def find_nbrs(low, high, num, max_d):
    x = random.uniform(low, high, num)
    y = random.uniform(low, high, num)
    points = pd.DataFrame({'x':x, 'y':y})

    tot_nbrs = 0

    for i in arange(len(points)):
        x_0 = points.x[i]
        y_0 = points.y[i]

        pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
        tot_nbrs += len(pt_nbrz)
        plot (pt_nbrz.x, pt_nbrz.y, 'r-')

    plot (points.x, points.y, 'b.')
    return tot_nbrs

print find_nbrs(0, 1, 50, 0.1)

首先，它并不总是找到正确的对（我看到未标记的指定距离内的点）。
如果我写plot(..., 'or')，它会突出显示所有要点。这意味着pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2] 至少返回一个 (x,y)。为什么？如果比较为False，它不应该返回一个空数组吗？
如何在 Pandas 中更优雅地完成上述所有操作？例如，无需遍历每个元素。

【问题讨论】：

如果我错了，请纠正我，但是当我认为您想要的是 O(n^2) 搜索时，您正在执行 O(n) 搜索。您基本上是在检查 x0:y0, x1:y1, x2:y2 之间的距离...当我认为您想要做的是检查 x0:y0, x0:y1, ... x1:y0, x1:y1, x1:y2 ....
但如果我对你想要的东西有误，那么这对你来说会很好stackoverflow.com/questions/1401712/…
感谢您的链接。尽管有答案，但我在弄清楚如何使用 numpy.linalg.norm 计算距离时遇到了一些麻烦。在示例中，a 和 b 应该采用什么格式？回复：O（n^2），我认为这就是我正在做的事情：即遍历每个数据框元素并找到所有其他满足比较的元素。这应该可以识别出所有的双胞胎，两次，所以为了得到这个数字，我只需将最后的计数除以 2。
其实我刚刚意识到问题2的答案：显然，Pandas比较了点与自身的距离（为零且小于max_d），这就是为什么每个点都被“标记”为一对双胞胎。那么，这是另一个问题：我如何执行不包括将点与自身进行比较的比较？

标签： python pandas distance

【解决方案1】：

您正在寻找的功能包含在scipy's spatial distance module 中。

这是一个如何使用它的示例。真正的魔力在squareform(pdist(points))。

from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt

points = np.random.uniform(-.5, .5, (1000,2))

# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))

poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min

print("There are {} other points within a distance of {} from the point "
    "({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))

There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)

出于可视化目的：

f,ax = plt.subplots(subplot_kw=
    dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
    marker='o', mfc='none', mec='r', ls='')  # draw all points within distance

t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()

【讨论】：