计算唯一 Python 数组区域之间的距离？答案

【问题标题】：Calculating distances between unique Python array regions?计算唯一 Python 数组区域之间的距离？
【发布时间】：2018-09-12 22:17:46
【问题描述】：

我有一个带有一组唯一 ID 补丁/区域的栅格，我已将其转换为二维 Python numpy 数组。我想计算所有区域之间的成对欧几里得距离，以获得分隔每个栅格补丁最近边缘的最小距离。由于阵列最初是一个栅格，因此解决方案需要考虑跨单元格的对角线距离（我总是可以通过乘以栅格分辨率将单元格中测量的任何距离转换回米）。

我已经按照this answer to a related question 中的建议尝试了scipy.spatial.distance 中的cdist 函数，但到目前为止，我无法使用可用的文档解决我的问题。作为最终结果，理想情况下，我将拥有一个 3 x X 数组，其形式为“从 ID，到 ID，距离”，包括所有可能的区域组合之间的距离。

这是一个类似于我的输入数据的示例数据集：

import numpy as np
import matplotlib.pyplot as plt

# Sample study area array
example_array = np.array([[0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0],
                          [0, 0, 2, 0, 2, 2, 0, 6, 0, 3, 3, 3],
                          [0, 0, 0, 0, 2, 2, 0, 0, 0, 3, 3, 3],
                          [0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0],
                          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3],
                          [1, 1, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 3],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
                          [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                          [1, 0, 1, 0, 0, 0, 0, 5, 5, 0, 0, 0],
                          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]])

# Plot array
plt.imshow(example_array, cmap="spectral", interpolation='nearest')

【问题讨论】：

您能提供示例输出吗？
如果不详尽地遍历上面的数组，前几个结果的样本理想情况下应该是something like this，第一列代表“from”区域，第二列代表“to”区域，第三列“距离”列。具体结果当然会根据用于计算距离的算法而有所不同，但我所追求的是那个球场中的一些东西。

标签： python arrays numpy scipy distance

【解决方案1】：

可以使用以下代码计算图像标记区域之间的距离，

import itertools
from scipy.spatial.distance import cdist

# making sure that IDs are integer
example_array = np.asarray(example_array, dtype=np.int) 
# we assume that IDs start from 1, so we have n-1 unique IDs between 1 and n
n = example_array.max()

indexes = []
for k in range(1, n):
    tmp = np.nonzero(example_array == k)
    tmp = np.asarray(tmp).T
    indexes.append(tmp)

# calculating the distance matrix
distance_matrix = np.zeros((n-1, n-1), dtype=np.float)   
for i, j in itertools.combinations(range(n-1), 2):
    # use squared Euclidean distance (more efficient), and take the square root only of the single element we are interested in.
    d2 = cdist(indexes[i], indexes[j], metric='sqeuclidean') 
    distance_matrix[i, j] = distance_matrix[j, i] = d2.min()**0.5

# mapping the distance matrix to labeled IDs (could be improved/extended)
labels_i, labels_j = np.meshgrid( range(1, n), range(1, n))  
results = np.dstack((labels_i, labels_j, distance_matrix)).reshape((-1, 3))

print(distance_matrix)
print(results)

这假定整数 ID，如果不是这种情况，则需要扩展。比如上面的测试数据，计算出的距离矩阵是，

# From  1             2         3            4              5         # To
[[  0.           4.12310563   4.           9.05538514   5.        ]   # 1
 [  4.12310563   0.           3.16227766  10.81665383   8.24621125]   # 2
 [  4.           3.16227766   0.           4.24264069   2.        ]   # 3 
 [  9.05538514  10.81665383   4.24264069   0.           3.16227766]   # 4
 [  5.           8.24621125   2.           3.16227766   0.        ]]  # 5

而完整的输出可以在here 找到。请注意，这需要距每个像素中心的欧几里德距离。例如，区域 1 和 3 之间的距离为 2.0，而它们之间的距离为 1 个像素。

这是一种蛮力方法，我们计算不同区域像素之间的所有成对距离。这对于大多数应用程序来说应该足够了。不过，如果您需要更好的性能，请查看scipy.spatial.cKDTree，与cdist 相比，它在计算两个区域之间的最小距离方面会更有效。

【讨论】：

感谢您的出色回答。代码运行良好，除了不计算 ID=6 区域的距离（range 函数不包括最终元素；可以通过将 1 加到 n = input_array.max() 轻松修复）。我唯一的问题（可能是我提供的示例数组的错误）是在我的实际数据数组中，ID 编号可能并不总是从零开始或连续：即我可能有一组 ID 为 3、8 的区域, 22 和 450 在同一个数组中。我如何概括上述内容以解决此问题？