方法#1
np.tril_indices 的一种方法-
def n_largest_indices_tril(a, n=2):
m = a.shape[0]
r,c = np.tril_indices(m,-1)
idx = a[r,c].argpartition(-n)[-n:]
return zip(r[idx], c[idx])
示例运行 -
In [39]: a
Out[39]:
array([[ 1. , 0.4 , 0.59, 0.15, 0.29],
[ 0.4 , 1. , 0.03, 0.57, 0.57],
[ 0.59, 0.03, 1. , 0.9 , 0.52],
[ 0.15, 0.57, 0.9 , 1. , 0.37],
[ 0.29, 0.57, 0.52, 0.37, 1. ]])
In [40]: n_largest_indices_tril(a, n=2)
Out[40]: [(2, 0), (3, 2)]
In [41]: n_largest_indices_tril(a, n=3)
Out[41]: [(4, 1), (2, 0), (3, 2)]
方法 #2
为了性能,我们可能希望避免生成所有下三角索引,而是使用掩码,为我们提供第二种解决问题的方法,就像这样 -
def n_largest_indices_tril_v2(a, n=2):
m = a.shape[0]
r = np.arange(m)
mask = r[:,None] > r
idx = a[mask].argpartition(-n)[-n:]
clens = np.arange(m).cumsum()
grp_start = clens[:-1]
grp_stop = clens[1:]-1
rows = np.searchsorted(grp_stop, idx)+1
cols = idx - grp_start[rows-1]
return zip(rows, cols)
运行时测试
In [143]: # Setup symmetric array
...: N = 1000
...: a = np.random.rand(N,N)*0.9
...: np.fill_diagonal(a,1)
...: m = a.shape[0]
...: r,c = np.tril_indices(m,-1)
...: a[r,c] = a[c,r]
In [144]: %timeit n_largest_indices_tril(a, n=2)
100 loops, best of 3: 12.5 ms per loop
In [145]: %timeit n_largest_indices_tril_v2(a, n=2)
100 loops, best of 3: 7.85 ms per loop
对于n 最小索引
要获得最小的n,只需使用ndarray.argpartition(n)[:n] 代替这两种方法。