我们在每次迭代时检查pts 中的那些浮动 pt 数字是否在每个整数 bin 中。因此,我们可以使用的技巧是将这些浮动 pt 数字转换为它们的下限数字。此外,我们需要屏蔽满足range(M) 和range(N) 的有效值。就是这样!
这是实现 -
def binpts(pts, M, K):
N = len(pts)
in_bin_out = np.zeros((M, K, N), dtype=bool)
mask = (pts[:,0]<M) & (pts[:,1]<K)
pts_f = pts[mask]
r,c = pts_f.astype(int).T
in_bin_out[r, c, mask] = 1
return in_bin_out
基准测试
浮动 pts 数组中的范围与给定示例中提供的大小成正比的大型数组的计时 -
案例#1:
In [2]: M = 100
...: K = 101
...: N = 10000
...: np.random.seed(0)
...: pts = 2000 * np.random.rand(N, 2)
# @hpaulj's soln
In [3]: %%timeit
...: x0=(pts[:,0]>=np.arange(M)[:,None]) & (pts[:,0]<np.arange(1,M+1)[:,None])
...: x1=(pts[:,1]>=np.arange(K)[:,None]) & (pts[:,1]<np.arange(1,K+1)[:,None])
...: xx = x0[:,None,:] & x1[None,:,:]
10 loops, best of 3: 47.5 ms per loop
# @user545424's soln
In [6]: %timeit bin_points(pts,M,K)
1000 loops, best of 3: 331 µs per loop
In [7]: %timeit binpts(pts,M,K)
10000 loops, best of 3: 125 µs per loop
注意:
@hpaulj 的解决方案是内存密集型的,我在较大的解决方案上使用它时内存不足。
案例#2:
In [8]: M = 100
...: K = 101
...: N = 100000
...: np.random.seed(0)
...: pts = 20000 * np.random.rand(N, 2)
In [9]: %timeit bin_points(pts,M,K)
...: %timeit binpts(pts,M,K)
100 loops, best of 3: 2.31 ms per loop
1000 loops, best of 3: 585 µs per loop
案例#3:
In [10]: M = 100
...: K = 101
...: N = 1000000
...: np.random.seed(0)
...: pts = 200000 * np.random.rand(N, 2)
In [11]: %timeit bin_points(pts,M,K)
...: %timeit binpts(pts,M,K)
10 loops, best of 3: 34.6 ms per loop
100 loops, best of 3: 2.78 ms per loop