【问题标题】:How to speed up Numpy array filtering/selection?如何加快 Numpy 数组过滤/选择?
【发布时间】:2018-11-22 15:56:53
【问题描述】:

我有大约 40k 行,我想测试行上的各种选择组合。通过选择,我的意思是布尔掩码。口罩/过滤器的数量在250MM左右。

目前的简化代码:

np_arr = np.random.randint(1, 40000, 40000)
results = np.empty(250000000)
filters = np.random.randint(1, size=(250000000, 40000))
for i in range(250000000):
    row_selection = np_arr[filters[i].astype(np.bool_)] # Select rows based on next filter
    # Performing simple calculations such as sum, prod, count on selected rows and saving to result
    results[i] = row_selection.sum() # Save simple calculation result to results array

我尝试过 Numba 和 Multiprocessing,但由于大部分处理是在过滤器选择中而不是在计算中,这并没有多大帮助。

解决这个问题的最有效方法是什么?有什么办法可以并行化吗?据我所知,我需要遍历每个过滤器,然后单独计算总和、产品、计数等,因为我不能并行应用过滤器(即使应用过滤器后的计算非常简单)。

感谢任何关于性能改进/加速的建议。

【问题讨论】:

  • Numba 中是否提供您想要应用的所有功能,或者至少易于实现?
  • 对于所有 i,j 过滤器[i,j] ==0。使用 randint(2, ...) 代替。
  • 嗨,是的,计算在 Numba 中很容易实现,但棘手的部分是应用过滤器 250MM 次的循环。
  • 您在计算中从哪里获得过滤器数组?大小为 (250000000, 40000) 的布尔数组有 10TB,不适合 RAM。或者您想在应用过滤器的循环中创建一些随机数?

标签: python performance numpy parallel-processing multiprocessing


【解决方案1】:

要在Numba 中获得良好的性能,只需避免屏蔽,因此避免非常昂贵的数组副本。您必须自己实现过滤器,但这对您提到的过滤器应该没有任何问题。

并行化也很容易做到。

示例

import numpy as np
import numba as nb

max_num = 250000 #250000000
max_num2 = 4000#40000
np_arr = np.random.randint(1, max_num2, max_num2)
filters = np.random.randint(low=0,high=2, size=(max_num, max_num2)).astype(np.bool_)

#Implement your functions like this, avoid masking
#Sum Filter
@nb.njit(fastmath=True)
def sum_filter(filter,arr):
  sum=0.
  for i in range(filter.shape[0]):
    if filter[i]==True:
      sum+=arr[i]
  return sum

#Implement your functions like this, avoid masking
#Prod Filter
@nb.njit(fastmath=True)
def prod_filter(filter,arr):
  prod=1.
  for i in range(filter.shape[0]):
    if filter[i]==True:
      prod*=arr[i]
  return sum

@nb.njit(parallel=True)
def main_func(np_arr,filters):
  results = np.empty(filters.shape[0])
  for i in nb.prange(max_num):
    results[i]=sum_filter(filters[i],np_arr)
    #results[i]=prod_filter(filters[i],np_arr)
  return results

【讨论】:

    【解决方案2】:

    一种改进方法是将 as_type 移到循环之外。在我的测试中,它将执行时间减少了一半以上。 为了比较,请检查以下两个代码:

    import numpy as np
    import time
    
    max_num = 250000 #250000000
    max_num2 = 4000#40000
    np_arr = np.random.randint(1, max_num2, max_num2)
    results = np.empty(max_num)
    filters = np.random.randint(1, size=(max_num, max_num2))
    start = time.time()
    for i in range(max_num):
        row_selection = np_arr[filters[i].astype(np.bool_)] # Select rows based on next filter
        # Performing simple calculations such as sum, prod, count on selected rows and saving to result
        results[i] = row_selection.sum() # Save simple calculation result to results array
    
    end = time.time()
    print(end - start)
    

    接受2.12

    同时

    import numpy as np
    import time
    
    max_num = 250000 #250000000
    max_num2 = 4000#40000
    np_arr = np.random.randint(1, max_num2, max_num2)
    results = np.empty(max_num)
    filters = np.random.randint(1, size=(max_num, max_num2)).astype(np.bool_)
    start = time.time()
    for i in range(max_num):
        row_selection = np_arr[filters[i]] # Select rows based on next filter
        # Performing simple calculations such as sum, prod, count on selected rows and saving to result
        results[i] = row_selection.sum() # Save simple calculation result to results array
    
    end = time.time()
    print(end - start)
    

    接受0.940

    【讨论】:

      猜你喜欢
      • 2019-01-31
      • 1970-01-01
      • 2020-02-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-20
      • 1970-01-01
      相关资源
      最近更新 更多