【问题标题】:put numpy array items into "bins" [duplicate]将numpy数组项放入“bins”[重复]
【发布时间】:2017-11-15 14:45:56
【问题描述】:

我有一个带有一些整数的 numpy 数组,例如,

a = numpy.array([1, 6, 6, 4, 1, 1, 4])

我现在想将所有项目放入具有相同值的“箱”中,这样​​标签为1 的箱包含a 的所有索引值1。对于上面的例子:

bins = {
    1: [0, 4, 5],
    6: [1, 2],
    4: [3, 6],
    }

uniquewheres 的组合可以解决问题,

uniques = numpy.unique(a)
bins = {u: numpy.where(a == u)[0] for u in uniques}

但这似乎并不理想,因为唯一条目的数量可能很大。

【问题讨论】:

    标签: python arrays numpy


    【解决方案1】:

    带有 append 的 Defaultdict 可以解决问题:

    from collections import defaultdict
    
    d = defaultdict(list)
    
    for ix, val in enumerate(a):
      d[val].append(ix)
    

    【讨论】:

      【解决方案2】:

      这是利用广播np.where()np.split()的一种方式:

      In [66]: unique = np.unique(a)
      
      In [67]: rows, cols = np.where(unique[:, None] == a)
      
      In [68]: indices = np.split(cols, np.where(np.diff(rows) != 0)[0] + 1)
      
      In [69]: dict(zip(unique, indices))
      Out[69]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
      

      【讨论】:

        【解决方案3】:

        这是一种方法 -

        def groupby_uniqueness_dict(a):
            sidx = a.argsort()
            b = a[sidx]
            cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
            parts = np.split(sidx, cut_idx)
            out = dict(zip(b[np.r_[0,cut_idx]], parts))
            return out
        

        通过避免使用np.split 来提高效率 -

        def groupby_uniqueness_dict_v2(a):
            sidx = a.argsort()  # use .tolist() for output dict values as lists
            b = a[sidx]
            cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
            idxs = np.r_[0,cut_idx, len(b)+1]
            out = {b[i]:sidx[i:j] for i,j in zip(idxs[:-1], idxs[1:])}
            return out
        

        示例运行 -

        In [161]: a
        Out[161]: array([1, 6, 6, 4, 1, 1, 4])
        
        In [162]: groupby_uniqueness_dict(a)
        Out[162]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
        

        运行时测试

        其他方法-

        from collections import defaultdict
        
        def defaultdict_app(a): # @Grisha's soln
            d = defaultdict(list)
            for ix, val in enumerate(a):
                d[val].append(ix)
            return d
        

        时间安排 -

        案例#1:字典值作为数组

        In [226]: a = np.random.randint(0,1000, 10000)
        
        In [227]: %timeit defaultdict_app(a)
             ...: %timeit groupby_uniqueness_dict(a)
             ...: %timeit groupby_uniqueness_dict_v2(a)
        100 loops, best of 3: 4.06 ms per loop
        100 loops, best of 3: 3.06 ms per loop
        100 loops, best of 3: 2.02 ms per loop
        
        In [228]: a = np.random.randint(0,10000, 100000)
        
        In [229]: %timeit defaultdict_app(a)
             ...: %timeit groupby_uniqueness_dict(a)
             ...: %timeit groupby_uniqueness_dict_v2(a)
        10 loops, best of 3: 43.5 ms per loop
        10 loops, best of 3: 29.1 ms per loop
        100 loops, best of 3: 19.9 ms per loop
        

        案例#2:作为列表的字典值

        In [238]: a = np.random.randint(0,1000, 10000)
        
        In [239]: %timeit defaultdict_app(a)
             ...: %timeit groupby_uniqueness_dict(a)
             ...: %timeit groupby_uniqueness_dict_v2(a)
        100 loops, best of 3: 4.15 ms per loop
        100 loops, best of 3: 4.5 ms per loop
        100 loops, best of 3: 2.44 ms per loop
        
        In [240]: a = np.random.randint(0,10000, 100000)
        
        In [241]: %timeit defaultdict_app(a)
             ...: %timeit groupby_uniqueness_dict(a)
             ...: %timeit groupby_uniqueness_dict_v2(a)
        10 loops, best of 3: 57.5 ms per loop
        10 loops, best of 3: 54.6 ms per loop
        10 loops, best of 3: 34 ms per loop
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-12-30
          • 2022-12-21
          • 2012-06-07
          • 2017-08-29
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多