Numpy 数据类型导致数组重复答案

【问题标题】：Numpy datatype causes repetition of the arrayNumpy 数据类型导致数组重复
【发布时间】：2020-06-23 16:15:03
【问题描述】：

我想使用用户定义的 numba 数据结构创建一个 numpy 数组。当我在数据结构中包含一个虚拟变量时，一切正常，但是当我删除它时，生成的矩阵是我想要的数据的重复。但我不知道为什么 numpy 会重复我的数据以及如何避免它。

import numpy as np
from numba.types import float64, Record, NestedArray

poly = np.random.rand (3,2)
args_dtype = Record.make_c_struct([
            ('dummy', float64),
            ('poly', NestedArray(dtype=float64, shape=poly.shape)),])

args = np.array((0,poly), dtype=args_dtype)
print(args)
print('-------------------------')
args_dtype = Record.make_c_struct([
            ('poly', NestedArray(dtype=float64, shape=poly.shape)),])

args = np.array(poly, dtype=args_dtype)
print(args)

输出：

(0., [[0.72543644, 0.77155485], [0.08560247, 0.11165251], [0.48421994, 0.15144579]])
-------------------------
[[([[0.72543644, 0.72543644], [0.72543644, 0.72543644], [0.72543644, 0.72543644]],)
  ([[0.77155485, 0.77155485], [0.77155485, 0.77155485], [0.77155485, 0.77155485]],)]
 [([[0.08560247, 0.08560247], [0.08560247, 0.08560247], [0.08560247, 0.08560247]],)
  ([[0.11165251, 0.11165251], [0.11165251, 0.11165251], [0.11165251, 0.11165251]],)]
 [([[0.48421994, 0.48421994], [0.48421994, 0.48421994], [0.48421994, 0.48421994]],)
  ([[0.15144579, 0.15144579], [0.15144579, 0.15144579], [0.15144579, 0.15144579]],)]]

编辑：为两个阶段打印 dtype：

{'names':['dummy','poly'], 'formats':['<f8',('<f8', (3, 2))], 'offsets':[0,8], 'itemsize':56, 'aligned':True}
-------------------------
{'names':['poly'], 'formats':[('<f8', (3, 2))], 'offsets':[0], 'itemsize':48, 'aligned':True}

【问题讨论】：

打印两个阶段的dtype。我不知道numba 在做什么，但知道在尝试与结构化数组进行转换时会出现这样的重复。 numpy.lib.recfunctions 有一对函数可以正确处理这个问题。
@hpaulj dtype 已添加到问题中。

标签： python numpy types casting numba

【解决方案1】：

In [4]: poly = np.random.rand (3,2)                                                            
In [5]: dt1 = np.dtype({'names':['dummy','poly'], 'formats':['<f8',('<f8', (3, 2))], 'offsets':
   ...: [0,8], 'itemsize':56, 'aligned':True})                                                                                        
In [6]: dt2 = np.dtype({'names':['poly'], 'formats':[('<f8', (3, 2))], 'offsets':[0], 'itemsize
   ...: ':48, 'aligned':True})                                                                 
In [7]: dt1                                                                                    
Out[7]: dtype([('dummy', '<f8'), ('poly', '<f8', (3, 2))], align=True)

制作第一个数组：

In [8]: np.array((0,poly), dtype=dt1)                                                          
Out[8]: 
array((0., [[0.06466034, 0.43310972], [0.58102027, 0.53106307], [0.23957058, 0.26556208]]),
      dtype={'names':['dummy','poly'], 'formats':['<f8',('<f8', (3, 2))], 'offsets':[0,8], 'itemsize':56, 'aligned':True})

第二个 dtype 有 1 个字段；即便如此，我们仍然需要以元组或元组列表的形式提供数据：

In [9]: dt2                                                                                    
Out[9]: dtype([('poly', '<f8', (3, 2))], align=True)
In [10]: np.array((poly,), dt2)                                                                
Out[10]: 
array(([[0.06466034, 0.43310972], [0.58102027, 0.53106307], [0.23957058, 0.26556208]],),
      dtype={'names':['poly'], 'formats':[('<f8', (3, 2))], 'offsets':[0], 'itemsize':48, 'aligned':True})

【讨论】：