创建结构化数组的方法答案

【问题标题】：Methods of creating a structured array创建结构化数组的方法
【发布时间】：2015-11-20 08:25:30
【问题描述】：

我有以下信息，我可以生成所需结构的 numpy 数组。请注意，值 x 和 y 必须单独确定，因为它们的范围可能不同，所以我不能使用：

xy = np.random.random_integers(0,10,size=(N,2))

额外的 list[... 转换对于转换是必需的，以便它在 Python 3.4 中工作，这不是必需的，但在使用 Python 2.7 时无害。

以下作品：

>>> # attempts to formulate [id,(x,y)] with specified dtype 
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
    array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
           (4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
           (7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])], 
          dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

我巧妙地认为我可以通过简单地在所需的垂直结构中创建数组并将我的 dtype 应用于它来规避上述讨厌的位，希望它会起作用。堆叠数组在垂直形式下是正确的

>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0,  7,  6],
       [ 1,  7,  7],
       [ 2,  5,  9],
       [ 3,  0,  1],    
       [ 4,  6,  1],
       [ 5,  6,  6],
       [ 6,  7,  6],
       [ 7, 10,  9],
       [ 8,  3,  2],
       [ 9,  7,  8]])

我尝试了几种方法来尝试重新格式化上述数组，以便我的 dtype 可以工作，但我无法弄清楚（这包括 vstacking 和 vstack 等）。所以我的问题是......我如何使用 vstack 版本并将其转换为满足我的 dtype 要求的格式，而无需通过我所做的程序。我希望这很明显，但我被切片、堆叠和椭圆化成一个无限循环。

摘要

非常感谢 hpaulj。我根据他的建议包括了两个化身供其他人考虑。纯 numpy 解决方案要快得多，而且要干净得多。

"""
Script:  pnts_StackExch
Author:  Dan.Patterson@carleton.ca
Modified: 2015-08-24
Purpose: 
    To provide some timing options on point creation in preparation for
    point-to-point distance calculations using einsum.
Reference:
    http://stackoverflow.com/questions/32224220/
    methods-of-creating-a-structured-array
Functions:
    decorators:  profile_func, timing, arg_deco
    main:  make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps

np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)

# .... wrapper funcs .............
def delta_time(func):
    """timing decorator function"""
    import time
    @wraps(func)
    def wrapper(*args, **kwargs):
        print("\nTiming function for... {}".format(func.__name__))
        t0 = time.time()                # start time
        result = func(*args, **kwargs)  # ... run the function ...
        t1 = time.time()                # end time
        print("Results for... {}".format(func.__name__))
        print("  time taken ...{:12.9f} sec.".format(t1-t0))
        #print("\n  print results inside wrapper or use <return> ... ")
        return result                   # return the result of the function
    return wrapper

def arg_deco(func):
    """This wrapper just prints some basic function information."""
    @wraps(func)
    def wrapper(*args,**kwargs):
        print("Function... {}".format(func.__name__))
        #print("File....... {}".format(func.__code__.co_filename))
        print("  args.... {}\n  kwargs. {}".format(args,kwargs))
        #print("  docs.... {}\n".format(func.__doc__))
        return func(*args, **kwargs)
    return wrapper

# .... main funcs ................
@delta_time
@arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """Make N points based upon a random normal distribution,
       with optional min/max values for Xs and Ys
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))]) 
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(x_min,x_max,size=N) # note below
    Ys = np.random.random_integers(y_min,y_max,size=N)
    a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
    return IDs,Xs,Ys,a

@delta_time
@arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """ after hpaulj and his mods to the above and this.  See docs
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(0,10,size=N)
    Ys = np.random.random_integers(0,10,size=N)   
    c_stack = np.column_stack((IDs,Xs,Ys))
    a = np.ones(N, dtype=dt)
    a['ID'] = c_stack[:,0]
    a['Shape'] = c_stack[:,1:]
    return IDs,Xs,Ys,a

if __name__=="__main__":
    """time testing for various methods
    """
    id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
    id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)

1,000,000点的计时结果如下

Timing function for... pnts_IdShape
Function... **pnts_IdShape**
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
  time taken ... **0.680652857 sec**.

Timing function for... **alternate**
Function... alternate
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
  time taken ... **0.060056925 sec**.

【问题讨论】：

您正在使用一种方法将值放入结构化数组 - 元组列表。另一种是初始化它，然后逐个字段地填充它。有 2 个字段应该很快。当字段的长度不一致时，只有这两个选项。
@hpaulj 不确定我是否遵循，这些字段的长度是统一的......如果你的意思是行数。我只是想弄清楚如何重塑数组以便保留 id 列，但“形状”是组合 x 和 y 列的结果。我可以通过 hstack 或 zip 将 x、y 放在一起，但是我必须再次 zip 以将 id 与“形状”结合起来。显然，我不能将 dtype 应用于 3 列数组，因为它的格式不正确......我正在尝试做得到 [[id,(x,y)],...[ idn,(xn,yn)]]
在一个语句中填写id`字段。然后是另一个。

标签： arrays python-2.7 python-3.x numpy multidimensional-array

【解决方案1】：

有两种方法可以填充结构化数组 (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - 按行（或具有元组列表的行）和按字段。

要按字段执行此操作，请创建空的结构化数组，并按字段名称分配值

In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T

In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

表面上

arr = np.array(list(zip(id,np.hstack((x,y)))),dt)

看起来是一种构造元组列表的好方法，需要填充数组。但是结果重复了x 的值，而不是使用y。我得看看有什么问题。

如果 dtype 兼容，您可以查看像 a 这样的数组 - 3 个 int 列的数据缓冲区的布局方式与具有 3 个 int 字段的缓冲区相同。

a.view('i4,i4,i4')

但是您的 dtype 需要 'i4,f8,f8'，4 和 8 字节字段的混合，以及 int 和 float 的混合。 a 缓冲区必须进行转换才能实现。 view 做不到。（甚至不要问 .astype。）

更正的元组列表方法：

In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

列表推导式生成如下列表：

[(0, array([8, 8])),
 (1, array([8, 0])),
 (2, array([6, 2])),
 ....]

对于列表中的每个元组，[0] 位于 dtype 的第一个字段，[1]（一个小数组）位于第二个字段。

元组也可以用

构造

[(i,[j,k]) for i,j,k in zip(id,x,y)]

dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])

是视图兼容的 dtype（仍然是 3 个整数）

In [42]: a.view(dtype=dt1)
Out[42]: 
array([[(0, [8, 8])],
       [(1, [8, 0])],
       [(2, [6, 2])],
       [(3, [8, 8])],
       [(4, [3, 2])],
       [(5, [6, 1])],
       [(6, [5, 6])],
       [(7, [7, 7])],
       [(8, [6, 1])],
       [(9, [6, 6])]], 
      dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])

【讨论】：

非常好...我已经根据您的想法对我的工作脚本进行了一些修改，以满足您的兴趣，并且对于我的学科来说，即使是大量的点，填充数组的速度也大大加快。我已将它们包含在我原始帖子的摘要中。