强制 numpy 创建对象数组答案

【问题标题】：Force numpy to create array of objects强制 numpy 创建对象数组
【发布时间】：2020-04-20 16:54:00
【问题描述】：

我有一个数组：

x = np.array([[1, 2, 3], [4, 5, 6]])

我想创建另一个由 shape=(1, 1) 和 dtype=np.object 组成的数组，其唯一元素是 x。

我试过这段代码：

a = np.array([[x]], dtype=np.object)

但它会产生一个形状为(1, 1, 2, 3)的数组。

当然可以：

a = np.zeros(shape=(1, 1), dtype=np.object)
a[0, 0] = x

但我希望该解决方案能够轻松扩展到更大的 a 形状，例如：

[[x, x], [x, x]]

无需对所有索引运行for 循环。

有什么建议可以实现吗？

UPD1

数组可能不同，如：

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [0, 1, 2]])
u = np.array([[3, 4, 5], [6, 7, 8]])
v = np.array([[9, 0, 1], [2, 3, 4]])
[[x, y], [u, v]]

它们也可能具有不同的形状，但在这种情况下，一个简单的 np.array([[x, y], [u, v]]) 构造函数可以正常工作

UPD2

我真的想要一个适用于任意 x, y, u, v 形状的解决方案，不一定都是一样的。

【问题讨论】：

标签： python arrays numpy

【解决方案1】：

a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)

【讨论】：

感谢这个。抱歉，为了简洁起见，我使用了 same-x 数组示例，但实际上它们可能不同：[[x, y], [u, v]]。对我来说最初的问题是结果取决于所有输入数组是否具有相同的形状。
这个 fill 在所有 4 个插槽中都放置了指向 x 的相同指针。它有[mutable_object]*4复制列表的危险。

【解决方案2】：

这是一个非常通用的方法：它适用于嵌套列表、数组列表列表——无论这些数组的形状是不同还是相等。当数据聚集在一个数组中时，它也可以工作，这实际上是最棘手的情况。（到目前为止发布的其他方法在这种情况下不起作用。）

让我们从困难的情况开始，一个大数组：

# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>> 
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> 
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
        array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]]),
        array([[20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])],
       [array([[30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]]),
        array([[40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]]),
        array([[50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59]])]], dtype=object)

使用 OP 的示例，它是数组列表的列表：

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> 
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 0, 1],
       [2, 3, 4]])]], dtype=object)

还有一个不同形状子数组的例子（注意v.T）：

>>> data = [[x, y], [u, v.T]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 2],
       [0, 3],
       [1, 4]])]], dtype=object)

【讨论】：

感谢您的回答，但对我来说非常重要的是，该解决方案适用于任意 x, y, u, v 形状，不一定完全相同。很抱歉没有在 OP 中明确说明。
我写了一个替代方案，使用ndindex 代替。我认为这更容易理解。但真正重要的是一个是否比另一个更普遍。
另一个对象数组案例：stackoverflow.com/a/49226113/901925，由于用户想要一个二维元组数组这一事实而变得复杂。我们的方法产生一个数组数组（因为它们首先将嵌套列表变成一个数组）。

【解决方案3】：

自己找到了解决方案：

a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]

【讨论】：

【解决方案4】：

@PaulPanzer 对np.frompyfunc 的使用很聪明，但是reshaping 和__getitem__ 的使用让人难以理解：

将函数创建与应用程序分开可能会有所帮助：

func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)

这突出了ish 维度和osh 维度之间的区别。

我还怀疑lambda 函数可以替代__getitem__。

这是因为 frompyfunc 返回一个对象 dtype 数组。 np.vectorize 也使用frompyfunc，但让我们指定不同的otype。但两者都将标量传递给函数，这就是 Paul 的方法使用扁平化的 range 和 getitem 的原因。 np.vectorize 和 signature 让我们将数组传递给函数，但它使用 ndindex 迭代而不是 frompyfunc。

受此启发，这里有一个 np.empty 加填充方法 - 但使用 ndindex 作为迭代器：

In [385]: >>> osh, ish = (2, 3), (2, 5)
     ...: >>> tsh = (*osh, *ish)
     ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
     ...: >>> ish = np.shape(data)[len(osh):]
     ...: 
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
     ...:     res[idx] = data[idx]
     ...:     
In [391]: res
Out[391]: 
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
       ....
       [55, 56, 57, 58, 59]])]], dtype=object)

第二个例子：

In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
     ...:     res[idx] = arr[idx]

在第三种情况下，np.array(data) 已经创建了所需的 (2,2) 对象 dtype 数组。这个 res 创建和填充仍然有效，即使它产生相同的东西。

速度差别不大（虽然这个例子很小）

In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
     ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
     ...: arr = np.array(data)
     ...: res = np.empty(osh, object)
     ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
     ...: 
54.7 µs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意，当data 是（嵌套）列表时，np.reshape(data, (-1, *ish) 实际上是np.array(data).reshape(-1 *ish)。该列表必须先转换为数组。

除了速度之外，看看一种方法是否比另一种更通用会很有趣。有没有一个可以处理，另一个不能处理的情况？

【讨论】：

性能方面，旧的 stick-a-None-in-the-first-cell 方法看起来相当不错 tmp = list(np.reshape(data, (-1, *ish))); swap = tmp[0]; tmp[0] = None; result = np.array(tmp); result[0] = swap; result = result.reshape(osh) 在第一个示例中比 frompyfunc 快两倍以上。
Here 可能适用于您的，但不适用于我的。（它原则上有效，但不适用于我为使其通用而所做的工作。）
@PaulPanzer，我的 ((10,3),(10,8)) 案例失败了，因为它无法生成 ndarray。但是对于一个简单的列表，我们不需要使用ndindex 来迭代。 enumerate 就足够了。