合并两个具有连续行的numpy数组答案

【问题标题】：Merging two numpy arrays with sequential rows合并两个具有连续行的numpy数组
【发布时间】：2021-11-07 06:45:45
【问题描述】：

我有两个 numpy 数组，希望不使用任何 for 循环将它们与以下规则合并。

从第一个数组中取出前 n 行。
从第二个数组中添加前 m 行。
从第一个数组中添加 n 到 2n 之间的行。
从第二个数组添加 m 到 2m 之间的行。

.....

从第二个数组中添加最后 m 行。

例如，假设我有两个数组和n=2, m=3

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
 [5 5 0 5 3 0]
 [2 1 4 7 9 4]
 [8 1 1 9 2 8]
 [4 1 1 0 1 1]
 [2 9 3 5 7 9]
 [3 6 6 6 0 4]
 [4 4 7 3 7 9]
 [7 3 7 1 5 2]] 

[[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
 [19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
 [ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
 [ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]]

想要的输出是

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
[5 5 0 5 3 0]
 [2 1 4 7 9 4]
[19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
[8 1 1 9 2 8]
 [4 1 1 0 1 1]
[ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
[2 9 3 5 7 9]
 [3 6 6 6 0 4]
[ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]
[4 4 7 3 7 9]
 [7 3 7 1 5 2]

【问题讨论】：

标签： python numpy concatenation numpy-ndarray

【解决方案1】：

您可以创建一个输出数组并按索引将输入放入其中。输出总是

output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

您可以生成如下输出索引：

idx = (np.arange(0, output.shape[0] - n + 1, m + n)[:, None] + np.arange(n)).ravel()
idy = (np.arange(n, output.shape[0] - m + 1, m + n)[:, None] + np.arange(m)).ravel()

这将创建一个起始索引的列向量，并添加n 或m 步骤来标记输入所在的所有行。然后您可以直接分配输入：

output[idx, :] = x
output[idy, :] = y

【讨论】：

首先很抱歉以这种方式与您联系。我很佩服你的麻木技能。我正在努力学习它。有什么好的教程，你知道的课程吗？
@wwnde。官方文档和大量实践。学习工具的最好方法是有一个你想用它解决的特定问题。 Numpy 只是一个工具，python 也是。如果你没有目的，学习一个工具是没有意义的。
很好，我处理数据并且经常遇到多个问题，但更喜欢使用 pandas 或 pyspark 方式，也许需要考虑一些事情。谢谢大佬
@wwnde。熊猫没有错。它是建立在 numpy 之上的另一层抽象。很多事情你可以很容易地用一种来做，但不能用另一种来做。真的取决于你的需求。
有帮助，听取建议，会解决这个问题

【解决方案2】：

您可以创建一个将数组拆分为连续切片（块）的函数。然后，对两个数组进行分块并使用itertools.zip_longest 函数将它们交错。最后将输出包裹在np.vstack 中，得到新的数组。

import numpy as np
from itertool import zip_longest
from math import ceil

def chunk(arr, n):
    """Split an array `arr` into n-sized chunks along its first axis"""
    for i in range(ceil(len(arr)/n)):
        ix = slice(i * n, (i+1) * n)
        yield arr[ix]

def chunk_stack(a, b, n, m):
    """Splits the arrays `a` and `b` into `n` and `m` sized chunks. 
    Returns an array of the interleaved chunks.
    """
    chunker_a = chunk(a, n)
    chunker_b = chunk(b, m)
    arr = []
    for cha, chb in zip_longest(chunker_a, chunker_b):
        if cha is not None:
            arr.append(cha)
        if chb is not None:
            arr.append(chb)
    return np.vstack(arr)

在您的示例数组上测试它：

x = np.array(
[[5, 0, 2, 2, 6, 1],
 [4, 8, 9, 2, 7, 2],
 [5, 5, 0, 5, 3, 0],
 [2, 1, 4, 7, 9, 4],
 [8, 1, 1, 9, 2, 8],
 [4, 1, 1, 0, 1, 1],
 [2, 9, 3, 5, 7, 9],
 [3, 6, 6, 6, 0, 4],
 [4, 4, 7, 3, 7, 9],
 [7, 3, 7, 1, 5, 2]])

y = np.array(
[[3, 15, 3, 8, 12, 12],
 [19, 12, 13, 0, 19, 16],
 [11, 2, 18, 16, 9, 19],
 [19, 15, 15, 11, 13, 2],
 [19, 14, 1, 6, 13, 17],
 [19, 14, 19, 14, 13, 3],
 [0, 1, 13, 0, 19, 10],
 [19, 13, 19, 5, 16, 13],
 [12, 4, 15, 11, 12, 17],
 [4, 19, 17, 2, 11, 12],
 [9, 12, 10, 9, 15, 3],
 [13, 7, 2, 5, 13, 10]])

chunk_stack(x, y, 2, 3)
# returns:
array([[ 5,  0,  2,  2,  6,  1],
       [ 4,  8,  9,  2,  7,  2],
       [ 3, 15,  3,  8, 12, 12],
       [19, 12, 13,  0, 19, 16],
       [11,  2, 18, 16,  9, 19],
       [ 5,  5,  0,  5,  3,  0],
       [ 2,  1,  4,  7,  9,  4],
       [19, 15, 15, 11, 13,  2],
       [19, 14,  1,  6, 13, 17],
       [19, 14, 19, 14, 13,  3],
       [ 8,  1,  1,  9,  2,  8],
       [ 4,  1,  1,  0,  1,  1],
       [ 0,  1, 13,  0, 19, 10],
       [19, 13, 19,  5, 16, 13],
       [12,  4, 15, 11, 12, 17],
       [ 2,  9,  3,  5,  7,  9],
       [ 3,  6,  6,  6,  0,  4],
       [ 4, 19, 17,  2, 11, 12],
       [ 9, 12, 10,  9, 15,  3],
       [13,  7,  2,  5, 13, 10],
       [ 4,  4,  7,  3,  7,  9],
       [ 7,  3,  7,  1,  5,  2]])

【讨论】：

【解决方案3】：

我们重塑 x 和 y 将 n 和 m 组合在一起

然后我们水平堆叠，使n和m形成交替序列

然后，无论 x 和 y 是什么，我们都会追加它们

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))
n, m = 2, 3
output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

x_dim_1 = x.shape[0] // n  # 5
y_dim_1 = y.shape[0] // m  # 4

common_dim = min(x_dim_1, y_dim_1) # 4

x_1 = x[:common_dim * n].reshape(common_dim, n, -1) # (4, 2, 6)
y_1 = y[:common_dim * m].reshape(common_dim, m, -1) # (4, 3, 6)

# We stack horizontally x_1, y_1 to (4, 5, 6) then convert 4, 5 -> 4*5
# make n's and m's alternate
assign_til = common_dim * (n + m)
output[:assign_til] = np.hstack([x_1, y_1]).reshape(assign_til, x.shape[1])

# Remaining x's and y's
r_x = x[common_dim * n:]
r_y = y[common_dim * m:]

# Next entry in output will be of r_x, since alternate
# Choose n entries or whatever remaining and append those
rem = min(r_x.shape[0], n)
output[assign_til:assign_til + rem] = r_x[:rem]
assign_til += rem

# Next append all remaining y's
output[assign_til:] = r_y
assign_til += r_y.shape[0]

# If by chance x_dim_1 > y_dim_1 then r_x has atleast n elements
output[assign_til:] = r_x[rem:]

【讨论】：