Numpy 一次使用分隔符拆分数据答案

【问题标题】：Numpy split data with delimiter at onceNumpy 一次使用分隔符拆分数据
【发布时间】：2017-05-19 11:58:18
【问题描述】：

我有一个 numpyarray，它看起来像这样：

[
[1,2,6,1,5]
[3,6,46]
[7,7,6,6,6,62,4]
[2,4,52,85,78]
]

数据是异构的

我现在的问题是，是否可以不为每一行（意味着没有“for 循环”）用分隔符分割数据。

结果应该看起来像（3d 数组）

[
[[1][2][6][1][5]]
[[3][6][46]]
[[7][7][6][6][6][62][4]]
[[2][4][52][85][78]]
]

【问题讨论】：

您是想避免循环还是仅仅为了循环？
当然可以。我的数组很大

标签： arrays python-2.7 numpy

【解决方案1】：

方法#1：这是一种使用输入数组的扁平列表版本然后简单拆分的方法 -

def extend_dims_objectarr(a):
    v = np.concatenate(a)[:,None].tolist()
    idx = np.r_[0,np.cumsum(list(map(len,a)))]
    return np.array([v[i:j] for i,j in zip(idx[:-1], idx[1:])])

样本输入、输出-

In [81]: a
Out[81]: 
array([[1, 2, 6, 1, 5], [3, 6, 46], [7, 7, 6, 6, 6, 62, 4],
       [2, 4, 52, 85, 78]], dtype=object)

In [82]: extend_dims_objectarr(a)
Out[82]: 
array([[[1], [2], [6], [1], [5]], [[3], [6], [46]],
       [[7], [7], [6], [6], [6], [62], [4]], 
       [[2], [4], [52], [85], [78]]], dtype=object)

方法#2：如果您可以将数组数组作为输出，这里是另一个使用列表理解的方法 -

np.array([np.array(i)[:,None] for i in a])

要获取列表数组作为输出，只需附加 .tolist() : np.array(i)[:,None].tolist()。

运行时测试

In [108]: a = np.array([np.random.randint(0,9,(i)).tolist() \
                  for i in np.random.randint(2,9,(10000))])

# @Allen's soln
In [109]: %timeit np.r_[list(map(lambda x: np.asarray(x).reshape(-1,1),a))]
100 loops, best of 3: 15.2 ms per loop

# Proposed in this post
In [110]: %timeit np.array([np.array(i)[:,None] for i in a])
100 loops, best of 3: 9.94 ms per loop

【讨论】：

我认为 OP 想要避免使用 for 循环。
@Allen 我也这么认为，直到OP clarified 他们正在寻找性能。我想等待 OP 澄清他们的实际目标总是值得的 :)
很高兴我想出了更多的方法来做同样的事情:-)

【解决方案2】：

设置

a = np.asarray([
[1,2,6,1,5],
[3,6,46],
[7,7,6,6,6,62,4],
[2,4,52,85,78],
])

解决方案

#put the array to a DataFrame and then reshape it to a 3D array.
import pandas as pd
a2 = pd.DataFrame(a).applymap(lambda x: np.asarray(x).reshape(-1,1)).values

print(a2)
Out[264]: 
array([[array([[1],
       [2],
       [6],
       [1],
       [5]])],
       [array([[ 3],
       [ 6],
       [46]])],
       [ array([[ 7],
       [ 7],
       [ 6],
       [ 6],
       [ 6],
       [62],
       [ 4]])],
       [array([[ 2],
       [ 4],
       [52],
       [85],
       [78]])]], dtype=object)

更新

另一种不使用 pandas 的方法，只有 numpy 和内置函数。

a2 = np.r_[list(map(lambda x: np.asarray(x).reshape(-1,1),a))]

print(a2)

Out[312]: 
array([array([[1],
       [2],
       [6],
       [1],
       [5]]),
       array([[ 3],
       [ 6],
       [46]]),
       array([[ 7],
       [ 7],
       [ 6],
       [ 6],
       [ 6],
       [62],
       [ 4]]),
       array([[ 2],
       [ 4],
       [52],
       [85],
       [78]])], dtype=object)

【讨论】：

applymap 本质上不是隐藏在引擎盖下的循环吗？
可能，但它在幕后 :-)
没有熊猫可以做到吗？只有 numpy