如何从 NumPy 数组中逐行选择元素？答案

【问题标题】：How to select elements row-wise from a NumPy array?如何从 NumPy 数组中逐行选择元素？
【发布时间】：2011-11-24 11:43:03
【问题描述】：

我有一个像这样的 numpy 数组

dd= [[foo 0.567 0.611]
     [bar 0.469 0.479]
     [noo 0.220 0.269]
     [tar 0.480 0.508]
     [boo 0.324 0.324]]

如何循环遍历数组选择 foo 并将 0.567 0.611 作为单例浮点数。然后选择 bar 并获得 0.469 0.479 作为单例浮点数.....

我可以通过使用将第一个元素作为列表来获取向量

dv=  dd[:,1]

'foo' 和 'bar' 元素不是未知变量，它们可以改变。

如果元素在位置 [1]，我将如何更改？

[[0.567 foo2 0.611]
  [0.469 bar2 0.479]
  [0.220 noo2 0.269]
  [0.480 tar2 0.508]
  [0.324 boo2 0.324]]

【问题讨论】：

什么是“foo”、“bar”等？字符串？还是只是其他数字的占位符？
你怎么能构造一个包含浮点数和字符串的 numpy 数组？
@Merlin：但 numpy ndarray 只能有一种类型。不可能在同一个数组中同时包含字符串和浮点值。所以数组要么是record array，要么是对象类型的ndarray，其中每个条目都是一个列表。那么它是什么？
@tal "对象类型的 ndarray"

标签： python multidimensional-array numpy scipy

【解决方案1】：

您已将 NumPy 标签放在您的问题上，所以我假设您需要 NumPy 语法，而我之前的答案并未使用该语法。

如果实际上您希望使用 NumPy，那么您可能不需要数组中的字符串，否则您还必须将浮点数表示为字符串。

您正在寻找的是NumPy 语法以逐行访问二维数组的元素（并排除第一列）。

语法是：

M[row_index,1:]        # selects all but 1st col from row given by 'row_index'

W/r/t 您问题中的第二个场景-选择不相邻的列：

M[row_index,[0,2]]     # selects 1st & 3rd cols from row given by 'row_index'

您的问题中的小问题只是您想为 row_index 使用字符串，因此有必要删除字符串（这样您就可以创建一个 2D NumPy 浮点数组），用数字行索引替换它们然后创建一个查找表以将字符串映射到数字行索引：

>>> import numpy as NP
>>> # create a look-up table so you can remove the strings from your python nested list,
>>> # which will allow you to represent your data as a 2D NumPy array with dtype=float
>>> keys
      ['foo', 'bar', 'noo', 'tar', 'boo']
>>> values    # 1D index array comprised of one float value for each unique string in 'keys'
      array([0., 1., 2., 3., 4.])
>>> LuT = dict(zip(keys, values))

>>> # add an index to data by inserting 'values' array as first column of the data matrix
>>> A = NP.hstack((vals, A))
>>> A
        NP.array([  [ 0., .567, .611],
                    [ 1., .469, .479],
                    [ 2., .22, .269],
                    [ 3., .48, .508],
                    [ 4., .324, .324] ])

>>> # so now to look up an item, by 'key':
>>> # write a small function to perform the look-ups:
>>> def select_row(key):
        return A[LuT[key],1:]

>>> select_row('foo')
      array([ 0.567,  0.611])

>>> select_row('noo')
      array([ 0.22 ,  0.269])

您的问题中的第二种情况：如果索引列发生变化怎么办？

>>> # e.g., move index to column 1 (as in your Q)
>>> A = NP.roll(A, 1, axis=1)
>>> A
      array([[ 0.611,  1.   ,  0.567],
             [ 0.479,  2.   ,  0.469],
             [ 0.269,  3.   ,  0.22 ],
             [ 0.508,  4.   ,  0.48 ],
             [ 0.324,  5.   ,  0.324]])

>>> # the original function is changed slightly, to select non-adjacent columns:
>>> def select_row2(key):
        return A[LuT[key],[0,2]]

>>> select_row2('foo')
        array([ 0.611,  0.567])

【讨论】：

M[row_index,[0,2]] 不起作用，'row_index' 这个函数在哪里？
@Merlin ：是的，它有效。 'row_index' 是一个占位符或变量——它只是意味着 +row index* 这意味着该行的索引（一些整数值。
@Merlin ：我在我的答案中向您展示了如何构建键值对存储。再次，例如，从 2 个列表开始，一个用于键，一个用于值。键 = ['key1', 'key2', 'key3'], vals = range(3);创建一个由两个列表组成的元组，然后调用“zip”，然后对该元组调用“dict”——结果是一个字典。 LuT = dict(zip(keys, vals))
@doug: 我猜keys = dd[:,0]; vals = np.arange(1,len(keys)+1)
好吧，键可能是从 python 列表而不是 NumPy 数组中提取的——使用查找表的全部目的是删除字符串，以便您可以将数据表示为 NumPy数组，因此要从名为“data”的嵌套 python 列表中获取键（假设第 1 列中的键）使用：keys = [row[0] for row in data];对于值，最好使用基于 '0' 的索引——否则容易混淆，所以 vals = range(ken(keys))

【解决方案2】：

首先，第一个元素的向量是

dv = dd[:,0]

（python 是 0-indexed）

其次，遍历数组（例如，存储在字典中）：

dc = {}
ind = 0 # this corresponds to the column with the names
for row in dd:
    dc[row[ind]] = row[1:]

【讨论】：