Numpy genfromtxt 遍历列答案

【问题标题】：Numpy genfromtxt iterate over columnsNumpy genfromtxt 遍历列
【发布时间】：2015-11-22 05:29:57
【问题描述】：

我正在使用 NumPy 的 genfromtext 从 CSV 文件中获取列。

每一列都需要拆分并分配给一个单独的SQLAlchemySystemRecord，并结合其他一些列和属性并添加到数据库中。

迭代列f1 到f9 并将它们添加到会话对象的最佳做法是什么？

到目前为止，我使用了以下代码，但我不想对每个 f 列做同样的事情：

t = np.genfromtxt(FILE_NAME,dtype=[(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20), (np.str_, 20), (np.str_, 20),(np.str_, 20)]\
 ,delimiter=',',filling_values="None", skiprows=0,usecols=(0,1,2,3,4,5,6,7,8,9,10))

for r in enumerate(t):
    _acol = r['f1'].split('-')
    _bcol = r['f2'].split('-')
    ....
    arec = t_SystemRecords(first=_acol[0], second=_acol[1], third=_acol[2], ... )
    db.session.add(arec)
    db.session.commit()

【问题讨论】：

不可能迭代t 的转置，只需要：for col in t.T: ... ?
这很有趣 - 我会试试的
通常（总是？）genfromtxt 生成一维数组结构化数组。 transpose 什么都不做。

标签： python numpy genfromtxt

【解决方案1】：

看看t.dtype。或r.dtype。

制作一个示例结构化数组（这是 genfromtxt 返回的内容）：

t = np.ones((5,), dtype='i4,i4,f8,S3')

看起来像：

array([(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'),
       (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1')], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

dtype 和 dtype.names 是：

In [135]: t.dtype
Out[135]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

In [138]: t.dtype.names
Out[138]: ('f0', 'f1', 'f2', 'f3')

遍历名称以查看各个列：

In [139]: for n in t.dtype.names:
   .....:     print(t[n])
   .....:     
[1 1 1 1 1]
[1 1 1 1 1]
[ 1.  1.  1.  1.  1.]
[b'1' b'1' b'1' b'1' b'1']

或者在您的情况下，迭代“行”，然后迭代名称：

In [140]: for i,r in enumerate(t):
   .....:     print(r)
   .....:     for n in r.dtype.names:
   .....:         print(r[n])
   .....:         
(1, 1, 1.0, b'1')
1
1
1.0
b'1'
(1, 1, 1.0, b'1')
...

对于r，即0d（检查r.shape），您可以按数字选择项目或迭代

r[1]  # == r[r.dtype.names[1]]
for i in r: print(r)

对于 1d 的 t，这不起作用； t[1] 引用了一个项目。

一维结构化数组的行为有点像二维数组，但不完全一样。 row 和 column 的通常谈话必须替换为 row（或项目）和 field。

制作一个可能更接近您的情况的t

In [175]: txt=[b'one-1, two-23, three-12',b'four-ab, five-ss, six-ss']

In [176]: t=np.genfromtxt(txt,dtype=[(np.str_,20),(np.str_,20),(np.str_,20)])

In [177]: t
Out[177]: 
array([('one-1,', 'two-23,', 'three-12'),
       ('four-ab,', 'five-ss,', 'six-ss')], 
      dtype=[('f0', '<U20'), ('f1', '<U20'), ('f2', '<U20')])

np.char 具有可应用于数组的字符串函数：

In [178]: np.char.split(t['f0'],'-')
Out[178]: array([['one', '1,'], ['four', 'ab,']], dtype=object)

它不适用于结构化数组，但适用于单个字段。该输出可以被索引为列表列表（它不是 2d）。

【讨论】：