使用行索引和列索引在 numpy 中加载表，就像在 R 中一样？答案

【问题标题】：Loading a table in numpy with row- and column-indices, like in R?使用行索引和列索引在 numpy 中加载表，就像在 R 中一样？
【发布时间】：2014-02-07 09:37:39
【问题描述】：

我想在 numpy 中加载一个表格，以便将第一行和第一列视为文本标签。相当于这个 R 代码的东西：

read.table("filename.txt", row.header=T)

文件是这样的分隔文本文件：

   A    B    C    D
X  5    4    3    2
Y  1    0    9    9
Z  8    7    6    5

这样读入我就会有一个数组：

[[5,4,3,2],
 [1,0,9,9],
 [8,7,6,5]]

通过某种方式：行名 ["X","Y","Z"] 列名 ["A","B","C","D"]

有这样的类/机制吗？

【问题讨论】：

参见example 2 文档中的numpy.loadtxt。
必须是原生的numpy，还是可以允许pandas？你已经标记了这个 matplotlib。

标签： python numpy header row-number indices

【解决方案1】：

Numpy 数组并不完全适合类似表的结构。但是，pandas.DataFrames 是。

如需，请使用pandas.

对于你的例子，你会这样做

data = pandas.read_csv('filename.txt', delim_whitespace=True, index_col=0)

作为更完整的示例（使用StringIO 模拟您的文件）：

from StringIO import StringIO
import pandas as pd

f = StringIO("""A    B    C    D
X  5    4    3    2
Y  1    0    9    9
Z  8    7    6    5""")
x = pd.read_csv(f, delim_whitespace=True, index_col=0)

print 'The DataFrame:'
print x

print 'Selecting a column'
print x['D'] # or "x.D" if there aren't spaces in the name

print 'Selecting a row'
print x.loc['Y']

这会产生：

The DataFrame:
   A  B  C  D
X  5  4  3  2
Y  1  0  9  9
Z  8  7  6  5
Selecting a column
X    2
Y    9
Z    5
Name: D, dtype: int64
Selecting a row
A    1
B    0
C    9
D    9
Name: Y, dtype: int64

另外，正如@DSM 所指出的，如果您确实需要一个“原始”numpy 数组，那么了解DataFrame.values 或DataFrame.to_records() 之类的内容非常有用。（pandas 建立在 numpy 之上。在简单的非严格意义上，DataFrame 的每一列都存储为一维 numpy 数组。）

例如：

In [2]: x.values
Out[2]:
array([[5, 4, 3, 2],
       [1, 0, 9, 9],
       [8, 7, 6, 5]])

In [3]: x.to_records()
Out[3]:
rec.array([('X', 5, 4, 3, 2), ('Y', 1, 0, 9, 9), ('Z', 8, 7, 6, 5)],
      dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8'), ('D', '<i8')])

【讨论】：

然后你可以使用data.values 得到一个ndarray 或data.to_records() 得到一个recarray 等等（尽管恕我直言，numpy 的结构化数组足以引诱你会尝试用它们做更多的事情，而不是它们真正设计的用途......）
@DSM - 好点！（而且我完全同意结构化数组的观点。“..足够有用，足以引诱您进入...”是一个相当贴切的引用！）