如何使用我的数组名称作为文件名？答案

【问题标题】：How to use the name of my arrays as filenames?如何使用我的数组名称作为文件名？
【发布时间】：2018-01-30 07:24:39
【问题描述】：

我的代码正在做一些数学运算并将输出保存在多个 NumPy 数组中。

最后，我将输出写入磁盘，我希望将数组的名称用作单独的文件名，每个数组都将写入其中。

例如，如果我有以下多维数组

time = [...]
force = [...]
pressure = [...]
energy = [...]

等等，我愿意

for array in [time, force, pressure, energy, ....]:
    with open(**filename**, 'w') as file:
         pickle.dump(array, file)

但是如何设置文件名，让它取数组名。

我经历了许多类似的问题（尽管被问到其他动机）。答案表明数组（或任何变量）名称只是标签，而不是像这样检索。但是我在这里命名文件的动机似乎是一种真正的需要（至少对我来说），所以问。如果可能的话，我可能会更喜欢用 HDF5 格式编写，并将数组名称用作不同的数据集。虽然这一切都可以手动完成，但我们为什么要编码呢？

【问题讨论】：

您想以哪种格式保存文件？你的意思是数组名应该是文件名，数组元素是文件内容？
文件格式无关紧要。我现在正在腌制，但以后可以转移到其他人身上。但是，是的，我需要将数组名称作为文件名，并将数组元素作为数据放入文件中。谢谢。

标签： python arrays python-3.x numpy file-io

【解决方案1】：

如果我从一组变量中创建一个列表，我无法检索这些变量的名称。我只能检索引用变量的对象。

In [324]: x = np.arange(3)
In [325]: y = np.ones((3,3))
In [326]: alist = [x,y]
In [327]: alist
Out[327]: 
[array([0, 1, 2]), array([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])]
In [328]: id(x)
Out[328]: 2851921416
In [329]: id(alist[0])
Out[329]: 2851921416

alist[0] 不以任何方式引用变量名“x”。

字典是将名称或字符串与对象关联的更好方法：

In [331]: adict = {'x':x, 'y':y}
In [332]: adict['x']
Out[332]: array([0, 1, 2])

有了这样的字典，我可以用savez保存这些数组：

In [334]: np.savez('temp', **adict)
In [336]: d = np.load('temp.npz')
In [337]: list(d.keys())
Out[337]: ['y', 'x']

npz 存档包含两个名为：

In [340]: !unzip -l temp.npz
Archive:  temp.npz
  Length      Date    Time    Name
---------  ---------- -----   ----
      200  2018-01-29 23:58   y.npy
      140  2018-01-29 23:58   x.npy
---------                     -------
      340                     2 files

该字典在创建HDF5 数据集时也很有用。

使用pickle 保存/加载变量（和字典）的一些示例：

How to load/view structure of pickled object in Ipython console ? (Windows 7, Spyder, Ipython console)

这是一个尝试保存和加载工作区（或其中的一部分）的尝试，就像通常使用 MATLAB 所做的那样：

IPython loading variables to workspace: can you think of a better solution than this?

IPython: how to automagically load npz file and assign values to variables?

【讨论】：

这似乎是我可以使用的东西。我将不得不修改我的主数学程序并将 numpy 数组写入字典，而不是单独的数组。您是否认为这会对内存使用和速度产生（额外的）影响，因为我的数据很大。这些数组是多维的，文件大小已经很容易达到 1.5GB。

【解决方案2】：

您可以使用numpy.dtype.names。这是一个例子。

# inputs
In [196]: A
Out[196]: 
array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])

In [197]: B
Out[197]: 
array([[1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4]])

# their dtype
In [198]: A.dtype, B.dtype
Out[198]: (dtype('int64'), dtype('int64'))

# their size
In [199]: A.size, B.size
Out[199]: (16, 16)

# store it as a list of tuples
In [200]: dt = np.dtype([('A', A.dtype, A.size), ('B', B.dtype, B.size)])

# get all arrays
In [201]: dt.names
Out[201]: ('A', 'B')


In [202]: dt['A']
Out[202]: dtype(('<i8', (16,)))

你也可以跳过variable.size，因为它会被推断出来。

In [233]: dt = np.dtype([('A', A.dtype), ('B', B.dtype)])

# size inferred automatically
In [234]: dt.itemsize
Out[234]: 16

In [235]: dt.names
Out[235]: ('A', 'B')

【讨论】：

这似乎可以工作。但是你不认为为我的所有数组输入dt = np.dtype([('A', A.dtype, A.size), ('B', B.dtype, B.size)]) 行会比我手动将它们的名称输入到['time, 'force', pressure, 'energy', ...] 之类的列表中占用更多空间
@nsk 我明白你的意思 :) 实际上，你可以忽略尺寸。它将从变量名中推断出来。但是，我觉得这比将所有名称都作为一个字符串放在一个列表中要干净一些。
是的，但是如果我跳过 *.dtype 和 np.dtype，我将得到字符串列表。 :P 谢谢你的回答。我学到了新东西。

【解决方案3】：

我根本不会这样做。

我会这样做

time = [...]
force = [...]
pressure = [...]
energy = [...] 

file_data = {'time': time, 'force': force, 'pressure': pressure, 'energy': energy}
for filename, array in file_data.items():
    with open(filename, 'w') as file:
         pickle.dump(array, file)

这并不能保证在 3.6 左右之前的正确顺序，但我认为在这种情况下顺序并不重要。

如果顺序很重要，我会这样做

file_data = [('time', time), ('force', force), ('pressure', pressure), ('energy', energy)]
for filename, array in file_data:
    with open(filename, 'w') as file:
         pickle.dump(array, file)

【讨论】：

这是@hpaulj 建议的，我想我会这样做。但是也应该有一种直接的方法来检索名称，因为当 for 循环 读取它时，它首先按名称读取数组，然后查找它们的值。

【解决方案4】：

可以使用名称获取局部变量。虽然通常不是最好的主意。但如果您需要：

代码：

locals()[var_name]

测试代码：

x = 1
y = 2
z = 3
for var_name in ('x', 'y', 'z'):
    print(locals()[var_name])

结果：

1
2
3

本地示例：

所以把这个例子放到你的例子中：

for array_name in ['time', 'force', 'pressure', 'energy', ....]:
    with open(array_name, 'w') as file:
        pickle.dump(locals()[array_name], file)

【讨论】：

我在寻找答案时确实提到了使用 locals()。但如果可能的话，寻找更优雅的解决方案。无论如何，我希望在您的测试代码中检索 x、y、z，而不是 1、2、3。我猜需要进行一些调整？
可能是我在你的代码中遗漏了一些东西。但我需要文件名的变量“名称”，而不是它们的“值”。这就是问题所在。您的代码给出的结果与此相同.. for i in (x, y, z): print(i)
我将示例转换为更接近您的代码的内容。