使用 h5py 强制 hdf5 文件的数据类型答案

【问题标题】：Forcing datatype for hdf5 file using h5py使用 h5py 强制 hdf5 文件的数据类型
【发布时间】：2016-03-29 01:49:50
【问题描述】：

我有一个 csv 文件，其中包含“日期”、“时间”和其他列（10 个左右）

Date,Time,C
20020515,123000000,10293
20020515,160000000,10287
20020516,111800000,10270
20020516,160000000,10260
20020517,130500000,10349
20020517,160000000,10276
20020520,123700000,10313
20020520,160000000,10258
20020521,114500000,10223

我正在尝试将其加载到 hdf5 文件中，并且日期和时间类型为“字符串”而不是 integer32。所以我正在这样做

import h5py,numpy as np
my_data = np.genfromtxt("/tmp/data.txt",delimiter=",",dtype=None,names=True)
myFile="/tmp/data.h5"
with h5py.File(myFile,"a") as f:
  dset = f.create_dataset('foo',data=my_data)

我想将“日期”和“时间”存储为 HDF5 上的“字符串”类型。不是 Int32。

【问题讨论】：

我认为不可能。根据docs：Datasets are very similar to NumPy arrays. They are homogenous collections of data elements, with an immutable datatype and (hyper)rectangular shape.这意味着所有列必须具有相同的dtype。
您想更改在 HDF5 文件中存储数据的方式，还是希望能够在读取这些列后将它们转换为字符串文件？
我想改变我存储数据的方式。我想将它们存储为字符串而不是整数。

标签： python numpy hdf5 h5py

【解决方案1】：

一个简单的解决方案是在将 my_data 写入文件之前更改其 dtype：

newtype = np.dtype([('Date', 'S8'), ('Time', 'S8'), ('C', '<i8')])
dset2 = f.create_dataset('foo2', data=my_data.astype(newtype))

您还可以通过将适当的dtype= 和shape= 参数传递给f.create_dataset 来创建一个空数据集，然后填写my_data 中的值：

dset3 = f.create_dataset('foo3', shape=my_data.shape, dtype=newtype)
dset3[:] = my_data.astype(newtype)

请注意，在编写之前我仍然必须将 my_data 转换为 newtype - h5py 似乎无法处理类型转换本身：

In [15]: dset3[:] = my_data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-6e62dae3d59a> in <module>()
----> 1 dset3[:] = my_data

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    584         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    585         for fspace in selection.broadcast(mshape):
--> 586             self.id.write(mspace, fspace, val, mtype)
    587 
    588     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dwrite (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()

OSError: Can't prepare for writing data (No appropriate function for conversion path)

【讨论】：