【问题标题】:Forcing datatype for hdf5 file using h5py使用 h5py 强制 hdf5 文件的数据类型
【发布时间】:2016-03-29 01:49:50
【问题描述】:

我有一个 csv 文件,其中包含“日期”、“时间”和其他列(10 个左右)

Date,Time,C
20020515,123000000,10293
20020515,160000000,10287
20020516,111800000,10270
20020516,160000000,10260
20020517,130500000,10349
20020517,160000000,10276
20020520,123700000,10313
20020520,160000000,10258
20020521,114500000,10223

我正在尝试将其加载到 hdf5 文件中,并且日期和时间类型为“字符串”而不是 integer32。所以我正在这样做

import h5py,numpy as np
my_data = np.genfromtxt("/tmp/data.txt",delimiter=",",dtype=None,names=True)
myFile="/tmp/data.h5"
with h5py.File(myFile,"a") as f:
  dset = f.create_dataset('foo',data=my_data)

我想将“日期”和“时间”存储为 HDF5 上的“字符串”类型。不是 Int32。

【问题讨论】:

  • 我认为不可能。根据docsDatasets are very similar to NumPy arrays. They are homogenous collections of data elements, with an immutable datatype and (hyper)rectangular shape.这意味着所有列必须具有相同的dtype
  • 您想更改在 HDF5 文件中存储数据的方式,还是希望能够在读取这些列后将它们转换为字符串文件?
  • 我想改变我存储数据的方式。我想将它们存储为字符串而不是整数。

标签: python numpy hdf5 h5py


【解决方案1】:

一个简单的解决方案是在将 my_data 写入文件之前更改其 dtype:

newtype = np.dtype([('Date', 'S8'), ('Time', 'S8'), ('C', '<i8')])
dset2 = f.create_dataset('foo2', data=my_data.astype(newtype))

您还可以通过将适当的dtype=shape= 参数传递给f.create_dataset 来创建一个空数据集,然后填写my_data 中的值:

dset3 = f.create_dataset('foo3', shape=my_data.shape, dtype=newtype)
dset3[:] = my_data.astype(newtype)

请注意,在编写之前我仍然必须将 my_data 转换为 newtype - h5py 似乎无法处理类型转换本身:

In [15]: dset3[:] = my_data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-6e62dae3d59a> in <module>()
----> 1 dset3[:] = my_data

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    584         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    585         for fspace in selection.broadcast(mshape):
--> 586             self.id.write(mspace, fspace, val, mtype)
    587 
    588     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dwrite (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()

OSError: Can't prepare for writing data (No appropriate function for conversion path)

【讨论】:

    猜你喜欢
    • 2016-04-04
    • 2015-10-29
    • 2012-01-19
    • 2021-08-05
    • 2015-09-17
    • 2020-08-19
    • 1970-01-01
    • 2014-12-24
    • 2018-01-02
    相关资源
    最近更新 更多