【问题标题】:Tensorflow: convert .hdf5 to tfrecordTensorflow:将 .hdf5 转换为 tfrecord
【发布时间】:2018-08-08 20:16:35
【问题描述】:

我有 .h5 格式的 coco 数据集。 我需要将其转换为 .record(TF 记录文件),以便我可以使用 Object Detection API 训练我的 embedded_ssd_mobilenet。 我该怎么办?

【问题讨论】:

    标签: python tensorflow object-detection


    【解决方案1】:

    看看this conversion script.它从 .hdf5 文件中读取点和标签,然后将它们导出到带有示例原型的 TFRecords 文件。

    【讨论】:

    • 使用这个脚本,我得到了:Writing object_detection/images/train/train.record Unique values in dataset 'train': {} Writing object_detection/images/test/val.record Unique values in dataset 'val': {} 检查新文件 train.record 是 0Kb 大小!!一定有什么问题。
    【解决方案2】:

    这是我用来将一些数据从 hdf5 转换为 tfrecord 的脚本。您显然必须修改列名

    import h5py
    import os
    import tensorflow as tf
    
    CHUNK_SIZE = 5000
    COL_NAMES = {
        "index": lambda x: tf.train.Feature(int64_list=tf.train.Int64List(value=[x])),
        "vals": lambda x:     tf.train.Feature(float_list=tf.train.FloatList(value=x.reshape(-1))),
        "vals_shape": lambda x: tf.train.Feature(int64_list=tf.train.Int64List(value=list(x.shape))),
    }
    DATASET_NAME = "data"
    
    def write_tfrecords(file_path, features_list):
        with tf.python_io.TFRecordWriter(file_path) as writer:
            for features in features_list:
                writer.write(tf.train.Example(features=features).SerializeToString())
    
    def hdf5_row_to_features(hdf5_row):
        feature_dict = dict()
        for col_name in COL_NAMES.keys():
            if col_name == "index" and hdf5_row["index"] % 100 == 0:
                print("index: %d" % hdf5_row["index"])
            feature_dict[col_name] = COL_NAMES[col_name](hdf5_row[col_name])
            if col_name == "vals":
                feature_dict["vals_shape"] = COL_NAMES["vals_shape"]    (hdf5_row[col_name])
        return tf.train.Features(feature=feature_dict)
    
    def convert_records(file_path):
        dir_name = os.path.dirname(file_path)
        base_file_name = os.path.splitext(file_path)[0]
        tfrecord_file_name_template = "%s-%d.tfrecord"
        tfrecord_file_counter = 0
    
        hdf5_file = h5py.File(file_path, "r")
        features_list = list()
        index = 0
        print("Dataset size: %d" % hdf5_file[DATASET_NAME].size)
        while index < hdf5_file[DATASET_NAME].size:
            if index % 100 == 0:
                print("iteration index: %d" % index)
            features = hdf5_row_to_features(hdf5_file[DATASET_NAME][index])
            features_list.append(features)
    
            # Write chunk to file.
            if index % CHUNK_SIZE == 0 and index != 0:
                write_tfrecords(
                    os.path.join(
                        # dir_name,
                        tfrecord_file_name_template % (base_file_name, tfrecord_file_counter)),
                    features_list)
                tfrecord_file_counter += 1
                features_list = list()
            index += 1
    
        # Write remainder to file.
        if index % CHUNK_SIZE != 0:
            write_tfrecords(
                os.path.join(
                    # dir_name,
                    tfrecord_file_name_template % (base_file_name,     tfrecord_file_counter)),
                features_list)
    
        print("Dataset size: %d" % hdf5_file[DATASET_NAME].size)
        hdf5_file.close()
    

    您还可以检查 tfrecord 文件中的数据,以检查在开发过程中使用此 sn-p 是否正确写入了所有内容:

    def inspect_tf_record_file(file_path, result_chunking=1):
        count = 0
        for example in tf.python_io.tf_record_iterator(file_path):
            result = tf.train.Example.FromString(example)
            if count % result_chunking == 0:
                print("result: %s" % result)
            count += 1
        print("Total count: %d" % count)
    

    【讨论】:

      猜你喜欢
      • 2021-04-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-01-27
      • 2023-03-29
      • 1970-01-01
      • 1970-01-01
      • 2014-05-21
      相关资源
      最近更新 更多