【问题标题】:Alternative for ImageDataGenerator for custom dataset用于自定义数据集的 ImageDataGenerator 的替代方案
【发布时间】:2019-01-23 23:25:45
【问题描述】:

以下是我的 csv 文件

file,pt1,pt2,pt3,,pt4,pt5,pt6
object/obj0.png,66.0335639098,39.0022736842,30.2270075188,36.4216781955,59.582075188,39.6474225564
object/obj0.png,66.0335639098,39.0022736842,30.2270075188,36.4216781955,59.582075188,39.6474225564
object/obj0.png,66.0335639098,39.0022736842,30.2270075188,36.4216781955,59.582075188,39.6474225564

如何加载这些图像和注释来训练我的简单 cnn?

我尝试如下使用“ImagedataGenerator”,但没有帮助...还有其他选择吗?

train_datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

【问题讨论】:

    标签: python csv keras annotations computer-vision


    【解决方案1】:

    ImageDataGenerator 对象允许从numpy arrays 或直接从目录中产生数据。在后一种情况下,标签会自动从数据的文件夹结构中推断出来:每类图像都应该位于一个单独的文件夹中。每当标签结构更复杂时,如您的情况,您可以选择编写自己的自定义生成器。如果这样做,请使用Keras' Sequence object,它允许安全的多处理。 Keras 网站包含一个样板示例。在您的情况下,您的代码将如下所示:

    from keras.utils import Sequence
    from keras.preprocessing.image import load_img
    import pandas as pd
    import random 
    
    class DataSequence(Sequence):
    
        def __init__(self, csv_path, batch_size, mode='train'):
            self.df = pd.read_csv(csv_path) # read your csv file with pandas
            self.bsz = batch_size # batch size
            self.mode = mode # shuffle when in train mode
    
            # Take labels and a list of image locations in memory
            self.labels = self.df[['pt1', 'pt2', 'pt3', 'pt4', 'pt5', 'pt6']].values
            self.im_list = self.df['file'].tolist()
    
        def __len__(self):
            # compute number of batches to yield
            return int(math.ceil(len(self.df) / float(self.bsz)))
    
        def on_epoch_end(self):
            # Shuffles indexes after each epoch if in training mode
            self.indexes = range(len(self.im_list))
            if self.mode == 'train':
                self.indexes = random.sample(self.indexes, k=len(self.indexes))
    
        def get_batch_labels(self, idx):
            # Fetch a batch of labels
            return self.labels[idx * self.bsz: (idx + 1) * self.bsz,:]
    
        def get_batch_features(self, idx):
            # Fetch a batch of inputs
            return np.array([load_img(im) for im in self.im_list[idx * self.bsz: (1 + idx) * self.bsz]])
    
        def __getitem__(self, idx):
            batch_x = self.get_batch_features(idx)
            batch_y = self.get_batch_labels(idx)
            return batch_x, batch_y
    

    您可以使用此Sequence 对象来使用model.fit_generator() 训练您的模型:

    sequence = DataSequence('./path_to/csv_file.csv', batch_size)
    model.fit_generator(sequence, epochs=1, use_multiprocessing=True)
    

    另见this related question

    【讨论】:

    • 我收到以下错误,ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 2 target samples.。这是为每个样本加载两次标签吗?
    • 不,它应该可以正常工作。我不确定您的数据的确切性质和模型的输入/输出规范。你有一个输入张量和一个输出张量吗?
    • 这是一个多输出模型,将生成四个 x,y 坐标点
    猜你喜欢
    • 2017-11-05
    • 2023-03-10
    • 2011-07-18
    • 2017-11-08
    • 1970-01-01
    • 2010-11-13
    • 2020-02-09
    • 1970-01-01
    • 2015-07-14
    相关资源
    最近更新 更多