运行我的 CNN model.fit 时出现 PIL 未识别图像错误答案

【问题标题】：PIL Unidentified Image Error when running my CNN model.fit运行我的 CNN model.fit 时出现 PIL 未识别图像错误
【发布时间】：2021-08-17 15:21:59
【问题描述】：

这里的新程序员请原谅可能缺乏细节/知识。每当我运行我的 model.fit() 时，第一个 epoch 运行到一半，然后给我以下错误： PIL.UnidentifiedImageError：无法识别图像文件 <_io.bytesio object at> 如何跳过产生错误的图像或完全解决问题？

相关代码：

from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_gen=ImageDataGenerator(rotation_range=30,
                            width_shift_range=20,
                            height_shift_range=20,
                            horizontal_flip=True,
                            rescale=1/255,
                            zoom_range=0.3,
                            fill_mode='nearest')
image_gen.flow_from_directory(trainpath)
Found 16418 images belonging to 120 classes.
<tensorflow.python.keras.preprocessing.image.DirectoryIterator at 0x7fad88ea1f40>
test_gen=ImageDataGenerator(rescale=1/255)
test_gen.flow_from_directory(testpath)
Found 2153 images belonging to 120 classes.
<tensorflow.python.keras.preprocessing.image.DirectoryIterator at 0x7fad88ea4ee0>
Model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, MaxPool2D, Flatten
model=Sequential()

model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=img_shape, activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=img_shape, activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=img_shape, activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(120, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_8 (Conv2D)            (None, 98, 98, 32)        896       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 49, 49, 32)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 47, 47, 64)        18496     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 23, 23, 64)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 21, 21, 64)        36928     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 10, 10, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6400)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               819328    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 120)               15480     
=================================================================
Total params: 891,128
Trainable params: 891,128
Non-trainable params: 0
_________________________________________________________________
from tensorflow.keras.callbacks import EarlyStopping
earlystop=EarlyStopping(monitor='val_loss', patience=3)
batch_size=16
train_image_gen=image_gen.flow_from_directory(trainpath,
                                             target_size=img_shape[:2],
                                             color_mode='rgb',
                                             batch_size=batch_size,
                                             class_mode='categorical')
Found 16418 images belonging to 120 classes.
test_image_gen=image_gen.flow_from_directory(testpath,
                                             target_size=img_shape[:2],
                                             color_mode='rgb',
                                             batch_size=batch_size,
                                             class_mode='categorical',
                                            shuffle=False)
Found 2153 images belonging to 120 classes.
from PIL import Image
results=model.fit_generator(train_image_gen,
                            epochs=20,
                           validation_data=test_image_gen,
                           callbacks=[earlystop])
/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:1940: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  warnings.warn('`Model.fit_generator` is deprecated and '
Epoch 1/20
 109/1027 [==>...........................] - ETA: 2:18 - loss: 4.7821 - accuracy: 0.0110
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-219-155640c4966a> in <module>
----> 1 results=model.fit_generator(train_image_gen,
      2                             epochs=20,
      3                            validation_data=test_image_gen,
      4                            callbacks=[earlystop])

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1941                   'will be removed in a future version. '
   1942                   'Please use `Model.fit`, which supports generators.')
-> 1943     return self.fit(
   1944         generator,
   1945         steps_per_epoch=steps_per_epoch,

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1181                 _r=1):
   1182               callbacks.on_train_batch_begin(step)
-> 1183               tmp_logs = self.train_function(iterator)
   1184               if data_handler.should_sync:
   1185                 context.async_wait()

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    887 
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 
    891       new_tracing_count = self.experimental_get_tracing_count()

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    915       # In this case we have created variables on the first call, so we run the
    916       # defunned version which is guaranteed to never create variables.
--> 917       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    918     elif self._stateful_fn is not None:
    919       # Release the lock early so that multiple threads can perform the call

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   3021       (graph_function,
   3022        filtered_flat_args) = self._maybe_define_function(args, kwargs)
-> 3023     return graph_function._call_flat(
   3024         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
   3025 

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1958         and executing_eagerly):
   1959       # No tape is watching; skip to running the function.
-> 1960       return self._build_call_outputs(self._inference_function.call(
   1961           ctx, args, cancellation_manager=cancellation_manager))
   1962     forward_backward = self._select_forward_and_backward_functions(

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    589       with _InterpolateFunctionError(self):
    590         if cancellation_manager is None:
--> 591           outputs = execute.execute(
    592               str(self.signature.name),
    593               num_outputs=self._num_outputs,

~/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

UnknownError:  UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fad89356680>
Traceback (most recent call last):

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 249, in __call__
    ret = func(*args)

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 645, in wrapper
    return func(*args, **kwargs)

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 961, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 837, in wrapped_generator
    for data in generator_fn():

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 963, in generator_fn
    yield x[i]

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/keras_preprocessing/image/iterator.py", line 65, in __getitem__
    return self._get_batches_of_transformed_samples(index_array)

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/keras_preprocessing/image/iterator.py", line 227, in _get_batches_of_transformed_samples
    img = load_img(filepaths[j],

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/keras_preprocessing/image/utils.py", line 114, in load_img
    img = pil_image.open(io.BytesIO(f.read()))

  File "/Users/liatkatz/opt/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 2967, in open
    raise UnidentifiedImageError(

PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fad89356680>


     [[{{node PyFunc}}]]
     [[IteratorGetNext]] [Op:__inference_train_function_3481]

Function call stack:
train_function

提前致谢！

【问题讨论】：

总是将完整的错误消息（从单词“Traceback”开始）作为文本（不是截图，不是链接到外部门户）有问题（不是评论）。还有其他有用的信息。

标签： python tensorflow keras conv-neural-network

【解决方案1】：

几天前我也遇到过类似的情况。我通过删除导致此类错误的图像来处理它。为此，我使用异常处理。尝试添加异常处理以优化您的图像数据集，这可能对您也有用，因为它在我的情况下有效。

如您所见，我遍历整个图像数据集并检查导致错误的图像，因此我打印其地址并将其删除，在我的情况下，我有人员数据集并且我正在提取特征，它提供了一些图像错误并使我的程序崩溃，所以我像这样优化我的数据集并成功获得了我的结果。在您的情况下，您可以在打开图像时添加 try-catch，如果图像打开成功则可以，但如果没有，则将其删除。代码如下所示。

from PIL import Image
try:
     Image.open(image_address)
except:
     print('Error occur on ' + image_address)

【讨论】：

你能添加一些示例代码吗？
始终将代码、数据和完整的错误消息作为文本（不是屏幕截图，不是链接）放在有问题的地方（不在评论中）。

【解决方案2】：

这个问题经常发生，特别是当你使用来自其他人的数据集时，比如在 Kagle 上没有太多用处并且数据集中有错误的图像文件。您使用了 flow_from_directory。请注意，在参数列表中它没有名为 validate_filenames 的参数。如果您查看 flow_from_dataframe 它确实具有此参数。文档说明

validate_filenames: Boolean, whether to validate image filenames in x_col.
If True, invalid images will be ignored. 
Disabling this option can lead to speed-up in the execution of this function. Defaults to True.

我不知道 flow_from_directory 是否会检查图像文件，但我希望它不会。所以我认为首先要做的是使用 flow_from_dataframe 并查看它是否会检测到错误的图像文件并跳过它们。以下是您可以用来读取图像的代码。我假设您的目录结构类似于下图。

sdir
----train
    ---- class 0 directory
         ----- class 0 image 0
         ----- class 0 image 1
         -----
         ----- class 0 image N
    ---- class 1 directory
         ----- class 1 image 0
         -----
         ----- class 1 image M
    ----
    ---- class 119 directory
         ----- class 119 image 0
         ----- 
         ----- class 119 image K
---- test
    ---- class 0 directory
         ----- class 0 image 0
         ----- class 0 image 1
         -----
         ----- class 0 image N
    ---- class 1 directory
         ----- class 1 image 0
         -----
         ----- class 1 image M
    ----
    ---- class 119 directory
         ----- class 119 image 0
         ----- 
         ----- class 119 image K

def preprocess (sdir, trsplit):    
    for category in ['training_set', 'test_set']:
        filepaths=[]
        labels=[]
        catpath=os.path.join(sdir, category)
        classlist=os.listdir(catpath)
        for klass in classlist:
            classpath=os.path.join(catpath,klass)
            flist=os.listdir(classpath)
            for f in flist:
                fpath=os.path.join(classpath,f)
                filepaths.append(fpath)
                labels.append(klass)
        Fseries=pd.Series(filepaths, name='filepaths')
        Lseries=pd.Series(labels, name='labels')
        if category == 'training_set':
            df=pd.concat([Fseries, Lseries], axis=1)
        else:
            test_df=pd.concat([Fseries, Lseries], axis=1)       
    # split df into train_df and test_df 
    strat=df['labels']    
    train_df, valid_df=train_test_split(df, train_size=trsplit, shuffle=True, random_state=123, stratify=strat)    
    print('train_df length: ', len(train_df), '  test_df length: ',len(test_df), '  valid_df length: ', len(valid_df))
    print(train_df['labels'].value_counts())
    return train_df, test_df, valid_df

sdir=r'c:\sdir' # set this to the path for your directory
train_split=.9 # set this to the % of the data to use for training
train_df, test_df, valid_df= preprocess(sdir, train_split)

现在您有 3 个数据帧，一个 train_df、一个 valid_df 和一个测试 df。现在创建发电机

img_size = (224,224) # set this to your desired image size
channels=3 # for color images
image_shape=(img_size[0], img_size[1], channels)
batch_size= 32 # set this to your desired batch size
length=len(test_df)
# two lines of code below determines batch size for test_gen such that
# test_batch_size X test_steps = number of samples in the test_df.
# this ensures you  go through the test data exactly once.
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]  
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, '  test steps: ', test_steps)

trgen=ImageDataGenerator(rotation_range=30,
                            width_shift_range=20,
                            height_shift_range=20,
                            horizontal_flip=True,
                            rescale=1/255,
                            zoom_range=0.3,
                            fill_mode='nearest')
tvgen=ImageDataGenerator(rescale=1/255)
train_gen=trgen.flow_from_dataframe( ndf, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
                                    color_mode='rgb', shuffle=True, batch_size=batch_size)
test_gen=tvgen.flow_from_dataframe( test_df, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
                                    color_mode='rgb', shuffle=False, batch_size=test_batch_size)

valid_gen=tvgen.flow_from_dataframe( valid_df, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
                                    color_mode='rgb', shuffle=True, 
classes=list(train_gen.class_indices.keys())

当你运行它时，如果它检测到错误的图像文件，它应该打印出一个警告，并且生成器不会使用错误的图像文件。在此之后创建您的模型并运行 model.fit 以查看是否可以避免问题。如果不让我知道，因为我们可以做一些其他的事情来检测错误。

【讨论】：