避免在 Pycharm IDE 中多次加载图像数据集（仅加载一次）答案

【问题标题】：Avoid Loading Image dataset in Pycharm IDE multiple times(Load only once)避免在 Pycharm IDE 中多次加载图像数据集（仅加载一次）
【发布时间】：2023-03-12 08:25:01
【问题描述】：

我正在使用 Keras/Tensorflow 解决图像分类问题。问题是，由于我使用的是像 Pycharm 这样的 IDE（我也使用 Jupyter Notebook），我很想知道是否有任何方法可以只从目录加载数据集一次，然后当我重新运行整个.py 文件，我只是使用已加载数据中的图像？

labels = ['rugby', 'soccer']
img_size = 224
def get_data(data_dir):
    data = [] 
    for label in labels: 
        path = os.path.join(data_dir, label)
        class_num = labels.index(label)
        for img in os.listdir(path):
            try:
                img_arr = cv2.imread(os.path.join(path, img))[...,::-1] #convert BGR to RGB format
                resized_arr = cv2.resize(img_arr, (img_size, img_size)) # Reshaping images to preferred size
                data.append([resized_arr, class_num])
            except Exception as e:
                print(e)
    return np.array(data)
Now we can easily fetch our train and validation data.


train = get_data('../input/traintestsports/Main/train')
val = get_data('../input/traintestsports/Main/test')

每次调用 get_data 时，都需要额外的时间来加载整个数据集

【问题讨论】：

您应该使用序列数据生成器来实现高效的输入管道。我投票结束这个问题，因为它缺乏更多细节。请添加您的问题的更多详细信息以撤回接近投票。

标签： python tensorflow machine-learning keras deep-learning

【解决方案1】：

您可以使用cv2.imread() 方法读取每个图像，并使用np.save() 方法保存所有图像（放入单个数组） 将数据保存到二进制文件中.npy 格式：

import cv2
import numpy as np

imgs = ['image1.png', 'image2.png', 'image3.png', 'image4.png']

# Map each str to cv2.imread, convert map object to list, and convert list to array
arr = np.array(list(map(cv2.imread, imgs))) 

np.save('data.npy', arr)

当你想访问数据时，可以使用np.load()方法：

import numpy as np

arr = np.load('data.npy')

你可以通过命令提示符命令安装cv2(OpenCV)：

pip install opencv-python

和numpy 一起

pip install numpy

如果您有更复杂的数据类型，您可以使用pickle.dump() 方法将您的数据灭菌保存到文件中：

import pickle

data = {"data": ['test', 1, 2, 3]} # Replace this with your dataset

with open("data.pickle", "wb") as f:
    pickle.dump(data, f)

当你想访问数据时，可以使用pickle.load()方法：

import pickle

with open("data.pickle", "rb") as f:
    data = pickle.load(f)

print(data)

输出：

{'data': ['test', 1, 2, 3]}

pickle 模块内置在 python 中。

【讨论】：

这不能回答 OP 的问题。这更像是如何在python中读写系统中的文件。尽管问题不清楚，OP 提到了使用 tensorflow / keras 的分类问题 - 所以答案至少应该关注使用tf. data API 或 keras 中的序列数据生成器的高效数据输入管道。