【发布时间】:2019-02-28 07:38:45
【问题描述】:
我有一个包含 10000 张图像的文件夹和 3 个子文件夹,每个文件夹包含不同数量的图像。我想导入这些图像的一小部分进行训练,每次我想选择一部分数据时我手动选择的有限大小。 我已经有了这个 python 代码:
train_dir = 'folder/train/' # This folder contains 10.000 images and 3 subfolders , each folder contains different number of images
from tqdm import tqdm
def get_data(folder):
"""
Load the data and labels from the given folder.
"""
X = []
y = []
for folderName in os.listdir(folder):
if not folderName.startswith('.'):
if folderName in ['Name1']:
label = 0
elif folderName in ['Name2']:
label = 1
elif folderName in ['Name3']:
label = 2
else:
label = 4
for image_filename in tqdm(os.listdir(folder + folderName)):
img_file = cv2.imread(folder + folderName + '/' + image_filename)
if img_file is not None:
img_file = skimage.transform.resize(img_file, (imageSize, imageSize, 1))
img_arr = np.asarray(img_file)
X.append(img_arr)
y.append(label)
X = np.asarray(X) # Keras only accepts data as numpy arrays
y = np.asarray(y)
return X,y
X_test, y_test= get_data(train_dir)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_test, y_test, test_size=0.2)
我想指定Size 参数,以便我可以选择要导入的图像数量。从每个子文件夹导入的图片数量应该相等
【问题讨论】:
-
看来你需要的是 Keras
ImageDataGenerator类和flow_from_directory。 keras.io/preprocessing/image/#imagedatagenerator-class -
是否可以使用 ImageDataGenerator 指定从文件夹导入的图像数量?如果有,怎么做?
标签: python numpy machine-learning scikit-learn python-import