使用 PyTorch 进行迁移学习 [resnet18]。数据集：犬种识别答案

【问题标题】：transfer learning [resnet18] using PyTorch. Dataset: Dog-Breed-Identification使用 PyTorch 进行迁移学习 [resnet18]。数据集：犬种识别
【发布时间】：2018-05-04 08:35:41
【问题描述】：

我正在尝试在 PyTorch 中实现迁移学习方法。这是我正在使用的数据集：Dog-Breed

这是我要遵循的步骤。

1. Load the data and read csv using pandas.
2. Resize (60, 60) the train images and store them as numpy array.
3. Apply stratification and split the train data into 7:1:2 (train:validation:test)
4. use the resnet18 model and train.

数据集的位置

LABELS_LOCATION = './dataset/labels.csv'
TRAIN_LOCATION = './dataset/train/'
TEST_LOCATION = './dataset/test/'
ROOT_PATH = './dataset/'

读取 CSV (labels.csv)

def read_csv(csvf):
    # print(pandas.read_csv(csvf).values)
    data=pandas.read_csv(csvf).values
    labels_dict = dict(data)
    idz=list(labels_dict.keys())
    clazz=list(labels_dict.values())
    return labels_dict,idz,clazz

我这样做是因为我将在接下来使用 DataLoader 加载数据时提到的一个约束。

def class_hashmap(class_arr):
    uniq_clazz = Counter(class_arr)
    class_dict = {}
    for i, j in enumerate(uniq_clazz):
        class_dict[j] = i
    return class_dict

labels, ids, class_names = read_csv(LABELS_LOCATION)
train_images = os.listdir(TRAIN_LOCATION)
class_numbers = class_hashmap(class_names)

接下来，我使用opencv 将图像大小调整为 60,60，并将结果存储为 numpy 数组。

resize = []
indexed_labels = []
for t_i in train_images:
    # resize.append(transform.resize(io.imread(TRAIN_LOCATION+t_i), (60, 60, 3)))  # (60,60) is the height and widht; 3 is the number of channels
    resize.append(cv2.resize(cv2.imread(TRAIN_LOCATION+t_i), (60, 60)).reshape(3, 60, 60))
    indexed_labels.append(class_numbers[labels[t_i.split('.')[0]]])

resize = np.asarray(resize)
print(resize.shape)

在 indexed_labels 中，我给每个标签一个数字。

接下来，我将数据分成 7:1:2 部分

X = resize  # numpy array of images [training data]
y = np.array(indexed_labels)  # indexed labels for images [training labels]

sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
sss.get_n_splits(X, y)


for train_index, test_index in sss.split(X, y):
    X_temp, X_test = X[train_index], X[test_index]  # split train into train and test [data]
    y_temp, y_test = y[train_index], y[test_index]  # labels

sss = StratifiedShuffleSplit(n_splits=3, test_size=0.123, random_state=0)
sss.get_n_splits(X_temp, y_temp)

for train_index, test_index in sss.split(X_temp, y_temp):
    print("TRAIN:", train_index, "VAL:", test_index)
    X_train, X_val = X[train_index], X[test_index]  # training and validation data
    y_train, y_val = y[train_index], y[test_index]  # training and validation labels

接下来，我将上一步中的数据加载到torch DataLoaders中

batch_size = 500
learning_rate = 0.001

train = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=False)

val = torch.utils.data.TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
val_loader = torch.utils.data.DataLoader(val, batch_size=batch_size, shuffle=False)

test = torch.utils.data.TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=False)

# print(train_loader.size)

dataloaders = {
    'train': train_loader,
    'val': val_loader
}

接下来，我加载预训练的 rensnet 模型。

model_ft = models.resnet18(pretrained=True)

# freeze all model parameters
# for param in model_ft.parameters():
#     param.requires_grad = False

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(class_numbers))

if use_gpu:
    model_ft = model_ft.cuda()
    model_ft.fc = model_ft.fc.cuda()

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

然后我使用 train_model，一种在 PyTorch 的文档中描述的方法 here。

但是，当我运行它时，我得到了一个错误。

Traceback (most recent call last):
  File "/Users/nirvair/Sites/pyTorch/TL.py",
    line 244, in <module>
        num_epochs=25)
      File "/Users/nirvair/Sites/pyTorch/TL.py", line 176, in train_model
        outputs = model(inputs)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/Library/Python/2.7/site-packages/torchvision/models/resnet.py", line 149, in forward
        x = self.avgpool(x)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/pooling.py", line 505, in forward
        self.padding, self.ceil_mode, self.count_include_pad)
      File "/Library/Python/2.7/site-packages/torch/nn/functional.py", line 264, in avg_pool2d
        ceil_mode, count_include_pad)
      File "/Library/Python/2.7/site-packages/torch/nn/_functions/thnn/pooling.py", line 360, in forward
        ctx.ceil_mode, ctx.count_include_pad)
    RuntimeError: Given input size: (512x2x2). Calculated output size: (512x0x0). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/SpatialAveragePooling.c:64

我似乎无法弄清楚这里出了什么问题。

【问题讨论】：

请提及您收到错误的行。
@WasiAhmad 更新了问题

标签： python deep-learning conv-neural-network pytorch resnet

【解决方案1】：

您的网络对于您使用的图像尺寸 (60x60) 来说太深了。如您所知，随着输入图像通过层传播，CNN 层确实会产生越来越小的特征图。这是因为您没有使用填充。

您的错误只是说下一层需要 512 个尺寸为 2 像素 x 2 像素的特征图。从前向传递产生的实际特征图是 512 个大小为 0x0 的图。这种不匹配是触发错误的原因。

通常，所有股票网络，例如 RESNET-18、Inception 等，都要求输入图像的大小为 224x224（至少）。您可以使用torchvision transforms[1] 更轻松地完成此操作。您还可以使用更大的图像尺寸，但 AlexNet 有一个例外，它具有硬编码的特征向量大小，如我在 [2] 中的回答中所述。

额外提示：如果您在 pre-tained 模式下使用网络，则需要使用 [3] 中 pytorch 文档中的参数对数据进行白化。

链接

【讨论】：

我无法使用 Torch 视觉变换，因为在输入模型之前我必须先对数据进行分层。如果我的解决方案有替代方案，您能否提出建议？
您不需要 torchvision 转换来调整图像大小。您已经在使用 OpenCV 执行此操作。您只需要将 (60, 60) 替换为 (224, 224)。我必须承认，pytorch 中没有 sklearn 接口是它最大的缺点之一。如果您想正确执行此操作，我建议您编写自己的数据加载器并包含用于训练测试拆分的参数。如果你这样做，请分享代码。这是一个常见的问题。
将图像大小更改为 224 后，损失函数出现错误。 loss.backward()。它说RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66
这是因为批量大小为 500 对您的 GPU 内存来说太大了。输入图像比您开始使用的 60x60 尺寸大得多。将 batch_size 减小到 16。