【问题标题】:require_grad = True in pytorch model despite changing require_grad = false for all parameters尽管更改了所有参数的 require_grad = false,但在 pytorch 模型中 require_grad = True
【发布时间】:2021-09-22 22:30:57
【问题描述】:

我正在尝试从 pytorch 中的预训练模型中提取特征,然后使用这些特征进行进一步的训练。 我已导入模型并将所有参数的 require_grad 设置为 false,如下所示:

import torchvision.models as models
vgg_model = models.vgg19_bn(pretrained=True)
for param in vgg_model.parameters():
  param.requires_grad = False

现在,我定义了我的模型,它提取特征然后在其他层上进行如下训练:

class VGGModel(nn.Module):
  def __init__(self):
    '''Input Image Size: (227, 227)'''
    super(VGGModel, self).__init__()
    self.inception = list(model.children())[0]
    # self.inception = incept_model
    self.conv1 = nn.Conv2d(in_channels = 512, out_channels = 128, kernel_size = 5)
    self.dropout = nn.Dropout(0.4)
    self.fc1 = nn.Linear(128, 5)
  def forward(self, x):
    x = self.inception(x)
    x = F.relu(x)
    x = self.conv1(x)
    x = F.relu(x)
    x = F.max_pool2d(x, kernel_size=3)
    x = torch.flatten(x, 1)
    x = self.dropout(x)
    x = self.fc1(x)
    x = F.log_softmax(x, dim=1)
    return x

但是当我检查模型的 require_grad 时,它会将 VGG 层作为需要 require_grad 的层。

model = VGGModel().to(device)
model.requires_grad_

输出:

<bound method Module.requires_grad_ of VGGModel(
  (inception): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): ReLU(inplace=True)
    (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (16): ReLU(inplace=True)
    (17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (19): ReLU(inplace=True)
    (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (22): ReLU(inplace=True)
    (23): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (24): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (25): ReLU(inplace=True)
    (26): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (27): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (28): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (29): ReLU(inplace=True)
    (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (31): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (32): ReLU(inplace=True)
    (33): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (34): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (35): ReLU(inplace=True)
    (36): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (37): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (38): ReLU(inplace=True)
    (39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (42): ReLU(inplace=True)
    (43): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (44): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (45): ReLU(inplace=True)
    (46): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (47): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (48): ReLU(inplace=True)
    (49): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (50): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (51): ReLU(inplace=True)
    (52): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv1): Conv2d(512, 128, kernel_size=(5, 5), stride=(1, 1))
  (dropout): Dropout(p=0.4, inplace=False)
  (fc1): Linear(in_features=128, out_features=5, bias=True)
)>

如何防止预训练模型再次训练?

【问题讨论】:

    标签: pytorch feature-extraction transfer-learning vgg-net pre-trained-model


    【解决方案1】:

    你应该运行这个方法:

    model.requires_grad_(False)
    

    不过,您可能只想冻结部分网络,在您的情况下,您应该更改 fc1 属性:

    model.fc1 = torch.nn.Linear(128, num_classes)
    

    其中 num_classes 是你拥有的类的数量(你至少应该解冻最后一个线性层)。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-03-29
      • 1970-01-01
      • 2021-09-01
      • 2014-03-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-04-15
      相关资源
      最近更新 更多