如何将 PyTorch nn.Module 转换为 HuggingFace PreTrainedModel 对象？答案

【问题标题】：How to convert a PyTorch nn.Module into a HuggingFace PreTrainedModel object?如何将 PyTorch nn.Module 转换为 HuggingFace PreTrainedModel 对象？
【发布时间】：2022-10-18 18:50:03
【问题描述】：

给定 Pytorch 中的一个简单的神经网络，例如：

import torch.nn as nn

net = nn.Sequential(
      nn.Linear(3, 4),
      nn.Sigmoid(),
      nn.Linear(4, 1),
      nn.Sigmoid()
      ).to(device)

如何将其转换为 Huggingface PreTrainedModel 对象？

目标是将 Pytorch nn.Module 对象从 nn.Sequential 转换为 Huggingface PreTrainedModel 对象，然后运行类似：

import torch.nn as nn
from transformers.modeling_utils import PreTrainedModel


net = nn.Sequential(
      nn.Linear(3, 4),
      nn.Sigmoid(),
      nn.Linear(4, 1),
      nn.Sigmoid()
      ).to(device)

# Do something to convert the Pytorch nn.Module to the PreTrainedModel object.
shiny_model = do_some_magic(net, some_args, some_kwargs)

# Save the shiny model that is a `PreTrainedModel` object.
shiny_model.save_pretrained("shiny-model")

PreTrainedModel.from_pretrained("shiny-model")

似乎将任何原生 Pytorch 模型构建/转换为 Huggingface 模型，需要一些配置https://huggingface.co/docs/transformers/main_classes/configuration

有很多方法可以“从头开始”训练模型，例如

[使用 BertLMHeadModel，而不是从头开始]https://www.kaggle.com/code/mojammel/train-model-from-scratch-with-huggingface/notebook（这也是对bert的微调，不是scratch）
[不是真的从头开始，使用 roberta 作为模板]https://huggingface.co/blog/how-to-train（这是来自 roberta 的微调，并不是真正从头开始训练）
[排序使用一些配置模板]https://www.thepythoncode.com/article/pretraining-bert-huggingface-transformers-in-python（这有点从头开始，但是使用 BERT 的模板来生成配置，如果我们想改变模型的工作方式，配置应该是什么样子？）
[Kinda 定义了一个模板，但使用的是 RobertaForMaskedLM]https://skimai.com/roberta-language-model-for-spanish/（这看起来有点像定义了一个模板，但将其限制为 RobertaForMaskedLM 模板）

部分问题：

如果我们有一个更简单的 Pytorch 模型，比如上面的代码 sn-p，如何在 Huggingface 中从头开始创建预训练模型？
如何创建 Huggingface 所需的预训练模型配置，以使从原生 Pytorch nn.Module 的转换工作？

【问题讨论】：

标签： python machine-learning pytorch huggingface-transformers pre-trained-model

【解决方案1】：

一种方法是将模型放在继承自 PreTrainedModel 的类中，例如，它可以是预训练的 resnet34、timm 模型或您的“网络”模型。我建议查看文档以获取有关配置的更多详细信息，我将使用链接中的示例。 https://huggingface.co/docs/transformers/custom_models#sharing-custom-models

配置（注意：您可以添加不同的配置，例如版本，您可以稍后访问 config.json。）

from transformers import PretrainedConfig
from typing import List

class ModelConfig(PretrainedConfig):
    model_type = "mymodel"
    def __init__(
        self,
        version = 1,
        layers: List[int] = [3, 4, 6, 3],
        num_classes: int = 1000,
        input_channels: int = 3,
        stem_type: str = "",
        **kwargs,
    ):
        if stem_type not in ["", "deep", "deep-tiered"]:
            raise ValueError(f"`stem_type` must be '', 'deep' or 'deep-tiered', got {block}.")

        self.version = version
        self.layers = layers
        self.num_classes = num_classes
        self.input_channels = input_channels
        self.stem_type = stem_type
        super().__init__(**kwargs)

正如我所说，您的网络模型可能是 resnet34。

from transformers import PreTrainedModel
from torch import nn
net = nn.Sequential(
      nn.Linear(3, 4),
      nn.Sigmoid(),
      nn.Linear(4, 1),
      nn.Sigmoid()
      ).to('cuda')
      
class MyModel(PreTrainedModel):
    config_class = ModelConfig

    def __init__(self, config):
        super().__init__(config)
        self.model = net
        
    def forward(self, tensor):
        return self.model(tensor)

测试模型

config = ModelConfig()
model = MyModel(config)
dummy_input = torch.randn(1, 3).to('cuda')
with torch.no_grad():
    output = model(dummy_input)
print(output.shape)

推送到hugginface hub（注意：需要使用token登录，可以推送多次更新模型）

model.push_to_hub("mymodel-test")

下载模型（注意：您使用的是 MyModel 类，如果您想创建像 ..bert.modeling_bert.BertModel 这样的模型，我认为您需要使用 lib 结构。）

my_model = MyModel.from_pretrained("User/mymodel-test")

【讨论】：

【解决方案2】：

要创建自定义模型，您需要定义自定义配置和自定义模型类。在这些类中定义属性model_type 和config_class 很重要：

import torch.nn as nn
from transformers import PreTrainedModel, PretrainedConfig
from transformers import AutoModel, AutoConfig

class MyConfig(PretrainedConfig):
    model_type = 'mymodel'
    def __init__(self, important_param=42, **kwargs):
        super().__init__(**kwargs)
        self.important_param = important_param

class MyModel(PreTrainedModel):
    config_class = MyConfig
    def __init__(self, config):
        super().__init__(config)
        self.config = config
        self.model = nn.Sequential(
                          nn.Linear(3, self.config.important_param),
                          nn.Sigmoid(),
                          nn.Linear(self.config.important_param, 1),
                          nn.Sigmoid()
                          )
    def forward(self, input):
        return self.model(input)

现在你可以运行

config = MyConfig()
model = MyModel(config)
model.save_pretrained('./my_model_dir')

new_model = MyModel.from_pretrained('./my_model_dir')
new_model

如果你想使用AutoModel，你必须注册你的课程：

AutoConfig.register("mymodel", MyConfig)
AutoModel.register(MyConfig, MyModel)

new_model = AutoModel.from_pretrained('./my_model_dir')
new_model

【讨论】：