预训练变压器模型的配置更改答案

【问题标题】：Config change for a pre-trained transformer model预训练变压器模型的配置更改
【发布时间】：2020-10-17 13:09:22
【问题描述】：

我正在尝试为重整器变压器实施分类头。分类头工作正常，但是当我尝试更改配置参数之一时 - config.axis_pos_shape 即模型的序列长度参数会引发错误；

reformer.embeddings.position_embeddings.weights.0 的大小不匹配：从检查点复制形状为 torch.Size([512, 1, 64]) 的参数，当前模型中的形状为 torch.Size([64, 1 , 64])。 Reformer.embeddings.position_embeddings.weights.1 的尺寸不匹配：从检查点复制形状为 torch.Size([1, 1024, 192]) 的参数，当前模型中的形状为 torch.Size([1, 128, 192] )。

配置：

{
  "architectures": [
    "ReformerForSequenceClassification"
  ],
  "attention_head_size": 64,
  "attention_probs_dropout_prob": 0.1,
  "attn_layers": [
    "local",
    "lsh",
    "local",
    "lsh",
    "local",
    "lsh"
  ],
  "axial_norm_std": 1.0,
  "axial_pos_embds": true,
  "axial_pos_embds_dim": [
    64,
    192
  ],
  "axial_pos_shape": [
    64,
    256
  ],
  "chunk_size_feed_forward": 0,
  "chunk_size_lm_head": 0,
  "eos_token_id": 2,
  "feed_forward_size": 512,
  "hash_seed": null,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.05,
  "hidden_size": 256,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": true,
  "layer_norm_eps": 1e-12,
  "local_attention_probs_dropout_prob": 0.05,
  "local_attn_chunk_length": 64,
  "local_num_chunks_after": 0,
  "local_num_chunks_before": 1,
  "lsh_attention_probs_dropout_prob": 0.0,
  "lsh_attn_chunk_length": 64,
  "lsh_num_chunks_after": 0,
  "lsh_num_chunks_before": 1,
  "max_position_embeddings": 8192,
  "model_type": "reformer",
  "num_attention_heads": 2,
  "num_buckets": [
    64,
    128
  ],
  "num_chunks_after": 0,
  "num_chunks_before": 1,
  "num_hashes": 1,
  "num_hidden_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 100
    }
  },
  "vocab_size": 320
}

Python 代码：

config = ReformerConfig()
config.max_position_embeddings = 8192
config.axial_pos_shape=[64, 128]

#config = ReformerConfig.from_pretrained('./cnp/config.json', output_attention=True)

model = ReformerForSequenceClassification(config)
model.load_state_dict(torch.load("./cnp/pytorch_model.bin"))

【问题讨论】：

您尝试加载与要初始化的模型具有不同层大小的模型。那将不起作用，这就是错误消息告诉您的内容。我没有和改革者合作过，但你可以稍后加载它并调整它的大小。但我不确定这是否会破坏预训练。
@cronoik 我同意你的评论，这就是正在发生的事情。不幸的是，您提供了评论，否则，我会接受这个作为答案。

标签： pytorch huggingface-transformers pre-trained-model

【解决方案1】：

我遇到了同样的问题，尝试在 Reformer 预训练中使用的默认最大序列长度将 65536 (128*512) 的大小减半。

正如@cronoik 提到的，您必须：

加载预训练的改革者
通过删除不必要的权重来调整它的大小
保存这个新模型
加载此新模型以执行您想要的任务

那些不必要的权重来自 Position Embeddings 层。在 Reformer 模型中，使用轴向位置编码策略来学习位置嵌入（而不是像 BERT 那样使用固定的嵌入）。 Axial Position Encodings 以一种内存有效的方式存储位置嵌入，使用两个小张量而不是一个大张量。

但是，位置嵌入的想法保持不变，即为每个位置获取不同的嵌入。

也就是说，理论上（如果我在某处有误解，请纠正我），删除最后一个位置嵌入以匹配您的自定义最大序列长度不应损害性能。您可以参考此post from HuggingFace 以查看有关轴向位置编码的更详细说明，并了解在何处截断您的位置嵌入张量。

我已经通过以下代码调整并使用了自定义最大长度为 32768 (128*256) 的 Reformer：

# Load intial pretrained model
model = ReformerForSequenceClassification.from_pretrained('google/reformer-enwik8', num_labels=2)

# Reshape Axial Position Embeddings layer to match desired max seq length       
model.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model.reformer.embeddings.position_embeddings.weights[1][0][:256])

# Update the config file to match custom max seq length
model.config.axial_pos_shape = 128, 256
model.config.max_position_embeddings = 128*256 # 32768

# Save model with custom max length
output_model_path = "path/to/model"
model.save_pretrained(output_model_path)

【讨论】：