如何用 pytorch 实现 SciBERT；加载时出错答案

【问题标题】：How to implement SciBERT with pytorch; error while loading如何用 pytorch 实现 SciBERT；加载时出错
【发布时间】：2019-10-12 01:54:53
【问题描述】：

我正在尝试使用SciBERT预训练模型，即：scibert-scivocab-uncased 方式如下：

    !pip install pytorch-pretrained-bert 
    import torch
    from pytorch_pretrained_bert import BertTokenizer, BertModel,      BertForMaskedLM 
    import logging
    import matplotlib.pyplot as plt
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) 
    segments_ids = [1] * len(tokenized_text)
    tokens_tensor = torch.tensor([indexed_tokens])
    segments_tensors = torch.tensor([segments_ids])
    model =      BertModel.from_pretrained('/Users/.../Downloads/scibert_scivocab_uncased-3.tar.gz')

我收到以下错误：

EOFError: Compressed file ended before the end-of-stream marker was reached

我从网站（https://github.com/allenai/scibert）下载了文件
我将它从“tar”转换为gzip

没有任何效果。

关于如何解决这个问题的任何提示？

谢谢！

【问题讨论】：

标签： error-handling neural-network nlp tar word-embedding

【解决方案1】：

在新版本的 pytorch-pretrained-BERT 中，即在转换器中，您可以在解压后执行以下操作来加载预训练模型：

导入 AutoModelForTokenClassification、AutoTokenizer

model = AutoModelForTokenClassification.from_pretrained("/your/local/path/to/scibert_scivocab_uncased")

【讨论】：

【解决方案2】：

需要解压包并将json文件重命名为config.json 然后只需解决解压缩包的文件夹路径名。它应该工作

【讨论】：