【问题标题】:Parsing the Hugging Face Transformer Output解析拥抱脸转换器的输出
【发布时间】:2021-04-01 19:17:33
【问题描述】:

我希望使用这里提到的bert-english-uncased-finetuned-pos 转换器

https://huggingface.co/vblagoje/bert-english-uncased-finetuned-pos?text=My+name+is+Clara+and+I+live+in+Berkeley%2C+California.

我是这样查询变压器的……

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")

model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")

text = "My name is Clara and I live in Berkeley, California."
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
outputs = model(input_ids)

但是outputs 即将到来

(张量([[[-1.8196e+00, -1.9783e+00, -1.7416e+00, 1.2082e+00, -7.0337e-02, -7.0322e-03、3.4300e-01、-9.6914e-01、-1.3546e+00、7.7266e-03、 3.7128e+00、-3.4061e-01、4.8385e+00、-1.2548e+00、-5.1845e-01、 7.0140e-01, 1.0394e+00],
[-1.2702e+00,-1.5518e+00,-1.1553e+00,-4.4077e-01,-9.8661e-01, -3.2680e-01, -6.5338e-01, -3.9779e-01, -7.5383e-01, -1.2677e+00, 9.6353e+00、1.9938e-01、-1.0282e+00、-7.5071e-01、-1.0307e+00、 -8.0589e-01, 4.2073e-01],
[-9.6988e-01,-5.0090e-01,-1.3858e+00,-1.0554e+00,-1.4040e+00, -7.5977e-01, -7.4156e-01, 8.0594e+00, -5.1854e-01, -1.9098e+00, -1.6362e-02、1.0594e+00、-8.4962e-01、-1.7415e+00、-1.0628e+00、 -1.7485e-01, -1.1490e+00],
[-1.4368e+00, -1.6313e-01, -1.3202e+00, 8.7465e+00, -1.3782e+00, -9.8889e-01, -1.1371e+00, -1.0917e+00, -9.8495e-01, -9.3237e-01, -9.6111e-01, -4.1658e-01, -7.3133e-01, -9.6004e-01, -9.5337e-01, 3.1836e+00, -8.3462e-01],
[-7.9476e-01,-7.9640e-01,-9.0027e-01,-6.9506e-01,-8.9706e-01, -6.9383e-01, -3.1590e-01, 1.2390e+00, -1.0443e+00, -9.9977e-01, -8.8189e-01、8.7941e+00、-9.9445e-01、-1.2076e+00、-1.1424e+00、 -9.7801e-01, 5.6683e-01],
[-8.2837e-01,-5.5060e-01,-2.1352e-01,-8.8721e-01,9.5536e+00, 1.0478e+00, -5.6208e-01, -7.1037e-01, -7.0248e-01, 1.1298e-01

...

-7.3788e-01, 4.3640e-03, 1.6994e+00, 1.1528e-01, -1.0983e+00, -8.9202e-01, -1.2869e+00, 4.9141e+00, -6.2096e-01, 4.8374e+00, 3.2384e-01, 4.6213e-01],
[-1.3622e+00, 2.0772e+00, -1.6680e+00, -8.8679e-01, -8.6959e-01, -1.7468e+00, -1.1424e+00, 1.6996e+00, 3.5800e-01, -4.3927e-01, -3.6129e-01、-4.2220e-01、-1.7912e+00、8.0154e-01、7.4594e-01、 -1.0620e+00, 3.8152e+00],
[-1.2889e+00,-2.9379e-01,-1.6543e+00,-4.3326e-01,-2.4919e-01, -4.0112e-01、-4.4255e-01、2.2697e-01、-4.6042e-01、-3.7862e-03、 -6.3061e-01, -1.3280e+00, 8.5533e+00, -4.6881e-01, 2.3882e+00, 2.4533e-01, -1.4095e-01],
[-9.5640e-01,-5.7213e-01,-1.0245e+00,-5.3566e-01,-1.5287e-01, -6.6977e-01、-5.3392e-01、-3.1967e-02、-7.3077e-01、-3.1048e-01、 -7.2973e-01、-3.1701e-01、1.0196e+01、-5.2346e-01、4.0820e-01、 -2.1350e-01, 1.0340e+00]]], grad_fn=),)

但根据文档,我希望输出为 JSON 格式...

[{ "entity_group": "代号", “分数”:0.9994694590568542, “单词”:“我的”},{ “实体组”:“名词”, “分数”:0.997125506401062, “单词”:“名称”},{ “实体组”:“辅助”, “分数”:0.9938186407089233, “字”:“是”},{ "entity_group": "PROPN", “分数”:0.9983252882957458, “单词”:“克拉拉”},{ “实体组”:“CCONJ”, “分数”:0.9991229772567749, “单词”:“和” }, { "entity_group": "代号", “分数”:0.9994894862174988, “单词”:“我”},{ “实体组”:“动词”, “分数”:0.9983153939247131, “字”:“活”},{ “实体组”:“ADP”, “分数”:0.999370276927948, “字”:“在”},{ "entity_group": "PROPN", “分数”:0.9987357258796692, “单词”:“伯克利”},{ "entity_group": "PUNCT", “分数”:0.9996636509895325, “单词”: ”,” }, { "entity_group": "PROPN", “分数”:0.9985638856887817, “单词”:“加利福尼亚”},{ "entity_group": "PUNCT", “分数”:0.9996631145477295, “单词”: ”。” } ]

我做错了什么?如何将当前输出解析为所需的 JSON 输出?

【问题讨论】:

    标签: huggingface-transformers huggingface-tokenizers


    【解决方案1】:

    您看到的是来自 huggingface 的专有推理 API。此 API 不是转换器库的一部分,但您可以构建类似的东西。您只需要Tokenclassificationpipeline:

    from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline
    
    tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
    
    model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
    p = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
    p('My name is Clara and I live in Berkeley, California.')
    

    输出:

    [{'word': 'my', 'score': 0.9994694590568542, 'entity': 'PRON', 'index': 1},
     {'word': 'name', 'score': 0.9971255660057068, 'entity': 'NOUN', 'index': 2},
     {'word': 'is', 'score': 0.9938186407089233, 'entity': 'AUX', 'index': 3},
     {'word': 'clara', 'score': 0.9983252882957458, 'entity': 'PROPN', 'index': 4},
     {'word': 'and', 'score': 0.9991229772567749, 'entity': 'CCONJ', 'index': 5},
     {'word': 'i', 'score': 0.9994894862174988, 'entity': 'PRON', 'index': 6},
     {'word': 'live', 'score': 0.9983154535293579, 'entity': 'VERB', 'index': 7},
     {'word': 'in', 'score': 0.999370276927948, 'entity': 'ADP', 'index': 8},
     {'word': 'berkeley',
      'score': 0.9987357258796692,
      'entity': 'PROPN',
      'index': 9},
     {'word': ',', 'score': 0.9996636509895325, 'entity': 'PUNCT', 'index': 10},
     {'word': 'california',
      'score': 0.9985638856887817,
      'entity': 'PROPN',
      'index': 11},
     {'word': '.', 'score': 0.9996631145477295, 'entity': 'PUNCT', 'index': 12}]
    

    您可以找到推理 API here 可能使用的其他可用管道。

    【讨论】:

    • 谢谢,这行得通。是否有任何文档说明如何使用其他类似的专有模型?如果你能包含它会很好,对每个人都有帮助。
    • @red-devil 我添加了指向其他管道文档的链接。
    • 对不起,我错过了。正如您评论的那样实现。
    猜你喜欢
    • 2021-03-30
    • 1970-01-01
    • 1970-01-01
    • 2020-11-26
    • 1970-01-01
    • 2020-09-01
    • 2021-08-10
    • 2020-08-29
    • 2021-09-04
    相关资源
    最近更新 更多