如何使用 BERT 模型和 TensorflowLite 预测（分类）用户句子答案

【问题标题】：How to predict (classify) user sentence with BERT model and TensorflowLite如何使用 BERT 模型和 TensorflowLite 预测（分类）用户句子
【发布时间】：2021-08-12 12:48:54
【问题描述】：

我正在尝试使用 TFLite Model Maker 训练 MobileBERT 模型；训练部分还可以，测试也可以（我可以使用mb_model.evaluate(mb_test_data)）。

但我完全不知道如何用 Python 预测字符串句子的结果......

这是一个训练示例脚本：

import os
import tensorflow as tf
assert tf.__version__.startswith('2')
from tflite_model_maker import configs
from tflite_model_maker import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker.text_classifier import DataLoader

mb_spec = model_spec.get('mobilebert_classifier')
mb_train_data = DataLoader.from_csv(
    filename=os.path.join(os.path.join(data_dir, 'nlu_train.tsv')),
    text_column='sentence',
    label_column='label',
    model_spec=mb_spec,
    delimiter='\t',
    is_training=True)
mb_test_data = DataLoader.from_csv(
    filename=os.path.join(os.path.join(data_dir, 'nlu_test.tsv')),
    text_column='sentence',
    label_column='label',
    model_spec=mb_spec,
    delimiter='\t',
    is_training=False)
mb_model = text_classifier.create(mb_train_data, model_spec=mb_spec, epochs=30, batch_size=8)
config = configs.QuantizationConfig.for_float16()
config._experimental_new_quantizer = True
mb_model.export(export_dir='/')

它导出/model.tflite

我可以用这样的现有句子进行测试：

import numpy as np
import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path="nlu (6).tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.int32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

但我想使用自定义句子而不是input_data = np.array(np.random.random_sample(input_shape), dtype=np.int32)，例如：

input_data = "My user sentence"
output_data = interpreter.predict(input_data)

有人知道怎么做吗？我没有找到任何文档，TFLite Model Maker（以及官方.nlp.data 存储库上的 BERT）源的反面很难......

我没有找到用于字符串和标记化过程的完整预处理，以获取替换原始句子的 int32 列表：/

谢谢！

【问题讨论】：

标签： python tensorflow-lite bert-language-model

【解决方案1】：

您可以使用BertNLClassifier 进行推理。它将处理预处理和后处理部分。

【讨论】：

但是没有任何 Python API；我可以使用对 C++ API 的外部调用，但我正在寻找 Python 解决方案
我明白了。你可以试试token_ids = mb_spec.preprocess('My user sentence')吗？稍后我们会考虑在model_maker中添加predict方法。
失败：from tflite_model_maker import model_spec mb_spec = model_spec.get('bert_classifier') token_ids = mb_spec.preprocess('Est-ce qu\'il est grand ?') print(token_ids) AttributeError: 'BertClassifierModelSpec' 对象没有属性 'preprocess'
你可以试试这个吗？ from official.nlp.data import classifier_data_lib as libexample = lib.InputExample(guid='0', text_a='Your sentence', label=None)feature = lib.convert_single_example(ex_index=0, example=example, label_list=None, max_seq_length=mb_spec.seq_len, tokenizer=mb_spec.tokenizer)。那么三个输入应该是：feature.input_ids、feature.input_mask 和feature.segment_ids。请查看 input_details 名称以查看三个输入的顺序。
至于第二个错误，正如我所说，有三个输入，请查看 input_details 并映射变量名。映射为：1.feature.segment_ids:'serving_default_input_type_ids:0',2.feature.input_ids:'serving_default_input_word_ids:0',3.feature.input_mask:serving_default_input_mask:0