Tensorflow Tokenizer 恢复 fit_on_sequences答案

【问题标题】：Tensorflow Tokenizer revert fit_on_sequencesTensorflow Tokenizer 恢复 fit_on_sequences
【发布时间】：2021-09-29 07:20:05
【问题描述】：

Tensorflow Tokenizer 将文本标记化并编码为机器可读的向量。首先我们在大量文本上调用fit_on_texts 来构建字典，然后我们在输入文本上调用fit_on_sequences 来构建相应的向量编码。

What does Keras Tokenizer method exactly do?

但是，似乎没有内置的反向操作方法，用于根据字典从数字向量中检索文本。

在 Python 中可以实现类似的东西

 # map predicted word index to word
 out_word=''
 for word, index in tokenizer.word_index.items():
     if index==yhat:
         out_word=word
         break

有没有一种从数字中检索文本的好方法，换句话说，有没有内置的 fit_to_sequences 反向操作？

【问题讨论】：

你在 Tokenizer texts_to_sequences 中有两个主要的方法来获取序列和sequence_to_texts 来做逆向，我不知道你所说的反向 fit_to_sequences 是什么意思

标签： python tensorflow machine-learning encoding embedding

【解决方案1】：

有可用于从数值向量中检索文本的内置方法。

例如检查下面的代码：

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer

Sentences=["Life is Beautiful"]

tokenizer= Tokenizer(num_words= 30)
tokenizer.fit_on_texts(Sentences)

word_index=tokenizer.word_index
print("Word Index: ", word_index)

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
print("Reversed Word Index:", reverse_word_index)

seq = tokenizer.texts_to_sequences(Sentences)
print("Texts to Numbers:",seq)

seq_to_wrd=tokenizer.sequences_to_texts(seq)
print("Numbers to Texts:",seq_to_wrd)

输出：

Word Index:  {'life': 1, 'is': 2, 'beautiful': 3}
Reversed Word Index: {1: 'life', 2: 'is', 3: 'beautiful'}
Texts to Numbers: [[1, 2, 3]]
Numbers to Texts: ['life is beautiful']

查看此link 以查找更多Tokenizer 内置函数。

【讨论】：