【问题标题】:Language model/set does not contain </s>语言模型/集不包含 </s>
【发布时间】:2014-09-20 19:12:34
【问题描述】:

我正在使用 PocketSphinx 开发 ASR,并且我已经按照 page 的每一步操作。当我运行pocketsphinx_continous我得到以下错误:

ERROR: "ngram_search.c", line 221: Language model/set does not contain </s>, recognition will fail

我的语言模型包含 /s 标签。

我的语言模型如下:

This is an ARPA-format language model file, generated by CMU Sphinx
\data\
ngram 1=3
ngram 2=1
ngram 3=1

\1-grams:
-0.4770 <s>Alif</s> -0.3010
-0.4770 <s>Baa</s> 0.0000
-0.4770 <s>Jeem</s> 0.0000

\2-grams:
-0.1761 <s>Alif</s> <s>Baa</s> -0.1249

\3-grams:
-0.3010 <s>Alif</s> <s>Baa</s> <s>Jeem</s> 

\end\

制作这个的语料库文件是:

<s> Alif </s>
<s> Baa </s>
<s> Jeem </s>

非常感谢您协助解决此问题。

【问题讨论】:

  • 你可能想共享语言模型,很可能它不包含,你需要更好地检查它。

标签: cmusphinx pocketsphinx-android


【解决方案1】:

当您准备语料库时,&lt;s&gt; 和 Alif 之间没有空格,因此 lm 训练将 &lt;s&gt;Alif&lt;/s&gt; 计为一个单词。你应该有空格,正确的语言模型应该是这样的:

\data\
ngram 1=5
ngram 2=6
ngram 3=0


\1-grams:
-0.3010 </s> 0.0000
-99.0000 <s> -7.3814
-0.7782 Alif -99.0000
-0.7782 Baa -99.0000
-0.7782 Jeem -99.0000

\2-grams:
-0.4771 <s> Alif 0.0000
-0.4771 <s> Baa 0.0000
-0.4771 <s> Jeem 0.0000
0.0000 Alif </s> 0.0000
0.0000 Baa </s> 0.0000
0.0000 Jeem </s> 0.0000

\3-grams:

\end\

这个正确的 LM 有 &lt;/s&gt; 的单独条目

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-01-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多