【问题标题】:How to get the word result from audio using Sphinx如何使用 Sphinx 从音频中获取单词结果
【发布时间】:2015-09-25 03:37:24
【问题描述】:

我尝试使用以下代码使用 Sphinx 从音频中获取单词结果,但是它无法获取单词结果,有人可以帮忙吗?

这是 wav 音频:http://download.wavetlan.com/SVV/Media/HTTP/OtherWAV2.wav

 Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

StreamSpeechRecognizer recognizer;
try {
    recognizer = new StreamSpeechRecognizer(configuration);

    recognizer.startRecognition(new FileInputStream("1.wav"));
    SpeechResult result = recognizer.getResult();
    recognizer.stopRecognition();


    // Print utterance string without filler words.
    System.out.println(result.getHypothesis());

    System.out.println("================word result=============="+result.getWords().size());
    // Get individual words and their times.
    for (WordResult r : result.getWords()) {
        System.out.println(r);
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

下面是结果的输出:

19:12:30.264 INFO lexTreeLinguist      Max CI Units 43
19:12:30.264 INFO lexTreeLinguist      Unit table size 79507
19:12:30.273 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.273 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.273 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.273 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.273 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.273 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.294 INFO speedTracker            This  Time Audio: 1.38s  Proc: 0.01s  Speed: 0.00 X real time
19:12:30.295 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.295 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.295 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb
19:12:30.295 INFO trieNgramModel       LM Cache Size: 0 Hits: 0 Misses: 0
19:12:30.314 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.314 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.314 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.314 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.314 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.314 INFO speedTracker         Score                2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Prune                5       0.0000s   0.0000s   0.0000s   0.0000s   0.0000s   
19:12:30.315 INFO speedTracker         Grow                 7       0.0000s   0.0000s   0.0040s   0.0007s   0.0050s   
19:12:30.315 INFO speedTracker         Frontend             2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.315 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.315 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.315 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb

================word result==============0

【问题讨论】:

    标签: java audio cmusphinx


    【解决方案1】:

    音频必须具有以下格式:

    RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
    

    您的音频格式如下:

    RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz
    

    无法使用默认模型解码。该音频也无法转换为正确的格式,因为它的频率低于 16000 Hz,而且它只是 8 位而不是 16 位。您需要确保在解码之前将原始音频转换为正确的格式。

    【讨论】:

    • 另一个问题是如何提高准确率,因为现在只能从音频中检测一个单词?
    • 要获得准确性方面的帮助,您需要提供最新的音频和获得的结果。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-09-22
    • 2011-12-02
    • 1970-01-01
    • 2022-12-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多