Microsoft 语音识别：具有置信度分数的替代结果？答案

【问题标题】：Microsoft Speech Recognition: Alternate results with confidence score?Microsoft 语音识别：具有置信度分数的替代结果？
【发布时间】：2013-09-24 06:15:49
【问题描述】：

我是使用 Microsoft.Speech 识别器（使用 Microsoft Speech Platform SDK 版本 11）的新手，我正试图让它从一个简单的语法中输出 n 最佳识别匹配，以及对每个。

根据文档（以及提到的in the answer to this question），应该能够使用e.Result.Alternates 访问除了得分最高的单词之外的识别单词。但是，即使将置信度拒绝阈值重置为 0（这应该意味着什么都不会被拒绝），我仍然只得到一个结果，并且没有替代结果（尽管 SpeechHypothesized 事件表明至少其他单词中的一个似乎是在某些时候以非零置信度识别）。

我的问题：谁能向我解释为什么即使置信拒绝阈值设置为零，我也只能得到一个识别词？如何获得其他可能的匹配项及其置信度分数？我在这里错过了什么？

下面是我的代码。提前感谢任何可以提供帮助的人:)

在下面的示例中，识别器被发送一个单词“news”的 wav 文件，并且必须从相似的单词（“noose”、“newts”）中进行选择。我想提取每个单词的识别器置信度列表（它们都应该不为零），即使它只会返回最好的一个（“新闻”）作为结果。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Speech.Recognition;

namespace SimpleRecognizer
{
    class Program
    {
        static readonly string[] settings = new string[] {
            "CFGConfidenceRejectionThreshold",
            "HighConfidenceThreshold", 
            "NormalConfidenceThreshold",
            "LowConfidenceThreshold"};

        static void Main(string[] args)
        {
            // Create a new SpeechRecognitionEngine instance.
            SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE

            // Configure the input to the recognizer.
            sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav");

            // Display Recognizer Settings (Confidence Thresholds)
            ListSettings(sre);

            // Set Confidence Threshold to Zero (nothing should be rejected)
            sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0);
            sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0);

            // Display New Recognizer Settings
            ListSettings(sre);

            // Build a simple Grammar with three choices
            Choices topics = new Choices();
            topics.Add(new string[] { "news", "newts", "noose" });
            GrammarBuilder gb = new GrammarBuilder();
            gb.Append(topics);
            Grammar g = new Grammar(gb);
            g.Name = "g";

            // Load the Grammar
            sre.LoadGrammar(g);

            // Register handlers for Grammar's SpeechRecognized Events
            g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized);

            // Register a handler for the recognizer's SpeechRecognized event.
            sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

            // Register Handler for SpeechHypothesized
            sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);

            // Start recognition.
            sre.Recognize();

            Console.ReadKey(); //wait to close

        }
        static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name);
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence);
            Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString());
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence);
        }
        private static void ListSettings(SpeechRecognitionEngine recognizer)
        {
            foreach (string setting in settings)
            {
                try
                {
                    object value = recognizer.QueryRecognizerSetting(setting);
                    Console.WriteLine("  {0,-30} = {1}", setting, value);
                }
                catch
                {
                    Console.WriteLine("  {0,-30} is not supported by this recognizer.",
                      setting);
                }
            }
            Console.WriteLine();
        }
    }
}

这给出了以下输出：

Original recognizer settings:
  CFGConfidenceRejectionThreshold = 20
  HighConfidenceThreshold        = 80
  NormalConfidenceThreshold      = 50
  LowConfidenceThreshold         = 20

Updated recognizer settings:
  CFGConfidenceRejectionThreshold = 0
  HighConfidenceThreshold        = 0
  NormalConfidenceThreshold      = 0
  LowConfidenceThreshold         = 0

Speech from grammar g hypothesized: noose, 0.2214646
Speech from grammar g hypothesized: news, 0.640804

Number of Alternates from Grammar g: 1
news, 0.9208503

Speech recognized: news, 0.9208503
Number of Alternates from Recognizer: 1
news, 0.9208503

我还尝试为每个单词使用一个单独的短语（而不是一个具有三个选项的短语），甚至为每个单词/短语使用单独的语法来实现这一点。结果基本相同：只有一个“替代品”。

【问题讨论】：

recognizer.MaxAlternates的值是多少？
MaxAlternates 似乎是 10（我猜默认情况下）。
“我想为每个单词提取识别器的置信度分数列表（它们都应该非零）” - 不一定是这种情况。根据我对 SAPI 引擎端合同的理解，允许引擎从最终识别中删除“不可行”的替代方案。
@EricBrown 谢谢。那么这就是为什么“假设的”识别（如我的例子中的“套索”）似乎在此过程中被抛弃的原因吗？在那种情况下，如何改变引擎的行为来禁用这种修剪或降低修剪阈值（假设它是这样决定的）？这就是我认为"CFGConfidenceRejectionThreshold" 的用途......

标签： .net speech-recognition speech microsoft-speech-platform microsoft-speech-api

【解决方案1】：

我相信这是 SAPI 允许您请求 SR 引擎并不真正支持的东西的另一个地方。

Microsoft.Speech.Recognition 和 System.Speech.Recognition 都使用底层 SAPI 接口来完成它们的工作；唯一的区别是使用哪个 SR 引擎。（Microsoft.Speech.Recognition 使用服务器引擎；System.Speech.Recognition 使用桌面引擎。）

替代主要是为听写设计的，而不是上下文无关的语法。您始终可以为 CFG 获得一个替代项，但替代生成代码看起来不会扩展 CFG 的替代项。

很遗憾，Microsoft.Speech.Recognition 引擎不支持听写。（但是，它确实可以处理质量低得多的音频，并且不需要培训。）

【讨论】：

我明白了，感谢您的澄清。我需要使用 Microsoft.Speech，因为我正在模拟基于电话的对话系统。太糟糕了，替代品是为桌面引擎/听写设计的。