【发布时间】:2013-09-24 06:15:49
【问题描述】:
我是使用 Microsoft.Speech 识别器(使用 Microsoft Speech Platform SDK 版本 11)的新手,我正试图让它从一个简单的语法中输出 n 最佳识别匹配,以及对每个。
根据文档(以及提到的in the answer to this question),应该能够使用e.Result.Alternates 访问除了得分最高的单词之外的识别单词。但是,即使将置信度拒绝阈值重置为 0(这应该意味着什么都不会被拒绝),我仍然只得到一个结果,并且没有替代结果(尽管 SpeechHypothesized 事件表明至少其他单词中的一个似乎是在某些时候以非零置信度识别)。
我的问题:谁能向我解释为什么即使置信拒绝阈值设置为零,我也只能得到一个识别词?如何获得其他可能的匹配项及其置信度分数?我在这里错过了什么?
下面是我的代码。提前感谢任何可以提供帮助的人:)
在下面的示例中,识别器被发送一个单词“news”的 wav 文件,并且必须从相似的单词(“noose”、“newts”)中进行选择。我想提取每个单词的识别器置信度列表(它们都应该不为零),即使它只会返回最好的一个(“新闻”)作为结果。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Speech.Recognition;
namespace SimpleRecognizer
{
class Program
{
static readonly string[] settings = new string[] {
"CFGConfidenceRejectionThreshold",
"HighConfidenceThreshold",
"NormalConfidenceThreshold",
"LowConfidenceThreshold"};
static void Main(string[] args)
{
// Create a new SpeechRecognitionEngine instance.
SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE
// Configure the input to the recognizer.
sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav");
// Display Recognizer Settings (Confidence Thresholds)
ListSettings(sre);
// Set Confidence Threshold to Zero (nothing should be rejected)
sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0);
sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0);
sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0);
sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0);
// Display New Recognizer Settings
ListSettings(sre);
// Build a simple Grammar with three choices
Choices topics = new Choices();
topics.Add(new string[] { "news", "newts", "noose" });
GrammarBuilder gb = new GrammarBuilder();
gb.Append(topics);
Grammar g = new Grammar(gb);
g.Name = "g";
// Load the Grammar
sre.LoadGrammar(g);
// Register handlers for Grammar's SpeechRecognized Events
g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized);
// Register a handler for the recognizer's SpeechRecognized event.
sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
// Register Handler for SpeechHypothesized
sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);
// Start recognition.
sre.Recognize();
Console.ReadKey(); //wait to close
}
static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name);
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
}
}
static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence);
Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString());
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
}
}
static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence);
}
private static void ListSettings(SpeechRecognitionEngine recognizer)
{
foreach (string setting in settings)
{
try
{
object value = recognizer.QueryRecognizerSetting(setting);
Console.WriteLine(" {0,-30} = {1}", setting, value);
}
catch
{
Console.WriteLine(" {0,-30} is not supported by this recognizer.",
setting);
}
}
Console.WriteLine();
}
}
}
这给出了以下输出:
Original recognizer settings:
CFGConfidenceRejectionThreshold = 20
HighConfidenceThreshold = 80
NormalConfidenceThreshold = 50
LowConfidenceThreshold = 20
Updated recognizer settings:
CFGConfidenceRejectionThreshold = 0
HighConfidenceThreshold = 0
NormalConfidenceThreshold = 0
LowConfidenceThreshold = 0
Speech from grammar g hypothesized: noose, 0.2214646
Speech from grammar g hypothesized: news, 0.640804
Number of Alternates from Grammar g: 1
news, 0.9208503
Speech recognized: news, 0.9208503
Number of Alternates from Recognizer: 1
news, 0.9208503
我还尝试为每个单词使用一个单独的短语(而不是一个具有三个选项的短语),甚至为每个单词/短语使用单独的语法来实现这一点。结果基本相同:只有一个“替代品”。
【问题讨论】:
-
recognizer.MaxAlternates的值是多少?
-
MaxAlternates 似乎是 10(我猜默认情况下)。
-
“我想为每个单词提取识别器的置信度分数列表(它们都应该非零)” - 不一定是这种情况。根据我对 SAPI 引擎端合同的理解,允许引擎从最终识别中删除“不可行”的替代方案。
-
@EricBrown 谢谢。那么这就是为什么“假设的”识别(如我的例子中的“套索”)似乎在此过程中被抛弃的原因吗?在那种情况下,如何改变引擎的行为来禁用这种修剪或降低修剪阈值(假设它是这样决定的)?这就是我认为
"CFGConfidenceRejectionThreshold"的用途......
标签: .net speech-recognition speech microsoft-speech-platform microsoft-speech-api