语音识别质量极差，尤其是与 Word 相比答案

【问题标题】：SpeechRecogntion quality is extremely poor especially compared to Word语音识别质量极差，尤其是与 Word 相比
【发布时间】：2021-07-15 04:36:14
【问题描述】：

我正在使用 WPF 语音识别库，试图在桌面应用程序中使用它来替代菜单命令。（我想专注于没有键盘的平板电脑体验）。它可以工作 - 有点，除了识别的准确性太差以至于无法使用。所以我试着听写到 Word。 Word 工作得很好。在这两种情况下，我都使用内置笔记本电脑麦克风，并且两个程序都能够同时听到相同的语音（前提是 Word 保持键盘焦点），但 Word 做对了，WPF 做得很糟糕。

我尝试了通用 DictationGrammar() 和小型专用语法，并且尝试了“en-US”和“en-AU”，在所有情况下 Word 都表现良好，而 WPF 表现不佳。即使将 WPF 中的专业语法与 Word 中的一般语法进行比较，WPF 50% 的时间都会出错，例如将“size small”听成“color small”。

    private void InitSpeechRecognition()
    {
        recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

        // Create and load a grammar.  
        if (false)
        {
            GrammarBuilder grammarBuilder = new GrammarBuilder();
            Choices commandChoices = new Choices("weight", "color", "size");
            grammarBuilder.Append(commandChoices);
            Choices valueChoices = new Choices();
            valueChoices.Add("normal", "bold");
            valueChoices.Add("red", "green", "blue");
            valueChoices.Add("small", "medium", "large");
            grammarBuilder.Append(valueChoices);
            recognizer.LoadGrammar(new Grammar(grammarBuilder));
        }
        else
        {
            recognizer.LoadGrammar(new DictationGrammar());
        }

        // Add a handler for the speech recognized event.  
        recognizer.SpeechRecognized +=
                            new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure input to the speech recognizer.  
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition.  
        recognizer.RecognizeAsync(RecognizeMode.Multiple);
    }

Word 的示例结果：

Hello 
make it darker 
I want a brighter colour 
make it reader 
make it greener 
thank you 
make it bluer 
make it more blue
make it darker 
turn on debugging 
turn off debugging 
zoom in 
zoom out

WPF中相同的音频，听写语法：

a lower
make it back
when Ted Brach
making reader
and he
liked the
ethanol and
act out
to be putting
it off the parking
zoom in
and out

我使用 Nuget 获得了程序集。我正在使用运行时版本=v4.0.30319 和版本=4.0.0.0。如果我应该“训练”它，文档没有解释如何做到这一点，而且我不知道训练是否与 Word 等其他程序共享，或者训练保存在哪里。我已经玩了足够长的时间让它知道我的声音。

谁能告诉我我做错了什么？

【问题讨论】：

标签： c# wpf speech-recognition

【解决方案1】：

这是意料之中的。 Word 的听写使用基于云的 AI/ML 辅助语音服务：Azure Cognitive Services - Speech To Text。它正在不断地进行培训和更新，以获得最佳准确性。您可以通过离线并尝试 Word 中的听写功能来轻松测试 - 它不起作用。

.NET 的 System.Speech 使用离线的SAPI5，据我所知，它自 Windows 7 以来一直没有更新。核心技术本身（Windows 95 时代）比今天的手机或基于云的服务上可用的技术要古老得多。 Microsoft.Speech.Recognition 也使用类似的核心，不会好很多 - 尽管您可以尝试一下。

如果您想探索其他离线选项，我建议您尝试Windows.Media.SpeechRecognition。据我所知，它与 Cortana 和其他现代语音识别应用程序在 Windows 8 及更高版本上使用的技术相同，并且不使用 SAPI5。

在线查找 Azure 或 Windows.Media.SpeechRecognition 的示例非常容易，使用后者的最佳方法是将您的应用更新到 .NET 5 并使用 C#/WinRT 访问 UWP API。

【讨论】：

谢谢。我已经试用了 Azure CognitiveServices。设置有点麻烦，但似乎运行良好。我更喜欢离线工作的东西，但如果质量很差，我就不喜欢了。
有没有办法让 Windows.Media.SpeechRecogntion 离线工作以进行连续听写？来自 windows 的示例需要互联网连接。

【解决方案2】：

你最好不要使用DictationGrammar，而是使用带有整个短语或键值分配的特定语法：

private static SpeechRecognitionEngine CreateRecognitionEngine()
{
    var cultureInf = new System.Globalization.CultureInfo("en-US");

    var recoEngine = new SpeechRecognitionEngine(cultureInf);
    recoEngine.SetInputToDefaultAudioDevice();
            
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "weight", new string[] { "normal", "bold", "demibold" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "color", new string[] { "red", "green", "blue" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "size", new string[]{ "small", "medium", "large" }));

    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "", new string[] { "Put whole phrase here", "Put whole phrase here again", "another long phrase" }));

    return recoEngine;
}

static Grammar CreateKeyValuesGrammar(CultureInfo cultureInf, string key, string[] values)
{
    var grBldr = string.IsNullOrWhiteSpace(key) ? new GrammarBuilder() { Culture = cultureInf } : new GrammarBuilder(key) { Culture = cultureInf };
    grBldr.Append(new Choices(values));

    return new Grammar(grBldr);
}

您也可以尝试使用Microsoft.Speech.Recognition 见What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

【讨论】：

【解决方案3】：

由于您实际上是在创建语音用户界面，而不仅仅是进行语音识别，因此您应该查看Speechly。使用 Speechly，创建不需要硬编码命令而是支持多种表达同一事物的方式的自然体验要容易得多。将它集成到您的应用程序中也应该非常简单。头版有一个小codepen，可以基本了解一下。

【讨论】：

Speechly 有 C# 库吗？我可以像使用 Microsoft 库一样使用它吗？

【解决方案4】：

如果每个人都需要使用具有 Cortana 90% 准确率的语音识别引擎，则应遵循以下步骤。

步骤 1) 下载 Nugget 包 Microsoft.Windows.SDK.Contracts

Step 2) 迁移到SDK引用的包 --> https://devblogs.microsoft.com/nuget/migrate-packages-config-to-package-reference/

上述 SDK 将为您提供 Win32 应用程序中的 windows 10 语音识别系统。必须这样做，因为使用此语音识别引擎的唯一方法是构建通用 Windows 平台应用程序。我不建议制作 A.I.通用 Windows 平台中的应用程序，因为它具有沙盒功能。沙盒功能将应用程序隔离在一个容器中，它不允许它与任何硬件进行通信，它还会使文件访问变得非常痛苦，并且无法进行线程管理，只有异步功能。

步骤 3) 在命名空间部分添加这个命名空间。该命名空间包含所有与在线语音识别相关的功能。

using Windows.Media.SpeechRecognition;

第 4 步）添加语音识别实现。

Task.Run(async()=>
{
  try
  {
    
    var speech = new SpeechRecognizer();
    await speech.CompileConstraintsAsync();
    SpeechRecognitionResult result = await speech.RecognizeAsync();
    TextBox1.Text = result.Text;
  }
  catch{}
});

Windows 10 SpeechRecognizer 类中的大多数方法都需要异步调用，这意味着您必须在具有异步参数的 Task.Run(async()=>{}) lambda 函数中运行它们，async方法或异步任务方法。

为此，请转到操作系统中的设置 -> 隐私 -> 语音并检查是否允许在线语音识别。

【讨论】：