Azure 语音转文本忽略数字答案

【问题标题】：Azure speech-to-text ignores numbersAzure 语音转文本忽略数字
【发布时间】：2023-01-31 22:27:57
【问题描述】：

我正在使用 azure speech to text 来查找 wav 文件中话语的时间戳。

我遇到的问题是，如果用户记录了数字，例如“我要数到三。一、二、三，我来了”。输出中省略了数字。英语和其他语言都会发生这种情况。我可以理解省略“eh”和“ah”之类的话语，但是数字？为什么这是默认值。

我正在使用：

speechConfig.OutputFormat = OutputFormat.Detailed;
默认语言模型。

我可以以某种方式配置 SpeechRecognizer 以使其也输出数字吗？

【问题讨论】：

标签： azure speech-recognition speech-to-text

【解决方案1】：

因此，使用以下代码，我能够将 .wav 音频文件转换为文本而不会丢失数据。

 string speechKey = "<Your_Key>";
 string speechRegion = "Your_Region";
 
 var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
        
speechConfig.SpeechRecognitionLanguage = "en-US";

using var audioConfig = AudioConfig.FromWavFileInput("<Path to File>");

using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

        
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
       
Console.WriteLine(speechRecognitionResult.Text);

输出：

但显然在转换模型中存在一个错误，如果在I'm going to count to three. 和One, two, three, here I come 之间有一个暂停。该模型将从音频文件中省略 One, two, three, here I come 句子。
此外，我在MSDOC 音频配置类中找不到任何内容来配置有关此问题的音频设置。

【讨论】：

您遇到的“错误”是因为您使用的是 RecognizeOnceAsync。它只输出第一个“话语”，因此如果音频中有停顿，它将停止报告停顿后可以识别的单词。我也发现这非常具有误导性。

【解决方案2】：

我找到了我的结果无法识别数字的原因。它在我自己的代码中。在我的后处理中，我试图从结果中去除标点符号。在这里我也不小心去掉了数字。

【讨论】：