System.Speech.Recognition 可以使用语音文件作为语法吗？答案

【问题标题】：Can System.Speech.Recognition use voice files as grammar?System.Speech.Recognition 可以使用语音文件作为语法吗？
【发布时间】：2012-09-05 12:29:08
【问题描述】：

我正在 c# .NET Framework 4.0 中制作基于语音的应用程序

我想将语音文件（如 .wav）用作语法而不是字符串，因为我的应用程序将使用非英语语言，并且很难将其转换为英文字符。例如，会有 Khorooj 或 Taghire 'onvan 之类的表达式。并且会有很多问题，比如短语a字母的差异等。所以通过语音文件作为参考会更容易。

我该如何开始？谢谢！

【问题讨论】：

标签： c# speech-recognition

【解决方案1】：

作为一种变体，我建议您使用 Google 语音搜索 (GVS)。
GVS 使用 flac 作为输入音频的音频格式，因此您应该使用 Cuetools 之类的工具将波流转换为 flac

    public static int Wav2Flac(String wavName, string flacName)
    {
        int sampleRate = 0;

        IAudioSource audioSource = new WAVReader(wavName, null);
        AudioBuffer buff = new AudioBuffer(audioSource, 0x10000);

        FlakeWriter flakewriter = new FlakeWriter(flacName, audioSource.PCM);
        sampleRate = audioSource.PCM.SampleRate;            
        FlakeWriter audioDest = flakewriter;
        while (audioSource.Read(buff, -1) != 0)
        {
            audioDest.Write(buff);                
        }
        audioDest.Close();

        audioDest.Close();
        return sampleRate;
  }
  public static String GoogleSpeechRequest(String flacName, int sampleRate)
  {

    WebRequest request = WebRequest.Create("https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=ru-RU");

    request.Method = "POST";

    byte[] byteArray = File.ReadAllBytes(flacName);

    // Set the ContentType property of the WebRequest.
    request.ContentType = "audio/x-flac; rate=" + sampleRate; //"16000";        
    request.ContentLength = byteArray.Length;

    // Get the request stream.
    Stream dataStream = request.GetRequestStream();
    // Write the data to the request stream.
    dataStream.Write(byteArray, 0, byteArray.Length);

    dataStream.Close();

    // Get the response.
    WebResponse response = request.GetResponse();

    dataStream = response.GetResponseStream();
    // Open the stream using a StreamReader for easy access.
    StreamReader reader = new StreamReader(dataStream);
    // Read the content.
    string responseFromServer = reader.ReadToEnd();

    // Clean up the streams.
    reader.Close();
    dataStream.Close();
    response.Close();

    return responseFromServer;
  }

【讨论】：

感谢您的回答！这是非常正确的，但不幸的是，存在一些问题。第一：它需要互联网，第二：如果互联网可用，则需要 VPN，因为谷歌已经关闭了伊朗的 api：（没有更好的主意？
我相信 Google 语音 API 不适合自定义应用程序使用。今天，谷歌没有发布 API，也没有描述任何服务条款。它仅适用于 Chrome 浏览器和 Android 手机。它已经过逆向工程，因此人们已经使用它，但它并不是真正可用于定制使用。欲了解更多信息，请参阅stackoverflow.com/questions/7879804/…。
我知道这篇文章发表已经有一段时间了，但我对代码中使用的变量之一有疑问。 AudioBuffer 变量使用 0x10000，该值是如何确定的？使用我得到 AudioBuffer 格式不匹配的代码，我猜它与该值有关？

【解决方案2】：

您不能将语音文件用作语法。 Microsoft 语音识别引擎需要 format specified by the W3C opens standards body 中的语法。语法并不是语音识别引擎应该理解的所有单词的列表。语法是一组规则，用于对与系统的特定对话的预期响应。另一种说法是语法没有指定语音识别系统将理解的语言。您需要获取语言包并为您要使用的特定语音供应商安装它们。对于 Microsoft，它也可以特定于您使用的操作系统版本。这是languages supported on Vista。您可能需要与其他语音录制供应商合作以支持您想要的语言，例如 Nuance。

【讨论】：

只是在凯文的回答中添加了更多信息。 Microsoft 还提供可在服务器或桌面操作系统上使用的 Microsoft Speech Platform (msdn.microsoft.com/en-us/library/hh361572.aspx)。以下是支持的语言包列表 - microsoft.com/en-us/download/details.aspx?id=27224