IBM Watson STT：如何使用具有多个块的 Websocket 接口？答案

【问题标题】：IBM Watson STT: How to use Websocket interface with multiple chunks?IBM Watson STT：如何使用具有多个块的 Websocket 接口？
【发布时间】：2019-05-27 06:44:51
【问题描述】：

我已经使用另一个 API 和 IBM Watson Speech to Text 服务 API 在 c++ 中开发了一个流式语音识别应用程序。

在这两个程序中，我使用的是包含此音频的同一文件

周日，一连串严重的雷暴席卷科罗拉多州，导致几起龙卷风降落

此文件的大小为 641,680 字节，我一次将 100,000 字节（最大） 块发送到 Speech to text 服务器。

现在，使用其他 API，我可以将所有内容作为一个整体进行识别。使用 IBM Watson API 我做不到。这是我所做的：

连接到 IBM Watson Web 服务器（语音转文本 API）
发送起始帧{"action":"start","content-type":"audio/mulaw;rate=8000"}
发送二进制 100,000 字节
发送停止帧{"action":"stop"}
...重复二进制并停止直到最后一个字节。

IBM Watson Speech API 只能单独识别块
例如

几场龙卷风降落
一道强雷
席卷科罗拉多
周日

这似乎是单个块的输出以及块划分之间的单词（例如，这里的“thunderstorm”部分出现在块的末尾，部分出现在下一个块的开头) 因此被错误识别或丢弃。

我做错了什么？

编辑（我正在使用 c++ 和 websocket 接口的 boost 库）

//Do the websocket handshake 
void IbmWebsocketSession::on_ssl_handshake(beast::error_code ec) {

    auto mToken = mSttServiceObject->GetToken(); // Get the authentication token

    //Complete the websocket handshake and call back the "send_start" function
    mWebSocket.async_handshake_ex(mHost, mUrlEndpoint, [mToken](request_type& reqHead) {reqHead.insert(http::field::authorization,mToken);},
            bind(&IbmWebsocketSession::send_start, shared_from_this(), placeholders::_1));
}

//Sent the start frame
void IbmWebsocketSession::send_start(beast::error_code ec) {

    //Send the START_FRAME and call back the "read_resp" function to receive the "state: listening" message
    mWebSocket.async_write(net::buffer(START_FRAME),
            bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}

//Sent the binary data
void IbmWebsocketSession::send_binary(beast::error_code ec) {

    streamsize bytes_read = mFilestream.rdbuf()->sgetn(&chunk[0], chunk.size()); //gets the binary data chunks from a file (which is being written at run time

    // Send binary data
    if (bytes_read > mcMinsize) {  //Minimum size defined by IBM  is 100 bytes.
                                   // If chunk size is greater than 100 bytes, then send the data and then callback "send_stop" function
        mWebSocket.binary(true);

        /**********************************************************************
         *  Wait a second before writing the next chunk.
         **********************************************************************/
        this_thread::sleep_for(chrono::seconds(1));

        mWebSocket.async_write(net::buffer(&chunk[0], bytes_read),
                bind(&IbmWebsocketSession::send_stop, shared_from_this(), placeholders::_1));
    } else {                     //If chunk size is less than 100 bytes, then DO NOT send the data only call "send_stop" function
        shared_from_this()->send_stop(ec);
    }

}

void IbmWebsocketSession::send_stop(beast::error_code ec) {

    mWebSocket.binary(false);
    /*****************************************************************
     * Send the Stop message
     *****************************************************************/
    mWebSocket.async_write(net::buffer(mTextStop),
            bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}

void IbmWebsocketSession::read_resp(beast::error_code ec, size_t bytes_transferred) {
    boost::ignore_unused(bytes_transferred);
        if(mWebSocket.is_open())
        {
            // Read the websocket response and call back the "display_buffer" function
            mWebSocket.async_read(mBuffer, bind(&IbmWebsocketSession::display_buffer, shared_from_this(),placeholders::_1));
        }
        else
            cerr << "Error: " << e->what() << endl;

}

void IbmWebsocketSession::display_buffer(beast::error_code ec) {

    /*****************************************************************
     * Get the buffer into stringstream
     *****************************************************************/
    msWebsocketResponse << beast::buffers(mBuffer.data());

    mResponseTranscriptIBM = ParseTranscript(); //Parse the response transcript

    mBuffer.consume(mBuffer.size()); //Clear the websocket buffer

    if ("Listening" == mResponseTranscriptIBM && true != mSttServiceObject->IsGstFileWriteDone()) { // IsGstFileWriteDone -> checks if the user has stopped speaking
        shared_from_this()->send_binary(ec);
    } else {
        shared_from_this()->close_websocket(ec, 0);
    }
}

【问题讨论】：

显示你用过的代码——比文字更好
@data_henrik 好吧！我可以分享代码，但我认为这不是编码问题。我的猜测是，要么这是 IBM API 的功能，要么我 在逻辑上 做错了什么。尽管我必须进行一些更改以遵循我组织的政策。所以编辑可能需要一些时间。

标签： ibm-cloud speech-recognition ibm-watson speech-to-text

【解决方案1】：

IBM Watson Speech to Text has several APIs 发送音频和接收转录文本。根据您的描述，您似乎使用的是WebSocket Interface。

对于 WebSocket 接口，you would open the connection (start), then send individual chunks of data, and - once everything has been transmitted - stop the recognition request。

您尚未共享代码，但您似乎正在启动和停止对每个块的请求。仅在最后一个块之后停止。

我建议您查看包含不同语言示例的 API 文档。 The Node.js sample shows how to register for events。 GitHub 上也有类似 WebSocket API with Python 的示例。这里是another one that shows the chunking。

【讨论】：

好吧！我如何将其用作实时转录？我正在关注the Second example exchange。发送连续的块。
嘿@data_henrik 我已经添加了代码。它在 C++ 中。我已经用童谣示例检查了 python 代码。我不确定我是否完全理解它，但似乎数据是以块的形式发送的（true），但所有数据都是一次发送的，然后在最后发送“停止”帧。 PS:- 我不是 python 专业人士
代码没有显示驱动程序或流程，只有一些函数定义。
流程是我在原始问题中提到的方式。 ...START FRAME >> binary data >> STOP FRAME >> binary data >> STOP FRAME >> binary data >> ... >> STOP FRAME 仅在开始和每个停止帧之后我正在阅读 websocket 响应
我不确定你剩下的问题是什么。流程是错误的，但你没有显示它。

【解决方案2】：

@data_henrik 正确，流程错误，应该是： ...START FRAME >> binary data >> binary data >> binary data >> ... >> STOP FRAME

您只需要在没有更多音频块要发送时发送{"action":"stop"} 消息

【讨论】：