使用 IBM watson 进行实时 RTP/VOIP/音频呼叫的语音到文本未转换为文本答案

【问题标题】：Speech to text with IBM watson for live RTP/VOIP/Audio call is not converting to text使用 IBM watson 进行实时 RTP/VOIP/音频呼叫的语音到文本未转换为文本
【发布时间】：2018-08-07 01:33:25
【问题描述】：

我正在尝试使用 IBM watson serivce 提供的语音转文本服务，但在将语音转换为文本时遇到了一些问题。

您能帮我解决以下情况吗？

我已经设置了 VOIP(Asterisk/freeswitch) 服务器，其中 A SIP 客户端和 B SIP 客户端已注册，A 呼叫 B，呼叫已建立，他们正在使用 G711 ULAW 编解码器接管电话。

我有连接到 IBM watson 语音到文本的 websocket 应用程序并建立了会话。我收到了来自 watson 服务器的回复为“状态监听”。

现在我正在尝试将原始 rtp 数据包数据从 VOIP 服务器发送到 watson 服务器，但我收到 watson 的“会话超时”错误。

以下我正在使用的配置参数。

因为这是我正在使用的实时 RTP 音频通话 en-US_NarrowbandModel '内容类型':audio/l16;rate=16000

我通过 watson 服务器的 websocket 连接不断发送 RTP 数据包的原始数据。

请帮助我解决此设置的问题。

【问题讨论】：

标签： speech-recognition speech-to-text ibm-watson watson

【解决方案1】：

你好@ram，“我正在尝试发送原始 rtp 数据包数据”是什么意思？ Watson STT 服务不直接支持 RTP 数据包，您需要将其转换为支持的音频格式。您是否将 RTP 数据包转换为 audio/l16;rate=16000，然后再通过 websocket 提供它们？

这是支持的格式列表：

audio/basic (Use only with narrowband models.)
audio/flac
audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Specify the sampling rate of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/#recognize_audio_websockets

【讨论】：