IBM Watson Speech-to-Text“无法转码数据流音频/webm -> 音频/x-float-array”媒体 MIME 类型答案

【问题标题】：IBM Watson Speech-to-Text "unable to transcode data stream audio/webm -> audio/x-float-array" media MIME typesIBM Watson Speech-to-Text“无法转码数据流音频/webm -> 音频/x-float-array”媒体 MIME 类型
【发布时间】：2020-03-18 22:57:36
【问题描述】：

我正在使用 mediaDevices.getUserMedia() 在 Chrome 中录制简短的音频文件（几秒钟），将文件保存到 Firebase Storage，然后尝试将文件发送到 IBM Watson Speech-to-Text。我收到此错误消息：

unable to transcode data stream audio/webm -> audio/x-float-array

在浏览器中我设置了麦克风：

navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(stream => {

var options = {
   audioBitsPerSecond : 128000,
   mimeType : 'audio/webm'
};

const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.start();
...

根据this answerChrome只支持两种媒体类型

audio/webm
audio/webm;codecs=opus

我都试过了。

这是我发送给 IBM Watson 的内容：

curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/webm" \
--data-binary "https://firebasestorage.googleapis.com/v0/b/my-app.appspot.com/my-file" \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"

supported MIME types 的列表包括webm 和webm;codecs=opus。

我尝试录制并发送ogg格式文件，得到了同样的错误信息：

curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/ogg" \
--data-binary @/Users/TDK/LanguageTwo/public/1.ogg \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"

我尝试了 IBM 的示例音频文件，效果很好：

"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "

我收到了来自 Google Cloud Speech-to-Text 的 similar error message。

【问题讨论】：

您知道如何将正确的数据发送到 IBM Watson 吗？
curl -X POST -u "apikey:$apikey" --header "Content-Type: audio/mp3" --data-binary @"$1" "$url/v1/recognize?timestamps =true&max_alternatives=3"

标签： audio ibm-watson speech-to-text getusermedia ibm-cloud-speech

【解决方案1】：

创建一个名为 watsonstt.sh 的 bash 脚本（我建议保存在 ~/bin/ 中），粘贴下面的内容，将 apikey、url 和 savepath 变量内容替换为您自己的并调用脚本作为评论建议，包括单个参数的引号（处理空格）。

在撰写本文时，IBM Watson 云 Web 界面的“管理”选项卡中提供了 API 凭据，您需要使用信用卡/借记卡详细信息进行注册。


#!/bin/bash

# call this script with one argument for posix file path parameter in quotes e.g.: 
# watsonstt.sh "/user/name/file.mp3"

# 500 mins per month for free
# https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/curl.html?curl#get-token

apikey=XXXXXXXXXXXX
url=YYYYYYYYYY
savepath=~/Desktop/${1##*/}.txt

curl -X POST -u "apikey:$apikey" --header "Content-Type: audio/${1##*.}" --data-binary @"$1" "$url/v1/recognize?timestamps=true&max_alternatives=3" -o "${savepath}"

【讨论】：