如何将colab中的音频文件转换为文本？答案

【问题标题】：How to convert an audio file in colab to text?如何将colab中的音频文件转换为文本？
【发布时间】：2021-10-05 11:26:09
【问题描述】：

我正在尝试使用语音识别模块将我在 colab 工作区中的音频文件转换为文本。但它不起作用，因为这里的音频参数需要是音频，我如何将音频文件“audio.wav”加载到某个变量中以传递到那里或只是传递该文件。

import speech_recognition as sr
r = sr.Recognizer()
text = r.recognize_google(audio, language = 'en-IN')
print(text)

【问题讨论】：

标签： python-3.x google-colaboratory

【解决方案1】：

speech_recognition 库有一个读取音频文件的过程。你可以这样做：

inp = sr.AudioFile('path/to/audio/file')
with inp as file:
  audio = r.record(file)

然后将audio 作为第一个参数传递给r.recognize_google()

Here 是了解这个库的好文章。

【讨论】：

我试过了，但我得到了这个错误：ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC;检查文件是否损坏或其他格式。即使我尝试使用 wav 和 mp3。
文件可能已损坏，您可以在系统上打开并收听该文件吗？
谢谢你，但它没有正确转换文本。就像音频说的：“这是什么类型的布料”但它只给出文本“什么”，它似乎在我的 windows 中的 python 环境上工作正常，但在 colab 上却不行。

【解决方案2】：

pip3 install SpeechRecognition pydub

确保当前目录中有包含英语语音的音频文件

import speech_recognition as sr

filename = "16-122828-0002.wav"

以下代码负责加载音频文件，并使用 Google Speech Recognition 将语音转换为文本：

# initialize the recognizer
r = sr.Recognizer()

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

这需要几秒钟才能完成，因为它会将文件上传到 Google 并获取输出

【讨论】：