将音频流式传输到 DialogFlow 以进行实时意图识别答案

【问题标题】：Streaming audio to DialogFlow for real-time intent recognition将音频流式传输到 DialogFlow 以进行实时意图识别
【发布时间】：2020-02-12 17:58:03
【问题描述】：

我正在尝试将音频从（Pepper 机器人）麦克风流式传输到 DialogFlow。我有用于发送音频块的工作代码。当我发送请求时，响应包含消息None Exception iterating requests!。我以前从音频文件中读取时看到过这个错误。但是，我看不出我现在传递的数据有什么问题。

processRemote 会在麦克风录制某些内容时被调用。将sound_data[0].tostring() 写入 StringIO 并稍后以 4096 字节的块检索它时，该解决方案有效。

self.processing_queue 应该保存一些音频块，在处理新音频之前应该对其进行处理。

self.session_client.streaming_detect_intent(requests) 的响应中出现错误。

感谢任何想法。

    def processRemote(self, nbOfChannels, nbOfSamplesByChannel, timeStamp, inputBuffer):
        """audio stream callback method with simple silence detection"""
        sound_data_interlaced = np.fromstring(str(inputBuffer), dtype=np.int16)
        sound_data = np.reshape(sound_data_interlaced,
                                (nbOfChannels, nbOfSamplesByChannel), 'F')
        peak_value = np.max(sound_data)
        chunk = sound_data[0].tostring()
        self.processing_queue.append(chunk)
        if self.is_active:
            # detect sound
            if peak_value > 6000:
                print("Peak:", peak_value)
                if not self.recordingInProgress:
                    self.startRecording()

            # if recording is in progress we send directly to google
            try:
                if self.recordingInProgress:
                    print("preparing request proc remote")
                    requests = [dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk)]
                    print("should send now")
                    responses = self.session_client.streaming_detect_intent(requests)
                    for response in responses:
                        print("checking response")
                        if len(response.fulfillment_text) != 0:
                            print("response not empty")
                            self.stopRecording(response)  # stop if we already know the intent
            except Exception as e:
                print(e)

    def startRecording(self):
        """init a in memory file object and save the last raw sound buffer to it."""
        # session path setup
        self.session_path = self.session_client.session_path(DIALOG_FLOW_GCP_PROJECT_ID, self.uuid)
        self.recordingInProgress = True
        requests = list()

        # set up streaming
        print("start streaming")
        q_input = dialogflow.types.QueryInput(audio_config=self.audio_config)
        req = dialogflow.types.StreamingDetectIntentRequest(
                        session=self.session_path, query_input=q_input)
        requests.append(req)

        # process pre-recorded audio
        print("work on stored audio")
        for chunk in self.processing_queue:
            print("appending chunk")
            try:
                requests.append(dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk))
            except Exception as e:
                print(e)
        print("getting response")
        responses = self.session_client.streaming_detect_intent(requests)
        print("got response")
        print(responses)

        # iterate though responses from pre-recorded audio
        try:
            for response in responses:
                print("checking response")
                if len(response.fulfillment_text) != 0:
                    print("response not empty")
                    self.stopRecording(response)  # stop if we already know the intent
        except Exception as e:
            print(e)

        # otherwise continue listening
        print("start recording (live)")

    def stopRecording(self, query_result):
        """saves the recording to memory"""
        # stop recording
        self.recordingInProgress = False
        self.disable_google_speech(force=True)
        print("stopped recording")

        # process response
        action = query_result.action
        text = query_result.fulfillment_text.encode("utf-8")
        if (action is not None) or (text is not None):
            if len(text) != 0:
                self.speech.say(text)
            if len(action) != 0:
                parameters = query_result.parameters
                self.execute_action(action, parameters)

【问题讨论】：

标签： google-cloud-platform dialogflow-es google-speech-api pepper

【解决方案1】：

根据source code，session_client.streaming_detect_intent 函数需要一个可迭代对象作为其参数。但你目前正在给它一个请求列表。

不起作用：

requests = [dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk)]
responses = self.session_client.streaming_detect_intent(requests) 
#None Exception iterating requests!

替代方案：

# wrap the list in an iterator
requests = [dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk)]
responses = self.session_client.streaming_detect_intent(iter(requests))

# Note: The example in the source code calls the function like this
# but this gave me the same error
requests = [dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk)]
for response in self.session_client.streaming_detect_intent(requests):
    # process response

使用生成器结构

虽然这修复了错误，但意图检测仍然不起作用。我相信更好的程序结构是使用生成器，如文档中所建议的那样。类似（伪代码）：

def dialogflow_mic_stream_generator():
    # open stream
    audio_stream = ...

    # send configuration request
    query_input = dialogflow.types.QueryInput(audio_config=audio_config)
    yield dialogflow.types.StreamingDetectIntentRequest(session=session_path,
            query_input=query_input)

    # output audio data from stream
    while audio_stream_is_active:
        chunk = audio_stream.read(chunk_size)
        yield dialogflow.types.StreamingDetectIntentRequest(input_audio=chunk)

requests = dialogflow_mic_stream_generator()
responses = session_client.streaming_detect_intent(requests)

for response in responses:
    # process response

【讨论】：