【问题标题】:How to detect speech start on iOS Speech API如何在 iOS Speech API 上检测语音开始
【发布时间】:2017-09-25 08:31:01
【问题描述】:

我有一个用 XCode/objective C 开发的 iOS 应用程序。 它使用 iOS Speech API 来处理连续的语音识别。 它正在工作,但我想在语音开始时将麦克风图标变为红色,我还想检测语音何时结束。

我实现了接口 SFSpeechRecognitionTaskDelegate,它给出了回调 onDetectedSpeechStart 和 speechRecognitionTask:didHypothesizeTranscription: 但这些直到处理第一个单词的结尾才发生,而不是在语音的开头。

我想检测语音的开头(或任何噪音)。我认为 installTapOnBus 应该可以:来自 AVAudioPCMBuffer 但我不确定如何检测这是否是静音与可能是语音的噪音。

此外,语音 API 不会在人停止说话时发出事件,即静音检测,它只会记录直到超时。我有一个通过检查最后一次触发事件之间的时间来检测静音的技巧,不确定它们是否是更好的方法。

代码在这里,

    NSError * outError;
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
    [audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
    [audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];

    SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

    if (speechRequest == nil) {
        NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
        return;
    }

    audioEngine = [[AVAudioEngine alloc] init];
    AVAudioInputNode* inputNode = [audioEngine inputNode];

    speechRequest.shouldReportPartialResults = true;

    // iOS speech does not detect end of speech, so must track silence.
    lastSpeechDetected = -1;

    speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];

    [inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
        long millis = [[NSDate date] timeIntervalSince1970] * 1000;
        if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
            lastSpeechDetected = -1;
            [speechTask finish];
            return;
        }
        [speechRequest appendAudioPCMBuffer: buffer];
    }];

    [audioEngine prepare];
    [audioEngine startAndReturnError: &outError];

【问题讨论】:

  • 你试过我的答案了吗?

标签: ios objective-c speech-recognition


【解决方案1】:

我建议使用AVAudioRecorderNSTimer 对电源信号进行低通滤波以进行回调。通过这种方式,您将能够检测到录音机读数何时达到某个阈值,并且低通滤波将有助于减轻噪音。

在.h文件中:

#import <UIKit/UIKit.h>
#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>

@interface ViewController : UIViewController{
    AVAudioRecorder *recorder;
    NSTimer *levelTimer;
    double lowPassResults;
}

- (void)levelTimerCallback:(NSTimer *)timer;
@end

.m 文件中:

#import "ViewController.h"

@interface ViewController ()

@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];

    // AVAudioSession already set in your code, so no need for these 2 lines.
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
    [[AVAudioSession sharedInstance] setActive:YES error:nil];

    NSURL *url = [NSURL fileURLWithPath:@"/dev/null"];

    NSDictionary *settings = [NSDictionary dictionaryWithObjectsAndKeys:
                              [NSNumber numberWithFloat: 44100.0],                 AVSampleRateKey,
                              [NSNumber numberWithInt: kAudioFormatAppleLossless], AVFormatIDKey,
                              [NSNumber numberWithInt: 1],                         AVNumberOfChannelsKey,
                              [NSNumber numberWithInt: AVAudioQualityMax],         AVEncoderAudioQualityKey,
                              nil];

    NSError *error;

    lowPassResults = 0;

    recorder = [[AVAudioRecorder alloc] initWithURL:url settings:settings error:&error];

    if (recorder) {
        [recorder prepareToRecord];
        recorder.meteringEnabled = YES;
        [recorder record];
        levelTimer = [NSTimer scheduledTimerWithTimeInterval: 0.05 target: self selector: @selector(levelTimerCallback:) userInfo: nil repeats: YES];
    } else
        NSLog(@"%@", [error description]);
}


- (void)levelTimerCallback:(NSTimer *)timer {
    [recorder updateMeters];

    const double ALPHA = 0.05;
    double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
    lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;  

    NSLog(@"lowPassResults: %f",lowPassResults);

    // Use here a threshold value to stablish if there is silence or speech
    if (lowPassResults < 0.1) {
        NSLog(@"Silence");
    } else if(lowPassResults > 0.5){
        NSLog(@"Speech");
    }

}


- (void)didReceiveMemoryWarning {
    [super didReceiveMemoryWarning];
    // Dispose of any resources that can be recreated.
}


@end

【讨论】:

    【解决方案2】:

    这是我们最终得到的有效代码。

    关键是安装TapOnBus(),然后是检测音量的魔术代码,

    浮动体积 = fabsf(*buffer.floatChannelData[0]);

    -(void) doActualRecording {
        NSLog(@"doActualRecording");
    
        @try {
        //if (!recording) {
            if (audioEngine != NULL) {
                [audioEngine stop];
                [speechTask cancel];
                AVAudioInputNode* inputNode = [audioEngine inputNode];
                [inputNode removeTapOnBus: 0];
            }
    
            recording = YES;
            micButton.selected = YES;
    
            //NSLog(@"Starting recording...   SFSpeechRecognizer Available? %d", [speechRecognizer isAvailable]);
            NSError * outError;
            //NSLog(@"AUDIO SESSION CATEGORY0: %@", [[AVAudioSession sharedInstance] category]);
            AVAudioSession* audioSession = [AVAudioSession sharedInstance];
            [audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
            [audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
            [audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
    
            SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
            //NSLog(@"AUDIO SESSION CATEGORY1: %@", [[AVAudioSession sharedInstance] category]);
            if (speechRequest == nil) {
                NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
                return;
            }
    
            speechDetectionSamples = 0;
    
            // This some how fixes a crash on iPhone 7
            // Seems like a bug in iOS ARC/lack of gc
            AVAudioEngine* temp = audioEngine;
            audioEngine = [[AVAudioEngine alloc] init];
            AVAudioInputNode* inputNode = [audioEngine inputNode];
    
            speechRequest.shouldReportPartialResults = true;
    
            // iOS speech does not detect end of speech, so must track silence.
            lastSpeechDetected = -1;
    
            speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
    
            [inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
                @try {
                    long long millis = [[NSDate date] timeIntervalSince1970] * 1000;
                    if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
                        lastSpeechDetected = -1;
                        [speechTask finish];
                        return;
                    }
                    [speechRequest appendAudioPCMBuffer: buffer];
    
                    //Calculate volume level
                    if ([buffer floatChannelData] != nil) {
                        float volume = fabsf(*buffer.floatChannelData[0]);
    
                        if (volume >= speechDetectionThreshold) {
                            speechDetectionSamples++;
    
                            if (speechDetectionSamples >= speechDetectionSamplesNeeded) {
    
                                //Need to change mic button image in main thread
                                [[NSOperationQueue mainQueue] addOperationWithBlock:^ {
    
                                    [micButton setImage: [UIImage imageNamed: @"micRecording"] forState: UIControlStateSelected];
    
                                }];
                            }
                        } else {
                            speechDetectionSamples = 0;
                        }
                    }
                }
                @catch (NSException * e) {
                    NSLog(@"Exception: %@", e);
                }
            }];
    
            [audioEngine prepare];
            [audioEngine startAndReturnError: &outError];
            NSLog(@"Error %@", outError);
        //}
        }
        @catch (NSException * e) {
            NSLog(@"Exception: %@", e);
        }
    }
    

    【讨论】:

      【解决方案3】:

      您是否尝试过使用AVCaptureAudioChannel?这是documentation的链接

      您有一个volume 属性,它提供了通道的当前音量(增益)。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2015-04-05
        • 2013-05-30
        • 2023-01-29
        • 2021-08-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-12-19
        相关资源
        最近更新 更多