【问题标题】:How to find timestamps of a specific sound in .wav file?如何在 .wav 文件中查找特定声音的时间戳?
【发布时间】:2021-05-10 02:04:12
【问题描述】:

我有一个 .wav 文件,我录制了自己的声音并说了几分钟。假设我想找到我在音频中说“Mike”的确切时间。我研究了语音识别并使用 Google Speech API 进行了一些测试,但我得到的时间戳远非准确。

作为替代方案,我录制了一个非常短的 .wav 文件,我只是说“Mike”。我正在尝试比较这两个 .wav 文件,并找到在较长的 .wav 文件中提到“Mike”的每个时间戳。我遇到了SleuthEye's了不起的answer

这段代码可以很好地找到一个时间戳,但我不知道如何找出多个开始/结束时间:

import numpy as np
import sys
from scipy.io import wavfile
from scipy import signal

snippet = sys.argv[1]
source  = sys.argv[2]

# read the sample to look for
rate_snippet, snippet = wavfile.read(snippet);
snippet = np.array(snippet, dtype='float')

# read the source
rate, source = wavfile.read(source);
source = np.array(source, dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
  num = int(np.round(rate*len(snippet)/rate_snippet))
  snippet = signal.resample(snippet, num)

# compute the cross-correlation
z = signal.correlate(source, snippet);

peak = np.argmax(np.abs(z))
start = (peak-len(snippet)+1)/rate
end   = peak/rate

print("start {} end {}".format(start, end))

【问题讨论】:

    标签: python audio scipy signal-processing wav


    【解决方案1】:

    你快到了。您可以使用find_peaks。例如

    import numpy as np
    from scipy.io import wavfile
    from scipy import signal
    import matplotlib.pyplot as plt
    
    snippet = 'snippet.wav'
    source  = 'source.wav'
    
    # read the sample to look for
    rate_snippet, snippet = wavfile.read(snippet);
    snippet = np.array(snippet[:,0], dtype='float')
    
    # read the source
    rate, source = wavfile.read(source);
    source = np.array(source[:,0], dtype='float')
    
    # resample such that both signals are at the same sampling rate (if required)
    if rate != rate_snippet:
        num = int(np.round(rate*len(snippet)/rate_snippet))
        snippet = signal.resample(snippet, num)
    

    我的来源和sn-p

    x_snippet = np.arange(0, snippet.size) / rate_snippet
    
    plt.plot(x_snippet, snippet)
    plt.xlabel('seconds')
    plt.title('snippet')
    

    x_source = np.arange(0, source.size) / rate
    
    plt.plot(x_source, source)
    plt.xlabel('seconds')
    plt.title('source')
    

    现在我们得到相关性

    # compute the cross-correlation
    z = signal.correlate(source, snippet, mode='same')
    

    我使用mode='same' 使sourcez 具有相同的长度

    source.size == z.size
    True
    

    现在,我们可以定义一个最小峰高,例如

    x_z = np.arange(0, z.size) / rate
    
    plt.plot(x_z, z)
    plt.axhline(2e20, color='r')
    plt.title('correlation')
    

    并在最小距离内找到峰值(您可能需要根据您的样本定义自己的 heightdistance

    peaks = signal.find_peaks(
        z,
        height=2e20,
        distance=50000
    )
    
    peaks
    (array([ 117390,  225754,  334405,  449319,  512001,  593854,  750686,
             873026,  942586, 1064083]),
     {'peak_heights': array([8.73666562e+20, 9.32871542e+20, 7.23883305e+20, 9.30772354e+20,
             4.32924341e+20, 9.18323020e+20, 1.12473608e+21, 1.07752019e+21,
             1.12455724e+21, 1.05061734e+21])})
    

    我们走山峰 idxs

    peaks_idxs = peaks[0]
    
    plt.plot(x_z, z)
    plt.plot(x_z[peaks_idxs], z[peaks_idxs], 'or')
    

    由于它们“几乎”在 sn-p 的中间,我们可以这样做

    fig, ax = plt.subplots(figsize=(12, 5))
    plt.plot(x_source, source)
    plt.xlabel('seconds')
    plt.title('source signal and correlatation')
    for i, peak_idx in enumerate(peaks_idxs):
        start = (peak_idx-snippet.size/2) / rate
        center = (peak_idx) / rate
        end   = (peak_idx+snippet.size/2) / rate
        plt.axvline(start,  color='g')
        plt.axvline(center, color='y')
        plt.axvline(end,    color='r')
        print(f"peak {i}: start {start:.2f} end {end:.2f}")
    
    peak 0: start 2.34 end 2.98
    peak 1: start 4.80 end 5.44
    peak 2: start 7.27 end 7.90
    peak 3: start 9.87 end 10.51
    peak 4: start 11.29 end 11.93
    peak 5: start 13.15 end 13.78
    peak 6: start 16.71 end 17.34
    peak 7: start 19.48 end 20.11
    peak 8: start 21.06 end 21.69
    peak 9: start 23.81 end 24.45
    

    但也许有更好的方法来更精确地定义开始和结束。

    【讨论】:

    • 嘿,太好了,谢谢!由于情节而学到了新东西。一个问题是,根据我试图找到时间戳的单词,我可能需要更改阈值,因为虽然有些峰值非常清晰,但其中一些却处于极限,我偶尔会错过峰值。
    • 是的,您必须在 false+ 和 false- 之间做出妥协。也许您可以在处理之前使用一些滤波器来去除噪声并放大信号。
    猜你喜欢
    • 2016-05-05
    • 2020-10-08
    • 2012-01-07
    • 1970-01-01
    • 1970-01-01
    • 2010-11-02
    • 1970-01-01
    • 1970-01-01
    • 2023-03-25
    相关资源
    最近更新 更多