如何在 Python 中放大声音而不失真答案

【问题标题】：How to amplify sounds without distortion in Python如何在 Python 中放大声音而不失真
【发布时间】：2016-04-22 09:23:42
【问题描述】：

我正在尝试对声音文件进行简单的音量调整。我正在使用 python 2.7 和以下

图书馆：

import numpy as np

import scipy.io.wavfile as wv

import matplotlib.pyplot as plt

import pyaudio  

import wave

我尝试了 2 种方法，我试图将声音放大 2 倍，即。 n=2。第一个是从这里改变的动态范围限制器方法 (http://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html)：

def limiter(self, n):

    #best version so far

    signal=self.snd_array

    attack_coeff = 0.01

    framemax=2**15-1

    threshold=framemax

    for i in np.arange(len(signal)):

    #if amplitude value * amplitude gain factor is > threshold set an interval to decrease the amplitude            

        if signal[i]*n > threshold:

            gain=1

            jmin=0

            jmax=0                

            if i-100>0: 

                jmin=i-100

            else:

                jmin=0

            if i+100<len(signal):

                jmax=i+100

            else:

                jmax=len(signal)

            for j in range(jmin,jmax):    

                #target gain is amplitude factor times exponential to smoothly decrease the amp factor (n)

                target_gain = n*np.exp(-10*(j-jmin))

                gain = (gain*attack_coeff + target_gain*(1-attack_coeff))

                signal[j]=signal[j]*gain

        else:

            signal[i] = signal[i]*n

    print max(signal),min(signal)

    plt.figure(3)

    plt.plot(signal)

    return signal

第二种方法是我进行硬膝压缩以将声音值的幅度降低到阈值以上，然后通过幅度增益因子放大整个信号。

def compress(self,n):

     print 'start compress'

     threshold=2**15/n+1000

     #compress all values above the threshold, therefore limiting the audio amplitude range

     for i in np.arange(len(self.snd_array)):         

         if abs(self.snd_array[i])>threshold:

             factor=1+(threshold-abs(self.snd_array[i]))/threshold

         else:

             factor=1.0

     #apply compression factor and amp gain factor (n)

         self.snd_array[i] = self.snd_array[i]*factor*n

     print np.min(self.snd_array),np.max(self.snd_array)

     plt.figure(2)

     plt.plot(self.snd_array,'k')

     return self.snd_array

在这两种方法中，文件听起来都失真了。在幅度接近阈值的点处，音乐听起来削波和噼啪作响。我认为这是因为它在阈值附近“变平”。我尝试在限制器函数中应用指数，但即使我让它非常迅速地减小，它也不能完全消除噼啪声。如果我改变 n=1.5 声音不会失真。如果有人能给我任何关于如何消除噼啪声失真或链接到其他音量调制代码的指示，将不胜感激。

【问题讨论】：

标签： python audio

【解决方案1】：

它可能不是 100% 的主题，但也许这对你来说还是很有趣的。如果您不需要进行实时处理，事情可以变得更容易。限制和动态压缩可以看作是应用动态传递函数。这个函数只是将输入映射到输出值。然后线性函数返回原始音频，“曲线”函数进行压缩或扩展。应用传递函数很简单

import numpy as np
from scipy.interpolate import interp1d
from scipy.io import wavfile

def apply_transfer(signal, transfer, interpolation='linear'):
    constant = np.linspace(-1, 1, len(transfer))
    interpolator = interp1d(constant, transfer, interpolation)
    return interpolator(signal)

限制或压缩只是选择不同传递函数的一种情况：

# hard limiting
def limiter(x, treshold=0.8):
    transfer_len = 1000
    transfer = np.concatenate([ np.repeat(-1, int(((1-treshold)/2)*transfer_len)),
                                np.linspace(-1, 1, int(treshold*transfer_len)),
                                np.repeat(1, int(((1-treshold)/2)*transfer_len)) ])
    return apply_transfer(x, transfer)

# smooth compression: if factor is small, its near linear, the bigger it is the
# stronger the compression
def arctan_compressor(x, factor=2):
    constant = np.linspace(-1, 1, 1000)
    transfer = np.arctan(factor * constant)
    transfer /= np.abs(transfer).max()
    return apply_transfer(x, transfer)

本示例假设输入为 16 位单声道 wav 文件：

sr, x = wavfile.read("input.wav")
x = x / np.abs(x).max() # x scale between -1 and 1

x2 = limiter(x)
x2 = np.int16(x2 * 32767)
wavfile.write("output_limit.wav", sr, x2)

x3 = arctan_compressor(x)
x3 = np.int16(x3 * 32767)
wavfile.write("output_comp.wav", sr, x3)

也许这个干净的离线代码可以帮助您对实时代码进行基准测试。

【讨论】：

感谢您的回复，Frank Zalkow。我实际上不需要实时处理，所以这段代码要快得多。我尝试运行此代码，之后我的输出听起来更加失真。这听起来像是原始声音的“硬摇滚”版本。我在应用限制器和 arctan_compression 转换之前（左）和之后（右）绘制了声音文件。你能解释为什么所有的值都是负数吗？图表在这里：dropbox.com/s/ivzfe1m51cn76c0/Capture.jpg.
参见下面的示例：在应用限制器或 arctan_compressor 之前，将您的音频信号在 -1 和 +1 之间缩放。之后，将其重新调整为最大振幅（16 位文件为 32767）。这有帮助吗？
我从音频文件中读取 x 后发现了问题所在，它需要转换为浮点数，否则它会从 0 变为 -1 而不是从 1 变为 -1。谢谢！
哦，是的！我使用 Python 3，它会自动进行这种转换......对，在 Python 2 上，在缩放之前将其转换为浮点数。
尽管这种方法看起来很合适，但这种压缩（即单个样本幅度的非线性变换）会产生失真的声音，并且远离最大允许幅度。这是由于谐波的性质和我们听到的方式——当使用非线性缩放时，我们确实扭曲了振幅的比例，从而扭曲了波。即使声音不大，音频也会听起来“过载”。您可以使缩放更线性，但您不会压缩太多。使用依赖于本地上下文的系数进行线性缩放会更好。