鸡尾酒会-音源分离答案

【问题标题】：Cocktail party - audio source separation鸡尾酒会-音源分离
【发布时间】：2019-01-09 19:27:07
【问题描述】：

我正在尝试解决“鸡尾酒会问题”。

这里有一个video很好地解释和解决了这个问题。

在视频中，他声称一行代码解决了这个问题。所以我得到了他在视频from here 中使用的那些相同的音频文件，并且我包含了他在视频中使用的代码行（第 5 行），但我得到的结果要差得多。我的代码基本上只是以较低的音量输出相同的原始混合音频文件。

这是我在 Octave 中的代码：

[x1, Fs1] = audioread('mixed1.wav');
[x2, Fs2] = audioread('mixed2.wav');
xx = [x1, x2]';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');
a = W*xx;
audiowrite('refined1.wav', a(1,:), Fs1);
audiowrite('refined2.wav', a(2,:), Fs1);

我不明白为什么这不起作用。我的意思是，他实际上在视频中显示它有效，也许不是 100% 准确，但它绝对有效。

我做错了什么以及如何解决？

【问题讨论】：

标签： audio machine-learning octave svd

【解决方案1】：

这里是 Octave 代码演示如何：

混合 2 个声音文件。
再次将它们分开。

# Read original (unmixed) signals.
[o1, Fs1] = audioread('original1.wav');
[o2, Fs2] = audioread('original2.wav');

# Sampling rates Fs1, Fs2 should be equal!

# o Nx2 contains original signals
o = [o1, o2];

# A is a mixing matrix to make a linear combination of the input sounds.
# It can be arbitrarily changed (must be invertible).
A = [.8,.5 ; .1,.4];

# m Nx2 contains mixed signals 
m = o * A;

# Save mixed files
audiowrite('mixed1.wav', m(:, 1), Fs1);
audiowrite('mixed2.wav', m(:, 2), Fs1);

# Uncomment to read your own mixed files.
#[m1, Fs1] = audioread('mymix1.wav');
#[m2, Fs2] = audioread('mymix2.wav');
#m = [m1, m2];

if 0  
    # Precise solution
    # W1 is ideal unmixing matrix
    W1 = inv(A);

    # s Nx2 contains separated signals 
    s = m * W1; 
else
    # Compute W by a magic algo
    # See https://cs.nyu.edu/~roweis/kica.html
    xx = m';

    yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
    [W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');

    ss = W * yy; 

    # Scale down by an empiric value
    s = ss * 0.5;

    # s Nx2 contains separated signals 
    s = s';
end

audiowrite('separated1.wav', s(:, 1), Fs1);
audiowrite('separated2.wav', s(:, 2), Fs1);

很遗憾，它不适用于来自 2 个麦克风的 real audio。

【讨论】：