【问题标题】:Faster multiple matrix value assignments in MATLABMATLAB 中更快的多个矩阵值分配
【发布时间】:2017-03-12 10:54:21
【问题描述】:

我想计算包含NaN 的大矩阵(数十行数千列)每一列的norm。在计算norm 之前,每列都减去其平均值。所有NaN 值都被视为0。因此我将它们执行为

nanix = isnan(X);
nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
X(nanix) = 0;
X = bsxfun(@minus, X, sum(X)./nx);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));

我认为这很有效,除了将所有 NaN 值分配为 0 的两行。profile 表明这两行花费了所有计算的 50% 以上。使用到 2000 年大小为 70 的数据矩阵,超过 10 s 用于运行分配 10,000 次。有什么建议吗?

===============

根据要求,测试功能可以是:

%%
function test
    a = randn(80,3000);
    [r,c] = size(a);
    b = randperm(r*c);
    nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
    a(nanix) = NaN;
    nnx = sum(~isnan(a));
    tic;
    for i = 1:1000
        t=a;
        t(nanix)=0;
        tm = sum(t)./nnx;
        t = bsxfun(@minus, t, tm);
        t(nanix) = 0;
        tnorm = sqrt(sum(t.^2));
    end
    tt = toc;
    fprintf('time: %.4f',tt);
end

输出是

>> test
time: 3.4734

profile 显示第一个t(nanix)=0; 花费了总运行时间的 42%,第二个花费了 16.3%。

【问题讨论】:

  • 如果你有统计和机器学习工具箱,你可以使用nanmean
  • @beaker 已测试。 6.772 s 返回,比我的慢近 2 倍。由于mean 函数以及nanmean 执行大量检查,然后使用sum 计算平均值,因此预计我使用的方式会更快。
  • 一个加速,如果你有 2016b:你可以直接用 centeredA = A - mean(A,'omitnan'); 代替 bsxfun 表示中心。此外,与其将NaN 值设置为零,不如不通过逻辑索引将它们包括在内:sum(t(~ nanix))。我很快就可以试试了。
  • @Jon t(~ nanix) 会将索引子矩阵复制到新矩阵,并且很可能没有写时复制。在我的测试中,使用这种方法比 OP 慢 50%。
  • 如果@Jon的方法可以测试就好了。我以前没见过这样的关于子索引性能的讨论。

标签: matlab matrix


【解决方案1】:

更新:我已将 OP 的方式纳入比较,并处理了几个错误。


您正在寻找向量的范数,它是向量减去另一个特殊向量的结果(与原始向量中的 ~nan 元素相对应的元素的常数值)。

这里我使用平方律 (a-b)^2 = a^2 - 2ab + b^2,其中 a 和 b 是向量。 这将避免一次零分配以及奇点扩展。

另外,根据@Elkan,对于向量 a 及其平均值 b,sum((a-b)^2) = sum(a ^2)-2*sum(a)*b+n * b^2 = sum(a ^2)-2*n * b* b+n* b^2 = sum(a ^2)-n*b^2,其中 n 是用于计算均值的非零点数。

这两种方法的关键是避免评估居中向量a-b,这需要第二个X(nanix) = 0;

根据 Profiler,最耗时的行是:

  1. X(nanix) = 0; (~>30%)

  2. X2 = sum(X.^2); (~10%)

  3. bsxfun(令人惊讶地~10%)


正如@Jon 在 cmets 中提到的,X(~nanix) 会将所有非 nan 数字作为所需的输入提取出来。然而,这个操作需要一个内存复制,这需要相当多的时间。更重要的是,由于 nan 的数量在所有列向量中并不一致,因此很难对过程进行向量化(必须使用 for 循环来处理每一列,或者促进更慢的事情,例如 cellfun)。


完整的测试代码:

clear;clc;close all

a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
nnx = sum(~isnan(a));
clearvars -except a
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm = sqrt(sum(X.^2));
end
toc
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    Xmean = Xsum./nx;
    X2 = sum(X.^2);
    Xmean2 = Xmean.^2.*nx;
    XXmean = Xsum.*Xmean;
    xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm2)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    X2sum = sum(X.^2); % 50% time consumed here
    xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm3)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    X = X(~nanix);
    bsxminus = mean(X);
    X = bsxfun(@minus, X, bsxminus);
    xnorm4 = sqrt(sum(X.^2));
end
toc
norm(abs(xnorm-xnorm4)./xnorm) % I can't think of a working way

输出:

Elapsed time is 6.326877 seconds.
Elapsed time is 3.780087 seconds.

ans =

   8.8214e-15

Elapsed time is 3.690037 seconds.

ans =

   8.8283e-15

Elapsed time is 3.632071 seconds.

ans =

   3.0369e+03

>> 

可以看出,前两种方法的速度相似,而我确实观察到第二种方法的时间较短。这是因为缺少计算与mean(X) 相关的那些事情可能会节省非常少的时间。

同时,第三个不能给出正确答案;问题出在索引命令X = X(~nanix);

【讨论】:

  • 干得好!用norm(a-b)=sqrt(sum(a.^2)-nx.*Xmean.^2 进一步简化怎么样?这是因为对于具有平均 b 的向量 a,sum((a-b)^2) = sum( a ^2)-2*sum(a)*b+nb^2 = sum(a ^2)- 2*nbb+nb^2 = sum(a ^2)-n*b^2,其中n是非零的个数计算平均值的点。明天测试。
  • 这似乎不是一个好的测试(不是说方法不快),因为你没有清除X,所以当你到达时没有NaNs替代方法,可能保存a,并在clearvars 之后设置X。我错过了什么吗?
  • @Jon 啊,我的错。稍后会重做。
  • @Yvon 感谢您的帮助。我已经测试了所有方法,简化版本,即使用获得xnorm3的版本并没有提高性能。但是我认为如果X在计算过程中没有改变,就像你提出的版本一样,X=aX(nanix)=0的代码可以在for循环之前运行。我试过这个,结果表明获得了显着的改进,从 3.5829 到 0.8288 s。所以使用我的 MATLAB (2015a),作业仍然花费最多的时间,显然超过 50%。
  • X=a 部分有点棘手。正如@Jon 指出的那样,我没有在每个循环中清除X,因此在每次迭代中没有新的分配,X 将不包含NaN,除非在第一个循环中。但是,此作业绝对是写时复制。所以下一次分配给X 会触发内存写入,因此需要更多时间,这不是我们感兴趣的。也许我们应该在X=a 行之后强制写入内存?
【解决方案2】:

嗯,这就是我的意思,但@Yvon,你是对的,它比将NaNs 分配给0 要慢。此外,使用 TIMEIT 功能进行更好的测试(Elkan 的答案似乎不正确)。我确信有一些小的改进和其他方法,但我想我会添加这个。

    Test Results:

func4 %# Jon potential improves on func3
  runTime: 0.00374437,
  error:5.32907e-15
func3 %# Elkan's comment on Yvon's answer
  runTime: 0.00375083,
  error:5.32907e-15
func2 %# Expanded Norm Yvon
  runTime: 0.00386379,
  error:5.32907e-15
func1 %# Original
  runTime: 0.00515395,
  error:0
func5 %# Jon's no zero-assign version of func3
  runTime: 0.00884188,
  error:5.32907e-15
func6 %# Jon's no zero-assign version of Original (norm-mean directly)
  runTime: 0.0105508,
  error:2.66454e-15

代码:

function test()

a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
% nnx = sum(~isnan(a));

f = {...
        @() func1(a);   %# Original
        @() func2(a);   %# Expanded Yvon
        @() func3(a);   %# Elkan's comment
        @() func4(a);   %# Jon potential improves on 3
        @() func5(a);   %# Jon's no zero-assign Elkan's
        @() func6(a);   %# Jon's no zero-assign on Original
    };
    %Time each function/method
    timings = cellfun(@timeit, f);
    validity = cellfun(@feval, f, 'UniformOutput',false);
    err=max( abs( bsxfun(@minus,validity{1},vertcat(validity{:})) ),[],2 );
    %Display in order of speed:
    idxTime=1:size(timings,1);
    [sortedTime,sortMap]=sort(timings);
    fprintf('Test Results:\n\n');
    fprintf('func%d\r  runTime: %g,\r  error:%g\n',[idxTime(sortMap);sortedTime.';err(sortMap).'])
end

%% Functions below
function xnorm = func1(X)

    nanix = isnan(X);
    nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm = sqrt(sum(X.^2));

end
function xnorm2 = func2(X)


    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    Xmean = Xsum./nx;
    X2 = sum(X.^2);
    Xmean2 = Xmean.^2.*nx;
    XXmean = Xsum.*Xmean;
    xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx

end
function xnorm3 = func3(X)

    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    X2sum = sum(X.^2); % 50% time consumed here
    xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx

 % relative error
end

%Try column version of setting to zero, Replace element-wise powers with multiplication, add realsqrt
function xnorm4 = func4(X)

    [r,c] = size(X);
    nanix = isnan(X);
    nx = sum(~nanix);%prob can't speedup without Mex
    Xtemp=X(:);
    Xtemp(nanix(:))=0;
    X=reshape(Xtemp,r,c); %Sometimes faster if linear?
    Xsum = sum(X);
    X2sum = sum(X.*X);
    xnorm4 = realsqrt( X2sum - (Xsum.*Xsum)./nx ); % avoid bsx

end
%Now using indexing instead of setting to zero
function xnorm5 = func5(X)

    nnanix = ~isnan(X);
    nx = sum(nnanix);
%     Xtemp=X(nnanix); %40% of time lost
    Xtemp=X(:);
    Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
    %Now have two vectors, one of values, and one of # of values per
    %original column... avoid cells!
    ind(1,numel(Xtemp)) = 0;%hack pre-allocation
    ind(cumsum(nx(end:-1:1))) = 1;
    ind = cumsum(ind(end:-1:1)); %grouping index for accumarray
    Xsum = accumarray(ind(:), Xtemp(:)); %sum each "original" columns
    X2sum = accumarray(ind(:), Xtemp(:).*Xtemp(:));%./nx(:) for mean directly

    xnorm5 = realsqrt( X2sum - (Xsum.*Xsum)./nx(:) ).'; % avoid bsx

end

%Now using indexing instead of setting to zero
function xnorm6 = func6(X)

    nnanix = ~isnan(X);
    nx = sum(nnanix); %only Mex could speed up
%     Xtemp=X(nnanix); %40% of time lost
    Xtemp=X(:);
    Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
    numNoNans=numel(Xtemp);
    %Now have two vectors, one of values, and one of # of values per
    %original column... avoid cells! It's like a "run length encoding"

    %Almost like "run-length encoding" FEX: RunLength, for MEX
    ind(1,numNoNans) = 0;%hack pre-allocation
    ind(cumsum(nx(end:-1:1))) = 1;
    ind = cumsum(ind(end:-1:1)); %// generate grouping index for accumarray
    Xmean = (accumarray(ind(:), Xtemp(:))./nx(:)).'; %means of each col
    %"Run-length decoding" %Repelem is fastest if 2015b>
    idx(1,numNoNans)=0;
    idx([1 cumsum(nx(1:end-1))+1]) = diff([0 Xmean]);
    XmeanRepeated = cumsum(idx);
    XminusMean = Xtemp(:).'-XmeanRepeated; %Each col subtracted by col mean
    XSumOfSquares = accumarray(ind(:), XminusMean.*XminusMean); %sum each "original" columns

    xnorm6 = realsqrt(XSumOfSquares).';

end

【讨论】:

    【解决方案3】:

    我很惊讶地看到,我提出的方式在被@Yvon 实施时花费了如此多的时间。所以我做了更多的实验并作为新的答案发布。

    clear;clc;close all
    
    a = randn(80,3000);
    [r,c] = size(a);
    b = randperm(r*c);
    nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
    a(nanix) = NaN;
    % nan values
    nanix = isnan(a);
    % count the number of non-NaN values in each column for calculating mean
    nx = sum(~nanix);
    clearvars -except a nanix nx
    tic
    for i = 1:1e3
        X = a;
        X(nanix) = 0;
        bsxminus = sum(X)./nx;
        X = bsxfun(@minus, X, bsxminus);
        X(nanix) = 0;
        xnorm = sqrt(sum(X.^2));
    end
    t = toc;
    fprintf('Reference edition: time elapese: %.4f\n',t) % relative error
    
    clearvars -except a xnorm nanix nx
    tic
    for i = 1:1e3
        X = a;
        X(nanix) = 0;
        Xsum = sum(X);
        Xmean = Xsum./nx;
        X2 = sum(X.^2);
        Xmean2 = Xmean.^2.*nx;
        XXmean = Xsum.*Xmean;
        xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
    end
    t = toc;
    fprintf('Yvon''s first edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm2))) % relative error
    
    clearvars -except a xnorm nanix nx
    tic
    for i = 1:1e3
        X = a;
        X(nanix) = 0;
        Xsum = sum(X);
        X2sum = sum(X.^2); % 50% time consumed here
        xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
    end
    t = toc;
    fprintf('Yvon''s second edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm3))) % relative error
    
    clearvars -except a xnorm nanix nx
    tic
    for i = 1:1e3
        X = a;
        X = X(~nanix);
        bsxminus = mean(X);
        X = bsxfun(@minus, X, bsxminus);
        xnorm4 = sqrt(sum(X.^2));
    end
    t = toc;
    fprintf('Yvon''s third edition: time elapsed: %.4fs, max error: %e\n', t, max(abs(xnorm-xnorm4)))
    
    clearvars -except a xnorm nanix nx
    tic
    X = a;
    X(nanix) = 0;
    for i = 1:1e3
        xsum = sum(X);
        x2sum = sum(X.^2);
        xnorm5 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
    end
    t = toc;
    fprintf('My simplified edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm5)))
    
    clearvars -except a xnorm nanix nx
    tic
    nanix = find(isnan(a));
    for i = 1:1e3
        X = a;
        X(nanix) = 0;
        xsum = sum(X);
        x2sum = sum(X.^2);
        xnorm6 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
    end
    t = toc;
    fprintf('Yvon''s edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm6)))
    
    clearvars -except a xnorm nanix nx
    tic
    nanix = find(isnan(a));
    for i = 1:1e3
        X = a;
        X(nanix) = 0;
        bsxminus = sum(X)./nx;
        X = bsxfun(@minus, X, bsxminus);
        X(nanix) = 0;
        xnorm7 = sqrt(sum(X.^2));
    end
    t = toc;
    fprintf('Reference edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm7)))
    

    输出是:

    Reference edition: time elapese: 6.1678
    Yvon's first edition: time elapsed: 3.6075s, max efrror: 4.440892e-15
    Yvon's second edition: time elapsed: 3.5760s, max error: 3.552714e-15
    Yvon's third edition: time elapsed: 4.0059s, max error: 4.043873e+02
    My simplified edition: time elapsed: 0.8743s, max error: 3.552714e-15
    Yvon's edition with linear indices: time elapsed: 2.3531s, max error: 3.552714e-15
    Reference edition with linear indices: time elapsed: 3.5527s, max error: 0.000000e+00
    

    使用find将逻辑索引转换为线性索引可以进一步显着提高速度。正如 MATLAB 建议使用逻辑索引来提高性能,我想知道为什么这与我的结果相矛盾。

    【讨论】:

      猜你喜欢
      • 2013-08-14
      • 2016-05-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-02-20
      • 2015-12-30
      相关资源
      最近更新 更多