Matlab：在数组中搜索相似值并创建一个包含所有值的新数组答案

【问题标题】：Matlab: searching arrays for similar values and create a new array containing all valuesMatlab：在数组中搜索相似值并创建一个包含所有值的新数组
【发布时间】：2016-08-15 10:01:45
【问题描述】：

我有三个列向量：

A = [1;2;5;9;15]
B = [2;3;5;11;15]
C = [5;7;11;20;25]

我想通过搜索A B C 的所有元素来创建一个新的列向量D，找到所有值并避免在D 中重复它们。

我希望D 成为：

如何做到这一点？
谢谢！

【问题讨论】：

unique([A;B;C])?
faster way to achieve unique() in matlab if assumed 1d pre-sorted vector?的可能重复
谢谢。除了使用matlab内置的'unique'之外，还有其他方法吗？
查看建议的重复帖子 - 它包含一个不使用unique的方法

标签： arrays matlab find

【解决方案1】：

这是另一种（超快）方式，不使用unique，并且没有循环，如果您只处理整数：

A = [1;2;5;9;15];
B = [2;3;5;11;15];
C = [5;7;11;20;25];
tmp = [A;B;C]; % concat the vectors
R = min(tmp):max(tmp)+1; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = R(ind).' % extract the relevant values

这个方法可以推广到双打：

A = [1.2;2.62;5.74;9.29;15.31];
B = [2.3;3;5;9.29;15.31];
C = [1.2;2.62;11;20;25];
tmp = sort([A;B;C]); % concat and sort the vectors
R = [tmp; max(tmp)+1]; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = tmp(ind) % extract the relevant values

但是，首先需要对值进行排序（在tmp 中）使其比其他方法慢。

【讨论】：

@user5916581 你可能会发现一些其他技术here
这个方法似乎需要整数值。但是，如果需要这样做，那么该方法似乎具有良好的性能。
@patrik 我添加了一个通用方法，但它似乎只在整数方面优于其他方法。
我认为这与对sort() 的调用有关。通常排序方法很重。在最好的情况下，你可能会归结为 O(n*log(n)) 操作，但最坏的情况是 O(n^2) 的很多倍。

【解决方案2】：

这段代码应该做你想做的：

% Your sample arrays
A=[1;2;5;9;15]
B=[2;3;5;11;15]
C=[5;7;11;20;25]

% [A,B,C] concatenates the arrays to one single array
% Unique finds unqiues values in the input array
[D, IA, ID] = unique([A,B,C]);

disp(D);

% D = array with unique values

% ID = array with unique natural number assigned to equal values for the
% original array

% IA = array that can be referenced against ID to find the value in the
% original array

% ID and IA can be used to recreate the original array

不使用“唯一”的解决方案，这可能效率较低：

% SOLUTION WITHOUT USING UNIQUE

% Your variables
A=[1;2;5;9;15];
B=[2;3;5;11;15];
C=[5;7;11;20;25];

% Allocate a temporary array with your arrays concatenated
temp = sort([A;B;C]);
rep_count = 0; % Count number of repeat values

% Allocate a blank array for your output
D = zeros(length(temp),1);
D(1) = temp(1); % Initialise first element (is always unique)

% Iterate through temp and output unqiue values to D
for i = 2:length(temp)
    if (temp(i) == D(i-1-rep_count))
        rep_count = rep_count+1;
    else
        D(i-rep_count) =  temp(i);
    end
end

% Remove zeros at the end of D
D = D(1:length(D)-rep_count);

disp(D)

【讨论】：

谢谢。除了使用matlab内置的'unique'之外，还有其他方法吗？
@user5916581 我在上面编辑了我的解决方案，并为您提供了替代方案。它可能比独特的要慢...

【解决方案3】：

可以对数据进行排序并检查唯一值。这似乎与使用函数unique() 一样有效。使用sort() 和diff() 可能具有优势。然而，这可能取决于硬件，考虑到D = unique([A;B;C]); 的简单性，差异是相当微不足道的。

function test()

% A=[1;2;5;9;15];
% B=[2;3;5;11;15];
% C=[5;7;11;20;25];

A = 500*rand(10000000,1);
B= 500*rand(10000000,1);
C = 500*rand(10000000,1);

f1 = @() testA(A,B,C);
f2 = @() testB(A,B,C);

time1 = timeit(f1,1);
time2 = timeit(f2,1);
disp(time1);
disp(time2);

function D = testA(A,B,C)
d = sort([A;B;C]);
idx = diff(d);
D = d([1;idx]>0);

function D = testB(A,B,C)
D = unique([A;B;C]);

测试

1.9085

1.9968

【讨论】：

我已经在我的电脑上测试了这个和use of histcounts (testC)，结果：testA = 1.6110, testB = 1.5125, testC = 0.1835跨度>