按最近时间对齐数据数组答案

【问题标题】：Aligning data arrays by closest time按最近时间对齐数据数组
【发布时间】：2018-07-21 05:41:58
【问题描述】：

我有 2 个数据向量和相应的时间向量。这些数据几乎同时被采样，但它们的时间戳略有不同（来自机器精度传输延迟等）。由于遥测问题，一个或两个数据向量偶尔会出现数据丢失和双重样本。

我想将数据数组与它们的时间匹配的位置进行匹配，以在它们之间执行一些数学运算。基本上从y1 和y2 中删除它们没有对应时间x1 和x2 的点（在大约1/2 的采样率内被视为匹配）。

注意我不想插入y1 & y2

%Sample time stamps: Real ones are much faster and not as neat.
x1 = [1  2 3 4 5 5.1 6   7   8       10  ]; %note double sample at ~5.
x2 = [.9       4.9    5.9 6.9 8.1 9.1 10.1]; %Slightly different times.

%Sample data:  y is basically y1+1 if no data was missing
y1 = [1 2 3 4 5 5 6 7 8    10];
y2 = [2       6   7 8 9 10 11];

所以结果应该是这样的：

y1_m = [1 5 6 7 8 10];
y2_m = [2 6 7 8 9 11];

到目前为止我所拥有的：我使用interp1 来查找两个时间数组之间最近的时间点。然后像这样得到它们之间的时间增量：

>> idx = interp1(x2,1:numel(x2),x1,'nearest','extrap')
idx =
     1     1     2     2     2     2     3     4     5     7

>> xDelta = abs(x2(idx) - x1)
xDelta =
    0.1000    1.1000    1.9000    0.9000    0.1000    0.2000    0.1000    0.1000    0.1000    0.1000

现在我认为我需要为每个唯一的idx 找到最小值xDelta，这应该可以让我找到所有匹配点。但是，我还没有想出一个聪明的方法来做到这一点......看起来accumarray在这里应该很有用，但到目前为止我没有使用它。

【问题讨论】：

有趣的问题！请先把它写成循环形式，它可能足够快，并且更具可读性。矢量化解决方案可能涉及制作具有索引和增量的单个数组，按行对其进行排序（因此您首先按索引排序，对于相等的索引，您按增量排序）。接下来，为每个索引选择第一个元素，并在未排序的数组中找到相应的位置。我认为排序索引列上的unique 会选择每个索引的第一个？
@CrisLuengo 我开始制作一个循环，它开始变得复杂，索引和查找语句的索引......它很难看。无论如何，根据您的评论，我得到了一个有效的矢量化解决方案。如果你写你的评论作为答案，我会接受它，否则我会提交我的，这样就不会没有答案。谢谢。
你写这个答案会更容易。我不认为我上面的草图是一个完整的答案，我必须努力写一个答案，而你已经做了那个努力......

标签： matlab indexing time-series sampling closest

【解决方案1】：

这是一个粗略的想法，您可以使用unique 和ismembertol 改进它：

function [y1_m, y2_m] = q48723002
%% Stage 0 - Setup:
%Sample time stamps: Real ones are much faster and not as neat.
x1 = [1  2 3 4 5   5.1 6   7   8       10  ]; %note double sample at ~5.
x2 = [.9       4.9     5.9 6.9 8.1 9.1 10.1]; %Slightly different times.

%Sample data:  y is basically y1+1 if no data was missing
y1 = [1 2 3 4 5 5 6 7 8    10];
y2 = [2       6   7 8 9 10 11];
%% Stage 1 - Remove repeating samples:
SR = 0.5; % Sampling rate, for rounding.
[~,Loc1] = ismembertol(x1,round(x1/SR)*SR,SR/2,'DataScale',1);
[~,Loc2] = ismembertol(x2,round(x2/SR)*SR,SR/2,'DataScale',1);
u1 = unique(Loc1);
u2 = unique(Loc2);
x1u = x1(u1);
y1u = y1(u1);
x2u = x2(u2);
y2u = y2(u2);
clear Loc1 Loc2
%% Stage 2 - Get a vector of reference time steps:
ut = union(u1,u2);
%% Stage 3 - Only keep times found in both
[In1,Loc1] = ismembertol(ut,x1u,SR/2,'DataScale',1);
[In2,Loc2] = ismembertol(ut,x2u,SR/2,'DataScale',1);
valid = In1 & In2;
%% Stage 4 - Output:
y1_m = ut(Loc1(valid)); % equivalently: y1_m = ut(valid)
y2_m = y1_m + 1;

ans =

     1     5     6     7     8     9

另请参阅：uniquetol。

【讨论】：

这是一个有趣的方法。我不知道函数ismembertol。可能是因为我的公司最近刚从 2014b 升级，它是 2015a 的新产品。但是，似乎有一些不正确的地方。因为答案不正确。 y1_m 应该是 [1 5 6 7 8 10]

【解决方案2】：

这是基于@Cris Luengo 对原始问题的评论的解决方案。

它使用sortrows 和unique 来获得每对数据点的最低时间误差。

%Sample time stamps: Real ones are much faster and not as neat.
x1 = [1  2 3 4 5 5.1 6   7   8       10  ]; %note double sample at ~5.
x2 = [.9       4.9    5.9 6.9 8.1 9.1 10.1]; %Slightly different times.

%Sample data:  y is basically y1+1 if no data was missing
y1 = [1 2 3 4 5 5 6 7 8    10];
y2 = [2       6   7 8 9 10 11];

%Find the nearest match
idx   = interp1(x2,1:numel(x2),x1,'nearest','extrap');
xDiff = abs(x2(idx) - x1);

% Combine the matched indices & the deltas together & sort by rows.
%So lowest delta for a given index is first.
[A, idx1]    = sortrows([idx(:) xDiff(:)]);
[idx2, uidx] = unique(A(:,1),'first');
idx1         = idx1(uidx); %resort idx1

%output
y1_m = y1(idx1)
y2_m = y2(idx2)


y1_m =
     1     5     6     7     8    10
y2_m =
     2     6     7     8     9    11

【讨论】：

@CrisLuengo 这是基于您的评论的解决方案。