【问题标题】：Moving window averages on unequal dimensions不等维度上的移动窗口平均值
【发布时间】：2018-02-03 08:22:51
【问题描述】：

TL;DR：有没有办法摆脱我的第二个for-loop？

我在二维网格上有一个时间序列的点。为了摆脱它们位置的快速波动，我在一个帧窗口上平均坐标。现在就我而言，这些点可能会比平时覆盖更大的距离。我不想包含特定点的帧，如果它比cut_off 值更远。

在第一个for-loop 中，我遍历所有帧并定义移动窗口。然后我计算当前帧和移动窗口中每个帧之间的距离。在我只从所有帧中抓取这些位置之后，x 和y 组件的移动距离都没有超过cut_off。现在我想从移动窗口的所有这些选定帧中计算每个点的平均位置（注意：选定帧的数量可以小于n_window） .这导致我进入第二个for-loop。在这里，我遍历所有点并实际从帧中获取位置，其中当前点没有比cut_off 移动得更远。从这些选定的帧中，我计算坐标的平均值并将其用作当前帧的新值。

最后一个for-loop 减慢了整个处理速度。我想不出更好的方法来完成这个计算。有什么建议吗？

MWE

放入 cmets 以便澄清。

import numpy as np

# Generate a timeseries with 1000 frames, each 
# containing 50 individual points defined by their 
# x and y coordinates
n_frames = 1000
n_points = 50
n_coordinates = 2

timeseries = np.random.randint(-100, 100, [n_frames, n_points, n_coordinates])

# Set window size to 20 frames
n_window = 20

# Distance cut off
cut_off = 60

# Set up empty array to hold results
avg_data_store = np.zeros([n_frames, timeseries.shape[1], 2])

# Iterate over all frames
for frame in np.arange(0, n_frames):
    # Set the frame according to the window size that we're looking at
    t_before = int(frame - (n_window / 2))
    t_after = int(frame + (n_window / 2))

    # If we're trying to access frames below 0, set the lowest one to 0
    if t_before < 0:
        t_before = 0

    # Trying to access frames that are not in the trajectory, set to last frame
    if t_after > n_frames - 1:
        t_after = n_frames - 1

    # Grab x and y coordinates for all points in the corresponding window
    pos_before = timeseries[t_before:frame]
    pos_after = timeseries[frame + 1:t_after + 1]
    pos_now = timeseries[frame]

    # Calculate the distance between the current frame and the windows before/after
    d_before = np.abs(pos_before - pos_now)
    d_after = np.abs(pos_after - pos_now)

    # Grab indices of frames+points, that are below the cut off
    arg_before = np.argwhere(np.all(d_before < cut_off, axis=2))
    arg_after = np.argwhere(np.all(d_after < cut_off, axis=2))

    # Iterate over all points
    for i in range(0, timeseries.shape[1]):
        # Create temp array
        temp_stack = pos_now[i]

        # Grab all frames in which the current point did _not_
        # travel farther than `cut_off`
        all_before = arg_before[arg_before[:, 1] == i][:, 0]
        all_after = arg_after[arg_after[:, 1] == i][:, 0]

        # Grab the corresponding positions for this points in these frames
        all_pos_before = pos_before[all_before, i]
        all_pos_after = pos_after[all_after, i]

        # If we have any frames for that point before / after
        # stack them into the temp array
        if all_pos_before.size > 0:
            temp_stack = np.vstack([all_pos_before, temp_stack])
        if all_pos_after.size > 0:
            temp_stack = np.vstack([temp_stack, all_pos_after])

        # Calculate the moving window average for the selection of frames
        avg_data_store[frame, i] = temp_stack.mean(axis=0)

【问题讨论】：

这听起来有点像你在尝试重新发明卡尔曼滤波器。
类似：stackoverflow.com/questions/13901997/kalman-2d-filter-in-python
只是为了澄清：问题是 (a) 存在您想要摆脱的测量伪影/异常值，或者 (b) 您的粒子运动中存在真正的跳跃不想通过移动窗口平均值求平均值？
@Paul 这是第二个问题。基本上我在一个盒子里模拟粒子，并且通过周期性边界条件，它们能够跳过盒子障碍。如果粒子在给定的帧窗口中连续跳过屏障，则其平均位置在框的中间。

标签： python numpy

【解决方案1】：

如果您可以分别计算 x 和 y 中的截止距离，您可以使用scipy.ndimage.generic_filter。

import numpy as np
from scipy.ndimage import generic_filter

def _mean(x, cutoff):
    is_too_different = np.abs(x - x[len(x) / 2]) > cutoff
    return np.mean(x[~is_too_different])

def _smooth(x, window_length=5, cutoff=1.):
    return generic_filter(x, _mean, size=window_length, mode='nearest', extra_keywords=dict(cutoff=cutoff))

def smooth(arr, window_length=5, cutoff=1., axis=-1):
    return np.apply_along_axis(_smooth, axis, arr, window_length=window_length, cutoff=cutoff)

# --------------------------------------------------------------------------------

def _simulate_movement_2d(T, fraction_is_jump=0.01):

    # generate random velocities with a few "jumps"
    velocity = np.random.randn(T, 2)
    is_jump = np.random.rand(T) < fraction_is_jump
    jump = 10 * np.random.randn(T, 2)
    jump[~is_jump] = 0.

    # pre-allocate position and momentum arrays
    position = np.zeros((T,2))
    momentum = np.zeros((T,2))

    # initialise the first position
    position[0] = np.random.randn(2)

    # update position using velocity vector:
    # smooth movement by not applying the velocity directly
    # but rather by keeping track of the momentum
    for ii in range(2,T):
        momentum[ii] = 0.9 * momentum[ii-1] + 0.1 * velocity[ii-1]
        position[ii] = position[ii-1] + momentum[ii] + jump[ii]

    # add some measurement noise
    noise = np.random.randn(T,2)
    position += noise
    return position

def demo(nframes=1000, npoints=3):
    # create data
    positions = np.array([_simulate_movement_2d(nframes) for ii in range(npoints)])

    # format to (nframes, npoints, 2)
    position = positions.transpose([1, 0, 2])

    # smooth
    smoothed = smooth(positions, window_length=11, cutoff=5., axis=1)

    # plot
    x, y = positions.T
    xs, ys = smoothed.T

    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1,1)
    ax.plot(x, y, 'o')
    ax.plot(xs, ys, 'k-', alpha=0.3, lw=2)
    plt.show()

demo()

【讨论】：