时间序列分割答案

【问题标题】：Time series segmentation时间序列分割
【发布时间】：2012-04-02 13:07:27
【问题描述】：

我有时间序列数组，每个数组平均有大约 1000 个值。我需要独立识别每个数组中的时间序列段。

每当每个项目之间的经过时间超过它时，我目前正在使用该方法来计算数组和分段项目的平均值。我找不到太多关于如何实现这一点的标准信息。我相信还有更合适的方法。

这是我目前正在使用的代码。

def time_cluster(input)
    input.sort!
    differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
    mean = differences.mean

    clusters = []
    j = 0

    input.each_index do |i|
      j += 1 if i > 0 and differences[i-1] > mean
      (clusters[j] ||= []) << input[i]
    end

    return clusters
  end

此代码中的几个示例

time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])

输出

1  2  3  4  7  9, sparsity 1.3
250  254  258  270  292,  sparsity 8.4
340  345  349  371  375  382  405  407  409, sparsity 7
520  527, sparsity 3

另一个数组

time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])

输出

1  2  3  4  5  6  7  8  9  10, sparsity 0.9
1000  1020  1040  1060  1080, sparsity 16
1200

【问题讨论】：

那么您的确切问题是什么？您是否看过文献中的方法，例如scholar.google.com/scholar?q=time+series+segmentation
查看对这个问题的回复：stackoverflow.com/questions/8940049/…

标签： data-mining time-series

【解决方案1】：

使用 K 均值。 http://ai4r.rubyforge.org/machineLearning.html

gem install ai4r

奇异值分解也可能会让您感兴趣。 http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/

如果在 Ruby 中做不到，这里有一个很好的 Python 示例。

Unsupervised clustering with unknown number of clusters

【讨论】：