【问题标题】:Periodic Patterns Identification in RR中的周期性模式识别
【发布时间】:2021-12-23 06:04:05
【问题描述】:

我想识别时间序列中的时间模式。

structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
"v", "w", "x"), `2016/01` = c(1, NA, NA, 1, NA, NA, 1, NA, NA, 
1, NA, 1, 1, 1, NA, 1, NA, NA, 1, NA, NA, 1, NA, NA), `2016/02` = c(NA, 
1, NA, NA, 1, NA, NA, 1, NA, NA, 1, 1, 1, NA, 1, NA, 1, NA, NA, 
1, NA, NA, 1, NA), `2016/03` = c(NA, NA, 1, NA, NA, 1, NA, NA, 
1, 1, NA, 1, 1, 1, NA, NA, NA, 1, NA, NA, 1, NA, NA, 1), `2016/04` = c(NA, 
NA, NA, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, 
1, NA, NA, NA, NA, NA), `2016/05` = c(NA, NA, NA, NA, 1, NA, 
NA, NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, 
NA), `2016/06` = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, 
1, 1, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA), `2016/07` = c(NA, 
NA, NA, 1, NA, NA, 1, NA, NA, 1, NA, 1, 1, 1, NA, 1, NA, NA, 
1, NA, NA, NA, NA, NA), `2016/08` = c(NA, NA, NA, NA, 1, NA, 
NA, 1, NA, NA, 1, 1, 1, NA, 1, NA, 1, NA, NA, 1, NA, NA, NA, 
NA), `2016/09` = c(NA, NA, NA, NA, NA, 1, NA, NA, 1, 1, NA, 1, 
1, 1, NA, NA, NA, 1, NA, NA, 1, NA, NA, NA), `2016/10` = c(NA, 
NA, NA, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, 
1, NA, NA, NA, NA, NA), `2016/11` = c(NA, NA, NA, NA, 1, NA, 
NA, NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, 
NA), `2016/12` = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, 
1, 1, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA), `2017/01` = c(1, 
NA, NA, 1, NA, NA, 1, NA, NA, 1, NA, 1, 1, 1, NA, 1, NA, NA, 
1, NA, NA, 1, NA, NA), `2017/02` = c(NA, 1, NA, NA, 1, NA, NA, 
1, NA, NA, 1, 1, 1, NA, 1, NA, 1, NA, NA, 1, NA, NA, 1, NA), 
    `2017/03` = c(NA, NA, 1, NA, NA, 1, NA, NA, 1, 1, NA, 1, 
    1, 1, NA, NA, NA, 1, NA, NA, 1, NA, NA, 1), `2017/04` = c(NA, 
    NA, NA, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, 
    NA, 1, NA, NA, NA, NA, NA), `2017/05` = c(NA, NA, NA, NA, 
    1, NA, NA, NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA, 1, 
    NA, NA, NA, NA), `2017/06` = c(NA, NA, NA, NA, NA, 1, NA, 
    NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, 
    NA), `2017/07` = c(NA, NA, NA, 1, NA, NA, 1, NA, NA, 1, NA, 
    1, 1, 1, NA, 1, NA, NA, 1, NA, NA, NA, NA, NA), `2017/08` = c(NA, 
    NA, NA, NA, 1, NA, NA, 1, NA, NA, 1, 1, 1, NA, 1, NA, 1, 
    NA, NA, 1, NA, NA, NA, NA), `2017/09` = c(NA, NA, NA, NA, 
    NA, 1, NA, NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 
    1, NA, NA, NA), `2017/10` = c(NA, NA, NA, 1, NA, NA, NA, 
    NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, 
    NA), `2017/11` = c(NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, 
    NA, 1, 1, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA), `2017/12` = c(1, 
    NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, 
    NA, NA, NA, 1, 1, NA, NA), `2018/01` = c(NA, 1, NA, 1, NA, 
    NA, 1, NA, NA, 1, NA, 1, 1, 1, NA, 1, NA, NA, 1, NA, NA, 
    NA, 1, NA), `2018/02` = c(NA, NA, 1, NA, 1, NA, NA, 1, NA, 
    NA, 1, 1, 1, NA, 1, NA, 1, NA, NA, 1, NA, NA, NA, 1), `2018/03` = c(NA, 
    NA, NA, NA, NA, 1, NA, NA, 1, 1, NA, 1, 1, 1, NA, NA, NA, 
    1, NA, NA, 1, NA, NA, NA), `2018/04` = c(NA, NA, NA, 1, NA, 
    NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, 1, NA, NA, 
    NA, NA, NA), `2018/05` = c(NA, NA, NA, NA, 1, NA, NA, NA, 
    NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA
    ), `2018/06` = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, 
    1, 1, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA), `2018/07` = c(NA, 
    NA, NA, 1, NA, NA, 1, NA, NA, 1, NA, 1, 1, 1, NA, 1, NA, 
    NA, 1, NA, NA, NA, NA, NA), `2018/08` = c(NA, NA, NA, NA, 
    1, NA, NA, 1, NA, NA, 1, 1, 1, NA, 1, NA, 1, NA, NA, 1, NA, 
    NA, NA, NA), `2018/09` = c(NA, NA, NA, NA, NA, 1, NA, NA, 
    1, 1, NA, 1, 1, 1, NA, NA, NA, 1, NA, NA, 1, NA, NA, NA), 
    `2018/10` = c(NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, 1, 1, 
    1, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, NA), `2018/11` = c(NA, 
    NA, NA, NA, 1, NA, NA, NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, 
    NA, NA, 1, NA, NA, NA, NA), `2018/12` = c(NA, NA, NA, NA, 
    NA, 1, NA, NA, NA, NA, 1, 1, 1, NA, 1, NA, NA, NA, NA, NA, 
    1, NA, NA, NA)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

在上层数据框内:

列表项 a 与 v 具有相同的模式 b 与 w 具有相同的模式 c 与 x 具有相同的模式

在上层数据框中,个体 a、b、c、v、w 和 x 具有相同的频率 - 每年一次。

还有一些其他情况,如双月、季度和半年。

我的目标是识别所有这些案例并按照时间模式对所有个体进行分类。

我想包 arulesSequences 可能很有用。

你能帮帮我吗?

【问题讨论】:

    标签: r time-series classification arules


    【解决方案1】:

    我认为一个好的开始是一个完整的层次聚类:

    library(gplots)
    library(dendsort)
    
    # data preparation
    dm <- matrix( as.numeric(!is.na(dat[,-1])), nrow=nrow(dat[,-1]) )
    rownames(dm) <- dat$ID
    colnames(dm) <- colnames(dat[,-1])
    
    heatmap.2( dm, trace="none", hclustfun=function(x){
      dendsort(hclust(x, method="single"), type="average")
      }, col=c("grey90","darkblue") )
    

    通过列的所有时间相关连接都清晰可见。 我加入了 dendsort 以将相似的聚类组合在一起,使 ID 相关的模式更加明显。

    此外,仅绘制行簇可以让您更好地可视化时间模式。

    heatmap.2( dm, trace="none", Colv=NA, dendrogram="row", 
      hclustfun=function(x){ dendsort(hclust(x, method="single"), 
      type="average") }, col=c("grey90","darkblue") )
    

    添加摘要和k-means进行比较:

    分层集群

    dis <- dist(dm, method="euclidean")
    hc <- hclust(dis, method="single")
    # choose the height where to cut
    # lower means more fine grained cluster, less member per cluster
    cutree(hc, h=4)
    a b c d e f g h i j k l m n o p q r s t u v w x 
    1 2 1 3 2 4 1 2 1 5 6 7 7 5 6 1 2 1 3 2 4 1 2 1
    # higher h means larger clusters, i.e. more member per cluster
    cutree(hc, h=5)
    a b c d e f g h i j k l m n o p q r s t u v w x 
    1 2 1 1 2 1 1 2 1 1 2 3 3 1 2 1 2 1 1 2 1 1 2 1
    

    k 均值

    # pre-defining k=6, has to be rerun to change k
    km <- kmeans(dm, 6, algorithm="Hartigan-Wong")
    km$cluster
    a b c d e f g h i j k l m n o p q r s t u v w x 
    2 5 2 6 5 4 2 5 4 3 1 1 1 3 1 2 5 4 6 5 4 2 5 2
    

    【讨论】:

    • 谢谢安德烈。但我有大约 800.000 个观察结果。 K 均值分析不是更有效和解释性更强吗?
    • 至于最终结果,我会说层次聚类为您提供了没有预先选择的全貌。分支可以很好地估计数据的结构。随着数据集变大,由于预定义,k 均值可能会更有效。我看看能不能加个k均值进行比较!
    猜你喜欢
    • 1970-01-01
    • 2019-05-19
    • 2016-04-02
    • 1970-01-01
    • 1970-01-01
    • 2013-12-24
    • 1970-01-01
    • 1970-01-01
    • 2012-09-17
    相关资源
    最近更新 更多