【问题标题】:how to count persistance in days for characters如何计算字符的持续时间
【发布时间】:2018-10-02 15:18:33
【问题描述】:

假设我们有这些数据:

type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
mydataframe <- data.frame(type, dates)

我在其他posts 中看到rle 可能会完成这项工作,但我想获得一个数据框,对于每种类型,我的平均持久性都在天数内。比如:

> print(persistance)
  type1 type2 type3
1     2   1.5   2.5

请问有人知道怎么做吗? 谢谢!

【问题讨论】:

  • runs &lt;- rle(mydataframe$type); aggregate(lengths ~ values, unclass(runs), mean)
  • 感谢您的帮助!

标签: r count duration run-length-encoding


【解决方案1】:

另一种(分组)解决方案:

type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
mydataframe <- data.frame(type, dates)

library(dplyr)

mydataframe %>%
  count(type, group = cumsum(type != lag(type, default = first(type)))) %>%
  group_by(type) %>%
  summarise(Avg = mean(n))

# # A tibble: 3 x 2
#     type     Avg
#    <fct>  <dbl>
# 1 type 1   2  
# 2 type 2   1.5
# 3 type 3   2.5

【讨论】:

    【解决方案2】:

    数据表

    library(data.table)
    runs <- setDT(rle(as.character(mydataframe$type)))
    runs[, mean(lengths), values]
    
    #    values  V1
    # 1: type 1 2.0
    # 2: type 2 1.5
    # 3: type 3 2.5
    

    tidyverse & magrittr

    library(tidyverse)
    library(magrittr)
    
    rle(as.character(mydataframe$type)) %$% 
      tibble(lengths, values) %>% 
      group_by(values) %>% 
      summarise_all(mean)
    
    # # A tibble: 3 x 2
    #   values lengths
    #   <chr>    <dbl>
    # 1 type 1    2.00
    # 2 type 2    1.50
    # 3 type 3    2.50
    

    dplyr

    library(dplyr)
    rle(as.character(mydataframe$type)) %>% 
      unclass %>%
      as.data.frame %>% 
      group_by(values) %>% 
      summarise_all(mean)
    

    【讨论】:

    • 哇...非常感谢您的广泛回答,这非常有用!
    【解决方案3】:

    您可以使用基本 R 函数 rleaggregate 来执行此操作。

    # set up the data as in your question
    type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
    dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
    mydataframe <- data.frame(type, dates)
    
    # calculate the length of the run using rle 
    runs <- rle(as.character(mydataframe$type))
    # calculate the average length of the run
    aggregate(runs[[1]], by = runs[2], FUN = mean)
    

    请注意,这假定您的日期列中的日期确实是连续的。如果您的日期有间隔并希望将其视为单独运行,则必须稍微更改公式才能真正处理 dates 列中的日期。

    【讨论】:

      猜你喜欢
      • 2013-03-07
      • 2016-01-18
      • 1970-01-01
      • 2011-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-08-06
      相关资源
      最近更新 更多