【问题标题】:Identify first change in value in a dataframe and ignore subsequent changes识别数据框中值的第一个变化并忽略后续变化
【发布时间】:2023-01-09 23:21:13
【问题描述】:

我想使用 R 来识别首次满足条件的时间并忽略后续更改。示例数据:

df <- data.frame(response = c(1, 1, 1, 0, 1, 0))

注意:第一个响应总是以 1 开头。

预期产出

f <- data.frame(response = c(1, 1, 1, 0, 1, 0), Threshold = c("no", "no", "no", "yes", "no", "no"))

【问题讨论】:

    标签: r


    【解决方案1】:

    全部设置为“否”,然后找到第一个 0,并将那个设置为“是”:

    df$Threshold <- "no"
    df$Threshold[ which(df$response == 0)[ 1 ] ] <- "yes"
    # df
    #   response Threshold
    # 1        1        no
    # 2        1        no
    # 3        1        no
    # 4        0       yes
    # 5        1        no
    # 6        0        no
    

    【讨论】:

    • 我喜欢这个,但 which(df$response != df$response[1])[1] 不会更通用吗?
    • @SamR 我假设他们希望第一个 0 是肯定的,没有任何变化。
    • 太谢谢了!是的,第一反应永远是 1。
    【解决方案2】:

    使用@zx8754 建议

    数据表

    df <-
      data.frame(
        response = c(1, 1, 1, 0, 1, 0),
        Threshold = c("no", "no", "no", "yes", "no", "no")
      )
    
    library(data.table)
    library(magrittr)
    setDT(df)[, Threshold_new := "no"] %>% 
      .[response == 0, Threshold_new := fifelse(cumsum(response == 0) == 1, "yes", Threshold_new)] %>% 
      .[]
    #>    response Threshold Threshold_new
    #> 1:        1        no            no
    #> 2:        1        no            no
    #> 3:        1        no            no
    #> 4:        0       yes           yes
    #> 5:        1        no            no
    #> 6:        0        no            no
    

    创建于 2023-01-09 reprex v2.0.2

    【讨论】:

      【解决方案3】:

      您可以使用match获取第一个0

      df$Threshold <- "no"
      df$Threshold[match(0, df$response)] <- "yes"
      
      df
      #  response Threshold
      #1        1        no
      #2        1        no
      #3        1        no
      #4        0       yes
      #5        1        no
      #6        0        no
      

      只是为了好玩一个基准:

      df <- data.frame(response = c(1, 1, 1, 0, 1, 0), Threshold = "no")
      
      library(data.table) #For Yuriy Saraykin
      library(magrittr)   #For Yuriy Saraykin
      
      bench::mark(check = FALSE, #For Yuriy Saraykin
      zx8754 = {df$Threshold <- "no"
        df$Threshold[ which(df$response == 0)[ 1 ] ] <- "yes"}
      , "Yuriy Saraykin" = {setDT(df)[, Threshold := "no"] %>% 
        .[response == 0, Threshold := fifelse(cumsum(response == 0) == 1, "yes", Threshold)] %>% 
        .[]}
      , GKi = {df$Threshold <- "no"
        df$Threshold[match(0, df$response)] <- "yes"}
      )
      # expression          min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
      #  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
      #1 zx8754          70.19µs  75.84µs    12515.    32.2KB     15.2  5763     7
      #2 Yuriy Saraykin   1.57ms   1.61ms      604.   137.6KB     10.4   289     5
      #3 GKi             68.69µs  72.98µs    13125.    32.2KB     14.7  6230     7
      

      zx8754GKi 靠得很近。 Yuriy Saraykin 在这种情况下需要更多时间和更多记忆。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-08-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-04-22
        • 1970-01-01
        相关资源
        最近更新 更多