【问题标题】:How to merge/copy different rows into one conditionally [R]如何有条件地合并/复制不同的行[R]
【发布时间】:2021-06-22 02:56:57
【问题描述】:

我有一个带有名称的大型数据框和一个名为 sequence 的“分类”变量。 sequence 说明关于其他行的位置。它有两个值:firstadditional。 问题是这些值的分布不均匀,即每个 first 没有一个 additional,每个 letters 值都是唯一的. 数据框长这样(简化版):

letters <- sample(LETTERS, 20)
sequence <- c("first","additional","first","first","first","first","first","additional","additional","additional","first","first","additional","first","additional","additional","first","additional","first","first")
df <- data.drame(sequence, letters)

现在,我要做的是将 letters 中的每个 additional 值粘贴到 中对应的 first 值中字母。 因此,例如,letters 列中的第二个(行)值将被粘贴到第一个中,因为它是对应的附加。此外,letters中的第八、第九和第十个值应粘贴在letters的第七个值的内部(旁边)(例如,first附加附加附加)。

我已经尝试了以下明显限制,即它只查看紧邻的下一个值,

library(dplyr)    
df <- df %>% mutate(letters_ok = if_else(sequence == "additional",
                                            paste(letters, lag(letters), sep = "; "), letters))

突出我的问题:我如何设法有条件地滞后于 sequence 中的值,以便我可以根据 first 粘贴 letters 中的值还是附加分类?

由于每个 letters 值都是唯一的,并且与特定的 sequence 值相关联,因此我没有使用 group_by。其他所有解决方案都无法解决我目前对字符串/字符争论的了解,所以我非常感谢任何帮助。

【问题讨论】:

    标签: r dataframe if-statement conditional-statements lag


    【解决方案1】:

    这是data.table 方法。我稍微修改了您的示例数据,因为letters 不是一个非常方便的列名。另外,添加set.seed(123) 用于复制目的。

    样本数据

    set.seed(123)
    letter <- sample(LETTERS, 20)
    sequence <- c("first","additional","first","first","first","first","first","additional","additional","additional","first","first","additional","first","additional","additional","first","additional","first","first")
    df <- data.frame(sequence, letter)
    
    #      sequence letter
    # 1       first      O
    # 2  additional      S
    # 3       first      N
    # 4       first      C
    # 5       first      J
    # 6       first      R
    # 7       first      K
    # 8  additional      E
    # 9  additional      X
    # 10 additional      Y
    # 11      first      W
    # 12      first      T
    # 13 additional      I
    # 14      first      L
    # 15 additional      U
    # 16 additional      M
    # 17      first      P
    # 18 additional      H
    # 19      first      B
    # 20      first      G
    

    代码

    library( data.table )
    #convert to data.table format
    setDT(df)
    #add id-column
    df[, id := .I ]
    #perform rolling join
    temp <- df[ sequence == "first", ][ df[ sequence == "additional", ], 
                                        .( x.letter, i.letter, i.id, x.id), 
                                        on = .(id), 
                                        roll = Inf ]
    #summarise
    temp <- temp[, paste0( `i.letter`, collapse = ";" ), by = .(x.id) ]
    #join, drop id column
    df[sequence == "first", ][ temp, letter := paste( letter, i.V1, sep = ";"), on = .(id = `x.id`) ][, id := NULL]
    

    输出

    #    sequence  letter
    # 1:    first     O;S
    # 2:    first       N
    # 3:    first       C
    # 4:    first       J
    # 5:    first       R
    # 6:    first K;E;X;Y
    # 7:    first       W
    # 8:    first     T;I
    # 9:    first   L;U;M
    #10:    first     P;H
    #11:    first       B
    #12:    first       G
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-12-13
      • 2023-03-07
      • 2023-02-07
      • 1970-01-01
      • 2014-03-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多