【问题标题】:Merge Row into one with condition and replace value in one row with value in the other R使用条件将行合并为一个,并将一行中的值替换为另一行中的值
【发布时间】:2021-05-25 22:48:29
【问题描述】:

我在 R 中有一个如下所示的数据集:

A <- c("X", "Y", "Z", "W", "U")
B <- c("apple", "pear", "apple", "pear", "pear")
C <- c("december", "december" ,"June", "june", "march")
D <- c("Winter", "Summer" ,"Winter", "Summer", "Summer")
df <- data.frame(A,B,C,D);df

  A     B        C      D
1 X apple december Winter
2 Y  pear december Summer
3 Z apple     June Winter
4 W  pear     june Summer
5 U  pear    march Summer

我想逐列合并 C 行(将第 1 行与第 2 行混合,将第 3 行与第 4 行混合)但我也想替换 B 行中的值,同时考虑 D 列。基本上,当 2 个值在 C 中是相同的(例如“十二月”),当 D 是“夏天”(“梨”)时,B 中的值总是被 D 是“冬天”(苹果)时 B 中的值替换 我想最后有一个这样的数据框:

  A     B        C             D
1 X apple december Winter,Summer
2 Z apple     june Winter,Summer
3 U  pear    march        Summer

当合并 2 行时,我真的想保留 D 列中的 2 个值。

有人有想法吗?

【问题讨论】:

    标签: r conditional-statements row


    【解决方案1】:

    data.table 选项

    setDT(df)[
      ,
      c(
        lapply(
          setNames(.(A, B), c("A", "B")),
          function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
        ),
        .(D = D)
      ),
      C
    ][
      ,
      lapply(.SD, function(x) toString(unique(x))),
      C
    ][,
      .SD,
      .SDcols = names(df)
    ]
    

    给予

       A     B        C              D
    1: X apple december Winter, Summer
    2: Z apple     june Winter, Summer
    3: U  pear    march         Summer
    

    数据

    > dput(df)
    structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
    "pear", "apple", "pear", "pear"), C = c("december", "december",
    "june", "june", "march"), D = c("Winter", "Summer", "Winter",
    "Summer", "Summer")), class = "data.frame", row.names = c(NA,
    -5L))
    

    【讨论】:

    • 谢谢!这运作良好。我对此还有另一个问题。如果我在 C 列中有 NA,我怎样才能让它们不被合并在一起?
    【解决方案2】:

    dplyr 的选项

    library(dplyr)
    library(tidyr)
    df %>% 
        group_by(C = tolower(C)) %>% 
        mutate(across(c(A, B), ~ if(n_distinct(D) > 1) replace(., D %in% 'Summer', NA) else
             .)) %>%
        fill(c(A, B)) %>% 
        summarise(across(c(A, B), first), D = toString(D), .groups = 'drop')
    # A tibble: 3 x 4
    #  C        A     B     D             
    #* <chr>    <chr> <chr> <chr>         
    #1 december X     apple Winter, Summer
    #2 june     Z     apple Winter, Summer
    #3 march    U     pear  Summer        
     
    

    【讨论】:

    • 谢谢。我使用与我发送的相似的 data.frame 尝试了此代码,但使用了字符而不是因子。我最终在 A 和 B 列中有 NA... 为什么?
    • @VG-29 你可以尝试转换为字符类,即df %&gt;% type.convert(as.is = TRUE) %&gt;% group_by(..
    猜你喜欢
    • 1970-01-01
    • 2023-03-22
    • 2014-06-19
    • 2023-03-08
    • 2011-03-16
    • 2018-10-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多