【问题标题】:Filling the missing values by selecting the values in the previous row using R [duplicate]通过使用 R 选择上一行中的值来填充缺失值 [重复]
【发布时间】:2021-08-18 09:18:13
【问题描述】:

在我的下表中,第三列是三元字符(A、B 和 C):Table

我喜欢用前一个单元格中的值填充空单元格,除了 C:Table

数据框为:

df <- data.frame(
  'ID' = c(1,   2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
  ,'V1' = c("desc1",    "desc2",    "desc3",    "desc4",    "desc5",    "desc6",    "desc7",    "desc8",    "desc9",    "desc10",   "desc11",   "desc12",   "desc13",   "desc14",   "desc15",   "desc16",   "desc17",   "desc18",   "desc19",   "desc20",   "desc21",   "desc22",   "desc23")
  ,'V2' =c("A", "", "", "B",    "", "", "", "C",    "", "A",    "", "", "", "B",    "", "", "C",    "", "A",    "B",    "", "C",    ""))

【问题讨论】:

    标签: r


    【解决方案1】:
    library(data.table)
    library(magrittr)
    df <- data.frame(
      'ID' = c(1,   2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
      ,'V1' = c("desc1",    "desc2",    "desc3",    "desc4",    "desc5",    "desc6",    "desc7",    "desc8",    "desc9",    "desc10",   "desc11",   "desc12",   "desc13",   "desc14",   "desc15",   "desc16",   "desc17",   "desc18",   "desc19",   "desc20",   "desc21",   "desc22",   "desc23")
      ,'V2' =c("A", "", "", "B",    "", "", "", "C",    "", "A",    "", "", "", "B",    "", "", "C",    "", "A",    "B",    "", "C",    ""))
    
    df[df == ""] <- NA
    setDT(df)[, V2 := zoo::na.locf(V2)] %>% 
      .[, id := rowid(V2), by = rleid(V2)] %>% 
      .[, V2 := ifelse(V2 == "C" & id > 1, NA, V2)] %>% 
      .[, id := NULL] %>% 
      .[]
    #>     ID     V1   V2
    #>  1:  1  desc1    A
    #>  2:  2  desc2    A
    #>  3:  3  desc3    A
    #>  4:  4  desc4    B
    #>  5:  5  desc5    B
    #>  6:  6  desc6    B
    #>  7:  7  desc7    B
    #>  8:  8  desc8    C
    #>  9:  9  desc9 <NA>
    #> 10: 10 desc10    A
    #> 11: 11 desc11    A
    #> 12: 12 desc12    A
    #> 13: 13 desc13    A
    #> 14: 14 desc14    B
    #> 15: 15 desc15    B
    #> 16: 16 desc16    B
    #> 17: 17 desc17    C
    #> 18: 18 desc18 <NA>
    #> 19: 19 desc19    A
    #> 20: 20 desc20    B
    #> 21: 21 desc21    B
    #> 22: 22 desc22    C
    #> 23: 23 desc23 <NA>
    #>     ID     V1   V2
    

    reprex package (v2.0.1) 于 2021-08-18 创建

    【讨论】:

    • 好的,谢谢。但是为什么我最后会得到“type”和“res”呢?你能把它们删掉吗?
    • 针对您的示例数据进行了更正
    • 非常感谢 Yuriy Saraykin!
    【解决方案2】:

    我们可以使用tidyr 包中的fill 函数并结合ifelse 条件:

    library(dplyr)
    library(tidyr)
    df %>%  
      mutate(V3 =  V2) %>% 
      fill(V3) %>% 
      mutate(V3 = ifelse(V3 == "C", V2, V3))
             
    

    输出:

          ID V1     V2    V3   
       <dbl> <chr>  <chr> <chr>
     1     1 desc1  A     A    
     2     2 desc2  NA    A    
     3     3 desc3  NA    A    
     4     4 desc4  B     B    
     5     5 desc5  NA    B    
     6     6 desc6  NA    B    
     7     7 desc7  NA    B    
     8     8 desc8  C     C    
     9     9 desc9  NA    NA   
    10    10 desc10 A     A    
    # … with 13 more rows
    

    【讨论】:

    • 为什么我得不到和你一样的结果?对我来说,V3 和 V2 完全一样!!
    • 缺少一个步骤,fill() 需要 NA 而不是 "" 才能工作。例如在mutate(V3 = V2) %&gt;% 之后的下一行需要是mutate(V3 = ifelse(V3 == "", NA, V3)) %&gt;%
    • 你加载了所有的包吗? library(tidyr).
    • 我想我已经解决了这个问题。你的表中有 NA。
    • 非常感谢 TarJae。我也使用你的解决方案。谢谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-03-04
    • 2016-07-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多