【问题标题】:Expand a datatable adding new rows and replacing NA values by group展开数据表,添加新行并按组替换 NA 值
【发布时间】:2020-06-15 12:50:03
【问题描述】:

我正在尝试基于列扩展数据表(在下面的示例 Month 中)并按组填充空值(Group)。以dt为例;

set.seed(0)
dt<-data.table(ID=c(1:10),Month=sample(1:10,replace = F),Group=c("A","B","C","A","B","C","A","B",'A','A'))
dt[1:4,":="(Income=rnorm(4),Tax=rnorm(4),Birth=sample(seq(as.POSIXct('2000/01/01'), as.POSIXct('2002/05/01'), by="day"), 4))]

我想扩展表格,以便每个 Group 有 10 行,这样每一行的列 Month 的值从 1 到 10。剩余列(IncomeTaxBirth) 应根据现有行填充。 NA 应该采用最近的“月份”的值。所以对于GroupA,数据表应该有 10 行,如下所示(即最终的数据表每组总共应该有 10 行):

dt_desired<-data.table(
  ID=rep(1:10),
  Group=rep("A",10),
  Income=c(rep(dt[Group=='A'&Month==1]$Income,8),rep(dt[Group=='A'&Month==9]$Income,2)),
  Tax=c(rep(dt[Group=='A'&Month==1]$Tax,8),rep(dt[Group=='A'&Month==9]$Tax,2)),
  Birth=c(rep(dt[Group=='A'&Month==1]$Birth,8),rep(dt[Group=='A'&Month==9]$Birth,2))
  )

【问题讨论】:

    标签: r datatable


    【解决方案1】:

    据我所知,data.table::nafill() 无法处理非数字列(还没有?),所以我不得不改用zoo:na.locf()..

    library( data.table )
    
    #first, create CJ, then perform update join
    ans <- CJ( Group = dt$Group, Month = dt$Month, unique = TRUE )[ dt, 
                                                     `:=`( Income = i.Income, Tax = i.Tax, Birth = i.Birth ),
                                                     on = .( Group, Month ) ]
    #columns to fill NA's
    cols = names(ans)[-(1:2)]
    #for locf
    ans[, (cols) := lapply( .SD, zoo::na.locf, na.rm = FALSE ), by = Group, .SDcols = cols]
    #for  nocb
    ans[, (cols) := lapply( .SD, zoo::na.locf, na.rm = FALSE, fromLast = TRUE ), by = Group, .SDcols = cols][]
    

    【讨论】:

      猜你喜欢
      • 2017-10-28
      • 2019-05-03
      • 2019-11-26
      • 1970-01-01
      • 2020-10-13
      • 2014-06-28
      • 2021-07-08
      • 1970-01-01
      相关资源
      最近更新 更多