【问题标题】:Sum column using lapply data.table and keep other columns使用 lapply data.table 对列求和并保留其他列
【发布时间】:2018-03-07 15:26:36
【问题描述】:

我有一个大型数据集,我想使用 lapply 函数对 1 列求和。但我有问题,其他列消失了。我想保留它们。

我有一个例子给你:)

示例:

我有这个数据集

   X  Y   Z     date     columnSum
1: A  a1  z1   2018.01         4
2: A  a1  z1   2018.01         4
2: B  a2  z3   2018.02        10
2: B  a2  z5   2018.02        30
2: B  a2  z5   2018.02        10
3: C  a2  z3   2018.02        10
4: D  a3  z4   2018.03         0
4: D  a3  z6   2018.03         0

我想将“columnSum”与“X”、“Y”和“date”相加.我想保留列“Z

我试过了:

DT[, lapply(.SD,sum,na.rm=TRUE), .SDcols="columnSum", by=list(X,Y,date)]

今天我有这个结果

   X  Y   date    columnSum
1: A  a1  2018.01         8
2: B  a2  2018.02        50
3: C  a2  2018.02        10
4: D  a3  2018.03         0

我想要这个结果

   X  Y   Z     date     columnSum
1: A  a1  z1   2018.01         8
2: B  a2  z3   2018.02        50
3: B  a2  z5   2018.02        50
4: C  a2  z3   2018.02        10
5: D  a3  z4   2018.03         0
6: D  a3  z6   2018.03         0

【问题讨论】:

    标签: r sum data.table lapply


    【解决方案1】:
    df <- read.table(text = "X  Y   Z     date     columnSum
     A  a1  z1   2018.01         4
                     A  a1  z1   2018.01         4
                     B  a2  z3   2018.02        10
                     B  a2  z5   2018.02        30
                     B  a2  z5   2018.02        10
                     C  a2  z3   2018.02        10
                     D  a3  z4   2018.03         0
                      D  a3  z6   2018.03         0", 
                     header = TRUE, stringsAsFactors = FALSE)
    library(data.table)
    
    setDT(df)
    df[, columnSum := sum(columnSum), by = c("X", "Y", "date")] # summing columnSum by X, Y, date and retaining column Z
    df <- unique(df) # filtering duplicate records
    
       #    X  Y  Z    date columnSum
       # 1: A a1 z1 2018.01         8
       # 2: B a2 z3 2018.02        50
       # 3: B a2 z5 2018.02        50
       # 4: C a2 z3 2018.02        10
       # 5: D a3 z4 2018.03         0
       # 6: D a3 z6 2018.03         0
    

    【讨论】:

      【解决方案2】:

      这行得通:

      # recreated your example
      DT <- data.table(X = c("A", "A", "B", "B", "B", "C", "D", "D"),
                       Y = c("a1", "a1", "a2", "a2", "a2", "a2", "a3", "a3"),
                       Z = c("z1", "z1", "z3", "z5", "z5", "z3", "z4", "z6"),
                       date = c("2018.01", "2018.01", "2018.02", "2018.02", 
                                "2018.02", "2018.02", "2018.03", "2018.03"),
                       columnSum = c(4, 4, 10, 30, 10, 10, 0, 0))
      
      sums <- DT[, sum(columnSum), .(X, Y, date)]
      keep <- unique(DT[, .(X, Y, Z, date)])
      merge.data.frame(keep, sums)
      

      【讨论】:

        【解决方案3】:
        library(dplyr)
        practiceData <- tibble(X=c("A","A","B","B","B","C","D","D"),
                           Y=c('a1', 'a1', 'a2','a2', 'a2', 'a2','a3','a3'),  
                           Z=c('z1','z1', 'z3', 'z5','z5','z3','z4','z6'),
                           date=c('2018.01',
                                  '2018.01',
                                  '2018.02',
                                  '2018.02',
                                  '2018.02',
                                  '2018.02',
                                  '2018.03',
                                  '2018.03'),
                           U=c(4,4,10,30,10,10,0,0))
        
        NEW <- practiceData %>% group_by(X,Y,date) %>% summarise(colsumnew=sum(U)) 
        lol <- unique(practiceData[,c(1,2,3,4)]) %>% data.frame()
        lol2 <- left_join(NEW,lol) %>% unique()
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2017-04-06
          • 2015-09-12
          • 1970-01-01
          • 1970-01-01
          • 2016-12-29
          • 2020-12-18
          • 1970-01-01
          相关资源
          最近更新 更多