【问题标题】:Difference between rows in R on dataframe grouped by column按列分组的数据框上 R 中的行之间的差异
【发布时间】:2015-10-10 13:05:00
【问题描述】:

我希望通过 app_name 按版本获取计数差异。我的数据集如下所示:app_name, version_id, count, [difference]

这是数据集

    data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id", 
"count"), class = "data.frame", row.names = c(NA, -9L))

鉴于此 data.frame,我如何通过 app_name 和 version_id 获得计数的滞后差异?每个应用程序的初始(第一个)版本差异为零,因为没有区别。

以下是最终结果与最终“差异”列的示例

structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25, 
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff"
), class = "data.frame", row.names = c(NA, -9L))

【问题讨论】:

标签: r dataframe diff lag


【解决方案1】:

尝试使用dplyrlag

library(dplyr)
data %>% group_by(app_name) %>%
         mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]),
                diffcount = count - dplyr::lag(count, default = count[1]))

Source: local data frame [9 x 5]
Groups: app_name [3]

  app_name version_id count diffvers diffcount
    (fctr)      (dbl) (int)    (dbl)     (int)
1        a        1.0   600      0.0         0
2        a        1.1   620      0.1        20
3        a        2.3   620      1.2         0
4        b        2.0   200      0.0         0
5        b        3.1   200      1.1         0
6        b        3.3   250      0.2        50
7        b        4.0   250      0.7         0
8        c        1.1    15      0.0         0
9        c        2.4    36      1.3        21

【讨论】:

    【解决方案2】:

    我们可以使用data.table。我们将'data.frame'转换为'data.table'(setDT(data)),按'app_name'分组,循环(lapply(...SDcols中指定的列,得到当前元素和它的区别lagshift 默认具有type='lag')并分配(:=)输出以创建新列。

    library(data.table)#v1.9.6
    setDT(data)[, c('diffvers', 'diffcount') := lapply(.SD, 
                  function(x) x-shift(x, fill=x[1L])), by = app_name, .SDcols=2:3]
    
    data
    #   app_name version_id count diffvers diffcount
    #1:        a        1.0   600      0.0         0
    #2:        a        1.1   620      0.1        20
    #3:        a        2.3   620      1.2         0
    #4:        b        2.0   200      0.0         0
    #5:        b        3.1   200      1.1         0
    #6:        b        3.3   250      0.2        50
    #7:        b        4.0   250      0.7         0
    #8:        c        1.1    15      0.0         0
    #9:        c        2.4    36      1.3        21
    

    【讨论】:

      猜你喜欢
      • 2018-09-28
      • 1970-01-01
      • 2017-04-30
      • 1970-01-01
      • 1970-01-01
      • 2013-07-12
      • 2019-07-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多