【问题标题】:Conditional aggregating by column pairs in RR中的列对条件聚合
【发布时间】:2020-03-23 11:57:22
【问题描述】:

更新:我已经更新了这个例子,因为它不够清楚。

我正在尝试根据条件在数据框的 R 列中进行聚合。 我的数据框如下所示:

df <- data.frame(year = rep(2005, 8),
             id = 1:8,
             crash_x = c(0, 2, 0, 0, 4, 0,1,2),
             crash_y = c(1, 0, 0, 0, 0, 1,0,0),
             crash_z = c(0, 0, 3, 1, 0, 0,0,0),
             injured_x = c(0, 1, 0, 0, 3, 0,0,0),
             injured_y = c(0, 0, 2, 1, 0, 0,1,2),
             injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1    0       1       0         0        0          3
2005 2    2       0       0         1        0          0
2005 3    0       0       3         0        2          0
2005 4    0       0       1         0        1          0
2005 5    4       0       0         3        0          0
2005 6    0       1       0         0        0          2
2005 7    1       0       0         0        1          0
2005 8    2       0       0         0        2          0

我想对列求和,条件是共享相同后缀(x、y 或 z)的列 crash_injured_ 的数字大于0 在同一行,例如第 1 行和第 6 行、第 3 和第 4 行、第 2 和第 5 行、第 7 和第 8 行等。

输出应如下所示:

year crash_x crash_y crash_z injured_x injured_y injured_z
2005     0       2       0         0        0          5
2005     6       0       0         4        0          0
2005     0       0       4         0        3          0
2005     3       0       0         0        3          0

这可能吗?谢谢!!

【问题讨论】:

  • 抱歉,我很困惑:“如果 columns crash_ 和受伤_ 具有相同的 non_null columns”?
  • 您希望如何聚合列——通过计算sum
  • 谢谢,我没有意识到我写了两次“列”,我已经更新了问题!

标签: r if-statement aggregate


【解决方案1】:

此解决方案首先使用 0 和非 0 值的“模式”创建一个新列:

df <- data.frame(year = rep(2005, 8),
                 id = 1:8,
                 crash_x = c(0, 2, 0, 0, 4, 0,1,2),
                 crash_y = c(1, 0, 0, 0, 0, 1,0,0),
                 crash_z = c(0, 0, 3, 1, 0, 0,0,0),
                 injured_x = c(0, 1, 0, 0, 3, 0,0,0),
                 injured_y = c(0, 0, 2, 1, 0, 0,1,2),
                 injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

df %<>% unite("pattern", c(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), remove = FALSE) %>%
  mutate(pattern = gsub("[1-9]", "1", pattern))

然后根据模式组用dplyr对每一列进行汇总:

df %>% group_by(pattern, year) %>% 
  summarise_at(vars(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), sum)

【讨论】:

    【解决方案2】:

    最简单的方法是重塑(基础 R 变体):

    library(reshape2)
    
    d <- read.table(text = "year id crash_x crash_y crash_z injured_x injured_y injured_z
    2005 1    0       1       0         0        0          3
    2005 2    2       0       0         1        0          0
    2005 3    0       0       3         0        2          0
    2005 4    0       0       1         0        1          0
    2005 5    4       0       0         3        0          0
    2005 6    0       1       0         0        0          2", header = T, stringsAsFactors = F)
    
    want <- melt(subset(d, select = -id), id.vars = "year", variable.name = "crash", value.name = "val")
    want$postfix <- gsub("(^crash_)|(^injured_)", "", want$crash)
    want <- aggregate(val ~ crash + year + postfix, want, sum)
    dcast(want, year + postfix ~ crash, value.var = "val", fill = 0)
    
    #  year postfix crash_x crash_y crash_z injured_x injured_y injured_z
    #1 2005       x       6       0       0         4         0         0
    #2 2005       y       0       2       0         0         3         0
    #3 2005       z       0       0       4         0         0         5
    

    【讨论】:

    • 感谢您的回答!!实际上我对我想根据数字的位置聚合行的问题不够清楚。如果数字在同一个地方,那么我应该聚合行,我会尝试调整你的代码。
    猜你喜欢
    • 2020-02-18
    • 1970-01-01
    • 1970-01-01
    • 2019-05-07
    • 2016-10-02
    • 1970-01-01
    • 2021-04-09
    • 2015-08-17
    • 2015-10-29
    相关资源
    最近更新 更多