【问题标题】:Reduce a data frame by combining like rows according to two qualitative factors通过根据两个定性因素组合相似的行来减少数据框
【发布时间】:2021-02-12 04:38:32
【问题描述】:

我有一个如下的数据框:

observations<- data.frame(X=c("00KS089001","00KS089001","00KS089002","00KS089002","00KS089003","00KS089003","00KS105001","00KS105001", "00KS177011","00KS177011","00P0006","00P006","00P006","00P006"), hzdept = c(0,20,0,15,0,13,0,20,0,16,0,6,13,29), hzdepb = c(20,30,15,30,13,30,20,30,16,30,6,13,29,30),Y=c("Red","White","Red","White","Green","Red","Red","Blue", "Black","Black","Red","White","White","White"), Z = c(0.67,0.33,0.5,0.5,0.43,0.57,0.67,0.33,0.53,0.47,0.2,0.23,0.53,0.04))

我希望能够减少这种情况,以便任何时候 X 和 Y 对于两行相同,观察结果被合并,即

data.frame(X=c("00KS089001","00KS089001","00KS089002","00KS089002","00KS089003","00KS089003","00KS105001","00KS105001", "00KS177011","00P0006","00P006"), hzdept = c(0,20,0,15,0,13,0,20,0,0,6), hzdepb = c(20,30,15,30,13,30,20,30,30,6,30),Y=c("Red","White","Red","White","Green","Red","Red","Blue", "Black","Red","White"), Z = c(0.67,0.33,0.5,0.5,0.43,0.57,0.67,0.33,1.00,0.20,0.80))

关于如何最好地解决这个问题有什么建议吗?

【问题讨论】:

  • 你想如何组合它们?你如何得到你的 z 值并不明显
  • 对不起,我在 Z 栏打错了;应该更直观的看到。我想要的是把它们结合起来。 hzdept 成为同类行分组中的最小值,hzdepb 成为最大值,Z 成为组中所有 Z 的总和。

标签: r dataframe dplyr plyr


【解决方案1】:

编辑:好的,现在我从您上面的评论中看到 hzdepthzdepb 应该如何组合:

library(tidyverse)

df <- observations %>% count(X,Y,wt = Z,name = "Z")

df_hzdept <- observations %>%
   arrange(hzdept) %>%
   distinct(X,Y,.keep_all = T) %>%
   select(X,Y,hzdept)

df_hzdepb <- observations %>%
   arrange(desc(hzdepb)) %>%
   distinct(X,Y,.keep_all = T) %>%
   select(X,Y,hzdepb)

df <- df %>% left_join(df_hzdept) %>% left_join(df_hzdepb)

【讨论】:

    【解决方案2】:

    使用 dplyr

    以下是按两列分组并使用数据框中其他列的最小值、最大值和总和进行汇总的方法:

    library(magrittr) # For the pipe: %>% 
    observations %>%
        dplyr::group_by(X, Y) %>%
        dplyr::summarise(hzdept  = min(hzdept),
                         hzdepb  = max(hzdepb),
                         Z = sum(Z), .groups = 'drop')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-08-01
      • 1970-01-01
      • 2021-09-17
      • 2021-01-26
      • 2022-01-20
      • 2016-10-03
      • 1970-01-01
      相关资源
      最近更新 更多