【问题标题】:data frame grouping and column transforming using dplyr使用 dplyr 进行数据框分组和列转换
【发布时间】:2021-07-24 07:17:59
【问题描述】:

我有以下df:

df<-data.frame(geokey=c("A","A","A","A","A","A","B","B","B","B","B","B"),
               upc=c("100","100","101","101","102","102","200","200","201","201",
                     "202","202"),
                endwk=c("14-07-2021","21-07-2021","14-07-2021","21-07-2021","14-07-2021","21-07-2021",
                               "14-07-2021","21-07-2021","14-07-2021","21-07-2021","14-07-2021","21-07-2021"),
                Base_units=c(2,3,1,2,4,1,1,4,2,3,3,2),
                Base_price=c(0.1,0.2,0.2,0.1,0.1,0.1,0.2,0.3,0.4,0.1,0.2,0.3),
                Incr_units=c(2,1,1,1,2,1,3,2,2,3,1,1),
                incr_price=c(0.1,0.1,0.1,0.3,0.2,0.1,0.1,0.2,0.1,0.2,0.1,0.2))
> df
   geokey upc      endwk Base_units Base_price Incr_units incr_price
1       A 100 14-07-2021          2        0.1          2        0.1
2       A 100 21-07-2021          3        0.2          1        0.1
3       A 101 14-07-2021          1        0.2          1        0.1
4       A 101 21-07-2021          2        0.1          1        0.3
5       A 102 14-07-2021          4        0.1          2        0.2
6       A 102 21-07-2021          1        0.1          1        0.1
7       B 200 14-07-2021          1        0.2          3        0.1
8       B 200 21-07-2021          4        0.3          2        0.2
9       B 201 14-07-2021          2        0.4          2        0.1
10      B 201 21-07-2021          3        0.1          3        0.2
11      B 202 14-07-2021          3        0.2          1        0.1
12      B 202 21-07-2021          2        0.3          1        0.2

预期输出---> 按 geokey--upc---endwk 分组,所有 vol cols 合计(添加)和价格列平均如下所示:

df_merged<-data.frame(geokey=c("A","A","B","B"),
               upc=c("upc_100_101_102","upc_100_101_102","upc_200_201_202","upc_200_201_202"),
               endwk=c("14-07-2021","21-07-2021","14-07-2021","21-07-2021"),
               Base_units_totalled=c(7,6,6,9),
               Base_price_averaged=c(0.133,0.133,0.2667,0.2333),
               Incr_units_totalled=c(5,3,3,6),
               incr_price_averaged=c(0.1333,0.1,0.1,0.2))

> df_merged
  geokey             upc      endwk Base_units_totalled Base_price_averaged Incr_units_totalled incr_price_averaged
1      A upc_100_101_102 14-07-2021                   7              0.1330                   5              0.1333
2      A upc_100_101_102 21-07-2021                   6              0.1330                   3              0.1000
3      B upc_200_201_202 14-07-2021                   6              0.2667                   3              0.1000
4      B upc_200_201_202 21-07-2021                   9              0.2333                   6              0.2000

我们将不胜感激。

【问题讨论】:

    标签: r dplyr grouping


    【解决方案1】:

    我想您想总结upc 列而不是按它分组?

    library(dplyr)
    group_by(geokey, endwk) %>%
    summarize(upc = paste0("upc_", paste(upc, collapse = "_")),
            across(contains("units"), sum),
            across(contains("price"), mean), .groups = "drop")
    
    
    # A tibble: 4 x 7
      geokey endwk      upc             Base_units Incr_units Base_price incr_price
    * <chr>  <chr>      <chr>                <dbl>      <dbl>      <dbl>      <dbl>
    1 A      14-07-2021 upc_100_101_102          7          5      0.133      0.133
    2 A      21-07-2021 upc_100_101_102          6          3      0.133      0.167
    3 B      14-07-2021 upc_200_201_202          6          6      0.267      0.1  
    4 B      21-07-2021 upc_200_201_202          9          6      0.233      0.2 
    

    【讨论】:

    • 我得到了一个不同的 A | incr_price 平均值中的 21-07-2021,您可能有错别字。
    • 嘿,这正是我想要的……太棒了!!
    猜你喜欢
    • 2019-06-29
    • 2022-11-21
    • 1970-01-01
    • 2015-08-13
    • 1970-01-01
    • 2019-11-18
    • 1970-01-01
    • 2019-08-27
    • 2022-01-19
    相关资源
    最近更新 更多