【问题标题】:right join with dplyr make rows columns使用 dplyr 右连接使行列
【发布时间】:2018-05-27 22:39:06
【问题描述】:

我想通过 ProductCode 正确连接 data1 和 data2,我需要得到低于所需的输出表

  data1=data.frame(ProductCode=c(1,1,1,2,2,3),region=c("A","A","A","B","B","C"))
  data1
  ProductCode region
       1      A
       1      A
       1      A
       2      B
       2      B
       3      C

   data2=data.frame(ProductCode=c(1,1,1,2,2,3),Period=c("promo1","promo2"
   ,"promo3","promo2","promo3","promo1"),promosales=c(15,12,7,18,20,2))
   data2
   ProductCode Period promosales
         1     promo1         15
         1     promo2         12
         1     promo3          7
         2     promo2         18
         2     promo3         20
         3     promo1          2 

想要的输出表

ProdcutCode region  Promo1_sales Promo2_sales Promo3_sales
     1        A          15       12               7
     2        B          18       20               0
     3        C           2        0               0

如果我用 sql 来做,我必须在那之后通过最大化每一行来分组

  sqldf("select a.*,
        case when Period='promo1' then b.promosales else 0 end as 
        Promo1_sales1,
        case when Period='promo2' then b.promosales else 0 end as 
        Promo1_sales2,
        case when Period='promo3' then b.promosales else 0 end as 
        Promo1_sales3,
        case when Period='promo4' then b.promosales else 0 end as 
        Promo1_sales4
        from data1 a
        left join data2 b on a.ProductCode=b.ProductCode
                ") 

我可以使用 dplyr 或其他方式吗?

谢谢。

【问题讨论】:

    标签: r join dplyr


    【解决方案1】:

    不确定这是否适用于您的一般情况,但您可以这样做:

    data1 <- data.frame(ProductCode=c(1,1,1,2,2,3),
                        region=c(rep('A', 3), rep('B', 2),'C'))
    data2 <- data.frame(ProductCode=c(1,1,1,2,2,3),
                        Period=c("promo1","promo2","promo3","promo2","promo3","promo1"),
                        promosales=c(15,12,7,18,20,2))
    
    
    library(dplyr)
    library(tidyr)
    
    data1 %>% 
      distinct() %>% 
      inner_join(data2, by = 'ProductCode') %>% 
      group_by(ProductCode) %>% 
      mutate(rownr = paste0('Promo', row_number(), '_sales')) %>% 
      select(-Period) %>% 
      spread(rownr, promosales, fill = 0)
    #> # A tibble: 3 x 5
    #> # Groups:   ProductCode [3]
    #>   ProductCode region Promo1_sales Promo2_sales Promo3_sales
    #>         <dbl> <fct>         <dbl>        <dbl>        <dbl>
    #> 1           1 A                15           12            7
    #> 2           2 B                18           20            0
    #> 3           3 C                 2            0            0
    

    更好的方法会更简单:

    data1 %>% 
      distinct() %>% 
      inner_join(data2, by = 'ProductCode') %>% 
      group_by(ProductCode) %>% 
      spread(Period, promosales, fill = 0)
    #> # A tibble: 3 x 5
    #> # Groups:   ProductCode [3]
    #>   ProductCode region promo1 promo2 promo3
    #>         <dbl> <fct>   <dbl>  <dbl>  <dbl>
    #> 1           1 A          15     12      7
    #> 2           2 B           0     18     20
    #> 3           3 C           2      0      0
    

    reprex package (v0.2.0) 于 2018 年 5 月 23 日创建。

    【讨论】:

      猜你喜欢
      • 2015-04-29
      • 2019-07-28
      • 1970-01-01
      • 2020-04-25
      • 2017-02-23
      • 2018-09-25
      • 2019-07-29
      • 2019-08-05
      • 2022-11-10
      相关资源
      最近更新 更多