【问题标题】:How to create a crosstab table using two tables in R?如何使用 R 中的两个表创建交叉表?
【发布时间】:2018-07-06 18:41:23
【问题描述】:

我的excel数据集如下:

Weight Quantity Price
72       5      460
73       8      720
75       20     830
95       2      490
91       15     680
82       14     340
88       30     250
89       6      770
78       27     820
98       24     940
99       29     825

我想获得一个权重与数量数据透视表,其中每个类别的价格总和如下:

        0-10     10-20     20-30
70-80   1180     830        820
80-90   770      340        250
90-100  490      680        1765

我使用dplyr 包为各个类别创建了两个表来获取平均值和计数,如下所示:

table1 <- group_by(dataset, Weight = cut(Weight, breaks = c(70,80,90,100))
result1 <- summarise(table1, Count = n(), Avg_Price = mean(Price, na.rm = T))
table2 <- group_by(dataset, Quantity = cut(Quantity, breaks = c(0,10,20,30))
result2 <- summarise(table2, Count = n(), Avg_Price = mean(Price, na.rm = T))

现在,我如何使用 table1 和 table2 来创建一个交叉表?

【问题讨论】:

    标签: r pivot-table


    【解决方案1】:

    也许以下就是您想要的。它像你一样使用cut,然后使用xtabs

    Weight = cut(dataset$Weight, breaks = c(70,80,90,100))
    Quantity = cut(dataset$Quantity, breaks = c(0,10,20,30))
    dt2 <- data.frame(Weight, Quantity, Price = dataset$Price)
    xtabs(Price ~ Weight + Quantity, dt2)
    #          Quantity
    #Weight     (0,10] (10,20] (20,30]
    #  (70,80]    1180     830     820
    #  (80,90]     770     340     250
    #  (90,100]    490     680    1765
    

    【讨论】:

      【解决方案2】:

      dplyrtidyr 解决方案:

      library(dplyr)
      library(tidyr)
      
      df %>% 
        mutate(Weight = cut(Weight, breaks = c(70,80,90,100)),
               Quantity = cut(Quantity, breaks = c(0,10,20,30))) %>% 
        group_by(Weight, Quantity) %>% 
        summarise(Price = sum(Price)) %>% 
        spread(Quantity, Price)
      
      # A tibble: 3 x 4
      # Groups:   Weight [3]
        Weight   `(0,10]` `(10,20]` `(20,30]`
      * <fct>       <int>     <int>     <int>
      1 (70,80]      1180       830       820
      2 (80,90]       770       340       250
      3 (90,100]      490       680      1765
      

      数据:

      df <- structure(list(Weight = c(72L, 73L, 75L, 95L, 91L, 82L, 88L, 
      89L, 78L, 98L, 99L), Quantity = c(5L, 8L, 20L, 2L, 15L, 14L, 
      30L, 6L, 27L, 24L, 29L), Price = c(460L, 720L, 830L, 490L, 680L, 
      340L, 250L, 770L, 820L, 940L, 825L)), .Names = c("Weight", "Quantity", 
      "Price"), class = "data.frame", row.names = c(NA, -11L))           
      

      【讨论】:

      • 这行得通。 @phiver - 你能解释一下'mutate'和'spread'在这里做什么吗?为什么要在数量和价格上传播
      猜你喜欢
      • 2022-06-14
      • 1970-01-01
      • 1970-01-01
      • 2014-04-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-13
      • 2012-05-17
      相关资源
      最近更新 更多