如何使用 R 中的两个表创建交叉表？答案

【问题标题】：How to create a crosstab table using two tables in R?如何使用 R 中的两个表创建交叉表？
【发布时间】：2018-07-06 18:41:23
【问题描述】：

我的excel数据集如下：

Weight Quantity Price
72       5      460
73       8      720
75       20     830
95       2      490
91       15     680
82       14     340
88       30     250
89       6      770
78       27     820
98       24     940
99       29     825

我想获得一个权重与数量数据透视表，其中每个类别的价格总和如下：

        0-10     10-20     20-30
70-80   1180     830        820
80-90   770      340        250
90-100  490      680        1765

我使用dplyr 包为各个类别创建了两个表来获取平均值和计数，如下所示：

table1 <- group_by(dataset, Weight = cut(Weight, breaks = c(70,80,90,100))
result1 <- summarise(table1, Count = n(), Avg_Price = mean(Price, na.rm = T))
table2 <- group_by(dataset, Quantity = cut(Quantity, breaks = c(0,10,20,30))
result2 <- summarise(table2, Count = n(), Avg_Price = mean(Price, na.rm = T))

现在，我如何使用 table1 和 table2 来创建一个交叉表？

【问题讨论】：

标签： r pivot-table

【解决方案1】：

也许以下就是您想要的。它像你一样使用cut，然后使用xtabs。

Weight = cut(dataset$Weight, breaks = c(70,80,90,100))
Quantity = cut(dataset$Quantity, breaks = c(0,10,20,30))
dt2 <- data.frame(Weight, Quantity, Price = dataset$Price)
xtabs(Price ~ Weight + Quantity, dt2)
#          Quantity
#Weight     (0,10] (10,20] (20,30]
#  (70,80]    1180     830     820
#  (80,90]     770     340     250
#  (90,100]    490     680    1765

【讨论】：

【解决方案2】：

dplyr 和 tidyr 解决方案：

library(dplyr)
library(tidyr)

df %>% 
  mutate(Weight = cut(Weight, breaks = c(70,80,90,100)),
         Quantity = cut(Quantity, breaks = c(0,10,20,30))) %>% 
  group_by(Weight, Quantity) %>% 
  summarise(Price = sum(Price)) %>% 
  spread(Quantity, Price)

# A tibble: 3 x 4
# Groups:   Weight [3]
  Weight   `(0,10]` `(10,20]` `(20,30]`
* <fct>       <int>     <int>     <int>
1 (70,80]      1180       830       820
2 (80,90]       770       340       250
3 (90,100]      490       680      1765

数据：

df <- structure(list(Weight = c(72L, 73L, 75L, 95L, 91L, 82L, 88L, 
89L, 78L, 98L, 99L), Quantity = c(5L, 8L, 20L, 2L, 15L, 14L, 
30L, 6L, 27L, 24L, 29L), Price = c(460L, 720L, 830L, 490L, 680L, 
340L, 250L, 770L, 820L, 940L, 825L)), .Names = c("Weight", "Quantity", 
"Price"), class = "data.frame", row.names = c(NA, -11L))

【讨论】：

这行得通。 @phiver - 你能解释一下'mutate'和'spread'在这里做什么吗？为什么要在数量和价格上传播