在 R 中透视数据答案

【问题标题】：Pivoting data in R在 R 中透视数据
【发布时间】：2015-05-12 17:55:26
【问题描述】：

我有一个数据框：

dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat

       date val type
1 2015-01-01  10    A
2 2015-01-01  20    A
3 2015-01-01  30    B
4 2015-01-01  50    C
5 2015-02-02 300    A
6 2015-02-02 100    A
7 2015-02-02 200    B
8 2015-02-02 200    C
9 2015-02-02 400    C

我希望每天有一行按类型显示平均值，因此输出将是：

Date           A     B     C
2015-01-01    15     30    50
2015-02-02    200     200   300

另外，我将如何获得计数，所以结果是：

Date           A     B     C
2015-01-01    2      1     1
2015-02-02    2      1     2

【问题讨论】：

在 R 语言中，这称为“聚合”。 aggregate 函数对此非常有帮助。

标签： r

【解决方案1】：

library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")

#         date   A   B   C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

对于dcast，公式的LHS 定义行，RHS 定义列，value.var 是成为值的列的名称，fun.aggregate 是计算这些值的方式。默认的fun.aggregate 是length，即值的数量。你问的是平均值，所以我们使用mean。您也可以使用min、max、sd、IQR 或任何接受向量并返回单个值的函数。

【讨论】：

不错。正要发布同样的东西。
嗨@Gregor，谢谢。如果我想要行数而不是平均值怎么办？谢谢！
@user3022875 但是，您显示了预期的输出。默认情况下，你会得到长度（如果你不指定任何函数）
我只是在问如何获得计数。
@user3022875: fun.aggregate = length

【解决方案2】：

您也可以使用table 更新问题

  table(dat[c(1,3)])
  #            type
  #date       A B C
  #2015-01-01 2 1 1
  #2015-02-02 2 1 2

对于第一个问题，我认为@Gregor 的解决方案是最好的（到目前为止），dplyr/tidyr 的可能选项是

 library(dplyr)
 library(tidyr)
 dat %>%
    group_by(date,type) %>%
    summarise(val=mean(val)) %>% 
    spread(type, val)

或者base R 选项将是（nchar=50 和dcast(.. nchar=44。所以还不错:-)）

  with(dat, tapply(val, list(date, type), FUN=mean))
  #            A   B   C
  #2015-01-01  15  30  50
  #2015-02-02 200 200 300

【讨论】：

我喜欢 tapply 一个，可能是完成这项任务的最佳基础 R 解决方案。
我同意@DavidArenburg，干得好。我特别喜欢 table 的简单行计数。
感谢两位慷慨的 cmets

【解决方案3】：

我个人会使用 Gregor 的解决方案，使用 reshape2。但为了完整起见，我将包含一个基本的 R 解决方案。

agg <- with(dat, aggregate(val, by = list(date = date, type = type), FUN = mean))

out <- reshape(agg, timevar = "type", idvar = "date", direction = "wide")

out
#         date x.A x.B x.C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

如果你想去掉列名上的x.，你可以用gsub删除它。

colnames(out) <- gsub("^x\\.", "", colnames(out))

要获取行数，请在对aggregate 的调用中将FUN = mean 替换为FUN = length。

【讨论】：

【解决方案4】：

使用data.table v1.9.5（当前开发），我们可以这样做：

require(data.table) ## v1.9.5+
dcast(setDT(dat), date ~ type, fun = list(mean, length), value.var="val")
#          date A_mean_val B_mean_val C_mean_val A_length_val B_length_val C_length_val
# 1: 2015-01-01         15         30         50            2            1            1
# 2: 2015-02-02        200        200        300            2            1            2

安装说明here.

【讨论】：

【解决方案5】：

我将添加pivot_wider 解决方案，该解决方案旨在替换早期的tidyverse 选项，即

将pivot_wider 与values_fn 选项一起使用，我们可以执行以下操作：

library(tidyr) # At least 1.0.0

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = mean))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300

和

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = length))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

当然，如果我们想变得花哨，我们可以同时做到：

library(purrr)
library(rlang)

map(quos(mean, length), 
    ~pivot_wider(dat, names_from = type, values_from = val, values_fn = list(val = eval_tidy(.))))
#> [[1]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

^{由reprex package (v0.3.0) 于 2019 年 12 月 4 日创建}

请注意，如果您担心速度，it may be worth updating to the dev version of tidyr。

【讨论】：