【问题标题】:Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?当我在`dplyr`之后加载`plyr`时,为什么汇总或变异不适用于group_by?
【发布时间】:2021-11-21 18:35:38
【问题描述】:

注意:此问题的标题已被编辑,使其成为plyr 函数掩盖其dplyr 对应项时的问题的规范问题。其余问题保持不变。


假设我有以下数据:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

有了旧的plyr,我可以创建一个小表格,用以下代码汇总我的数据:

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

输出如下所示:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33

我正在尝试将我的代码移动到 dplyr%&gt;% 运算符。我的代码采用 DF,然后按组和性别对其进行分组,然后对其进行总结。那就是:

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

但我的输出是:

  mean   sd
1 35.56 9.92

我做错了什么?

【问题讨论】:

    标签: r dplyr plyr r-faq


    【解决方案1】:

    这里的问题是你是先加载dplyr再加载plyr,所以plyr的函数summarise屏蔽了dplyr的函数summarise。发生这种情况时,您会收到以下警告:

    library(plyr)
        Loading required package: plyr
    ------------------------------------------------------------------------------------------
    You have loaded plyr after dplyr - this is likely to cause problems.
    If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
    library(plyr); library(dplyr)
    ------------------------------------------------------------------------------------------
    
    Attaching package: ‘plyr’
    
    The following objects are masked from ‘package:dplyr’:
    
        arrange, desc, failwith, id, mutate, summarise, summarize
    

    因此,为了让您的代码正常工作,请分离 plyr detach(package:plyr) 或重新启动 R 并先加载 plyr,然后再加载 dplyr(或仅加载 dplyr):

    library(dplyr)
    dfx %>% group_by(group, sex) %>% 
      summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
    Source: local data frame [6 x 4]
    Groups: group
    
      group sex  mean    sd
    1     A   F 41.51  8.24
    2     A   M 32.23 11.85
    3     B   F 38.79 11.93
    4     B   M 31.00  7.92
    5     C   F 24.97  7.46
    6     C   M 36.17  9.11
    

    或者你可以在你的代码中显式调用dplyr的summary,这样无论你如何加载包都会调用正确的函数:

    dfx %>% group_by(group, sex) %>% 
      dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
    

    【讨论】:

    • 我不明白为什么很少有人注意到这个警告:/
    • @hadley fortunes::fortune(9)
    【解决方案2】:

    由于您加载“plyr”和“dplyr”的顺序,您的代码正在调用plyr::summarise 而不是dplyr::summarise

    演示:

    library(dplyr) ## I'm guessing this is the order you loaded
    library(plyr)
    dfx %>% group_by(group, sex) %>% 
      summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
    #    mean   sd
    # 1 36.88 9.76
    dfx %>% group_by(group, sex) %>% 
      dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
    # Source: local data frame [6 x 4]
    # Groups: group
    # 
    #   group sex  mean    sd
    # 1     A   F 32.17  6.30
    # 2     A   M 30.98  7.37
    # 3     B   F 38.20  7.67
    # 4     B   M 33.12 12.24
    # 5     C   F 43.91 10.31
    # 6     C   M 47.53  8.25
    

    【讨论】:

      猜你喜欢
      • 2016-09-25
      • 2021-12-15
      • 2022-01-10
      • 1970-01-01
      相关资源
      最近更新 更多