【问题标题】:Resampling and looping using dplyr is not working [duplicate]使用 dplyr 重新采样和循环不起作用 [重复]
【发布时间】:2019-09-15 19:59:49
【问题描述】:

我之前曾问过this question 关于使用 dplyr 函数进行重采样和循环的问题。 接受的解决方案早些时候工作得很好,但不是给出 8000 个值,而是只产生一个均值和方差值。我的 R 也一直在向我抛出与“stringi”包相关的错误,即使安装了它,也很难识别它。我想知道这两者有关系吗? 如果不相关,我怎样才能获得这 8000 个值而不是 1 个均值和方差值?

我当前运行的代码是:

  library(dplyr)
  fertilizer <- c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P")

    crop <- c("alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group")

    level <- c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low")

    growth <- c(0,0,1,2,90,5,2,5,8,55,1,90,2,4,66,80,1,90,2,33,56,70,99,100,66,80,1,90,2,33,0,0,1,2,90,5,2,2,5,8,55,1,90,2,4,66,0,0)

    dat <- data.frame(fertilizer, crop, level, growth)
    dat %>% 
      group_by(fertilizer, crop, level) %>% 
      sample_n(3*1000, replace = T) %>% 
      mutate(sample_id = rep(1:1000, each = 3)) %>% 
      group_by(sample_id, add = TRUE) %>% 
      summarise(
        mean = mean(growth, na.rm = T),
        var = sd(growth)^2
      ) %>% 
      ungroup()

【问题讨论】:

  • 不清楚您的期望。如果您将sample_id 与其他组一起添加为分组变量,请检查每个组的计数。 dat %&gt;% + group_by(fertilizer, crop, level) %&gt;% + sample_n(3*1000, replace = T) %&gt;% + mutate(sample_id = rep(1:1000, each = 3)) %&gt;% ungroup %&gt;% count(fertilizer, crop, level, sample_id) # A tibble: 8,000 x 5 这意味着你会得到8000 mean 和 sd 的值
  • 我的问题是当我运行上面的代码时:在 ungroup() 之后为什么没有显示 8000 个值?
  • 好吧,我得到了 8000 个值
  • dat %&gt;% + group_by(fertilizer, crop, level) %&gt;% + sample_n(3*1000, replace = T) %&gt;% + mutate(sample_id = rep(1:1000, each = 3)) %&gt;% + group_by(sample_id, add = TRUE) %&gt;% + summarise( + mean = mean(growth, na.rm = T), + var = sd(growth)^2 + ) %&gt;% + ungroup() # A tibble: 8,000 x 6
  • 可能你还加载了plyr 包和dplyr。如果函数被来自plyr 的相同函数屏蔽,请尝试添加dplyr::summarise( 而不是简单的summarise

标签: r for-loop dplyr


【解决方案1】:

这可能是从另一个包中屏蔽相同功能的问题。在加载plyrdplyr 时通常会发现它。例如。在这里,我们没有加载plyr,但如果我们将summarise 明确指定为plyr::summarise,则可以获得相同的行为

library(dplyr)
dat %>% 
       group_by(fertilizer, crop, level) %>% 
       sample_n(3*1000, replace = T) %>% 
       mutate(sample_id = rep(1:1000, each = 3)) %>% 
       group_by(sample_id, add = TRUE) %>% 
       plyr::summarise(
         mean = mean(growth, na.rm = T),
        var = sd(growth)^2
       ) %>% 
       ungroup()
#      mean      var
#1 30.98258 1390.291

解决办法是

1) 要么在仅加载 dplyr 的新会话上开始

2) 使用相同的会话并使用:: (dplyr::summarise() 而不是简单的summarise( 指定包名称和函数

【讨论】:

    猜你喜欢
    • 2019-12-26
    • 1970-01-01
    • 2021-07-05
    • 1970-01-01
    • 1970-01-01
    • 2020-06-27
    • 2017-02-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多