【发布时间】:2019-09-15 19:59:49
【问题描述】:
我之前曾问过this question 关于使用 dplyr 函数进行重采样和循环的问题。 接受的解决方案早些时候工作得很好,但不是给出 8000 个值,而是只产生一个均值和方差值。我的 R 也一直在向我抛出与“stringi”包相关的错误,即使安装了它,也很难识别它。我想知道这两者有关系吗? 如果不相关,我怎样才能获得这 8000 个值而不是 1 个均值和方差值?
我当前运行的代码是:
library(dplyr)
fertilizer <- c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P")
crop <- c("alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group")
level <- c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low")
growth <- c(0,0,1,2,90,5,2,5,8,55,1,90,2,4,66,80,1,90,2,33,56,70,99,100,66,80,1,90,2,33,0,0,1,2,90,5,2,2,5,8,55,1,90,2,4,66,0,0)
dat <- data.frame(fertilizer, crop, level, growth)
dat %>%
group_by(fertilizer, crop, level) %>%
sample_n(3*1000, replace = T) %>%
mutate(sample_id = rep(1:1000, each = 3)) %>%
group_by(sample_id, add = TRUE) %>%
summarise(
mean = mean(growth, na.rm = T),
var = sd(growth)^2
) %>%
ungroup()
【问题讨论】:
-
不清楚您的期望。如果您将
sample_id与其他组一起添加为分组变量,请检查每个组的计数。dat %>% + group_by(fertilizer, crop, level) %>% + sample_n(3*1000, replace = T) %>% + mutate(sample_id = rep(1:1000, each = 3)) %>% ungroup %>% count(fertilizer, crop, level, sample_id) # A tibble: 8,000 x 5这意味着你会得到8000mean 和 sd 的值 -
我的问题是当我运行上面的代码时:在 ungroup() 之后为什么没有显示 8000 个值?
-
好吧,我得到了 8000 个值
-
dat %>% + group_by(fertilizer, crop, level) %>% + sample_n(3*1000, replace = T) %>% + mutate(sample_id = rep(1:1000, each = 3)) %>% + group_by(sample_id, add = TRUE) %>% + summarise( + mean = mean(growth, na.rm = T), + var = sd(growth)^2 + ) %>% + ungroup() # A tibble: 8,000 x 6 -
可能你还加载了
plyr包和dplyr。如果函数被来自plyr的相同函数屏蔽,请尝试添加dplyr::summarise(而不是简单的summarise