【发布时间】:2021-10-30 12:59:06
【问题描述】:
我正在尝试创建一个函数,可用于在多个场合轻松操作/子集更大的数据集,而不必一遍又一遍地重复 dplyr 命令。这是我的示例代码。
library(tidyverse)
# Mock data.
data <- tibble(
p = rep("Par_1", 40),
v = c(rep("slo", 20), rep("mod", 20)),
n = c(rep(1, 10), rep(2, 10), rep(1, 10), rep(2, 10)),
r = c(rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5),
rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5)),
x_tib_1 = runif(40, min = .100, max = .300),
x_lum_1 = runif(40, min = .100, max = .300),
x_tho_1 = runif(40, min = .100, max = .300),
x_sen_1 = runif(40, min = .100, max = .300),
x_ves_1 = runif(40, min = .100, max = .300),
)
# Function to manipulate data.
test_function <- function(x, y, z){
output <- data %>%
filter(v == x, n == y, r == z) %>%
select(p, ifelse(z == 800, matches("tib | lum | tho"),
c(5:9))) %>% # i.e. x_tib_1:x_ves_1.
group_by(p) %>%
summarise_all(list(mean)) %>%
mutate(across(where(is.numeric), round, 3)) # Would usally have na.rm = TRUE here to account for NA.
return(output)
}
# Test function.
test <- test_function("slo", 1, 800)
我的目标是在test_function 中输入data$v、data$n、data$r 的值,以用于过滤数据集。然后,我只想根据test_function 中z 的值(800 或100)选择某些列。我不知道我在这里使用dplyr::matches是否正确,但是如果z是800,我只想选择p、x_tib_1、x_lum_1和x_tho_1。
该代码目前无法运行,因此我们将不胜感激。
【问题讨论】: