如何创建函数以轻松多次操作数据？答案

【问题标题】：How to create function to easily manipulate data multiple times?如何创建函数以轻松多次操作数据？
【发布时间】：2021-10-30 12:59:06
【问题描述】：

我正在尝试创建一个函数，可用于在多个场合轻松操作/子集更大的数据集，而不必一遍又一遍地重复 dplyr 命令。这是我的示例代码。

library(tidyverse)

# Mock data.

data <- tibble(
  p = rep("Par_1", 40),
  v = c(rep("slo", 20), rep("mod", 20)),
  n = c(rep(1, 10), rep(2, 10), rep(1, 10), rep(2, 10)),
  r = c(rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5),
        rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5)),
  x_tib_1 = runif(40, min = .100, max = .300),
  x_lum_1 = runif(40, min = .100, max = .300),
  x_tho_1 = runif(40, min = .100, max = .300),
  x_sen_1 = runif(40, min = .100, max = .300),
  x_ves_1 = runif(40, min = .100, max = .300),
)

# Function to manipulate data.

test_function <- function(x, y, z){
  
  output <- data %>%
    filter(v == x, n == y, r == z) %>%
    select(p, ifelse(z == 800, matches("tib | lum | tho"),
                                  c(5:9))) %>% # i.e. x_tib_1:x_ves_1.
    group_by(p) %>%
    summarise_all(list(mean)) %>%
    mutate(across(where(is.numeric), round, 3)) # Would usally have na.rm = TRUE here to account for NA.
  
  return(output)
}

# Test function.

test <- test_function("slo", 1, 800)

我的目标是在test_function 中输入data$v、data$n、data$r 的值，以用于过滤数据集。然后，我只想根据test_function 中z 的值（800 或100）选择某些列。我不知道我在这里使用dplyr::matches是否正确，但是如果z是800，我只想选择p、x_tib_1、x_lum_1和x_tho_1。

该代码目前无法运行，因此我们将不胜感激。

【问题讨论】：

标签： r function dplyr

【解决方案1】：

ifelse 要求所有参数的长度相同。我们可以使用if/else。此外，matches 中使用的正则表达式模式中有空格，即"tib | lum | tho"，列名中不存在。

test_function <- function(data, x, y, z){
  
   data %>%
    filter(v == x, n == y, r == z) %>%
    select(p, if(z == 800) matches("tib|lum|tho") else 
                                  c(5:9))  %>%
   group_by(p) %>%
   summarise_all(list(mean)) %>%
   mutate(across(where(is.numeric), round, 3)) #
}

-测试

> test_function(data, "slo", 1, 800)
# A tibble: 1 x 4
  p     x_tib_1 x_lum_1 x_tho_1
  <chr>   <dbl>   <dbl>   <dbl>
1 Par_1   0.188   0.182   0.229
>  test_function(data, "slo", 1, 100)
# A tibble: 1 x 6
  p     x_tib_1 x_lum_1 x_tho_1 x_sen_1 x_ves_1
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Par_1    0.22   0.226   0.182   0.214   0.225

【讨论】：