【问题标题】:How to create function to easily manipulate data multiple times?如何创建函数以轻松多次操作数据?
【发布时间】:2021-10-30 12:59:06
【问题描述】:

我正在尝试创建一个函数,可用于在多个场合轻松操作/子集更大的数据集,而不必一遍又一遍地重复 dplyr 命令。这是我的示例代码。

library(tidyverse)

# Mock data.

data <- tibble(
  p = rep("Par_1", 40),
  v = c(rep("slo", 20), rep("mod", 20)),
  n = c(rep(1, 10), rep(2, 10), rep(1, 10), rep(2, 10)),
  r = c(rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5),
        rep(800, 5), rep(100, 5), rep(800, 5), rep(100, 5)),
  x_tib_1 = runif(40, min = .100, max = .300),
  x_lum_1 = runif(40, min = .100, max = .300),
  x_tho_1 = runif(40, min = .100, max = .300),
  x_sen_1 = runif(40, min = .100, max = .300),
  x_ves_1 = runif(40, min = .100, max = .300),
)

# Function to manipulate data.

test_function <- function(x, y, z){
  
  output <- data %>%
    filter(v == x, n == y, r == z) %>%
    select(p, ifelse(z == 800, matches("tib | lum | tho"),
                                  c(5:9))) %>% # i.e. x_tib_1:x_ves_1.
    group_by(p) %>%
    summarise_all(list(mean)) %>%
    mutate(across(where(is.numeric), round, 3)) # Would usally have na.rm = TRUE here to account for NA.
  
  return(output)
}

# Test function.

test <- test_function("slo", 1, 800)

我的目标是在test_function 中输入data$vdata$ndata$r 的值,以用于过滤数据集。然后,我只想根据test_functionz 的值(800 或100)选择某些列。我不知道我在这里使用dplyr::matches是否正确,但是如果z是800,我只想选择px_tib_1x_lum_1x_tho_1

该代码目前无法运行,因此我们将不胜感激。

【问题讨论】:

    标签: r function dplyr


    【解决方案1】:

    ifelse 要求所有参数的长度相同。我们可以使用if/else。此外,matches 中使用的正则表达式模式中有空格,即"tib | lum | tho",列名中不存在。

    test_function <- function(data, x, y, z){
      
       data %>%
        filter(v == x, n == y, r == z) %>%
        select(p, if(z == 800) matches("tib|lum|tho") else 
                                      c(5:9))  %>%
       group_by(p) %>%
       summarise_all(list(mean)) %>%
       mutate(across(where(is.numeric), round, 3)) #
    }
    

    -测试

    > test_function(data, "slo", 1, 800)
    # A tibble: 1 x 4
      p     x_tib_1 x_lum_1 x_tho_1
      <chr>   <dbl>   <dbl>   <dbl>
    1 Par_1   0.188   0.182   0.229
    >  test_function(data, "slo", 1, 100)
    # A tibble: 1 x 6
      p     x_tib_1 x_lum_1 x_tho_1 x_sen_1 x_ves_1
      <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    1 Par_1    0.22   0.226   0.182   0.214   0.225
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-10-22
      • 2012-10-30
      • 2022-01-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-08-03
      相关资源
      最近更新 更多