【问题标题】:How do I fill the subset argument with a variable?如何用变量填充子集参数?
【发布时间】:2020-10-25 22:48:26
【问题描述】:

我想为子集参数使用一个变量,以便将其放入函数中


     formula <- paste0(response_name, 
                         " ~ .")
     
     if (subset_filter != ""){
       subset_filter <- "G3 < 10"
       
       model <- 
         lm(as.formula(formula),
            subset = subset_filter,
            data = train_dataset)    
       
     } else {
      model <- 
         lm(as.formula(formula),
            data = train_dataset)
      
     }

我的数据集是这个 -

student_performance <-
  read_csv("https://raw.githubusercontent.com/UBC-MDS/ellognea-smwatts-student-performance/master/data/student-math-perf.csv") %>% 
  as_tibble()

我的响应变量是 G3,我用这段代码拆分集合


split_sets <- function(dataset, 
                       response_name,
                       output_set_type){

  set.seed(1)
  training.samples <- createDataPartition(as_vector(dataset[response_name]), 
                                          p = 0.8,
                                          list = F)
  
  train.data <- suppressWarnings(dataset[training.samples, ])
  test.data <- suppressWarnings(dataset[-training.samples, ])  
  
  l <- list()
  
  l[["train.data"]] <-
    train.data
  
  l[["test.data"]] <-
    test.data
  
  ifelse(output_set_type == "train", 
         return(as_tibble(l$train.data)), 
         return(as_tibble(l$test.data)))

}

我想这样做,以便我可以将值提交到子集过滤器参数中,然后使用它们

【问题讨论】:

    标签: r


    【解决方案1】:

    如果我们需要传递一个字符串,那么我们可以parseevaluate

    library(caret)
    library(readr)    
    create_model <- function(data, response_name, subset_filter) {
    
         
         formula <- paste0(response_name, " ~ .")    
         
         if (subset_filter != ""){             
           
           model <- 
             lm(as.formula(formula),
                subset = eval(parse(text = subset_filter), envir = data),
                data = data)    
           
         } else {
          model <- 
             lm(as.formula(formula),
                data = data)
          
         }
         
         model$call <- as.formula(formula)
         return(model)
    
    }
    

    -对数据应用函数

    create_model(train_dat, "G3", "G3 < 10" )
    #Call:
    #G3 ~ .
    
    #Coefficients:
    #     (Intercept)          schoolMS              sexM               age          addressU        famsizeLE3  
    #        -4.42602           1.18145           0.15315          -0.16790           1.11708          -0.08173  
    #        PstatusT              Medu              Fedu        Mjobhealth         Mjobother      Mjobservices  
    #         1.32870           1.00518          -0.62716          -1.98356          -1.31388          -0.94443  
    #     Mjobteacher        Fjobhealth         Fjobother      Fjobservices       Fjobteacher        reasonhome  
    #        -1.28718           0.03242           0.02968           0.32962          -1.53201          -2.10665  
    #     reasonother  reasonreputation    guardianmother     guardianother        traveltime         studytime  
    #        -0.51770           0.22395          -0.29893           1.85975          -0.39072          -1.56920  
    #        failures      schoolsupyes         famsupyes           paidyes     activitiesyes        nurseryyes  
    #        -0.17344           2.35607           0.35207           0.29857          -0.91373           0.09838  
    #       higheryes       internetyes       romanticyes            famrel          freetime             goout  
    #         1.06065          -0.58727           0.09469           0.69217           0.25081           0.14379  
    #            Dalc              Walc            health          absences                G1                G2  
    #       -1.39164           0.60450           0.86492           0.12033           0.11660           0.78624  
    

    这里,“train_data”是从

    创建的
    split_sets <- function(dataset, 
                           response_name,
                           output_set_type){
    
      set.seed(1)
      training.samples <- createDataPartition(as_vector(dataset[response_name]), 
                                              p = 0.8,
                                              list = F)
      
      train.data <- suppressWarnings(dataset[training.samples, ])
      test.data <- suppressWarnings(dataset[-training.samples, ])  
      
      l <- list()
      
      l[["train.data"]] <-
        train.data
      
      l[["test.data"]] <-
        test.data
      
       out <- if(output_set_type == "train") {
       
            as_tibble(l$train.data)
            
            } else {
            as_tibble(l$test.data)
            }
            
      return(out)
    
    }
    
    
    train_dat <- split_sets(student_performance, "G3", "train")
    

    【讨论】:

    • 我收到错误 ` eval 中的错误(解析(文本 = 子集过滤器),envir = 数据):'closure' 类型的无效 'envir' 参数`
    • @potato 根据您显示的数据,它对我来说工作正常
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-03-19
    • 1970-01-01
    • 2014-03-25
    • 1970-01-01
    • 1970-01-01
    • 2018-09-03
    • 2021-02-13
    相关资源
    最近更新 更多