【问题标题】:Taking the mean of a column within a function and a for loop取函数和 for 循环中列的平均值
【发布时间】:2021-12-07 12:50:48
【问题描述】:

我有以下功能:

  compute_treatment_effects <- function(dataset, outcome, baseline_outcome, 
                                      covariates, 
                                      standardize){
  
  
  baseline_covariates <- c(baseline_outcome, covariates)
  
    
  dataset <- dataset %>%
    mutate(treat =ifelse(treatment_group == "trt", 1, 
                           ifelse(treatment_group == "control", 0, NA))) %>%
    filter(!is.na(treat))  
    
  if (standardize){
    dataset[,outcome] <- (dataset[,outcome] - mean(dataset[dataset$treat==0,outcome], na.rm=TRUE))/
      sd(dataset[dataset$treat==0,outcome], na.rm=TRUE)
  }
}

现在的问题是,每当涉及标准化程序时,我都会收到一个错误:

“is.data.frame(x) 中的错误: 'list' 对象不能被强制输入'double' 另外:警告信息: 在 mean.default(dataset[dataset$treat == 0, 结果], na.rm = TRUE)"

我真的不知道为什么会这样,我不相信任何地方的语法都是错误的?

以下是与代码一起使用的数据框示例:

dataframe <- data.frame("var1" = c(1, 2, 5, 1, 642, 5, 1, 2, 5, 9, NA, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 ),
                 "Var2"  = c(1, 3, 5, 1, 642, 5, NA, NA, NA, NA, NA, NA, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 ), 
                 "var3"   = c(1, 2, 635, 9, NA, 1, 2, 5, NA, NA, 12, NA, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10),
                 "var4"  = c(1, 21, 15, 19, NA, 1, 26656, 56,6 , NA, 512, NA, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), 
                 "cov1" =  c(1, 22,335, 29, NA, NA, NA, 645, NA, NA, 12, NA, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10),
                 "cov2" =  c(44251, 2322,5, 29, 45, 35, 42, 645, 55, 525, NA, NA, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10),
                 "cov3" =  c(154, 2552,35, 53529, 5, 3, 53542, 645, 25, 2, 12, 23, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10))


dataframe <- dataframe %>%
  mutate(treatment_group = ifelse(var3 == 2, "trt", ifelse(var3 == 10, "control", NA)))
dataset <- dataframe
outcome <- "Var2"
baseline_outcome <- "var1"
covariates = c("cov1", "cov2","cov3")

非常感谢!!!

【问题讨论】:

  • 您的函数仅显示if.. else 的情况是什么。另外,return 对象可能会更好
  • 是的,对不起,我只放了函数的开头,它要长得多,它应该返回数据集,但现在真正的主要问题是标准化过程。事实上,如果我只是运行“mean(dataset[dataset$treat==0,outcome], na.rm=TRUE)”,它会告诉我“参数不是数字或逻辑:返回 NA”,即使在函数之外......
  • 我无法用你的函数重现错误。它工作正常。我添加了return(dataset),但即使没有它也可以正常工作
  • 会不会是R版本的问题?
  • 非常感谢 Akrun- 这解决了它!一如既往,你的知识和直觉让我吃惊!!

标签: r dplyr tidyverse stat


【解决方案1】:

OP 的原始数据集可能是tibbledata.table,因为当我们执行, column 时,它们都不会将列子集化为vector,因为在这两种情况下都是drop = FALSEdata.frame(即drop = TRUE)相比

> compute_treatment_effects(as_tibble(dataset), outcome, baseline_outcome, covariates, standardize = TRUE)

is.data.frame(x) 中的错误: 'list' 对象不能被强制输入'double' 另外:警告信息: 在 mean.default(dataset[dataset$treat == 0, 结果], na.rm = TRUE) 中: 参数不是数字或逻辑:返回 NA


解决方法是使用as.data.frame 转换为data.frame

compute_treatment_effects(as.data.frame(dataset), outcome, baseline_outcome, covariates, standardize = TRUE)

-输出

var1 Var2 var3  var4 cov1 cov2  cov3 treatment_group treat
1     2 -Inf    2    21   22 2322  2552             trt     1
2     1   NA    2 26656   NA   42 53542             trt     1
3    10  NaN   10    10   10   10    10         control     0
4    10  NaN   10    10   10   10    10         control     0
5    10  NaN   10    10   10   10    10         control     0
6    10  NaN   10    10   10   10    10         control     0
7    10  NaN   10    10   10   10    10         control     0
8    10  NaN   10    10   10   10    10         control     0
9    10  NaN   10    10   10   10    10         control     0
10   10  NaN   10    10   10   10    10         control     0
11   10  NaN   10    10   10   10    10         control     0
12   10  NaN   10    10   10   10    10         control     0

或者通过使用 [[ 而不是 [ 对列进行子集化来更改函数,即

compute_treatment_effects <- function(dataset, outcome, baseline_outcome, 
                                       covariates, 
                                       standardize){
  
  
   baseline_covariates <- c(baseline_outcome, covariates)
  
    
   dataset <- dataset %>%
     mutate(treat =ifelse(treatment_group == "trt", 1, 
                            ifelse(treatment_group == "control", 0, NA))) %>%
     filter(!is.na(treat))  
    
   if (standardize){
     dataset[[outcome]] <- (dataset[[outcome]] - 
       mean(dataset[[outcome]][dataset$treat==0], na.rm=TRUE))/
   
       sd(dataset[[outcome]][dataset$treat==0], na.rm=TRUE)
   }
   dataset
 }

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-12-22
    • 2021-12-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-02-02
    • 2018-08-17
    • 2021-02-12
    相关资源
    最近更新 更多