如何使用具有多个参数的 mapply 复制函数来计算方法的功效？答案

【问题标题】：How to replicate a function using mapply with multiple arguments to calculate the power of a method?如何使用具有多个参数的 mapply 复制函数来计算方法的功效？
【发布时间】：2021-12-20 10:59:56
【问题描述】：

我有独立和依赖的数据集。我想测试因变量和自变量之间所有可能的关系，最后计算方法的功效。

# dependent dataset
test_A <- data.frame(matrix(rnorm(100), nr=10, nc=10))
# independent dataset
test_B <- data.frame(matrix(sample(c(0,1,2), 500, replace = TRUE), nr=50, nc=10))
# Find all combination using dependent and independe datasets's variables
A_B_pair <- subset(expand.grid(c1=names(test_A), c2=names(test_B), 
                                    stringsAsFactors = FALSE))

# Main function to estimate the parameter and p-values 
test_function <- function(x,y){
  c1 <- test_A [[x]]
  c2 <- test_B[[y]]
  Data <- data.frame(1, XX=c1, YY=c2)
  
  model_lm <- lm(YY ~ XX, Data)
  est_lm <- as.numeric(model_lm$coefficients)[2]
  pvalue_lm <- as.numeric(summary(model_lm)$coeffi[,4][2])

  return(unlist(data.frame(lm.estimator = est_lm, lm.pvalue =pvalue_lm)))
}

# Using mapply  to get the all pairs estimators and p-values
output <- mapply(test_function, x=A_B_pair$c1, y=A_B_pair$c2)

# transpose the output
output.data <- data.frame(t(output))

# Put all the dependent and independent variables and their estimated values and p-values in a data frame.
output_final <- cbind(A_B_pair, output.data)

我的问题是我需要复制这个函数 100 次来检查方法的功效并估计参数。功率将使用以下命令计算：

power <- mean(output_final$lm.pvalue <= 0.05)

我该怎么做？

【问题讨论】：

标签： r function mapply replicate

【解决方案1】：

你可以试试-

main_fn <- function() {
  test_A <- data.frame(matrix(rnorm(100), nr=10, nc=10))
  # independent dataset
  test_B <- data.frame(matrix(sample(c(0,1,2), 500, replace = TRUE), nr=50, nc=10))
  # Find all combination using dependent and independe datasets's variables
  A_B_pair <- subset(expand.grid(c1=names(test_A), c2=names(test_B), 
                                 stringsAsFactors = FALSE))
  
  output <- mapply(function(x, y) test_function(test_A, test_B, x, y), 
                   A_B_pair$c1, A_B_pair$c2)
  output.data <- data.frame(t(output))
  output_final <- cbind(A_B_pair, output.data)
}

test_function <- function(test_A, test_B, x,y){
  c1 <- test_A[[x]]
  c2 <- test_B[[y]]
  Data <- data.frame(1, XX=c1, YY=c2)
  
  model_lm <- lm(YY ~ XX, Data)
  est_lm <- as.numeric(model_lm$coefficients)[2]
  pvalue_lm <- as.numeric(summary(model_lm)$coeffi[,4][2])
  
  return(unlist(data.frame(lm.estimator = est_lm, lm.pvalue =pvalue_lm)))
}

result <- do.call(rbind, replicate(100, main_fn(), simplify = FALSE))
power <- mean(result$lm.pvalue <= 0.05)

【讨论】：

感谢您的及时回复@Ronak Shah。最终结果的维度是复制次数和 A_B_pair 的维度（例如，结果数据集产生 10000 x 4。但是，我想检查每对 A_B_pair 的 p 值。因此，我使用以下代码获得所需的输出：```cols <- c("c1", "c2"); final.results <- result %>% group_by_at(cols) %>% summarise(across(everything(), list(mean)), .groups = 'drop'); power <- final.results$lm.pvalue_1 <= 0.5
嗨@F。 Privé，我想使用 bigstatsr 包来获得与我的数据集很大相同的结果。你能帮我应用 big_apple 函数来获得相同的结果吗？
@DataScientist 看来这个问题已经解决了。如果你想向其中添加另一个组件（大数据），你应该打开一个新的问题来关注它。
@F.Privé，谢谢。我已经就这个问题提出了一个新问题。请看问题（stackoverflow.com/questions/70080099/…）