为什么我会收到“未使用的参数（na.action = NULL）”错误？答案

【问题标题】：Why do I get "unused argument (na.action = NULL)" error in aggregate?为什么我会收到“未使用的参数（na.action = NULL）”错误？
【发布时间】：2022-01-17 20:17:37
【问题描述】：

我汇总了包含NAs 的数据，因此我包括na.action = NULL，如here 所述。这是有效的代码：

# Toy data.
df <- data.frame(x= 1:10, group= rep(1:2, 5), other_var= rnorm(10))
# Aggragate with formula.
aggregate(formula= x ~ group, data= df, na.action= NULL, FUN= function(i) sum(i))

在我的情况下，我无法提供变量名称作为公式，因为它们可以更改。因此，我在x 和by 参数中为他们提供了一个字符串向量，如下所示：

var_names <- c("x", "group")
aggregate(x= df[ , var_names[1]],  by= list(df[ , var_names[2]]), na.action= NULL, FUN= function(i) sum(i))

这会导致错误。有趣的是，省略了na.action= NULL，例如aggregate(x= df[ , var_names[1]], by= list(df[ , var_names[2]]), FUN= function(i) sum(i))，不会以错误结束，而是返回预期的输出。在提供列名作为向量时，如何避免包含 NAs 的行消失？我确实需要包含na.action= NULL，因为我的真实数据包含NAs。

【问题讨论】：

研究文档。只有 aggregate.formula 有 na.action 参数。
我觉得你的例子令人困惑。链接的帖子描述了一种不同的情况，即如果您在公式的 LHS 上有.。既然你没有那个，我不明白你为什么要摆弄na.action 参数。
@Roland 你说得对，我的示例数据不好。它甚至不包含 NA。我只是制作数据来重现错误。当然，我的真实数据看起来不一样。

标签： r aggregate na

【解决方案1】：

您不必使用 aggregate.formula 中的列名。
na.pass 应该可以解决您的 na.action 要求。

setNames( 
   aggregate( cbind(df[,1], df[,3]) ~ df[,2], df, sum, na.rm=T, 
   na.action=na.pass ), colnames(df[,c(2,1,3)]) )
  group  x  other_var
1     1 25 -0.7313815
2     2 30  0.3231317

数据

（我加了NAs）

df <- structure(list(x = 1:10, group = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L), other_var = c(-1.79458090358371, 0.295106071151792, 
NA, -0.589487588239041, 0.325944874015228, NA, 0.737254570399201, 
0.47849317537615, NA, 0.139020009150021)), row.names = c(NA, 
-10L), class = "data.frame")

【讨论】：

【解决方案2】：

我不完全确定问题出在哪里：分配 na.action=NULL 意味着忽略它们并将包括它们的 NAs 在内的任何值传递给函数，原封不动。这是非公式版本默认情况下会发生的情况。

所以我建议你省略 na.action。

使用mtcars：

mt <- mtcars
mt$mpg[3] <- NA
var_names <- c("mpg", "cyl")

一、公式变体：

aggregate(
  as.formula(paste(var_names[1], "~", var_names[2])), data= mt,
  na.action= NULL,
  FUN= function(i) sum(i))
#   cyl   mpg
# 1   4    NA
# 2   6 138.2
# 3   8 211.4

二、非公式失败：

aggregate(
  x= mt[ , var_names[1]],  by= list(mt[ , var_names[2]]),
  na.action= NULL,
  FUN= function(i) sum(i))
# Error in FUN(X[[i]], ...) : unused argument (na.action = NULL)

修复它：

aggregate(
  x= mt[ , var_names[1]],  by= list(mt[ , var_names[2]]),
  # na.action= NULL,
  FUN= function(i) sum(i))
#   Group.1     x
# 1       4    NA
# 2       6 138.2
# 3       8 211.4

如果您想要第一组的总和，则可以在函数本身中处理它：

aggregate(
  x= mt[ , var_names[1]],  by= list(mt[ , var_names[2]]),
  FUN= function(i) sum(i, na.rm=TRUE))
#   Group.1     x
# 1       4 270.5
# 2       6 138.2
# 3       8 211.4

【讨论】：

好的，那么您是否有理由选择不使用公式版本（如我的回答所示）？它是唯一使用您认为必须使用的na.action= 的aggregate 方法。

【解决方案3】：

这段代码应该可以解决你的问题。

aggregate(x = df[which(!is.na(df[var_names[1]])), var_names[1]],
      by = list(df[which(!is.na(df[var_names[1]])), var_names[2]]),
      FUN = function(i) sum(i))

【讨论】：