在函数调用中提供变量名并绑定到 R 中的现有数据框答案

【问题标题】：Providing variable name in function call and cbinding to existing data frame in R在函数调用中提供变量名并绑定到 R 中的现有数据框
【发布时间】：2014-04-15 17:25:42
【问题描述】：

目标是让函数调用的最后一个参数为要绑定到原始数据框的新列提供名称。

参考this 和this 上一个问题，并以第一个问题的最小工作示例为基础。

GroupId <-          c(1,1,1,1,2,2,2,3,3)
IndId <-            c(1,1,2,2,3,4,4,5,5)
IndGroupProperty <- c(1,2,1,2,3,3,4,5,6)
PropertyType <-     c(1,2,1,2,2,2,1,2,2)

df <- data.frame(GroupId, IndId, IndGroupProperty, PropertyType)
df

ValidGroupC <-       c(1,1,1,1,0,0,0,0,0)
df <- data.frame(df, ValidGroupC)
df

library(dplyr)
grouptest <- function(object, group, ind, type, new){
groupvar <- deparse(substitute(group)) 
indvar <- deparse(substitute(ind)) 
typevar <- deparse(substitute(type)) 
eval(substitute(
tmp <- object[, c(groupvar, indvar, typevar)] %.%
  group_by(group, ind) %.%
  mutate(type1 = any(type == 1))  %.%
  group_by(group, add = FALSE) %.%
  mutate(tmp2 = all(type1) * 1) %.%
  select(-type1)
  ))
new <- tmp[, 4]                    # this is the relevant part
tmp <- cbind(object, new)          # this is the relevant part
}

df <- grouptest(df, GroupId, IndId, PropertyType, ValidGroup)
df

所以大部分代码已经是引用问题的产物。这个问题的相关部分在最后，我将我对tmp 所做的计算的第 4 列放入一个新对象中，该对象的名称应取自函数调用中的new 参数，然后我将其绑定到原始数据框。

我的问题：为什么最后一列 df 没有命名为 ValidGroup ？我不明白有什么问题 - new 应该替换为 ValidGroup，但不是吗？

我尝试将这两行放在eval() 中，结果为Error in cbind(df, ValidGroup) : object 'ValidGroup' not found。

我尝试在两条线周围加上另一个eval(substitute())，同样的错误。

我已经尝试了许多其他变体来放置线条，使用经过解析的newvar，将tmp 命名为new，。 . .

【问题讨论】：

标签： r function arguments naming-conventions

【解决方案1】：

您想将函数中突出显示的最后两行更改为：

object[, new] <- tmp[, 4]
object

然后，当您调用函数时，将new 参数指定为字符串：

> df <- grouptest(df, GroupId, IndId, PropertyType, "ValidGroup")   
> df
  GroupId IndId IndGroupProperty PropertyType ValidGroupC ValidGroup
1       1     1                1            1           1          1
2       1     1                2            2           1          1
3       1     2                1            1           1          1
4       1     2                2            2           1          1
5       2     3                3            2           0          0
6       2     4                3            2           0          0
7       2     4                4            1           0          0
8       3     5                5            2           0          0
9       3     5                6            2           0          0

【讨论】：

谢谢，这行得通！只是我还是 R 语法有时会很古怪？对我来说，这看起来与我最初拥有的非常相似......
一切都需要练习。久而久之，一切都会变得清晰。 :)

【解决方案2】：

如果对象总是data.frame，为什么不干脆新建一个？

tmp <- data.frame(object, new=tmp[,4])
names(tmp)[4] <- as.character(match.call()$new)
return(tmp)

编辑：将代码更改为接受 name 而不是 character 作为新参数。不过，我仍然认为这不是一个好主意。根据@hadley 在this 线程中的推理，您至少应该有一个可选参数来将第二行切换为names(tmp)[4] <- new。

【讨论】：

我不明白你的意思。那仍然无法正确命名新列？
我的错 :-)。我添加了一个解决方案，但我认为这可能会导致 data.frame 被重写。如果是data.table，您可以使用setnames() 将其更改到位。
这不起作用，结果相同Error in grouptest(df, GroupId, IndId, PropertyType, ValidGroup) : object 'ValidGroup' not found
ValidGroup 必须是 character 变量。我不确定是否有其他方法。
好吧，我也尝试了deparse()new 的值，这应该将其呈现为character 类字符串，但这也不起作用...... :(

【解决方案3】：

我怀疑您正在寻找assign 函数：

assign(deparse(substitute(new)), tmp[,4])

显然我误解了这个问题。这是另一种方法。除了使用cbind，您只需将新列添加到现有对象即可。

object[, deparse(substitute(new))] <- tmp[,4]
object

【讨论】：

感谢您的回复。我将如何应用这个？通过简单地在函数末尾添加两行，我仍然得到最后一列名为“new”。我在tmp <- cbind(object, new) 和} 之间添加了两行assign(deparse(substitute(new)), tmp[,4]) 和tmp 。