【发布时间】:2021-02-22 18:28:19
【问题描述】:
我已经开始编写函数以加快表格生成速度,但希望该函数尊重用户在管道中做出的早期分组选择。
示例数据:
df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))
以下是我正在处理的函数,它需要/接受一个变量来定义同类群组、一个信用变量和一个术语变量。
table_headsfte_cohorts<-function(.data,cohortvar,credits,term){
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,Pidm)%>%
group_by(!!term,!!cohortvar,group_cols())%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,group_cols())
}
对于可能有兴趣在他们选择的同类群组变量之外使用其他分组变量的用户,我希望最终结果函数允许如下使用:
df2<-df%>%
group_by(Status,Group)%>%
table_headsfte_cohorts(Major,Credits,Year)
除了来自table_headsfte_cohorts() 参数的cohortvar 和term 列之外,所需的最终结果将是一个尊重并保留上述group_by 语句中两个分组变量级别的表。
我需要生成同一张表,但是对于范围广泛的分组变量和不同数量的分组变量,灵活性会非常有帮助。
编辑:
通过至少允许多个分组变量,以下似乎接近。这不是我所希望的,因为我更喜欢从管道中读取额外的分组参数:
table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){
grps<-enquos(...)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,!!cohortvar,!!! grps)%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,!!!grps)
}
使用上面,我可以成功运行:
fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:
我还可以将其他变量传递给函数作为附加分组变量:
fdfout_alt<-fdf%>%
table_headsfte_cohorts(Major,Credits,Year,Status,Group)
产生期望的结果:
不幸的是,当我使用
fdf_no<-fdf%>%
group_by(Status, Group)%>%
table_headsfte_cohorts(Major, Credits, Year)
我明白了:
这个输出可能会让使用我的函数的人感到困惑,因为他们的 group_by() 行似乎什么也没做。
【问题讨论】:
-
See here 关于制作一个更易于人们帮助的可重现示例。如果没有重新创建设置的示例数据,很难确切知道您要做什么以及您可能会卡在哪里
-
@camille 谢谢你的建议。我已经更新了这个问题,并希望它更清楚。如果您有机会查看并发现仍不清楚,请告诉我可能缺少什么。