【发布时间】:2013-02-12 18:50:42
【问题描述】:
我怀疑我做错了,但我想将字符向量作为参数传递给ddply 中的函数。有很多关于删除引号等的问答,但似乎都不适合我(例如Remove quotes from a character vector in R 和http://r.789695.n4.nabble.com/Pass-character-vector-to-function-argument-td3045226.html)。
# reproducible data
df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d")))
df2<-data.frame(a=sample(1:50,9),b=sample(1:50,9),c=sample(1:50,9),d=(c("e","f","g","e","e","f","f","e","g")))
df3<-data.frame(a=sample(1:50,8),b=sample(1:50,8),c=sample(1:50,8),d=(c("h","i","j","h","h","i","i","h")))
#make a list
list.1<-list(df1=df1,df2=df2,df3=df3)
# desired output
lapply(list.1, function(x) ddply(x, .(d), function(x) data.frame(am=mean(x$a), bm=mean(x$b), cm=mean(x$c))))
$df1
d am bm cm
1 a 31.00000 29.25000 18.50000
2 b 31.66667 24.33333 34.66667
3 c 18.50000 5.50000 24.50000
4 d 36.00000 39.00000 43.00000
$df2
d am bm cm
1 e 18.25000 32.50000 18
2 f 27.66667 41.33333 24
3 g 25.00000 7.50000 42
$df3
d am bm cm
1 h 36.00000 25.00000 20.50000
2 i 25.33333 37.33333 24.33333
3 j 32.00000 32.00000 46.00000
但我的实际用例有许多新列和不同类型的计算,我想在 ddply 函数中进行计算。所以我想做这样的事情:
# here's a simple version of a function that I want to send to ddply
func <- "am=mean(x$a), bm=mean(x$b), cm=mean(x$c)"
# here's how I imagine it might work
lapply(list.1, function(x) ddply(x, .(d), function(x) data.frame(func)) )
# not the desired outcome...
$df1
d func
1 a am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 b am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 c am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
4 d am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
$df2
d func
1 e am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 f am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 g am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
$df3
d func
1 h am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 i am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 j am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
我已经尝试过noquote、deparse、eval(as.symbol())、do.call(data.frame, ...) 以及这里的一些方法:https://github.com/hadley/devtools/wiki/Evaluation on func 无济于事。此时解决方案可能很明显(即融化所有东西!),但如果不是,这里有一个更接近我的用例的更长示例:
# sample data
s <- 23 # number of samples
r <- 10 # number of runs per sample
el <- 17 # number of elements
mydata <- data.frame(ID = unlist(lapply(LETTERS[1:s], function(x) rep(x, r))),
run = rep(1:r, s))
# insert fake element data
mydata[letters[1:el]] <- lapply(1:el, function(i) rnorm(s*r, runif(1)*i^2))
# generate all combinations of 5 runs from ten runs
su <- 5 # number of runs to sample from ten runs
idx <- combn(unique(mydata$run), su)
# RSE function
RSE <- function(x) {100*( (sd(x)/sqrt(length(x)))/mean(x) )}
# make a list of dfs for all samples for each combination of five runs
# to prepare to calculate RSEs
combys1 <- lapply(1:ncol(idx), function(i) mydata[mydata$run %in% idx[,i],] )
# make a list of dfs with RSE for each ID, for each combination of runs
combys2 <- lapply(1:length(combys1), function(i) ddply(combys1[[i]], "ID", summarise, RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c)))
我想将上面最后一行中的 RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c) 替换为此处的对象 doRSE,以避免大量输入:
# prepare to calculate new colums with RSE and means
RSEs <- sapply(3:ncol(mydata), function(j) paste0("RSE",names(mydata[j])))
RSExs <- sapply(3:ncol(mydata), function(j) paste0("RSE(",names(mydata[j]),")"))
doRSE <- paste0(sapply(1:length(RSEs), function(x) paste0(RSEs[x],"=",RSExs[x])), collapse=",", sep="")
我对涉及基础、data.table 和肮脏技巧的解决方案持开放态度。似乎这些接近我想要的,但我不能完全将它们转化为我的问题:
Pass character argument and evaluate,
Force evaluation of multiple variables using vector of character,
Using a vector of characters that correspond to an expression as an argument to a function
更新这里有一个问题:我希望能够修改简单示例中的func(或我的用例中的doRSE)来创建一堆新的列从现有列的各种计算中探索数据。我想要一个允许生成的数据帧具有原始数据帧中没有的新列的工作流。抱歉,原始问题中没有更清楚。我看不出如何调整@Marius 的答案来做到这一点,但@mnel 很有帮助(请参阅下面的更新)
通过 @mnel 出色的肮脏技巧,通过一些小修复,我可以在我的用例中获得所需的结果:
# @mnel's solution, adapted (no period before eval)
combys2 <- lapply(combys1, function(x) do.call(ddply,c(.data = quote(x),
.variables = quote(.(ID)), .fun = quote(summarize),
eval(parse(text = sprintf('.(%s)', doRSE ))))))
head(combys2)
[[1]]
ID RSEa RSEb RSEc RSEd RSEe RSEf RSEg RSEh RSEi
1 A 168.30658 21.68632 5.657228 5.048057 4.162017 2.9581874 1.849009 0.6925148 0.4393491
2 B 26.55071 26.20427 4.782578 4.385409 2.342764 2.1813874 2.719625 1.1576681 0.6427935
3 C 73.83165 14.47216 8.154435 6.273202 3.046978 1.2179457 2.811405 1.1401837 0.8167067
4 D 31.96170 57.89260 9.438220 7.388410 3.755772 0.8601780 3.724875 0.8358204 0.9939387
5 E 63.22537 60.35532 5.839690 11.691304 3.828430 0.9217787 4.204300 0.8217187 0.7876634
6 F 56.37635 65.37907 4.149568 5.496308 2.227544 2.1548455 2.847291 1.1956212 0.2506518
7 G 69.32232 23.63214 4.255847 7.979225 4.917660 1.6185960 3.156521 0.3265555 0.8133279
8 H 29.82015 40.74184 7.372100 7.464792 2.749862 0.6054420 4.061368 0.9973909 1.3807720
9 I 50.58114 19.53732 2.989920 9.767678 4.000249 1.7451322 1.175397 0.9952093 0.9095086
10 J 92.96462 39.77475 6.140688 10.295668 3.407726 2.4663758 3.030444 0.5743419 0.9296482
11 K 90.72381 42.25092 2.483069 6.781054 3.142082 1.8080633 2.891740 1.1996176 0.8525290
12 L -385.24547 40.81267 4.506087 8.148382 2.976488 0.8304432 2.234134 0.2108664 0.4979777
13 M 22.77743 33.98332 2.913926 8.764639 2.307293 0.8366635 3.229944 1.0003125 0.3878567
14 N 66.75163 34.16087 6.611326 13.865377 1.285522 1.3863958 4.165575 0.7379386 0.4515194
15 O 37.37188 100.57479 5.738877 5.724862 2.839638 1.1366610 3.186332 0.7383855 0.3954544
16 P 17.08913 26.62210 6.060130 4.110893 2.688908 2.6970727 1.609043 1.3860834 0.8780010
17 Q 13.96392 74.92279 5.469304 8.467638 2.974131 1.2135436 3.284564 0.6232778 1.0759226
18 R 42.59899 30.75952 4.842832 8.764158 1.874020 1.5791048 3.427342 1.4479638 0.2964455
19 S 26.03307 15.56352 6.968717 7.783876 4.439733 2.0764179 4.683080 0.7459654 1.1268772
20 T 71.57945 33.81362 7.147049 11.201551 2.128315 2.2051611 2.419805 0.2688807 1.1559635
21 U 73.93002 11.77155 7.738910 7.207041 1.478491 1.4409844 4.042419 0.5883490 0.5585716
22 V 67.93166 39.54994 5.701551 8.636122 2.472963 1.6514199 2.627965 1.0359048 0.8747136
23 W 11.23057 12.51272 7.003448 7.424559 4.102693 0.6614847 2.246305 1.3422405 0.2665246
RSEj RSEk RSEl RSEm RSEn RSEo RSEp RSEq
1 0.6366733 0.3713819 2.1993487 0.3865293 0.5436581 0.9187585 0.4344699 0.8915868
2 0.3445095 0.2932025 1.8563179 0.5397595 1.0433388 0.3533622 0.1942316 0.1941072
3 0.2720344 0.5507595 2.0305726 0.4377259 0.8589854 0.5690906 0.1397337 0.4043247
4 0.6606667 0.6769112 3.4737352 0.5674656 1.2519256 0.8718298 0.1162969 0.8287504
5 0.4620774 0.5598069 1.9236112 0.7990046 0.9832732 0.6847352 0.4070675 0.9005185
6 0.7981610 0.4005493 0.9721068 0.2770989 1.7054674 0.3110139 0.4521183 0.8740444
7 0.3969116 0.4717575 4.1341106 0.7510628 0.9998299 0.5342292 0.4319642 1.1861705
8 0.2963956 0.2652221 0.4775827 0.2617120 0.8261874 0.5266087 0.1900943 0.2350553
9 0.2609359 0.5431035 2.6478440 0.1606919 0.7407281 0.6802262 0.1802069 0.7438792
10 0.4239787 0.8753544 3.4218030 0.5467869 0.7404017 0.5581173 0.3682014 0.6361436
11 0.4188502 0.8629862 4.4181479 0.1623873 0.8018811 0.5873609 0.3592134 0.5357984
12 0.5790265 0.5009210 3.7534287 0.1933726 0.5809601 0.5777868 0.3400925 0.4783890
13 0.3562582 0.2552756 2.1393219 0.1849345 0.5796194 0.6129469 0.3363311 0.4382125
14 0.7921502 0.6147990 2.9054634 0.5852325 1.4954072 0.9983203 0.2937837 0.7654504
15 0.5840424 0.2757707 1.5695675 0.3305385 0.8712636 0.5816490 0.1985457 0.7213289
16 0.3301280 0.3008273 2.9014987 0.4540833 0.5966479 0.9042004 0.1631630 0.7262141
17 0.5882511 0.2820978 3.0652666 0.4518936 1.3168151 0.4749311 0.2244693 0.6583083
18 0.4048816 0.3708787 3.2207478 0.2603412 1.3168318 0.3318745 0.3120436 0.6210711
19 0.4425123 0.3602076 3.7609863 0.5399527 0.8302572 0.3246904 0.1952143 0.2915325
20 0.5877835 0.6339015 1.6908570 0.3223056 0.5239339 0.6607198 0.2808094 0.3697380
21 0.4454056 0.7733354 4.3433420 0.4391075 0.5503594 0.5893406 0.2262403 0.2361512
22 0.9583940 0.6365843 3.0033951 0.6507968 0.8610046 0.6363198 0.2866719 0.5736855
23 0.4969730 0.3895182 2.0021608 0.3354475 1.4398250 0.7386870 0.2458906 0.3414804
...
...
【问题讨论】:
-
我不关注。为什么不直接编写一个计算所有这些新列的函数并在 ddply 中使用呢?
-
你能告诉我那个函数的样子吗?
-
等等,什么?如果你不知道函数会是什么样子,即它会做什么,我怎么能?
-
@hadley,感谢您的光临。我希望
ddply返回由一组 colwise 操作产生的新列。例如,RSE、mean、RSD以及所有数字列上的其他自定义函数。colwise可以接受函数列表吗?
标签: r function vector plyr argument-passing