【发布时间】:2015-12-11 17:34:16
【问题描述】:
情况是这样的。我有一个data.table,我想根据某些标准折叠该表的行。我写了一个函数,但它一次只能操作 2 行。所以我可以用一轮函数折叠的最多行是 50%(即从 1000 行开始,第 1 轮函数给我们留下 500 行)。在这一点上,似乎合乎逻辑的做法是在产生的输出上再次运行该函数,以便我可以进一步折叠行,然后再次执行此操作,直到我折叠所有可能的行。
我的功能
fun = function(x) {
<stuff the function does>
return(output) }
我想在其自己的输出上调用该函数,并重复此操作,直到输出不再因进一步的函数调用而改变。
我试过这个:
fun = function(x) {
<stuff>
output = resulting_dt
while (!identical(x,output)) {fun(output)}
return(output)
}
但这给了我一个错误:
Error in eval(expr, envir, enclos) : object '__' not found
我确信有一种方法可以完成这项工作,但我对 R 编程相当陌生,这是我必须编写的第一个真正的程序,因此非常感谢任何帮助或建议!
**编辑:我采用了@gregor 和@42- ** 提供的解决方案的组合
fun = function(x) {
<stuff>
output = resulting_dt
if (!identical(x,output)) {return(Recall(output))}
else {return(output)}
}
对于那些对可重现函数感兴趣的人,我在想出一个函数时遇到了麻烦(它花了很长时间),所以这是我实际使用的一个丑陋的函数:
fun <- function (object)
{
num = 1
n = 1
temp = list()
while (n <= object[, length(chr)])
{
if ( (n == (object[, length(chr)])) &&
!( (object[n,chr] == object[n-1,chr]) &&
(abs(object[n,end] - object[n-1,start]) < 500) &&
(((object[n,meth.diff] >= 0) == (object[n-1,meth.diff] >= 0)) ||
((object[n,meth.diff] < 0) == (object[n-1,meth.diff] < 0)))))
{
x = data.table(
chr=object[n,chr], start=object[n,start], end=object[n,end],
meth.diff=object[n,meth.diff], mean_KO=object[n,mean_KO],
mean_WT=object[n,mean_WT], coverage_KO=object[n,coverage_KO],
coverage_WT=object[n,coverage_WT]
)
temp[[num]] = x
n = n + 1
num = num + 1
}
else if ( (object[n,chr] == object[n+1,chr]) &&
(abs(object[n,end] - object[n+1,start]) < 500) &&
(((object[n,meth.diff] >= 0) == (object[n+1,meth.diff] >= 0)) ||
((object[n,meth.diff] < 0) == (object[n+1,meth.diff] < 0))))
{
x = data.table(
chr=object[n,chr], start=object[n,start], end=object[n+1, end], meth.diff= mean(c(object[n,meth.diff], object[n+1,meth.diff])),
mean_KO=(((object[n,mean_KO] * object[n,coverage_KO])/(object[n,coverage_KO] + object[n+1,coverage_KO])) +
((object[n+1,mean_KO] * object[n+1,coverage_KO])/(object[n,coverage_KO] + object[n+1,coverage_KO]))),
mean_WT=(((object[n,mean_WT] * object[n,coverage_WT])/(object[n,coverage_WT] + object[n+1,coverage_WT])) +
((object[n+1,mean_WT] * object[n+1,coverage_WT])/(object[n,coverage_WT] + object[n+1,coverage_WT]))),
coverage_KO=(object[n,coverage_KO] + object[n+1,coverage_KO]),
coverage_WT=(object[n,coverage_WT] + object[n+1,coverage_WT])
)
x[, meth.diff := (mean_KO - mean_WT) ]
temp[[num]] = x
n = n + 2
num = num + 1
}
else
{
x = data.table(
chr=object[n,chr], start=object[n,start], end=object[n,end],
meth.diff=object[n,meth.diff], mean_KO=object[n,mean_KO],
mean_WT=object[n,mean_WT], coverage_KO=object[n,coverage_KO],
coverage_WT=object[n,coverage_WT]
)
temp[[num]] = x
n = n + 1
num = num + 1
}
}
result = rbindlist(temp)
#print(result)
while (!identical(object,result)){fun(result)}
else {return(result)}
}
AND 示例输入 data.table:
library(data.table)
dt = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr1", "chr1", "chr1", "chr1", "chr1"), start = c(842326, 855423,
855426, 855739, 855771, 880164, 880182, 880262, 1005284, 1005315
), end = c(842327L, 855424L, 855427L, 855740L, 855772L, 880165L,
880183L, 880263L, 1005285L, 1005316L), meth.diff = c(9.35200555410902,
19.1839617944039, 29.6734426495636, -12.3375577709254, 4.21809779410175,
50.539925536006, 28.0168014922334, 35.1349192165154, 16.8742940741475,
62.6063420676512), mean_KO = c(9.35200555410902, 19.1839617944039,
32.962962583692, 1.8512250859083, 4.44417336983763, 67.0864799025607,
31.1083297690512, 49.5746020684321, 25.1985773481452, 78.6766354515961
), mean_WT = c(0, 0, 3.28951993412841, 14.1887828568337, 0.226075575735883,
16.5465543665547, 3.09152827681786, 14.4396828519167, 8.32428327399768,
16.0702933839448), coverage_KO = c(139L, 55L, 55L, 270L, 270L,
55L, 55L, 238L, 526L, 499L), coverage_WT = c(120L, 86L, 87L,
444L, 442L, 116L, 115L, 362L, 649L, 647L)), .Names = c("chr",
"start", "end", "meth.diff", "mean_KO", "mean_WT", "coverage_KO",
"coverage_WT"), class = c("data.table", "data.frame"), row.names = c(NA,
-10L))
以及我想要的输出示例(为了后代,因为它与这个问题并不完全相关)
library(data.table)
dt1 = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr1"), start = c(842326, 855423, 855739, 855771, 880164, 1005284
), end = c(842327L, 855427L, 855740L, 855772L, 880263L, 1005316L
), meth.diff = c(9.35200555410902, 24.4191949389371, -12.3375577709254,
4.21809779410175, 36.7726824955192, 39.0419497750433), mean_KO = c(9.35200555410902,
26.073462189048, 1.8512250859083, 4.44417336983763, 49.4237638627169,
51.2332612443618), mean_WT = c(0, 1.65426725011082, 14.1887828568337,
0.226075575735883, 12.6510813671977, 12.1913114693185), coverage_KO = c(139L,
110L, 270L, 270L, 348L, 1025L), coverage_WT = c(120L, 173L, 444L,
442L, 593L, 1296L)), .Names = c("chr", "start", "end", "meth.diff",
"mean_KO", "mean_WT", "coverage_KO", "coverage_WT"), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
【问题讨论】:
-
请包含一个真实的函数而不是伪代码。
-
while将由递归自动处理。将 while 行替换为if (!identical(x, output)) return(fun(output))。出于调试目的,您可能需要输入打印语句或其他内容来检查它的深度。 -
函数很长,可以吗?我还可以提供示例输入/输出数据,我只是认为没有这些东西也可以回答这个问题。你怎么看?
-
我认为我的评论中的回答会起作用,但它假设其他一切都按预期工作。提出问题的理想方法是创建一个最小的工作示例 - 可以说明问题并且可以运行但忽略与问题的特定部分正交的复杂性。
-
@Gregor 我尝试了您的解决方案,包括打印语句帮助很大。该函数实际上是在循环,但事实证明它每次都离开了 data.table 的最后一行,因此每个循环都使表更短,直到在最后一个循环中没有更多行可以使用。 - 看来我需要更改功能才能使其正常工作。感谢您的帮助!
标签: r for-loop while-loop data.table