【发布时间】:2017-06-22 21:40:54
【问题描述】:
我有一个包含序列和读取次数的 data.table,如下所示:
sequence num_reads
1: AACCTGCCG 1
2: CGCGCTCAA 12
3: AGTGTGAGC 3
4: TGGGTACAC 11
5: GGCCGCGTG 15
6: CCTTAAGAG 2
7: GCGGAACTG 9
8: GCGTTGTAG 17
9: GTTGTAGCG 20
10: ACACGTGAC 16
我想使用 data.table 将两个新列添加到此表中,基于应用 dpois() 的结果和两个权重和两个 lambda。正确的输出应该是这样的(基于使用data.frame):
sequence num_reads clus1 clus2
1 AACCTGCCG 1 2.553269503552647000377e-03 1.610220613932057849571e-03
2 CGCGCTCAA 12 1.053993989051599418361e-02 2.887608256917401083896e-02
3 AGTGTGAGC 3 2.085170094567994833468e-02 1.717568654860860896672e-02
4 TGGGTACAC 11 1.806846838374168498498e-02 4.331412385376097462508e-02
5 GGCCGCGTG 15 1.324248858039188620275e-03 5.415587646672919558410e-03
6 CCTTAAGAG 2 8.936443262434262332916e-03 6.440882455728230530922e-03
7 GCGGAACTG 9 4.056186780023639942838e-02 7.444615037365168164207e-02
8 GCGTTGTAG 17 2.385595369261770803265e-04 1.274255916864215588610e-03
9 GTTGTAGCG 20 1.196285397159046524451e-05 9.538289904012846548518e-05
10 ACACGTGAC 16 5.793588753921446012421e-04 2.707793823336458478163e-03
但是当我尝试使用 data.table 时,我似乎无法获得正确的结果。这是我尝试过的(基于围绕该主题提出的类似问题):
pois = function(n, p, l){return(dpois(as.numeric(as.character(n)), l)*p) }
x = x[, c(paste("clus", seq(1,2), sep = '')) := pois(num_reads, c(0.4,0.6), c(7,8)), by = seq_len(nrow(x))]
结果如下:
sequence num_reads clus1 clus2
1: AACCTGCCG 1 2.553269503552647000377e-03 2.553269503552647000377e-03
2: CGCGCTCAA 12 1.053993989051599418361e-02 1.053993989051599418361e-02
3: AGTGTGAGC 3 2.085170094567994833468e-02 2.085170094567994833468e-02
4: TGGGTACAC 11 1.806846838374168498498e-02 1.806846838374168498498e-02
5: GGCCGCGTG 15 1.324248858039188620275e-03 1.324248858039188620275e-03
6: CCTTAAGAG 2 8.936443262434262332916e-03 8.936443262434262332916e-03
7: GCGGAACTG 9 4.056186780023639942838e-02 4.056186780023639942838e-02
8: GCGTTGTAG 17 2.385595369261770803265e-04 2.385595369261770803265e-04
9: GCGTTGTAG 20 1.196285397159046524451e-05 1.196285397159046524451e-05
10: ACACGTGAC 16 5.793588753921446012421e-04 5.793588753921446012421e-04
我使用 data.table 而不是 data.frame 的原因是我的真实数据有 100,000 行。我研究了this 和this 的答案,但我还没有找到解决方案。
您的任何提示将不胜感激。谢谢!
【问题讨论】:
标签: r dataframe data.table variable-assignment