【发布时间】:2017-04-14 07:46:37
【问题描述】:
我有一个从 bedGraph 文件导入到 Granges 对象的全基因组 ChIP-seq 信号。我想在覆盖所有峰值的固定宽度间隔上绘制平均信号。如何将信号提取到数值向量中,以便对它们进行平均?
例如考虑:
library(GenomicRanges)
set.seed(1)
signal <- GRanges(
seqnames = Rle(c("chr1"), c(10)),
ranges = IRanges(1:10*10, end = 1:10*10+5),
score = runif(10))
intervals <- GRanges(
seqnames = Rle(c("chr1"), c(5)),
ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))
所以信号看起来像:
GRanges with 10 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 10, 15] * | 0.2655086631421
[2] chr1 [ 20, 25] * | 0.37212389963679
[3] chr1 [ 30, 35] * | 0.572853363351896
[4] chr1 [ 40, 45] * | 0.908207789994776
[5] chr1 [ 50, 55] * | 0.201681931037456
[6] chr1 [ 60, 65] * | 0.898389684967697
[7] chr1 [ 70, 75] * | 0.944675268605351
[8] chr1 [ 80, 85] * | 0.660797792486846
[9] chr1 [ 90, 95] * | 0.62911404389888
[10] chr1 [100, 105] * | 0.0617862704675645
---
seqlengths:
chr1
NA
间隔看起来像:
GRanges with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 20, 29] *
[2] chr1 [ 40, 49] *
[3] chr1 [ 62, 71] *
[4] chr1 [ 81, 90] *
[5] chr1 [103, 112] *
---
seqlengths:
chr1
NA
所以我想对向量进行平均:
Rle(c(0.372, 0), c(6, 4)) # [ 20, 29]
Rle(c(0.908, 0), c(6, 4)) # [ 40, 49]
Rle(c(0.898, 0, 0.945), c(4, 4, 2)) # [ 62, 71]
Rle(c(0.661, 0, 0.629), c(5, 4, 1)) # [ 81, 90]
Rle(c(0.061, 0), c(3, 7)) # [103,112]
如果没有 for 循环和大量繁琐且容易出错的区间运算,我该如何做到这一点?我希望 GenomicRanges 包会包含这种功能,但我在手册中看不到它。我一直在尝试使用subsetByOverlaps,但这似乎并没有将信号分数传递到结果中,似乎也没有帮助提取上面的Rle向量。
【问题讨论】:
标签: r bioconductor