【问题标题】:R obtain matrix with overlap in rangesR获得范围重叠的矩阵
【发布时间】:2021-01-15 18:12:12
【问题描述】:

我有一个数据框,其范围如下所示:

df <- data.frame(label = c("A", "B", "C"),
                 start = c(2, 11, 22),
                 stop = c(37, 45, 29))

现在我想获得一个矩阵,在该矩阵中我可以看到 A:B、B:C、A:C 等之间有多少重叠(百分比)。即,有多少范围 A 出现在范围 B 等中. 输出应该是这样的:

          A       B      C
 A        100     76.5   100
 B        74.3    100    100
 C        20      20.6   100

我试图用 IRanges 或 GRanges 获得这样的矩阵,但这似乎是不可能的。希望有人可以帮助我!

【问题讨论】:

  • 请解释一下您是如何得出 23.5% 的。
  • 抱歉,应该是 76.5。我会在问题中更改它

标签: r range overlap


【解决方案1】:

基础 R

out <- 100 * with(df, t((outer(stop, stop, pmin) - outer(start, start, pmax)) / (stop - start)))
dimnames(out) <- list(df$label, df$label)
out
#           A         B   C
# A 100.00000  76.47059 100
# B  74.28571 100.00000 100
# C  20.00000  20.58824 100

tidyverse

library(dplyr)
library(tidyr)
expand_grid(Var1 = df$label, Var2 = df$label) %>%
  left_join(df, by = c("Var1" = "label")) %>%
  left_join(df, by = c("Var2" = "label")) %>%
  mutate(
    start = pmax(start.y, start.x),
    stop  = pmin(stop.x, stop.y),
    overlap = 100 * (stop - start) / (stop.y - start.y)
  ) %>%
  pivot_wider(Var1, names_from = Var2, values_from = overlap)
# # A tibble: 3 x 4
#   Var1      A     B     C
#   <chr> <dbl> <dbl> <dbl>
# 1 A     100    76.5   100
# 2 B      74.3 100     100
# 3 C      20    20.6   100

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-10-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-10-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多