如果我理解您的操作正确,您的数据看起来有点像这样:
set.seed(3425)
levs <- paste0("chr", c(1:10, "M"))
join_chr <- tibble(
seqnames = sample(factor(levs, levels = levs), size = 1000, replace = TRUE),
gtype = sample(c("pseudogene", "RNA", "prot_coding", "rest"), size = 1000, replace = TRUE),
value = round(runif(n = 1000, min = 2e4, max = 2e8)))
join_chr
# # A tibble: 1,000 x 3
# seqnames gtype value
# <fct> <chr> <dbl>
# 1 chr1 pseudogene 16170520
# 2 chr8 pseudogene 193230157
# 3 chr9 RNA 6846001
# 4 chr8 prot_coding 64930082
# 5 chr8 pseudogene 11873972
# 6 chr1 pseudogene 136993074
# 7 chr9 rest 53026355
# 8 chr6 prot_coding 36841130
# 9 chr5 prot_coding 157630684
# 10 chr10 prot_coding 29793808
# # … with 990 more rows
您可以使用summarise 来计算密度。将此作为单独的步骤执行,您可以检查摘要以确保其行为符合您的预期。
dens_chr <- join_chr %>%
group_by(seqnames, gtype) %>%
summarise(density = n() / max(value), .groups = "drop")
dens_chr
# # A tibble: 44 x 3
# seqnames gtype density
# <fct> <chr> <dbl>
# 1 chr1 prot_coding 0.0000000872
# 2 chr1 pseudogene 0.000000154
# 3 chr1 rest 0.000000162
# 4 chr1 RNA 0.000000119
# 5 chr2 prot_coding 0.000000110
# 6 chr2 pseudogene 0.0000000833
# 7 chr2 rest 0.0000000893
# 8 chr2 RNA 0.000000143
# 9 chr3 prot_coding 0.000000145
# 10 chr3 pseudogene 0.000000126
# # … with 34 more rows
然后你可以绘制这个。
ggplot(data = dens_chr,
mapping = aes(x = density, y = seqnames)) +
geom_bar(stat = "summary", fun = max) +
facet_grid(. ~ gtype) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))