如何计算和标记R ggplot中多个条件/方面的分布峰值？答案

【问题标题】：How to calculate and label peak value of distribution by multiple conditions/facets in R ggplot?如何计算和标记R ggplot中多个条件/方面的分布峰值？
【发布时间】：2021-02-19 22:37:14
【问题描述】：

虽然这个问题看起来与其他问题相似，但我的想法有一个关键区别。

我希望能够计算和/或打印（绘制它是最终目标，但在数据框中计算它是主要目标）每个 SUB 的密度曲线的峰值-CONDITION BY FACET 密度图如下所示：

因此，理想情况下，我将能够知道每个条件下密度曲线的最高峰对应的强度（x 轴值）。

这是一些虚拟数据：

set.seed(1234)

library(tidyverse)
library(fs)
n = 100000
silence = factor(c("sil1", "sil2", "sil3", "sil4", "sil5"))
treat = factor(c("con", "uos", "uos+wnt5a", "wnt5a"))
silence = rep(silence, n)
treat = rep(treat, n)
intensity = sample(4000:10000, n)

df <- cbind(silence, treat, intensity)
df$silence <- silence
df$treat <- treat

我尝试过的：

对主要 DF 进行子集化并计算每个条件的密度，但这可能需要几天时间
接近这个答案的东西：Calculating peaks in histograms or density functions 但不完全是。我个人认为数据作为直方图看起来更好，但这为强度数据构建了任意数量的箱（连续测量）。直方图如下所示：

同样，只需在控制台中获取每个组的峰值（即通过沉默子分布进行治疗）就足够了，但将它们作为垂直线添加到这些图表将是顶部的甜樱桃（它也可能使它很忙，所以我稍后会看到那部分）

谢谢！！

【问题讨论】：

你用什么代码来绘制密度曲线？
请检查您的虚拟数据的代码，为我产生多个错误，例如如果source 不是现有对象，source = rep(source, n) 将不起作用。
谢谢 ^ 我已经更新它以反映所需的数据
你用什么代码来绘制密度曲线？我提供了一个答案，它采用预先存在的图并提取每个方面中每条曲线的顶点，但如果您正在寻找一种在上游找到这些值的解决方案，这取决于您使用的方法和参数生成你的密度曲线。
ggplot 中的 geom_density()

标签： r ggplot2 probability-density

【解决方案1】：

根据您生成密度图的方式，可能有一种更直接的方法可以在密度计算进入 ggplot 之前重新创建它。这将是获取峰值并将其保留为数据格式的最简单方法。

如果没有这个，这里有一个 hack 应该可以正常工作，但需要一些组合才能将提取的点恢复到原始数据的形式中。

这是一个像你这样的情节：

mtcars %>% 
  mutate(gear = as.character(gear)) %>%
  ggplot(aes(wt, fill = gear, group = gear)) +
  geom_density(alpha = 0.2) +
  facet_wrap(~am) ->my_plot

以下是构成该图的组件：

ggplot_build(my_plot) -> my_plot_innards

通过一些丑陋的黑客攻击，我们可以提取构成曲线的点，并使它们看起来有点像我们的原始数据。一些信息被破坏，例如齿轮值 3/4/5 变为组 1/2/3。可能有一种很酷的方法可以转换回来，但我还不知道。

extracted_points <- tibble(
  wt = my_plot_innards[["data"]][[1]][["x"]],
  y = my_plot_innards[["data"]][[1]][["y"]],
  gear = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK
  am = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK
)

ggplot(extracted_points, aes(wt, y, fill = gear)) +
  geom_point(size = 0.3) +
  facet_wrap(~am)

extracted_points_notes <- extracted_points %>%
  group_by(gear, am) %>%
  slice_max(y)


my_plot +
  geom_point(data = extracted_points_notes,
             aes(y = y), color = "red", size = 3, show.legend = FALSE) +
  geom_text(data = extracted_points_notes, hjust = -0.5,
             aes(y = y, label = scales::comma(y)), color = "red", size = 3, show.legend = FALSE)

【讨论】：

这不适合我。这是我尝试这样做的代码：
ggplot_build(f1_dens) -> my_plot_innards > extracted_points <- tibble( + wt = my_plot_innards[["data"]][[1]][["x"]], + y = my_plot_innards[["data"]][[1]][["y"]], + gear = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK + am = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK + )
extracted_points <- tibble( + wt = my_plot_innards[["data"]][[1]][["x"]], + intensity = my_plot_innards[["data"]][[1]][["y"]], + treatment = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK + silence = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK + ) > ggplot(extracted_points, aes(wt, intensity, fill = treatment)) + + geom_point(size = 0.3) + + facet_wrap(~am)