【问题标题】:calculate density of one point in groups计算组中一点的密度
【发布时间】:2021-04-07 10:27:09
【问题描述】:

我正在绘制一些密度曲线,我想在每组的平均值处添加一个点。但是,我想沿着密度曲线的顶部绘制这些点,而不是在 0 处。有没有办法得出组内平均点的密度值?代码如下:

# make df
df<- data.frame(group=c("a","b",'c'),
           value=rnorm(
             3000,
             mean=c(1,2,3),
             sd=c(1,1.5,1)
           )) 
library(tidyverse)
library(ggridges)
library(ggdist)

方式 1:来自 ggridges ppackage 的密度脊

df %>%

  # calculate mean density per group to use later
  group_by(group)%>%
  mutate(mean_value=mean(value)) %>%
    
  
  ggplot()+
  aes(x=value,y=group)+
  geom_density_ridges()+
  
  # could do with stat summary - blue points
  stat_summary(
    orientation = "y",
    fun = mean,
    geom = "point", 
    color="blue"
  )+
  
  # or could do with geom_point using precalculated value (red points)
  # nudged so we can see both. 
  geom_point(aes(x=mean_value,y=group),
             color="red",
             position = position_nudge(x=.1)
             )

方式 2:来自 ggdist 包的 stat_halfeye

df %>%
  group_by(group)%>%
  mutate(mean_value=mean(value)) %>%
  
  # mutate(mean_density = density(mean_value,value))
  
  
  ggplot()+
  aes(x=value,y=group)+
  stat_halfeye()+
  
  # could do with stat summary
  stat_summary(
    orientation = "y",
    fun = mean,
    geom = "point", 
    color="blue",
    alpha=.8
  )+
  
  # or could do with geom_point using precalculated value
  # nudged so we can see both. 
  geom_point(aes(x=mean_value,y=group),
             color="red",
             position = position_nudge(x=.1)
  )

期望输出:这些蓝色或红色点位于密度曲线的顶部。所以我需要一种类似于“组 + 密度值”的 y 美学。

宁愿使用方式 2 (ggdist) 而不是 geom_density ridges

谢谢

【问题讨论】:

    标签: r ggplot2 kernel-density density-plot


    【解决方案1】:

    我不确定是否有办法计算 ggplot geom/stat 函数中的平均值处的密度曲线高度,因此我创建了几个辅助函数来执行此操作。

    dens_at_mean 以数据的平均值计算密度曲线的高度。 get_mean_coords 按组运行dens_at_mean,然后缩放高度值以匹配stat_halfeye 生成的y 值,并返回可以传递给geom_point 的数据框。

    # Reproducible data
    set.seed(394)
    df<- data.frame(group=c("a","b",'c'),
                    value=rnorm(
                      3000,
                      mean=c(1,2,3),
                      sd=c(1,1.5,1)
                    )) 
    

    # Function to get height of density curve at mean value
    dens_at_mean = function(x) { 
      d = density(x)
      mean.x = mean(x)
      data.frame(mean.x = mean.x,
                 max.y = max(d$y),
                 mean.y = approx(d$x, d$y, xout=mean.x)$y)
    }
    
    # Function to return data frame with properly scaled heights 
    #  to plot mean points
    get_mean_coords = function(data, value.var, group.var) {
    
      data %>% 
        group_by({{group.var}}) %>% 
        summarise(vals = list(dens_at_mean({{value.var}}))) %>% 
        ungroup %>% 
        unnest_wider(vals) %>% 
        # Scale y-value to work properly with stat_halfeye
        mutate(mean.y = (mean.y/max(max.y) * 0.9 + 1:n())) %>% 
        select(-max.y)
    }
    
    df %>%
      ggplot()+
        aes(x=value, y=group)+
        stat_halfeye() +
        geom_point(data=get_mean_coords(df, value, group), 
                   aes(x=mean.x, y=mean.y),
                   color="red", size=2) +
        theme_bw() +
        scale_y_discrete(expand=c(0.08,0.05))
    

    【讨论】:

      猜你喜欢
      • 2021-06-26
      • 2015-05-25
      • 1970-01-01
      • 1970-01-01
      • 2020-04-10
      • 1970-01-01
      • 2020-01-21
      • 2019-05-02
      • 2018-09-27
      相关资源
      最近更新 更多