【发布时间】:2019-12-14 17:34:15
【问题描述】:
我正在尝试在 R 中生成一个分组小提琴图的变体(最好使用ggplot2),类似于下面的:
由以下可重现的示例代码生成:
# Load libraries #
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_violin(draw_quantiles = 0.5) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
我想要的变化是中位数上方的密度与中位数下方的密度相比应该具有不同的阴影,如下图所示:
我使用以下代码为数据中的X = X1 和Z = Za 组合生成了上述(单个)小提琴图:
## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>%
filter(X == "X1", Z == "Za") %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>%
filter(X == "X1", Z == "Za") %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>%
mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))
# Find density values at limits #
df.lim.2 <- df.dens %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded single violin plot #
df.dens %>%
ggplot(aes(x = LOC)) +
geom_area(aes(y = DENS, alpha = COL), fill = "red") +
geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
正如您将在代码中注意到的那样,我正在使用density 函数从头开始构建小提琴图,然后水平翻转轴。当我尝试生成分组小提琴图时出现问题,主要是因为组@987654332@ 和Z 将出现在其中的轴已经用于密度的“高度”。我确实尝试通过按组重复所有计算来达到相同的结果,但我被困在最后一步:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>%
group_by(X, Z) %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>%
group_by(X, Z) %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
left_join(., df.lim, by = c("X", "Z")) %>%
filter(LOC >= Y_min, LOC <= Y_max) %>%
mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))
# Find density values at limits by group #
df.lim.2 <- df.dens %>%
group_by(X, Z) %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded grouped violin plot #
df.dens %>%
ggplot(aes(x = LOC, group = interaction(X, Z))) +
# The following two lines don't work when included #
#geom_area(aes(y = DENS, alpha = COL), fill = "red") +
#geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
运行上面的代码将为每个组生成小提琴图的轮廓,每个组都在另一个之上。但是一旦我尝试包含 geom_area 行,代码就会失败。
我的直觉告诉我,我需要以某种方式将“阴影”小提琴图生成为新的geom,然后可以在ggplot2 图形的一般结构下使用它,但我不知道该怎么做,因为我的编码技能并没有延伸那么远。任何帮助或指示,无论是沿着我的思路还是在不同的方向,都将不胜感激。感谢您的宝贵时间。
【问题讨论】:
-
我不认为
geom_area()能解决你的问题,当小提琴在0 附近时。最好用geom_polygon()替换它。我发现创建自己的 geoms 的最佳指南在这里:cran.r-project.org/web/packages/ggplot2/vignettes/…。
标签: r ggplot2 violin-plot