R：ggplot堆积条形图，在y轴上计数但百分比作为标签答案

【问题标题】：R: ggplot stacked bar chart with counts on y axis but percentage as labelR：ggplot堆积条形图，在y轴上计数但百分比作为标签
【发布时间】：2016-10-15 12:40:34
【问题描述】：

我正在寻找一种方法来用百分比标记堆积条形图，而 y 轴显示原始计数（使用 ggplot）。这是没有标签的情节的 MWE：

library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
                       data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
                                "north", "north", "north", "north", "south", "south", "south",
                                "A", "B", "B", "C", "A", "A", "C"),
                      byrow = FALSE))

colnames(df) <- c("ID", "region", "species")

p <- ggplot(df, aes(x = region, fill = species))
p  + geom_bar()

我有一个更大的表，R 可以很好地计算每个地区的不同物种。现在，我想同时显示原始计数值（最好在 y 轴上）和百分比（作为标签）来比较区域之间的物种比例。

我使用geom_text() 尝试了很多东西，但我认为与其他问题（e.g. this one）的主要区别在于

我没有单独的 y 值列（它们只是每个区域不同物种的计数）和
我需要每个区域的标签总和达到 100%（因为它们被认为代表不同的群体），而不是整个地块的所有标签。

非常感谢任何帮助！

【问题讨论】：

当你做一些非标准的事情时，你通常需要自己计算数字。可能可以在 ggplot 中执行此操作，但这并不简单。最好使用为数据操作构建的函数，然后尝试在 ggplot 中进行数据操作。

标签： r ggplot2 geom-text

【解决方案1】：

正如@Gregor 提到的，分别汇总数据，然后将数据汇总提供给 ggplot。在下面的代码中，我们使用dplyr 即时创建摘要：

library(dplyr)

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n),               # Calculate percent within each region
                ypos = cumsum(n) - 0.5*n),  # Calculate label positions
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))

更新：使用 dplyr 0.5 及更高版本，您不再需要提供 y 值来使每个条形内的文本居中。相反，您可以使用position_stack(vjust=0.5):

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n)),              # Calculate percent within each region
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")), 
            position=position_stack(vjust=0.5))

【讨论】：

非常感谢，这正是我想要的！
请注意，上面的代码不会产生所示的条形图！除此之外，您还必须使用group_by 命令：df %>% group_by(region) %>% count(region, species) %>% mutate(pct=n/sum(n)
group_by 是不必要的。 count(x,y) 等同于 group_by(x,y) %>% tally。

【解决方案2】：

我同意约翰娜的观点。你可以试试：

d <- aggregate(.~region+species, df, length)
d$percent <- paste(round(ID/sum(ID)*100),'%',sep='')
ggplot(d, aes(region, ID, fill=species)) + geom_bar(stat='identity') + 
  geom_text(position='stack', aes(label=paste(round(ID/sum(ID)*100),'%',sep='')), vjust=5)

【讨论】：

感谢您的输入，但在您的解决方案中，每个堆栈的百分比总和不等于 100%。顺便说一句：我想应该是d$percent <- paste(round(d$ID/sum(d$ID)*100),'%',sep='')。