不要放弃零计数：躲避条形图答案

【问题标题】：Don't drop zero count: dodged barplot不要放弃零计数：躲避条形图
【发布时间】：2012-04-26 03:14:39
【问题描述】：

我正在 ggplot2 中制作一个闪避的条形图，并且一个分组的计数为零，我想显示。我记得不久前在HERE 上看到了这个，并认为scale_x_discrete(drop=F) 会起作用。它似乎不适用于闪避的条形图。如何显示零计数？

例如，（下面的代码）在下图中，type8~group4 没有示例。我仍然希望该图显示零计数的空白空间，而不是消除条形图。我该怎么做？

mtcars2 <- data.frame(type=factor(mtcars$cyl), 
    group=factor(mtcars$gear))

m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
        scale_x_discrete(drop=F)
p2

【问题讨论】：

标签： r ggplot2

【解决方案1】：

在不先制作汇总表的情况下，您可以这样做。
它在我的 CRAN 版本（2.2.1）中不起作用，但在 ggplot 的最新开发版本（2.2.1.900）中我没有问题。

ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
  geom_bar(position = position_dodge(preserve = "single"))

http://ggplot2.tidyverse.org/reference/position_dodge.html

【讨论】：

preserve = "single"救了我的命；）
preserve = "single" 效果很好！但是，它不会移动条形标签。当我使用 geom_text 时，“单个”条的标签未显示在条的中间。

【解决方案2】：

更新geom_bar()需要stat = "identity"

值得一提的是：上面的计数表 dat 包含 NA。有时，使用显式 0 来代替是有用的；例如，如果下一步是将计数放在条形上方。下面的代码就是这样做的，尽管它可能并不比 Joran 的简单。它涉及两个步骤：使用dcast 获取计数的交叉表，然后使用melt 融化表格，然后像往常一样使用ggplot()。

library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))

dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt

ggplot(dat.melt, aes(x = type,y = value, fill = variable)) + 
  geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
  ylim(0, 14) +
  geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)

【讨论】：

这会更好一些。我已经完成了图形，它花了一些破烂的垃圾，但这解决了这些问题。很好的回应。 +1

【解决方案3】：

我知道的唯一方法是预先计算计数并添加一个虚拟行：

dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))

ggplot(dat,aes(x = type,y = count,fill = group)) + 
    geom_bar(colour = "black",position = "dodge",stat = "identity")

我认为使用stat_bin(drop = FALSE,geom = "bar",...) 会起作用，但显然它不起作用。

【讨论】：

不像我希望的那么容易，但在我的搜索中找不到合适的答案，所以我应该认为这需要一些修改。谢谢乔兰。效果很好 +1
@TylerRinker 老实说，我觉得stat_bin(drop = FALSE, geom = "bar",position = "dodge",...) 应该这样做；至少，文档强烈建议它会这样做。我很想知道邮件列表中更多知识渊博的人为什么不这样做。
我现在正在做一个项目，但我稍后会把它列在列表上并在这里报告。

【解决方案4】：

我也问过同样的问题，但我只想使用data.table，因为它对于更大的数据集来说是一种更快的解决方案。我在数据上添加了注释，以便那些经验不足并想了解我为什么要做我所做的事情的人可以很容易地做到这一点。以下是我操作mtcars 数据集的方法：

library(data.table)
library(scales)
library(ggplot2)

mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"

这是生成图表的调用

ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) + 
               geom_bar(position="dodge", stat="identity") + 
               ylab("Count") + theme(legend.position="top") + 
               scale_x_discrete(drop = FALSE)

它会生成这个图表：

另外，如果有连续的数据，比如diamonds数据集中的那个（感谢mnel）：

library(data.table)
library(scales)
library(ggplot2)

diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut) 
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]

然后使用CJ 也可以。

data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
             geom_bar(stat = "identity", position = "dodge") + 
             ylab("Mean Carat") + xlab("Color")

给我们这张图：

【讨论】：

【解决方案5】：

使用来自dplyr 的count 和complete 来执行此操作。

library(tidyverse)

mtcars %>% 
    mutate(
        type = as.factor(cyl),
        group = as.factor(gear)
    ) %>%
    count(type, group) %>% 
    complete(type, group, fill = list(n = 0)) %>%
    ggplot(aes(x = type, y = n, fill = group)) +
        geom_bar(colour = "black", position = "dodge", stat = "identity")

【讨论】：

非常好...我怀疑 ggplot2 到 CRAN 的下一个版本将包含 @S_BRT 的答案，这似乎是 ebst 解决方案 github.com/tidyverse/ggplot2/blob/master/R/position-dodge.r

【解决方案6】：

您可以利用table() 函数的特性，该函数计算一个因子在所有其级别的出现次数

# load plyr package to use ddply
library(plyr) 

# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise, 
 types = as.numeric(names(table(type))), 
 counts = as.numeric(table(type)))

# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
 geom_bar(stat='identity',colour="black", position="dodge")

【讨论】：