ggplot2 制作奇怪的涂抹线答案

【问题标题】：ggplot2 making strange, smeared linesggplot2 制作奇怪的涂抹线
【发布时间】：2017-04-17 00:12:36
【问题描述】：

我有以下代码可以使用 ggplot2 (here is the data file) 进行绘图：

sig1 <- ggplot(var_dat_df %>%
                filter(!(variable %in% c("LogDiffSq", "cusum_ker", "de_ker", "hr_ker"))),
               aes(x = i, y = -log10(value), group = variable, color = variable)) +
          geom_line() +
          scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"),
                             labels = c("CUSUM", "DE", "HR"),
                             name = "Statistic") +
          geom_hline(yintercept = -log10(0.05), color = "red", linetype = "dashed") +
          scale_y_continuous(breaks = c(-log10(0.05), 5, 10, 15, 17),
                             labels = expression(alpha, 5, 10, 15, 17)) +
          xlab("Index") + ylab(expression(-log[10](p))) +
          labs(title = "Statistical Significance of Detected Change",
               subtitle = "Without Using Kernel Estimation for Long-Run Variance") +
          theme_bw() +
          theme(plot.title = element_text(size = rel(2)),
                legend.position = "bottom")

出现以下错误信息：

Warning message:
In eval(expr, envir, enclos) : NaNs produced

这是结果图：

顶部的绿色条是什么？它们为什么会出现，我该如何摆脱它们？

【问题讨论】：

不能说没有数据，这可能是它的原因（抱歉，我无法访问 Dropbox）。顺便说一句，只是一些友好的建议：您个人可能认为 dplyr 是上帝赐给 R 编程的礼物，但其他一些人则不然。如果您使用基数 R 进行简单的子集化操作，则会增加潜在的回答者池。无论如何，子集与您的问题无关，不应成为 minimal reproducible example. 的一部分

标签： r plot ggplot2 statistics nan

【解决方案1】：

这是因为您对 log10 的输入值为零（或非常小）。你可以试试这个：

value_for_log0 <- NA # define value_for_log0 as the value you want to have as output of log10 when it's nearly 0 

ggplot(var_dat_df %>%
         filter(!(variable %in% c("LogDiffSq", "cusum_ker", "de_ker", "hr_ker"))),
       aes(x = i, y = ifelse(round(value, 15)==0, value_for_log0,-log10(value)), group = variable, color = variable)) +
  geom_line() +
  scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"),
                     labels = c("CUSUM", "DE", "HR"),
                     name = "Statistic") +
  geom_hline(yintercept = -log10(0.05), color = "red", linetype = "dashed") +
  scale_y_continuous(breaks = c(-log10(0.05), 5, 10, 15, 17),
                     labels = expression(alpha, 5, 10, 15, 17)) +
  xlab("Index") + ylab(expression(-log[10](p))) +
  labs(title = "Statistical Significance of Detected Change",
       subtitle = "Without Using Kernel Estimation for Long-Run Variance") +
  theme_bw() +
  theme(plot.title = element_text(size = rel(2)),
        legend.position = "bottom")

【讨论】：

这假装从未得出最重要的结果。这对我来说似乎是个问题。 OP 应该计算对数尺度上的 p 值或创建一个可以将这些描述为 p < ... 的图表。
@Roland，是的，在这种情况下，我们可能希望 p~0（最重要）值的对数：value_for_log0 定义为略高于可能的最高值负对数似然值，因为这里我们显示 0-15 的 y 轴，可能我们可以将 value_for_log0 定义为 16 或其他值，以便它可以显示为最重要的。这就是为什么保持常量 value_for_log0 可配置的原因。