【问题标题】:ggplot2: Colour specific ID red and other points conditional on separate variableggplot2:颜色特定的 ID 红色和其他以单独变量为条件的点
【发布时间】:2021-11-23 22:26:02
【问题描述】:

我想向我的同类群组中的参与者发送他们以图形方式显示的具体结果。有超过一百个参与者有多个时间点,我希望发送他们的方式是将他们的特定数据点涂成红色,并保持其他样本 ID 以分组为条件(即疫苗状态)。我知道如何根据组对点进行着色,但在包含基于特定研究 ID 的另一个条件时遇到了麻烦。下面插入 Plotly 的代码可以无缝运行,但 Plotly 也存在问题。我更喜欢使用 ggplot2 来生成图表。

很遗憾,我无法共享数据集,所以这里是一个简短的描述:

  • 样本 = 每个参与者唯一的研究 ID
  • 时间点 = 与疫苗接种日期(第一次或第二次)相关的血液样本采集
  • 组 = 描述每个时间点每个参与者的疫苗和感染状态的条件(即第 0 组 -> 无疫苗;1 -> 单剂等)

这是我的代码:

df %>% 
  mutate(Group = as.factor(Group)) %>% 
  mutate(Timepoint = fct_relevel(Timepoint, c("Pre-vaccine",
                                              "< 3.5 weeks after first",
                                              "3-6 weeks after first",
                                              "6-12 weeks after first", 
                                              "> 12 weeks after first",
                                              
                                              "< 3 weeks after second", 
                                              "3-6 weeks after second", 
                                              "6-12 weeks after second",
                                              "> 12 weeks after second"))) %>% 
  droplevels() %>% 
  filter(Assay == "Antibody levels" & Group %in% c(0,1,2,3,4)) %>% 
  ggplot(aes(Timepoint, Concentration)) +
  geom_jitter(position = position_jitter(width =  0.0001), 
              aes(fill = ifelse(str_detect(Sample, "V1"), Sample, Group)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
              pch = 21, 
              size = 2.5) +
  scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  breaks = breaks, 
                minor_breaks = minor_breaks) +
  theme_classic()+
  labs(title = "Antibody levels",
       x = "",
       y = "Concentration (AU/ml)") +
  annotation_logticks(base = 10, sides = "l") +
  scale_fill_manual(values = pal) +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.y = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
        legend.position = "none")

图 1. 仅通过分组着色的示例。忽略样本 ID。

图 2. 在 Plotly 中生成的图显示了预期的结果。这可以在ggplot2中完成吗?

编辑:它似乎实际上在 ggplot 中工作,但红点被其他数据点掩盖。有没有办法将它们移到最前面,同时尽量减少代码量?

【问题讨论】:

  • 可能值得一看包gghighlight

标签: r ggplot2 colors


【解决方案1】:

我会尽力帮助你的。但如果没有你的数据,我将无法做到这一点。所以我生成了应该与您的数据相似的数据。注意Timepoint 的一个小细节,它使用 labels 属性来分解变量。这与您的数据有点不同。

library(tidyverse)

n=400
TimepointLev = c("Pre-vaccine",
                 "< 3.5 weeks after first",
                 "3-6 weeks after first",
                 "6-12 weeks after first", 
                 "> 12 weeks after first",
                 "< 3 weeks after second", 
                 "3-6 weeks after second", 
                 "6-12 weeks after second",
                 "> 12 weeks after second")

df = tibble(
  Sample = 1:n,
  Group = sample(0:4, n, replace = TRUE) %>% paste() %>% factor(),
  Timepoint = sample(1:9, n, replace = TRUE) %>% paste() 
  %>% factor(labels=TimepointLev),
  Concentration = sample(1:15000, n, replace = TRUE)
)
df

输出

# A tibble: 400 x 4
   Sample Group Timepoint               Concentration
    <int> <fct> <fct>                           <int>
 1      1 2     6-12 weeks after second         12021
 2      2 1     > 12 weeks after second         13608
 3      3 4     6-12 weeks after second         10417
 4      4 0     3-6 weeks after second           2545
 5      5 2     Pre-vaccine                      2167
 6      6 2     6-12 weeks after first          13725
 7      7 3     3-6 weeks after second           3367
 8      8 0     Pre-vaccine                      3900
 9      9 1     > 12 weeks after second           144
10     10 0     < 3 weeks after second           8219

现在让我们准备图表。您要突出显示的数据应重新绘制。就我而言,这是 Sample 可被 13 整除的数据。

df %>% ggplot(aes(Timepoint, Concentration, fill=Group))+
  geom_jitter(position = position_jitter(width =  0.1),pch = 21, size = 2.5)+
  geom_point(data = df %>% filter(Sample %% 13==0),
              position = position_jitter(width =  0.2), pch = 23, size = 3, 
              fill="red", color = "red")

请注意,我应用了数据过滤器data = df %&gt;% filter(Sample %% 13 == 0)。您可以根据自己的数据制作自己的过滤器。

最后,还有一件事。我完全不明白你为什么使用geom_jitter 并设置position_jitter(width = 0.0001)。这是完全没有意义的。 geom_jitter 只是为了让数据有点分散,这样它们就不会重叠。但是,当你设置width = 0.0001时,就好像你根本不使用jitter。

【讨论】:

    【解决方案2】:

    我将首先说使用红色作为颜色来表示一件事,而已经使用颜色来表示另一件事是令人困惑的。

    您可以在兴趣点周围画一个红色圆圈。或者添加一个箭头。

    正如建议的那样,gghighlight 可能会为您提供一个选项。也可能不会。

    不想根据我的个人喜好重新绘制你的整个图... ...我可以建议来自 ggbeeswarm 的 geom_beoswarm() 可以让你的图在理解数据分布方面更清晰。

    好的,现在来解决根本问题。当我们没有您的数据样本时总是很棘手

    require(ggpolot)
    require(tidyverse)
    seed(42)
    someData <- tibble(
        Timepoint = as.factor(rep(seq(0,8),10)),
        Concentration = sample(1:100000, 90, replace=F),
        Group = rep(seq(0,4), 18 )
    ) %>%
        mutate( Sample = paste0("V",ceiling(row_number()/9)))
    
    someData %>%
        mutate(Group = as.factor(Group)) %>% 
        mutate(Timepoint = fct_recode(Timepoint,  `Pre-vaccine` = "0",
                                                 "< 3.5 weeks after first" = "1",
                                                 "3-6 weeks after first" = "2",
                                                 "6-12 weeks after first" ="3", 
                                                 "> 12 weeks after first" = "4",
                                                    "< 3 weeks after second" = "5", 
                                                    "3-6 weeks after second" = "6", 
                                                    "6-12 weeks after second" ="7" ,
                                                    "> 12 weeks after second" = "8")) -> someData
    
    # You have defined some constants that aren't explained
    pal <- c("V1" = "red", "0"= "Purple", "1" = "Blue", "2" = "Green", "3" = "Yellow", "4" = "Black", "5"="Pink")
    # I've simply omitted breaks and minor_breaks from your code below
    

    这只是您使用上面示例数据的图表

    someData %>%
    ggplot(aes(Timepoint, Concentration)) +
        geom_jitter(position = position_jitter(width =  0.0001), 
                    aes(fill = ifelse(str_detect(Sample, "V1"), Sample, Group)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
                    pch = 21, 
                    size = 2.5) +
        scale_y_log10(labels = scales::comma,
                      limits = c(10,10000000),
                      #breaks = breaks, 
                      #minor_breaks = minor_breaks
                      ) +
        theme_classic()+
        labs(title = "Antibody levels",
             x = "",
             y = "Concentration (AU/ml)") +
        annotation_logticks(base = 10, sides = "l") +
        scale_fill_manual(values = pal) +
        theme(plot.title = element_text(hjust = 0.5),
              axis.text.y = element_text(face = "bold"),
              axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
              legend.position = "none")
    
    

    您可以简单地在第一行上添加第二个 geom_jitter,代码中最近的一行位于另一行之上,并且只需为您要突出显示的行指定填充即可达到您的要求

    someData %>%
        ggplot(aes(Timepoint, Concentration)) +
        geom_jitter(position = position_jitter(width =  0.0001), 
                    aes(fill = Group), 
                    pch = 21, 
                    size = 2.5) +
        geom_jitter(position = position_jitter(width =  0.0001), 
                    aes(fill = ifelse(str_detect(Sample, "V1"), "V1", NA)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
                    pch = 21, 
                    size = 2.5) +    
        scale_y_log10(labels = scales::comma,
                      limits = c(10,10000000),
                      #breaks = breaks, 
                      #minor_breaks = minor_breaks
        ) +
        theme_classic()+
        labs(title = "Antibody levels",
             x = "",
             y = "Concentration (AU/ml)") +
        annotation_logticks(base = 10, sides = "l") +
        scale_fill_manual(values = pal) +
        theme(plot.title = element_text(hjust = 0.5),
              axis.text.y = element_text(face = "bold"),
              axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
              legend.position = "none")
    

    在我看来,更好的做法是将填充保留为“组”,但要突出显示关键数据点

    # I've added a new palette that will highlight the sample of interest
    pal2 <- c("V1" = "red", "V2"= NA, "V3" = NA, "V4" = NA, "V5" = NA, 
              "V6"=NA, "V7" = NA, "V8" = NA, "V9" = NA, "V10"= NA)
    
    someData %>%
        ggplot(aes(Timepoint, Concentration), warn) +
        geom_jitter(position = position_jitter(width =  0.0001), 
                    aes(fill = Group), 
                    pch = 21, 
                    size = 2.5) +
        
        # You will get an error warning that some rows have missing values... thats becasue 
        # you only want to highlight some values
        # If you need to - save the plot as an object using -> gg at the end
        # and then suppressWarnings(print(gg))
        
        geom_jitter(position = position_jitter(width =  0.0001), 
                    aes( color=Sample, stroke = 1, fill = NA), 
                    pch = 21,
                    size = 5) +    
        
        scale_y_log10(labels = scales::comma,
                      limits = c(10,10000000),
                      #breaks = breaks, 
                      #minor_breaks = minor_breaks
        ) +
        theme_classic()+
        labs(title = "Antibody levels",
             x = "",
             y = "Concentration (AU/ml)") +
        annotation_logticks(base = 10, sides = "l") +
        scale_fill_manual(values = pal) +
        scale_color_manual(values = pal2) +
        theme(plot.title = element_text(hjust = 0.5),
              axis.text.y = element_text(face = "bold"),
              axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
              legend.position = "none")
    
    

    为了我自己的放纵

    require(ggbeeswarm)
    someData %>%
        ggplot(aes(Timepoint, Concentration)) +
        geom_beeswarm( 
            cex=1.75,
                    aes(fill = Group),
                    pch = 21, 
                    size = 2.5) +
        scale_y_log10(labels = scales::comma,
                      limits = c(10,10000000),
                      #breaks = breaks, 
                      #minor_breaks = minor_breaks
        ) +
        
        geom_beeswarm( 
            cex=1.75,
            aes( color=Sample, stroke = 1, fill = NA), 
            pch = 21,
            size = 5 
        ) +
        theme_classic()+
        labs(title = "Antibody levels",
             x = "",
             y = "Concentration (AU/ml)") +
        annotation_logticks(base = 10, sides = "l") +
        scale_fill_manual(values = pal) +
        scale_color_manual(values = pal2) +
        theme(plot.title = element_text(hjust = 0.5),
              axis.text.y = element_text(face = "bold"),
              axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
              legend.position = "none")
    
    

    【讨论】:

    • 对不起 - 我正忙着写代码,没有意识到 Marek 已经回复了。他的方法很好。我同意 - 抖动无济于事!
    猜你喜欢
    • 1970-01-01
    • 2017-10-22
    • 1970-01-01
    • 2023-02-21
    • 1970-01-01
    • 2012-04-15
    • 2020-03-25
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多