【问题标题】:How can I add these data points clearly to a chart without them looking the way they do?我怎样才能将这些数据点清楚地添加到图表中,而不会让它们看起来像它们那样?
【发布时间】:2021-02-22 15:49:53
【问题描述】:

我有两个数据框(本问题末尾的dput()s)我希望绘制到同一个图表上。

我希望能够显示任何给定日期(列)的第一次和第二次预约号码,以及每个日期接种的每种疫苗的数量,按地点细分。我对原始数据执行了count(使用dplyr),但我认为通过每天按站点绘制,它导致我的图表显示堆叠值而不是单个/总值:

我高度怀疑我的处理方法是错误的,这就是导致列和行看起来像它们的方式的原因;它在很多层面上似乎都是错误的。

我认为列被分解成段(因为它们是许多值的组合),都堆叠在一起,我相信线也是如此。

就这一行而言,显然有问题,因为它似乎从一列跳到下一列;没有平滑/流畅的过渡。我已按单日值拆分数据,但仍然会发生这种情况。

(为了这个例子,我添加了粗体颜色;这个图表不是最终形式。)

我尝试使用merge 来组合数据集,但仍然收到相同的结果;我确信有更好的方法来做到这一点。

任何建议都会很棒。

merge 数据帧的代码:

merged <- merge(df, df2, by = 1)
colnames(merged)[1] <- "apptDTS" # Change first column name

图表代码:

ggplot(merged) +
geom_col(aes(apptDTS, n.x), fill = "yellow", colour = "black") +
geom_col(aes(apptDTS, n.y), fill = "blue", colour = "black") +
geom_line(aes(x = apptDTS, y = n.x),
          colour = "green") +
geom_line(aes(x = apptDTS, y = n.y),
          colour = "red")

dputs:

df <- structure(list(FirstApptDTS = structure(c(1609718400, 1609718400, 
1609718400, 1609718400, 1609804800, 1609804800, 1609804800, 1609804800, 
1609891200, 1609891200, 1609891200, 1609891200, 1609977600, 1609977600, 
1609977600, 1609977600, 1610064000, 1610064000, 1610064000, 1610064000, 
1610150400, 1610150400, 1610150400, 1610150400, 1610409600, 1610409600, 
1610409600, 1610409600, 1610409600, 1610496000, 1610496000, 1610496000, 
1610496000, 1610496000, 1610582400, 1610582400, 1610582400, 1610582400, 
1610582400, 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 
1610755200, 1610755200, 1610755200, 1610755200, 1610755200, 1610928000, 
1610928000, 1610928000, 1610928000, 1610928000, 1610928000, 1611014400, 
1611014400, 1611014400, 1611014400, 1611014400, 1611014400, 1611100800, 
1611100800, 1611100800, 1611100800, 1611100800, 1611100800, 1611187200, 
1611187200, 1611187200, 1611187200, 1611187200, 1611273600, 1611273600, 
1611273600, 1611273600, 1611273600, 1611360000, 1611360000, 1611360000, 
1611360000, 1611360000, 1611360000, 1611532800, 1611532800, 1611532800, 
1611532800, 1611532800, 1611532800, 1611532800, 1611619200, 1611619200, 
1611619200, 1611619200, 1611619200, 1611705600, 1611705600, 1611705600, 
1611705600, 1611705600, 1611792000, 1611792000, 1611792000, 1611792000, 
1611792000, 1611878400, 1611878400, 1611878400, 1611878400, 1611878400, 
1611964800, 1611964800, 1611964800, 1611964800, 1611964800), class = c("POSIXct", 
"POSIXt"), tzone = ""), firstSiteLocation = c("GHGA", "LBVC1", 
"STHSTVC", "STHSTVC", "GHGA", "LBVC1", "STHSTVC", "STHSTVC", 
"GHGA", "LBVC1", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", "STHSTVC", 
"STHSTVC", "GHGA", "LBVC1", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", 
"STHSTVC", "STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", 
"GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", 
"STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "VC2", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", 
"STHSTVC", "WBVC1"), VaccineTypeCD = c("DEF", "DEF", "ABC", "DEF", 
"DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", 
"ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", 
"DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "DEF", 
"ABC", "DEF", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF"), n = c(134L, 283L, 3L, 10L, 122L, 120L, 
18L, 128L, 148L, 534L, 481L, 22L, 151L, 520L, 529L, 7L, 174L, 
539L, 535L, 3L, 185L, 540L, 494L, 3L, 91L, 321L, 491L, 12L, 495L, 
82L, 329L, 493L, 6L, 534L, 86L, 423L, 517L, 2L, 496L, 111L, 394L, 
505L, 2L, 498L, 401L, 547L, 518L, 2L, 362L, 443L, 481L, 555L, 
1L, 524L, 153L, 446L, 452L, 493L, 1L, 426L, 288L, 472L, 463L, 
558L, 1L, 381L, 317L, 491L, 592L, 610L, 566L, 471L, 496L, 606L, 
615L, 572L, 561L, 472L, 564L, 557L, 1L, 577L, 584L, 534L, 598L, 
570L, 1L, 594L, 1L, 553L, 492L, 581L, 570L, 610L, 573L, 484L, 
580L, 575L, 571L, 554L, 482L, 590L, 596L, 533L, 395L, 489L, 570L, 
606L, 486L, 413L, 495L, 497L, 538L, 441L, 264L)), row.names = c(59L, 
61L, 63L, 64L, 66L, 68L, 70L, 71L, 73L, 74L, 76L, 77L, 79L, 81L, 
83L, 84L, 86L, 88L, 90L, 91L, 93L, 95L, 97L, 98L, 109L, 111L, 
113L, 115L, 116L, 118L, 120L, 122L, 124L, 125L, 127L, 129L, 131L, 
133L, 134L, 136L, 138L, 140L, 142L, 143L, 145L, 147L, 149L, 151L, 
152L, 154L, 156L, 158L, 160L, 161L, 163L, 165L, 167L, 169L, 171L, 
172L, 174L, 176L, 178L, 180L, 182L, 183L, 185L, 187L, 189L, 191L, 
193L, 195L, 197L, 199L, 201L, 203L, 205L, 207L, 209L, 211L, 213L, 
214L, 216L, 218L, 220L, 222L, 224L, 225L, 228L, 229L, 231L, 233L, 
235L, 237L, 239L, 241L, 243L, 245L, 247L, 249L, 251L, 253L, 255L, 
257L, 259L, 261L, 263L, 265L, 267L, 269L, 271L, 273L, 275L, 277L, 
279L), class = "data.frame")

df2 <- structure(list(SecondApptDTS = structure(c(1609545600, 1609804800, 
1609891200, 1609977600, 1610064000, 1610150400, 1610409600, 1610409600, 
1610496000, 1610496000, 1610496000, 1610582400, 1610582400, 1610668800, 
1610668800, 1610668800, 1610755200, 1611014400, 1611187200, 1611705600, 
1611878400, 1611964800, NA), class = c("POSIXct", "POSIXt"), tzone = ""), 
    secondSiteLocation = c("GHGA", "GHGA", "GHGA", "GHGA", "GHGA", 
    "GHGA", "GHGA", "LBVC1", "GHGA", "LBVC1", "STHSTVC", "GHGA", 
    "LBVC1", "GHGA", "LBVC1", "LBVC2", "GHGA", "LBVC1", "GHGA", 
    "GHGA", "STHSTVC", "GHGA", NA), VaccineType2CD = c("DEF", 
    "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
    "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
    "DEF", "DEF", "DEF", NA), n = c(1L, 1L, 254L, 199L, 274L, 
    269L, 325L, 157L, 284L, 197L, 2L, 295L, 123L, 257L, 123L, 
    1L, 1L, 1L, 4L, 2L, 1L, 3L, NA)), row.names = c("24", "28", 
"31", "34", "37", "40", "47", "49", "51", "53", "55", "57", "59", 
"62", "64", "66", "67", "68", "73", "75", "77", "78", "NA"), class = "data.frame")

【问题讨论】:

    标签: r ggplot2


    【解决方案1】:

    如果我理解正确,OP想要显示

    • 任何给定日期的第一个和第二个约会号码
    • 每个日期接种的每种疫苗的数量
    • 按位置细分。

    但是,我不确定我是否完全理解了这些要求。因此,我的回答可能需要根据 OP 的反馈进行调整。

    以下是我将使用我喜欢的工具执行的操作(我使用 data.table 比使用 dplyr 更熟悉和更快)。最重要的是,我不是merge(),而是rbind() 两个输入数据集,第一次和第二次约会都有一个 id 列。

    library(data.table)
    library(magrittr)
    cols <- c("appDTS", "siteLocation", "vaccineType", "n")
    combi <- list(df, df2) %>% 
      lapply(setDT) %>% 
      lapply(setnames, cols) %>% 
      rbindlist(idcol = "appt") %>%
      .[, appt := factor(appt, labels = c("First", "Second"))]
    
    # 1st plot
    ggplot(combi) + 
      aes(appDTS, n, fill = appt) + 
      geom_col() +
      scale_fill_brewer(palette = "Paired")
    

    # 2nd plot
    ggplot(combi) + 
      aes(appDTS, n, fill = vaccineType) + 
      geom_col() +
      scale_fill_brewer(palette = "Accent")
    

    # 3rd plot
    ggplot(combi) + 
      aes(appDTS, n, fill = siteLocation) + 
      geom_col()
    

    请注意,我为每个图选择了不同的调色板,以可视化不同的变量进行颜色编码。

    编辑

    OP 有clarified:

    我想要一个在 x 轴上显示日期的图, y 轴上的计数,包括条形图和两条线 表明每天每种疫苗接种了多少次。

    为了绘制每天接种的疫苗数量,我们需要进一步汇总数据。使用data.table,这是由

    完成的
    combi[!is.na(n), .(n = sum(n)), by = .(appDTS, vaccineType)]
    

    现在,可以通过以下方式创建带有线条叠加的图

    ggplot(combi) + 
      aes(appDTS, n, fill = appt) + 
      geom_col() +
      scale_fill_brewer(palette = "Paired") + 
      geom_line(
        aes(appDTS, n, colour = vaccineType),
        data = combi[!is.na(n), .(n = sum(n)), by = .(appDTS, vaccineType)],
        inherit.aes = FALSE, size = 1) +
      scale_color_brewer(palette = "Set1")
    

    需要inherit.aes = FALSE 以避免由于聚合数据集中缺少appt 变量(映射到fill 美学)而出现错误消息。

    【讨论】:

    • 首先,这些都是展示这些数据的绝妙方式,我真的很喜欢它们;谢谢你。回答您的问题:在我看来,我想要一个图,在 x 轴上显示日期,在 y 轴上显示计数,并用条形图和两条线表示每种疫苗接种的数量在每一天。这可能太拥挤而没有意义,但我仍然很好奇它可能如何拥挤;它可能比我们预期的更清楚。您能否展示我如何制作一个包含所有三种方法的单一情节?
    • @Mus,我已经更新了我的答案以满足您的要求(好吧,至少我相信我已经理解了)。请让我知道这是否朝着正确的方向发展。
    猜你喜欢
    • 2013-03-14
    • 2018-04-15
    • 1970-01-01
    • 2016-08-24
    • 1970-01-01
    • 1970-01-01
    • 2012-04-22
    • 1970-01-01
    • 2016-11-01
    相关资源
    最近更新 更多