【问题标题】:Group and count observations per year from a dataset containing interval data从包含区间数据的数据集中每年对观测值进行分组和计数
【发布时间】:2017-07-12 13:52:26
【问题描述】:

我有许多不同作家的活动数据,数据包括他们写作生涯的start.dateend.date

library("tidyverse")
writing_period_data <- tribble(
  ~start.date, ~end.date, ~writer, ~topic,
  12, 18, "a", sample(letters[10:20],1),
  14, 20, "b", sample(letters[10:20],1),
  17, 22, "c", sample(letters[10:20],1),
  15, 30, "a", sample(letters[10:20],1)
)

我想最终创建这个数据的joyplot,这需要我生成这个数据结构:

desired_output <- tribble(
  ~year, ~count, ~writer,
  12, 1, "a",
  13, 1, "a",
  14, 1, "a",
  14, 1, "b",
  15, 2, "a",
  15, 1, "b",
  16, 2, "a",
  16, 1, "b",
  17, 2, "a",
  17, 1, "b",
  17, 1, "c",
  18, 2, "a",
  18, 1, "b",
  18, 1, "c",
  19, 1, "a",
  19, 1, "b",
  19, 1, "c",
  20, 1, "a",
  20, 1, "b",
  20, 1, "c",
  21, 1, "a",
  21, 1, "c",
  22, 1, "a",
  22, 1, "c",
  23, 1, "a",
  24, 1, "a"
)

我们可以从这张图表中看到作者在感兴趣的时间段内的分布:

desired_output %>%
  ggplot(aes(x = year, y = count, fill = writer)) + geom_col()

如何从writing_period_data 生成desired_output

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    来自tidyverse 的解决方案。 dt 是最终输出。

    library(tidyverse)
    
    dt <- writing_period_data %>%
      mutate(year = map2(start.date, end.date, `:`)) %>%
      unnest() %>%
      count(year, writer) %>%
      select(year, count = n, writer)
    

    【讨论】:

    • 这太强大了——我目前是purrr 的新手,感谢这个很棒的例子!
    猜你喜欢
    • 2020-07-29
    • 1970-01-01
    • 2022-07-27
    • 1970-01-01
    • 2018-12-26
    • 2021-08-10
    • 2021-01-10
    • 1970-01-01
    • 2021-07-24
    相关资源
    最近更新 更多