【问题标题】:Add missing rows per group in R [duplicate]在R中添加每组缺失的行[重复]
【发布时间】:2019-02-21 12:09:41
【问题描述】:

数据集:

original <- data.frame(
            type = c(1,1,1,1,2,2,2,2),
            day = as.POSIXct(c("01-01-2000 00:00:00",
                               "01-01-2000 00:01:00",
                               "01-01-2000 00:02:00",
                               "01-01-2000 00:04:00",
                               "01-01-2000 12:00:00",
                               "01-01-2000 12:01:00",
                               "01-01-2000 12:02:00",
                               "01-01-2000 12:04:00"), format="%m-%d-%Y %H:%M:%S"),
            value = c(4, 3, 1, 1, 3, 5, 6, 3))

我有一个这样的数据框

  type                 day value
1    1 2000-01-01 00:00:00     4
2    1 2000-01-01 00:01:00     3
3    1 2000-01-01 00:02:00     1
4    1 2000-01-01 00:04:00     1
5    2 2000-01-01 12:00:00     3
6    2 2000-01-01 12:01:00     5
7    2 2000-01-01 12:02:00     6
8    2 2000-01-01 12:04:00     3

我想在每种类型中用 value = 0 填充缺失的分钟级别数据

因此,预期的输出将是

  type                 day value
1    1 2000-01-01 00:00:00     4
2    1 2000-01-01 00:01:00     3
3    1 2000-01-01 00:02:00     1
4    1 2000-01-01 00:03:00     0
5    1 2000-01-01 00:04:00     1
6    2 2000-01-01 12:00:00     3
7    2 2000-01-01 12:01:00     5
8    2 2000-01-01 12:02:00     6
9    2 2000-01-01 12:03:00     0
10    2 2000-01-01 12:04:00    3

我可以使用padr 解决这个问题,但是我正在寻找datatable 解决方案。是否可以为每个类型的组?

【问题讨论】:

  • 你可能需要complete from tidyr original %&gt;% group_by(type) %&gt;% complete(day = seq(first(day), last(day), by = "1 min"), fill = list(value = 0))
  • 无法使用数据表?

标签: r datetime data.table


【解决方案1】:

使用data.table,我们可以在扩展原始数据集后进行join

new <- setDT(original)[, .(day = seq(first(day), last(day), by = "1 min"), value = 0),
  by =  type]
new[original, value := i.value, on = .(type, day)][]
#    type                 day value
# 1:    1 2000-01-01 00:00:00     4
# 2:    1 2000-01-01 00:01:00     3
# 3:    1 2000-01-01 00:02:00     1
# 4:    1 2000-01-01 00:03:00     0
# 5:    1 2000-01-01 00:04:00     1
# 6:    2 2000-01-01 12:00:00     3
# 7:    2 2000-01-01 12:01:00     5
# 8:    2 2000-01-01 12:02:00     6
# 9:    2 2000-01-01 12:03:00     0
#10:    2 2000-01-01 12:04:00     3

或使用tidyverse

library(tidyverse)
original %>%
   group_by(type) %>% 
   complete(day = seq(first(day), last(day), by = "1 min"), fill = list(value = 0))

【讨论】:

    猜你喜欢
    • 2021-11-30
    • 1970-01-01
    • 1970-01-01
    • 2017-11-11
    • 1970-01-01
    • 1970-01-01
    • 2017-11-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多