【发布时间】:2021-03-02 11:09:42
【问题描述】:
再一次,我有一个简单的问题让我发疯。
Data.table 结构如下所示。我有两个彼此相似但结构相同的 data.table。它需要“合并”到第一个表,但还需要保留第二个表中具有不匹配行/值的行。 N_Events 是由 Date 分组的事件的计数器变量。每个表都存储计数器变量。
挑战:两个表不存储相同的日期。
Date_1 N_Events_1
1990-01-01 1
1992-02-01 3
1992-02-01 3
1992-02-01 3
1997-04-01 2
1997-04-01 2
Date_2 N_Events_2
1990-01-01 1
1992-02-01 4
1992-02-01 4
1992-02-01 4
1992-02-01 4
1999-04-01 1
我很想提取每个唯一日期的 N_events 以获取下表,然后加入它们。我也希望将中间结果存储在 dt 中。
# Intermediate aggregation results stored in dt_summ_1
Date_1 N_Events_1
1990-01-01 1
1992-02-01 3
1997-04-01 2
# Intermediate aggregation results stored in dt_summ_2
Date_2 N_Events_2
1990-01-01 1
1992-02-01 4
1999-04-01 1
Date N_Events_1 N_Events_2
1990-01-01 1 1
1992-02-01 3 4
1997-04-01 2 NA
1999-04-01 NA 1
# NAs could also be stored as zero as I subsequently convert
# the NAs to zero to allow plotting the time series of N_Events 1 & 2
到目前为止我所尝试的:
setkey(dt, Date)
dt_1[, N_Events, by = Date] # not giving me unique dates
dt_1[, .(unique(Date), N_Events)] # warning about item 1 (being date) being recycled with remainder
merge(dt_1, dt_2, by.x = "Date_1", by.y = "Date_2, all = TRUE)
# Errors in 185736 rows; more than 37510 = nrow(x)+nrow(i).
# Check for duplicate key values in i each of which join to the
# same group in x over and over again.
我在这里做错了什么?任何指针都非常感谢!
【问题讨论】: