【发布时间】:2018-03-15 13:34:39
【问题描述】:
样本数据
dat <- data.table(yr = c(2013,2013,2013,2013,2013,2013,2013,2013,2013,2013,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012),
location = c("Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go","Bh","Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go"),
time.period = c("t4","t5","t6","t7","t8","t3","t4","t5","t6","t7","t3","t4","t5","t6","t7","t8","t3","t4","t5","t6","t7"),
period = c(20,21,22,23,24,19,20,21,22,23,19,20,21,22,23,24,19,20,21,22,23),
value = c(runif(21)))
key <- data.table(time.period = c("t1","t2","t3","t4","t5","t6","t7","t8","t9","t10"),
period = c(17,18,19,20,21,22,23,24,25,26))
key 为每个time.period 提供关联的period
在数据表dat 中,对于每个location 和yr,如果缺少一对time.period 和period,我想插入额外的行
例如。对于位置 Bh 和 yr 2013
dat[location == "Bh" & yr == 2013,]
yr location time.period period value
1: 2013 Bh t4 20 0.7167561
2: 2013 Bh t5 21 0.5659722
3: 2013 Bh t6 22 0.8549229
4: 2013 Bh t7 23 0.1046213
5: 2013 Bh t8 24 0.8144670
我想做:
yr location time.period period value
1: 2013 Bh t1 17 0
1: 2013 Bh t2 18 0
1: 2013 Bh t3 19 0
1: 2013 Bh t4 20 0.7167561
2: 2013 Bh t5 21 0.5659722
3: 2013 Bh t6 22 0.8549229
4: 2013 Bh t7 23 0.1046213
5: 2013 Bh t8 24 0.8144670
1: 2013 Bh t9 25 0
1: 2013 Bh t10 26 0
我试过这个:
dat %>% group_by(location,yr) %>% complete(period = seq(17, max(26), 1L))
A tibble: 40 x 5
Groups: location, yr [4]
location yr period time.period value
<chr> <dbl> <dbl> <chr> <dbl>
1 Bh 2012 17 <NA> NA
2 Bh 2012 18 <NA> NA
3 Bh 2012 19 t3 0.46757583
4 Bh 2012 20 t4 0.07041745
5 Bh 2012 21 t5 0.58707367
6 Bh 2012 22 t6 0.83271673
7 Bh 2012 23 t7 0.76918731
8 Bh 2012 24 t8 0.25368225
9 Bh 2012 25 <NA> NA
10 Bh 2012 26 <NA> NA
# ... with 30 more rows
如您所见,time.period 未填充。我该如何填写该列?
【问题讨论】:
-
谢谢。我已经尝试过这些解决方案。我仍然有一个问题,我已经编辑了问题。
标签: r dplyr data.table