【发布时间】:2021-02-15 19:04:01
【问题描述】:
我有两个数据框。第一个数据框包含三个变量 ID、纬度和经度,长度为 368。第二个数据框包含三个变量 ID、日期和值,长度为 3,058,478。每个 ID 每天都有多个观测值,并且在第二个数据集中有 10 年的每日测量值。
DT1: DT2:
ID Latitude Longitude ID Date value
1 38.2 -121.1 1 2000-01-01 3.1
1 38.0 -123.1 1 2000-01-01 3.1
1 33.8 -118.1 1 2000-01-01 3.1
1 34.9 -117.1 1 2000-01-01 3.8
1 32.6 -117.1 1 2000-01-01 4.3
1 37.6 -119.1 10 2000-01-01 3.2
10 38.3 -121.1 10 2000-01-01 3.6
10 39.8 -122.1 10 2000-01-01 1.2
10 37.9 -122.1 10 2000-01-01 3.6
10 39.5 -122.1 10 2000-01-01 1.1
10 38.3 -122.1
我想从 DT1 获取 ID 1 的前 5 个观察值,并将它们与 ID 1 的 DT2 合并,并对 DT2 中的所有 ID 重复该操作。 DT1 中每个 ID 的观察数将等于或大于 DT2 中 ID 的观察数。每次 DT1 中有一个 ID 具有更多观察值时,我只想选择与 DT2 中的观察值数量匹配的前 n 个观察值。 DT2 必须按日期和 ID 分组,然后纬度和经度测量值可以列绑定到该分组以获得此最终结果:
End result:
ID Date value Latitude Longitude
1 2000-01-01 3.1 38.2 -121.1
1 2000-01-01 3.1 38.0 -123.1
1 2000-01-01 3.1 33.8 -118.1
1 2000-01-01 3.8 34.9 -117.1
1 2000-01-01 4.3 32.6 -117.1
10 2000-01-01 3.2 38.3 -121.1
10 2000-01-01 3.6 39.8 -122.1
10 2000-01-01 1.2 37.9 -122.1
10 2000-01-01 3.6 39.5 -122.1
10 2000-01-01 1.1 38.3 -122.1
数据:
DT2<-structure(list(Date = structure(c(10957, 10957, 10957,
10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957,
10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957,
10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957, 10957
), class = "Date"), value = c(3.1, 3.1, 3.1, 3.8, 4.3,
3.2, 3.6, 1.2, 3.6, 1.1, 2.6, 3.8, 1.7, 4.8, 2.5, 1.7, 2.2, 2.8,
2.8, 1.8, 2.8, 3, 2.9, 3.6, 2, 2.4, 2.3, 3.4, 5.3, 5),ID = c("1",
"1", "1", "1", "1", "10", "10", "10", "10", "10", "1001", "1001",
"1001", "1001", "1001", "1002", "1002", "1002", "1002", "1002",
"1003", "1003", "1003", "1003", "1003", "1004", "1004", "1004",
"1004", "1004")), row.names = c(NA,
-30L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
Date = structure(c(10957, 10957, 10957, 10957, 10957,
10957), class = "Date"), ID = c("1", "10", "1001",
"1002", "1003", "1004"), .rows = list(1:5, 6:10, 11:15, 16:20,
21:25, 26:30)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
DT1<-structure(list(ID = c(1, 1, 1, 1, 1, 1, 10, 10, 10, 10,
10, 10, 10, 10, 10, 1001, 1001, 1001, 1001, 1001, 1001, 1002,
1002, 1002, 1002, 1002, 1002, 1003, 1003, 1003, 1003, 1003, 1003,
1003, 1003, 1004, 1004, 1004, 1004, 1004, 1004, 1004, 1004),
Latitude = c(38.201852, 37.97231, 33.821353, 34.895007, 32.631231,
37.64571, 38.725282, 35.385574, 38.558228, 34.421389, 37.138333,
38.0313, 37.7603, 33.747236, NA, 37.535833, 32.952124, 37.482934,
39.338504, 37.226862, 35.1019, 39.202935, 38.006311, 34.17605,
33.127711, 37.950741, 37.7481, 37.9642, 36.69676, 33.67464,
38.654069, 38.66121, 32.79222, 37.8375, 37.07206, 36.314399,
34.10374, 34.448048, 37.9604, 40.776944, 37.7478, 33.9397,
39.166017), Longitude = c(-120.681567, -122.520004, -117.91427,
-117.024484, -117.059075, -118.96652, -120.821916, -119.015009,
-121.492981, -119.701111, -119.266667, -122.1318, -122.1925,
-115.820124, NA, -121.961823, -117.264088, -122.20337, -120.171291,
-121.979675, -115.7767, -122.017728, -121.641918, -118.31712,
-117.075325, -121.268523, -119.5917, -122.3403, -121.637182,
-117.92568, -122.901857, -121.73269, -115.56306, -119.45,
-122.00764, -119.64457, -117.62914, -119.231321, -122.356811,
-124.1775, -119.5917, -115.4108, -120.148833)), row.names = c(NA,
-43L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
标签: r data-binding merge