基于公共 id 绑定行答案

【问题标题】：Binding rows based on common id基于公共 id 绑定行
【发布时间】：2021-06-10 06:53:14
【问题描述】：

我有一个非常简单的案例，我想根据特定数据帧的公共 id 元素将多个数据帧组合成一个。

例子：

id <- c(1, 2, 3)
x <- c(10, 12, 14)

data1 <- data.frame(id, x)
  
id <- c(2, 3)
x <- c(20, 22)

data2 <- data.frame(id, x)

id <- c(1, 3)
x <- c(30, 32)

data3 <- data.frame(id, x)

这给了我们，

$data1
  id  x
1  1 10
2  2 12
3  3 14

$data2
  id  x
1  2 20
2  3 22

$data3
  id  x
1  1 30
2  3 32

现在，我想根据 data3 的 id 组合所有三个数据帧。预期的输出应该是这样的

我正在尝试以下操作，但没有得到预期的输出。

library(dplyr)
library(tidyr)
combined <- bind_rows(data1, data2, data3, .id = "id") %>% arrange(id)

知道如何获得预期的输出吗？

【问题讨论】：

为什么最终数据集中没有2？

标签： r dplyr tidyr

【解决方案1】：

这行得通吗：

library(dplyr)
library(tidyr)
data1 %>% full_join(data2, by = 'id') %>% full_join(data3, by = 'id') %>% arrange(id) %>% right_join(data3, by = 'id') %>% 
   pivot_longer(cols = -id) %>% select(-name) %>% distinct()
# A tibble: 6 x 2
     id value
  <dbl> <dbl>
1     1    10
2     1    NA
3     1    30
4     3    14
5     3    22
6     3    32

【讨论】：

【解决方案2】：

将 3 个数据帧组合在一个列表中，并使用 filter 仅选择第三个数据帧中的 id。

library(dplyr)
library(tidyr)

bind_rows(data1, data2, data3, .id = "new_id") %>%
  filter(id %in% id[new_id == 3]) %>%
  complete(new_id, id)

#  new_id    id     x
#  <chr>  <dbl> <dbl>
#1 1          1    10
#2 1          3    14
#3 2          1    NA
#4 2          3    22
#5 3          1    30
#6 3          3    32

【讨论】：

【解决方案3】：

纯基础 R 解决方案也可以实现

lst <- list(data1, data2, data3)
reshape(
  subset(
    reshape(
      do.call(rbind, Map(cbind, lst, grp = seq_along(lst))),
      idvar = "id",
      timevar = "grp",
      direction = "wide"
    ),
    id %in% lst[[3]]$id
  ),
  idvar = "id",
  varying = -1,
  direction = "long"
)[c("id", "x")]

给了

【讨论】：

【解决方案4】：

使用base R

do.call(rbind, unname(lapply(mget(ls(pattern = "^data\\d+$")), \(x) {
        x1 <- subset(x, id %in% data3$id)
        v1 <- setdiff(data3$id, x1$id)
        if(length(v1) > 0) rbind(x1, cbind(id = v1, x = NA)) else x1
    })))

-输出

【讨论】：

【解决方案5】：

bind_rows(data1, data2, data3, .id = 'grp')%>%
  complete(id, grp)%>%
  select(-grp) %>%
  filter(id%in%data3$id) 

# A tibble: 6 x 2
     id     x
  <dbl> <dbl>
1     1    10
2     1    NA
3     1    30
4     3    14
5     3    22
6     3    32

【讨论】：