连接特定行值的列值答案

【问题标题】：Join of column values for specific row values连接特定行值的列值
【发布时间】：2021-11-17 20:10:28
【问题描述】：

我想仅在 df1 中 col2 的值为 NA 的情况下将一个 tibble (df2) 加入 (left_join) 到另一个 (df1)。我目前使用的代码不是很优雅。任何关于如何缩短代码的建议将不胜感激！

library(tidyverse)

# df1 contains NAs that need to be replaced by values from df2, for relevant col1 values
df1 <- tibble(col1 = c("a", "b", "c", "d"), col2 = c(1, 2, NA, NA), col3 = c(10, 20, 30, 40))
df2 <- tibble(col1 = c("a", "b", "c", "d"), col2 = c(5, 6, 7, 8), col3 = c(50, 60, 70, 80))

# my current approach
df3 <- df1 %>%
  filter(!is.na(col2))

df4 <- df1 %>%
  filter(is.na(col2)) %>%
  select(col1)%>%
  left_join(df2)

# output tibble that is expected
df_final <- df3 %>%
  bind_rows(df4)

【问题讨论】：

有什么理由要join 而不是bind_rows() 和过滤器？
@akash87 我认为因为col2 的值在df1 和df2 中是不同的，所以如果你绑定行并过滤你最终会得到df2 的值，这似乎不受欢迎。

标签： r join tidyverse tibble

【解决方案1】：

我们可以使用data.table方法

library(data.table)
setDT(df1)[setDT(df2), col2 := fcoalesce(col2, i.col2), on = .(col1)]

-输出

> df1
   col1 col2 col3
1:    a    1   10
2:    b    2   20
3:    c    7   30
4:    d    8   40

或者tidyverse的选项

library(dplyr)
library(stringr)
df1 %>% 
   left_join(df2, by = c("col1")) %>% 
    transmute(col1, across(ends_with(".x"),
      ~ coalesce(., get(str_replace(cur_column(), ".x", ".y"))), 
           .names = "{str_remove(.col, '.x')}"))

-输出

# A tibble: 4 x 3
    col1  col2  col3
  <chr>  <dbl>  <dbl>
1 a          1     10
2 b          2     20
3 c          7     30
4 d          8     40

【讨论】：

【解决方案2】：

这是一个适合我的小 dplyr 答案，但如果你有很多行，它可能会变慢：

df1 %>%
  filter(is.na(col2)) %>%
  select(col1) %>%
  left_join(df2, by = "col1") %>%
  bind_rows(df1, .) %>%
  filter(!is.na(col2))

【讨论】：