R仅将一列的一部分合并到另一个数据框中的现有列中答案

【问题标题】：R Merge only parts of one column into an existent column from another dataframeR仅将一列的一部分合并到另一个数据框中的现有列中
【发布时间】：2018-06-27 12:17:19
【问题描述】：

我知道合并是一个被广泛讨论的话题。如果您认为这是重复的，我很高兴能回答我的问题，但我还没有找到它（对不起！）。谢谢

我有两个数据框：

require(dplyr)
set.seed(1)
large_df <- data_frame(id = rep(paste0('id',1:40), each = 3),
                           age = c(rep(NA,60),rep (sample(20), each = 3)),
                           col3 = rep(letters[1:20],6), col4 = rep(1:60,2))
small_df <- data_frame(id = paste0('id',1:20),
                         age = sample(20))

large_df 包含不完整的数据 (large_df$age)，它包含在 small_df 中。现在我想将small_df$age 中的信息带入large_df$age（由正确的“id”合并）。我认为这必须通过合并或 dplyr 的连接函数之一来实现，但是几种组合并没有带来我想要的结果。

我还在行上尝试了一个 for 循环：

for(i in nrow(large_df)) {
  if (large_df[i,'id'] %in% small_df$id == TRUE) {
    large_df[i,'age'] <- small_df$age[which(small_df$id %in% large_df[i,'id'])]
  }
}

但这无济于事，它甚至不返回任何结果。（有人知道为什么不？）

我的结果应该是这样的：

large_df$age[1:60] <- rep(small_df$age, each = 3)
large_df
# A tibble: 120 x 4
   id      age col3   col4
   <chr> <int> <chr> <int>
 1 id1       6 a         1
 2 id1       6 b         2
 3 id1       6 c         3
 4 id2       8 d         4
 5 id2       8 e         5
 6 id2       8 f         6
 7 id3      11 g         7
 8 id3      11 h         8
 9 id3      11 i         9
10 id4      16 j        10
# ... with 110 more rows

【问题讨论】：

试试left_join(select(large_df, -age), small_df, by = "id")
谢谢@kath，好主意，但这将NA引入了我已经拥有的价值观。
标准方法是使用merge(large_df, small_df, by = "id", all.x = T, sort = F)。它为您提供了两个年龄列 age.x 和 age.y，如果需要，您必须将它们组合起来。
谢谢！ “如果需要，你必须组合”确实是我的问题......@awchisholm 展示了一种使用条件语句组合两列的漂亮方式。

标签： r join merge dplyr

【解决方案1】：

使用您的数据框就可以了。

result = 
  large_df %>% 
  left_join(small_df, by = 'id') %>% 
  mutate(age = ifelse(is.na(age.x), age.y, age.x)) %>%
  dplyr::select(-age.x, -age.y)
result
# A tibble: 120 x 4
      id  col3  col4   age
   <chr> <chr> <int> <int>
 1   id1     a     1    19
 2   id1     b     2    19
 3   id1     c     3    19
 4   id2     d     4     5

如果age.x 和age.y 都是NA，那么NA 将在age 中输出。

【讨论】：

太棒了。 @awchisholm。谢谢，我没想过将mutate() 与条件语句一起使用:)