当某些值为 NA 时，使用 dplyr 连接字符串字段答案

【问题标题】：Concatenate string field using dplyr when some values are NA当某些值为 NA 时，使用 dplyr 连接字符串字段
【发布时间】：2021-09-24 07:19:54
【问题描述】：

我有一个包含 cmets 字段的数据数据框。在某些数据行中，有单行没有注释（注释字段中的 NA）。数据中的某些位置有不止一行可能包含或不包含评论。

数据类似于这种结构（尽管有更多字段）：

input <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 1L, 1L, 2L, 2L, 3L, 4L),
           Comment = c("This is a comment", NA, "This is another comment", "This is a comment", NA, "This is a comment", NA)
)

Location  Comment
1         This is a comment
1         NA
1         This is another comment
2         This is a comment
2         NA
3         This is a comment
4         NA

我可以使用 group 连接它并像这样总结：

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | "))

但这会将 NA 值转换为字符串。

Location  Comment
1         "This is a comment | NA | This is another comment"
2         "This is a comment | NA"
3         "This is a comment"
4         "NA"

但我真正想要的流程输出会从最终评论中排除 NA，除非某个位置的唯一评论是 NA

outputDesired <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 2L, 3L, 4L),
          Comment = c("This is a comment | This is another comment", "This is a comment", "This is a comment", NA)
)

Location  Comment
1         This is a comment | This is another comment
2         This is a comment
3         This is a comment
4         NA

我可以轻松地将位置 4 中的“NA”文本转换为实际的 NA 值，并且我正在考虑删除“|NA”（如果存在），但可以在将其粘贴到 @987654327 时提供一些帮助@ 声明类似：

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | ")) %>%
  mutate(Comment = case_when(
    Comment == "NA" ~ NA,
    Comment ... (contains " | NA") ~ (remove pattern)
  ))

不过，理想情况下，如果我可以首先忽略 NA cmets，但将所有位置保留在最终输出中会更好。

请注意，在现实生活中，这是一个更大的 dplyr 管道的一部分，所以我更喜欢 tidyverse 解决方案，但很高兴探索其他选项。

有什么想法吗？

【问题讨论】：

标签： r dplyr concatenation na

【解决方案1】：

您可以使用na.omit 删除NA 值，na_if 会将空值更改为NA。

library(dplyr)

input %>%
  group_by(Location) %>%
  summarise(Comment = na_if(paste0(na.omit(Comment), collapse = '|'), ''))

#  Location Comment                                  
#     <int> <chr>                                    
#1        1 This is a comment|This is another comment
#2        2 This is a comment                        
#3        3 This is a comment                        
#4        4 NA

【讨论】：

谢谢，这一切都解决了。