将重复值粘贴在一起的新列添加到数据框[重复]答案

【问题标题】：Add new column to dataframe with duplicated values pasted together [duplicate]将重复值粘贴在一起的新列添加到数据框[重复]
【发布时间】：2019-11-18 20:07:33
【问题描述】：

我有一个df，看起来像这样：

ID  Country
55  Poland
55  Romania
55  France
98  Spain
98  Portugal
98  UK
65  Germany
67  Luxembourg
84  Greece
22  Estonia
22  Lithuania

其中一些ID 被重复，因为它们属于同一组。我想要做的是将paste 与所有Country 与相同的ID 一起，得到这样的输出。

到目前为止，我尝试过 ifelse(df[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE),], paste('Countries', df$Country), NA) 但这不是检索预期的输出。

【问题讨论】：

标签： r dataframe

【解决方案1】：

与purrr 和dplyr：

    df %>%
    nest(-ID) %>% 
    mutate(new_name = map_chr(data, ~ paste0(.x$Country, collapse = " + "))) %>% 
    unnest()

表：

  ID new_name                  Country     
  55 Poland + Romania + France Poland    
  55 Poland + Romania + France Romania   
  55 Poland + Romania + France France    
  98 Spain + Portugal + UK     Spain     
  98 Spain + Portugal + UK     Portugal  
  98 Spain + Portugal + UK     UK        
  65 Germany                   Germany   
  67 Luxembourg                Luxembourg
  84 Greece                    Greece    
  22 Estonia + Lithuania       Estonia   
  22 Estonia + Lithuania       Lithuania

【讨论】：

【解决方案2】：

使用data.table

library(data.table)

setDT(df)[, New_Name := c(paste0(Country, collapse = " + ")[1L],  rep(NA, .N -1)), by = ID]

#df
#ID    Country                  New_Name
#1: 55     Poland Poland + Romania + France
#2: 55    Romania                      <NA>
#3: 55     France                      <NA>
#4: 98      Spain     Spain + Portugal + UK
#5: 98   Portugal                      <NA>
#6: 98         UK                      <NA>
#7: 65    Germany                   Germany
#8: 67 Luxembourg                Luxembourg
#9: 84     Greece                    Greece
#10: 22    Estonia       Estonia + Lithuania
#11: 22  Lithuania                      <NA>

【讨论】：

另一种可能：setDT(df)[rowid(ID)==1L, nn := df[, paste(Country, collapse=" + "), ID]$V1]

【解决方案3】：

仅在第一次使用aggregate 后使用match：

flat <- function(x) paste("Countries:", paste(x,collapse=", "))
tmp <- aggregate(Country ~ ID, data=dat, FUN=flat)
dat$Country <- NA
dat$Country[match(tmp$ID, dat$ID)] <- tmp$Country

#   ID                            Country
#1  55 Countries: Poland, Romania, France
#2  55                               <NA>
#3  55                               <NA>
#4  98     Countries: Spain, Portugal, UK
#5  98                               <NA>
#6  98                               <NA>
#7  65                 Countries: Germany
#8  67              Countries: Luxembourg
#9  84                  Countries: Greece
#10 22      Countries: Estonia, Lithuania
#11 22                               <NA>

【讨论】：

【解决方案4】：

使用dplyr，一种方法是

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(new_name = paste0(Country,collapse = " + "), 
         new_name = replace(new_name, duplicated(new_name), NA))

#     ID Country    new_name                 
#   <int> <fct>      <chr>                    
# 1    55 Poland     Poland + Romania + France
# 2    55 Romania    NA                       
# 3    55 France     NA                       
# 4    98 Spain      Spain + Portugal + UK    
# 5    98 Portugal   NA                       
# 6    98 UK         NA                       
# 7    65 Germany    Germany                  
# 8    67 Luxembourg Luxembourg               
# 9    84 Greece     Greece                   
#10    22 Estonia    Estonia + Lithuania      
#11    22 Lithuania  NA

但是，为了获得您的确切预期输出，我们可能需要

df %>%
   group_by(ID) %>%
   mutate(new_name = if (n() > 1) 
         paste0("Countries ", paste0(Country,collapse = " + ")) else Country,
         new_name = replace(new_name, duplicated(new_name), NA))



#     ID Country    new_name                           
#    <int> <fct>      <chr>                              
# 1    55 Poland     Countries Poland + Romania + France
# 2    55 Romania    NA                                 
# 3    55 France     NA                                 
# 4    98 Spain      Countries Spain + Portugal + UK    
# 5    98 Portugal   NA                                 
# 6    98 UK         NA                                 
# 7    65 Germany    Germany                            
# 8    67 Luxembourg Luxembourg                         
# 9    84 Greece     Greece                             
#10    22 Estonia    Countries Estonia + Lithuania      
#11    22 Lithuania  NA

【讨论】：

要得到原题的准确结果，加...mutate(new_name = paste("Countries",paste0(Country,collapse = " + ")), ...
@RonakShah 谢谢！！但是我怎样才能在国家组的开头只添加一次Countries，而不是每次都列出一个新国家？即Countries Poland + Romania + France.
@Biostatician 哎呀……对不起。没有注意到 Countries 部分被重复了。更新了答案。

【解决方案5】：

使用基础 R，

replace(v1 <- with(df, ave(as.character(Country), ID, FUN = toString)), duplicated(v1), NA)

#[1] "Poland, Romania, France" NA      NA    "Spain, Portugal, UK"     NA        NA    "Germany"      "Luxembourg"              "Greece"                  "Estonia, Lithuania"     
#[11] NA

【讨论】：